Home Services Process Work Open Source Book a call
Background for Vidix AI macOS assistant case study
All projects
macOS Agent Claude Integration RAG MCP

Vidix

A native macOS application that embeds AI directly into your workflow. Select text or an image in any app, trigger a shortcut, and get AI-powered results without switching context. Built on a multi-agent architecture with RAG-powered recipes, MCP server integration, multi-provider support, and a strict privacy-first approach where zero data touches our servers.

Visit vidix.app
5+
AI providers
supported
50+
Built-in recipes
& agents
0
Data stored
on our servers
3
Output modes
(replace, type, editor)
The Challenge

Eliminating the context switch tax.

Every time you use an AI tool, you pay a context switch tax: leave your application, open a browser, navigate to a chat interface, paste your content, wait for a response, copy the result, switch back to your app, paste it in. That's 8 steps minimum, repeated dozens of times per day. For knowledge workers — developers, writers, analysts, project managers — this friction adds up to hours of lost productivity weekly.

Existing solutions were either web-based (still requiring a switch) or locked to a single AI provider. None offered the deep system-level integration needed to truly make AI invisible: capturing content from any application, processing it through the right model with the right prompt, and delivering results exactly where you need them.

We needed to build a tool that lived at the OS level — accessible from every application via a single shortcut, supporting multiple AI providers and custom workflows, with an absolute commitment to privacy: no data stored, no intermediary servers, no tracking. Just AI where you work.

Architecture

A multi-agent system inside your Mac.

Vidix isn't a simple API wrapper. It's a coordinated system of specialized agents, each responsible for a different aspect of the AI workflow, orchestrated by a central engine that routes requests to the right agent with the right context.

Agent 01

Capture Agent

Interfaces with macOS Accessibility APIs to capture selected text, images, or screen regions from any running application. Handles the complexity of different app frameworks — native Cocoa, Electron, web views, Terminal — with fallback strategies for apps that don't expose standard accessibility hooks. Detects content type automatically and routes to the appropriate processing pipeline.

Agent 02

Router Agent

The orchestration layer. Receives captured content and determines how to process it: which recipe to apply, which AI provider to use, what system prompt to inject. Uses a RAG-indexed recipe library to match content type and user intent to the right processing pipeline. Handles provider failover — if Claude is unavailable, it can route to GPT or a local Ollama model based on user-defined fallback preferences.

Agent 03

Recipe Agents

Each recipe is effectively a mini-agent with its own system prompt, provider preference, temperature setting, and output format. Built-in recipes cover common use cases: "Improve writing," "Explain code," "Translate to Spanish," "Extract key points." Users create custom recipe agents without code — defining the prompt, selecting a provider, and assigning a keyboard shortcut. Each recipe agent manages its own conversation context.

Agent 04

Vision Agent

Handles image inputs using Claude's vision capabilities and GPT-4V. Users can capture a screen region and ask questions about it, extract text from images (OCR), describe visual content, analyze charts and diagrams, or convert mockups into code. The agent automatically selects the best vision-capable provider based on the task and user's API key availability.

Agent 05

Output Agent

Manages how AI responses are delivered back to the user. Three modes: direct replacement (swaps selected text), character-by-character typing (for apps that block paste, like certain terminals and form fields), and editor mode (opens a markdown buffer where users can iterate — "make it shorter," "add bullet points" — before inserting). Handles formatting preservation and clipboard management.

Agent 06

MCP Bridge Agent

Integrates with Model Context Protocol (MCP) servers to extend Vidix's capabilities beyond text and image processing. Users can connect MCP servers for database queries, API calls, file system operations, and custom tools — all accessible through the same shortcut-driven interface. The bridge agent handles MCP server discovery, connection management, and tool routing.

RAG & Recipes

A knowledge-driven recipe engine.

The recipe system is powered by RAG, enabling intelligent recipe matching and context-aware suggestions that go beyond simple keyword search.

Semantic recipe search

The Palette (command launcher) uses vector search to match user queries to recipes. Typing "make this email more professional" finds the "Improve Writing" recipe even without exact keyword matches. The recipe library is indexed in a local vector store for instant retrieval.

Context-aware suggestions

The Router Agent analyzes captured content and suggests the most relevant recipes. Select code and it surfaces development-related recipes; select prose and it suggests writing recipes. Suggestions are ranked by content type, frequency of use, and the active application.

Custom recipe creation

Users build their own recipe agents without writing code: define a system prompt, choose a provider, set parameters, assign a shortcut. Custom recipes are automatically indexed into the RAG store for semantic search alongside built-in recipes.

Conversation context

The editor mode maintains conversation context, enabling iterative refinement. The system stores recent interaction history in a local RAG index, allowing follow-up prompts like "now translate that to French" to work seamlessly across sessions.

Privacy Architecture

Zero trust. Zero data retention.

Vidix was designed from the ground up with a non-negotiable privacy constraint: we never see, store, or process user data. The entire architecture enforces this at every level.

All processing happens locally on the user's Mac. When AI is needed, content routes directly from the application to the user's chosen provider — Claude, GPT, Gemini, or a local model via Ollama. No intermediary server, no proxy, no logging. Users bring their own API keys, stored in the macOS Keychain.

For users who can't send data to any external API, Ollama support enables fully offline AI processing with local models. The same recipes, the same interface, the same workflow — but nothing leaves the machine.

No intermediary servers — direct API calls only
API keys stored in macOS Keychain
No analytics, no telemetry, no tracking
Ollama support for fully offline operation
Local vector store for recipe indexing
Development

Built with Claude Code.

A native Swift application built with Claude Code driving the entire development workflow — from architecture design to App Store submission.

Swift + AI architecture

Used Claude Code's plan mode to design the agent architecture in Swift, mapping how native macOS APIs (Accessibility, Keychain, Pasteboard) would interface with the multi-provider AI layer.

Custom /build-recipe skill

A project-specific slash command that scaffolds new recipe agents: generates the prompt template, configures provider settings, creates test fixtures, and indexes the recipe into the local vector store.

CLAUDE.md for Swift conventions

Project rules enforce Swift naming conventions, ensure proper async/await patterns for API calls, mandate error handling for all provider interactions, and maintain privacy-first patterns across the codebase.

MCP integration testing

Claude Code's subagent capabilities enabled parallel development of the MCP Bridge Agent alongside the core application, testing server discovery and tool routing in isolation.

Tech Stack

The full system.

Native macOS

Swift SwiftUI Accessibility API Keychain Pasteboard App Store

AI & Providers

Claude API OpenAI API Gemini API Ollama MCP Protocol Local Vector Store

Development

Claude Code CLAUDE.md Rules Custom Skills Markdown Rendering RAG Pipeline Agent Architecture

Need AI embedded in
your workflow?

We build native applications with deep AI integration, multi-provider support, and privacy-first architecture. Let's talk about your use case.