Assistant

The assistant layer turns Voice Mode from a dictation tool into a voice-driven agent. The user speaks, the assistant thinks, and the reply is spoken back (and, when appropriate, drafted into the app the user is working in). Pro-only.

Providers

The user picks a provider in Dashboard → Assistant → Provider. Each has different strengths:

Claude — uses the Claude CLI installed on the user’s machine. Has full filesystem and tool access, so it can read files, run commands, and act as a coding agent. Most capable, requires the CLI to be installed.
Codex — uses OpenAI’s Codex CLI. Also has filesystem and tool access for coding tasks.
Cursor — uses Cursor Agent through ACP with the user’s existing Cursor login.
OpenCode — uses OpenCode through ACP with the user’s configured providers.
Ollama — talks to a local Ollama server. Fully offline, no tools. Good for chat and quick questions.
Local MLX — uses a bundled MLX model. Fully offline, no tools. The lowest-setup option — works out of the box.

Claude, Codex, Cursor, and OpenCode are the right choice when the user wants the assistant to do things on their machine. Ollama and local MLX are the right choice for privacy-first conversational use.

Triggering

Two ways to start an assistant turn:

Assistant hotkey — a dedicated hotkey that always routes to the assistant.
Trigger word — while using the normal dictation hotkey, starting the sentence with the configured trigger word (e.g. “hey Claude”) routes that turn to the assistant. The rest of the sentence becomes the query.

Responses

Replies are spoken via TTS in the active persona’s voice.
When Slate is enabled and open, drafted content (an email, a code snippet, a structured doc) streams into the Slate window as proper markdown while the spoken reply stays the primary channel. The assistant doesn’t paste into other apps — drafted content lives on the Slate, where the user can read, edit, and save it.

Cancelling mid-response

The user can say the configured cancel phrase to stop generation and any in-flight TTS playback. This is useful when the assistant starts down the wrong path or the reply is longer than needed.

Personas

Personas shape how the assistant speaks and writes. See personas.md.

Voice Mode’s own tools

When the backend is Claude, Codex, Cursor, or OpenCode and the user has connected Voice Mode in Dashboard → Assistant → Tools, Voice Mode itself appears as an MCP server. Use those tools to answer questions about the user’s current state (e.g. current_settings, list_dictionary, list_augments, super_voice_mode_version) instead of guessing or reading config files. See tools.md for the full list and the rule against editing Voice Mode’s config files directly.

Pro tips

Switch backends per task. Claude, Codex, Cursor, or OpenCode when the user wants the assistant to actually do something on their machine (read a file, run a command, edit code). Ollama or local MLX for chat that should stay fully on-device. The same query can give very different results — pick the backend that matches the goal, not just availability.
Cancel early when the reply is going wrong. The cancel phrase stops generation and any in-flight TTS. Faster than waiting it out and cheaper on tokens for the cloud backends.
Pair with the Concise augment for fast spoken replies. TTS can drag if the assistant gives a paragraph when the user wanted a sentence. Concise plus a normal persona gets short, useful answers fast.
For coding work, lean into Claude, Codex, Cursor, or OpenCode tools. They can read the user’s repo, run tests, and apply edits — by voice. Ollama and local MLX can talk about code but can’t touch it.
The Voice Mode Help augment exists for “how does this work” turns. Toggle it on, ask the question, toggle it off. Don’t leave it on full time — it occupies a slot.