Getting Started
Voice Mode is a macOS menu-bar app for voice dictation and a voice assistant. Everything that can run locally does — audio, transcription, and correction all happen on the user’s machine and never leave it. The optional assistant layer can route to local models (Ollama, MLX) for a fully offline experience or to Claude / Codex for more capable responses.
The dictation flow
Section titled “The dictation flow”- User holds the dictation hotkey and speaks.
- Voice Mode captures audio and trims silence automatically.
- On-device speech-to-text transcribes what was said.
- The Dictionary fixes specific words or phrases the user has configured.
- An optional on-device LLM cleans up punctuation, capitalization, and obvious transcription errors.
- The final text is pasted into whatever app has focus.
The assistant flow
Section titled “The assistant flow”If the user has Pro and either holds the assistant hotkey or starts their sentence with the assistant trigger word, Voice Mode routes the transcript to the assistant backend instead of pasting. The assistant can speak its reply back via TTS and, for drafted content, can paste too.
Privacy
Section titled “Privacy”Dictation is fully local. Audio capture, voice activity detection, speech-to-text, dictionary substitutions, and AI Correction (when enabled) all run on the user’s machine. The audio and the transcript do not leave the Mac for the dictation flow.
The assistant flow is different: it depends on which backend the user chose. Local backends (Ollama, local MLX) stay on-device. Claude and Codex backends route to those services through their respective CLIs — when the user picks one of those, their query and the assistant’s reply go to that provider, the same as if the user had typed the question into the CLI by hand.
Updates
Section titled “Updates”Voice Mode updates itself when a new version is released. The user is prompted before any update is applied; nothing installs silently.
What the user typically asks
Section titled “What the user typically asks”- “Why didn’t it type anything?” →
troubleshooting.md - “How do I change the hotkey?” →
hotkeys.md - “How do I stop it mishearing X as Y?” →
replacements.md - “How do I talk to Claude/Codex with my voice?” →
assistant.md - “Can I change its personality?” →
personas.md - “What’s free vs paid?” →
tiers.md - “How do I activate Pro / find my license?” →
licensing.md - “What’s an augment?” →
augments.md - “What’s my current setup?” / “what backend am I on?” →
tools.md(callcurrent_settingsrather than guessing).