Voicemode voicemode
Voice interaction for Claude Code. Use when users mention voice mode, speak, talk, converse, voice status, or voice troubleshooting.
git clone https://github.com/mbailey/voicemode
T=$(mktemp -d) && git clone --depth=1 https://github.com/mbailey/voicemode "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/voicemode" ~/.claude/skills/mbailey-voicemode-voicemode && rm -rf "$T"
.claude/skills/voicemode/SKILL.mdFirst-Time Setup
If VoiceMode isn't working or MCP fails to connect, run:
/voicemode:install
After install, reconnect MCP:
/mcp → select voicemode → "Reconnect" (or restart Claude Code).
VoiceMode
Natural voice conversations with Claude Code using speech-to-text (STT) and text-to-speech (TTS).
Note: The Python package is
voice-mode (hyphen), but the CLI command is voicemode (no hyphen).
When to Use MCP vs CLI
| Task | Use | Why |
|---|---|---|
| Voice conversations | MCP | Faster - server already running |
| Service start/stop | MCP | Works within Claude Code |
| Installation | CLI | One-time setup |
| Configuration | CLI | Edit settings directly |
| Diagnostics | CLI | Administrative tasks |
Usage
Use the
converse MCP tool to speak to users and hear their responses:
# Speak and listen for response (most common usage) voicemode:converse("Hello! What would you like to work on?") # Speak without waiting (for narration while working) voicemode:converse("Searching the codebase now...", wait_for_response=False)
For most conversations, just pass your message - defaults handle everything else. Use default converse tool parameters unless there's a good reason not to. Timing parameters (
listen_duration_max, listen_duration_min) use smart defaults with silence detection - don't override unless the user requests it or you see a clear need. Defaults are configurable by the user via ~/.voicemode/voicemode.env.
| Parameter | Default | Description |
|---|---|---|
| required | Text to speak |
| true | Listen after speaking |
| auto | TTS voice |
For all parameters, see Converse Parameters.
Best Practices
- Narrate without waiting - Use
when announcing actionswait_for_response=False - One question at a time - Don't bundle multiple questions in voice mode
- Check status first - Verify services are running before starting conversations
- Let VoiceMode auto-select - Don't hardcode providers unless user has preference
- First run is slow - Model downloads happen on first start (2-5 min), then instant
Parallel Tool Calls (Zero Dead Air)
When performing actions during a voice conversation, use parallel tool calls to eliminate dead air. Send the voice message and the action in the same turn so they execute concurrently.
Pattern: Speak + Act in Parallel
# FAST: One turn — voice and action fire simultaneously # Turn 1: speak (fire-and-forget) + do the work (all parallel) voicemode:converse("Checking that now.", wait_for_response=False) bash("git status") Agent(prompt="Research X", run_in_background=True) # Turn 2: speak the results (with listening) voicemode:converse("Here's what I found: ...", wait_for_response=True)
# SLOW: Two turns — unnecessary sequential delay # Turn 1: speak voicemode:converse("Checking that now.", wait_for_response=False) # Turn 2: do the work bash("git status") # Turn 3: speak results voicemode:converse("Here's what I found: ...", wait_for_response=True)
When to Use Parallel vs Sequential
| Scenario | Approach | Why |
|---|---|---|
| Announce + do work | Parallel | No dependency between speech and action |
| Announce + spawn agent | Parallel | Agent runs in background anyway |
| Check result then report | Sequential | Need result before speaking |
| Listen for response | Sequential | blocks until user finishes |
Key Rules
- All tool types can be parallel: MCP, Bash, Agent, Read — mix freely in one turn
- Wall-clock time = longest call, not the sum of all calls
- Use
for the speak call when combining with other toolswait_for_response=False - Great for demos: Audience hears continuous speech with no awkward silences
Handling Pauses and Wait Requests
When the user asks you to wait or give them time:
Short pauses (up to 60 seconds): If the user says something ending with "wait" (e.g., "hang on", "give me a sec", "wait"), VoiceMode automatically pauses for 60 seconds then resumes listening. This is built-in.
Longer pauses (2+ minutes): Use
bash sleep N where N is seconds. For example, if the user says "give me 5 minutes":
sleep 300 # Wait 5 minutes
Then call converse again when the wait is over:
voicemode:converse("Five minutes is up. Ready when you are.")
Configuration: The short pause duration is configurable via
VOICEMODE_WAIT_DURATION (default: 60 seconds).
STT Recovery - Manual Transcription
If Whisper STT fails but the audio was recorded successfully, you can manually transcribe the saved audio file:
# Transcribe the most recent recording whisper-cli ~/.voicemode/audio/latest-STT.wav # Or check if file exists first (safe for inclusion in automation) if [ -f ~/.voicemode/audio/latest-STT.wav ]; then whisper-cli ~/.voicemode/audio/latest-STT.wav fi
Requirements:
- Audio saving must be enabled via one of:
inVOICEMODE_SAVE_AUDIO=true~/.voicemode/voicemode.env
(saves all audio and transcriptions)VOICEMODE_SAVE_ALL=true
(enables debug mode with audio saving)VOICEMODE_DEBUG=true
How it works:
- VoiceMode saves all STT recordings to
with timestamps~/.voicemode/audio/ - The
symlink always points to the most recent recordinglatest-STT.wav - If the STT API fails, the recording is still saved for manual recovery
- This lets you recover the user's speech without asking them to repeat
When to use:
- STT service timeout or connection failure
- Transcription returned empty but user definitely spoke
- Need to verify what was actually said vs. what was transcribed
See also: Troubleshooting - No Speech Detected
Check Status
voicemode service status # All services voicemode service status whisper # Specific service
Shows service status including running state, ports, and health.
Installation
# Install VoiceMode CLI and configure services uvx voice-mode-install --yes # Install local services (Apple Silicon recommended) voicemode service install whisper voicemode service install kokoro
See Getting Started for detailed steps.
Service Management
# Start/stop services voicemode:service("whisper", "start") voicemode:service("kokoro", "start") # View logs for troubleshooting voicemode:service("whisper", "logs", lines=50)
| Service | Port | Purpose |
|---|---|---|
| whisper | 2022 | Speech-to-text |
| kokoro | 8880 | Text-to-speech |
| voicemode | 8765 | HTTP/SSE server |
Actions: status, start, stop, restart, logs, enable, disable
Configuration
voicemode config list # Show all settings voicemode config set VOICEMODE_TTS_VOICE nova # Set default voice voicemode config edit # Edit config file
Config file:
~/.voicemode/voicemode.env
See Configuration Guide for all options.
DJ Mode
Background music during VoiceMode sessions with track-level control.
# Core playback voicemode dj play /path/to/music.mp3 # Play a file or URL voicemode dj status # What's playing voicemode dj pause # Pause playback voicemode dj resume # Resume playback voicemode dj stop # Stop playback # Navigation and volume voicemode dj next # Skip to next chapter voicemode dj prev # Go to previous chapter voicemode dj volume 30 # Set volume to 30% # Music For Programming voicemode dj mfp list # List available episodes voicemode dj mfp play 49 # Play episode 49 voicemode dj mfp sync # Convert CUE files to chapters # Music library voicemode dj find "daft punk" # Search library voicemode dj library scan # Index ~/Audio/music voicemode dj library stats # Show library info # Play history and favorites voicemode dj history # Show recent plays voicemode dj favorite # Toggle favorite on current track
Configuration: Set
VOICEMODE_DJ_VOLUME in ~/.voicemode/voicemode.env to customize startup volume (default: 50%).
CLI Cheat Sheet
# Service management voicemode service status # All services voicemode service start whisper # Start a service voicemode service logs kokoro # View logs # Diagnostics voicemode deps # Check dependencies voicemode diag info # System info voicemode diag devices # Audio devices # DJ Mode voicemode dj play <file|url> # Start playback voicemode dj status # What's playing voicemode dj next/prev # Navigate chapters voicemode dj stop # Stop playback voicemode dj mfp play 49 # Music For Programming
Voice Handoff Between Agents
Transfer voice conversations between Claude Code agents for multi-agent workflows.
Use cases:
- Personal assistant routing to project-specific foremen
- Foremen delegating to workers for focused tasks
- Returning control when work is complete
Quick Reference
# 1. Announce the transfer voicemode:converse("Transferring you to a project agent.", wait_for_response=False) # 2. Spawn with voice instructions (mechanism depends on your setup) spawn_agent(path="/path", prompt="Load voicemode skill, use converse to greet user") # 3. Go quiet - let new agent take over
Hand-back:
voicemode:converse("Transferring you back to the assistant.", wait_for_response=False) # Stop conversing, exit or go idle
Key Principles
- Announce transfers: Always tell the user before transferring
- One speaker: Only one agent should use converse at a time
- Distinct voices: Different voices make handoffs audible
- Provide context: Tell receiving agent why user is being transferred
Auto-focus tmux pane on speak (opt-in)
When you run multiple voice agents in separate tmux panes, set
VOICEMODE_AUTO_FOCUS_PANE=true to make tmux follow the speaker. Focus
switches after conch acquisition, so agents waiting on the conch never
steal focus -- only the agent about to speak does. It also respects the
~/.voicemode/focus-hold sentinel written by the show-me plugin, so a
file you just opened stays on screen for its hold window.
# ~/.voicemode/voicemode.env VOICEMODE_AUTO_FOCUS_PANE=true
Off by default. Silent no-op outside tmux.
Detailed Documentation
See Call Routing for comprehensive guides:
- Handoff Pattern - Complete hand-off and hand-back process
- Voice Proxy - Relay pattern for agents without voice
- Call Routing Overview - All routing patterns
Sharing Voice Services Over Tailscale
Expose local Whisper (STT) and Kokoro (TTS) to other devices on your Tailnet via HTTPS.
Why
- Browsers require HTTPS for microphone access (e.g., VoiceMode Connect web app)
- Tailscale serve provides automatic HTTPS with valid Let's Encrypt certificates for
domains*.ts.net - Enables using your powerful local machine's GPU from any device on your Tailnet
Setup
# Expose TTS (Kokoro on port 8880) tailscale serve --bg --set-path /v1/audio/speech http://localhost:8880/v1/audio/speech # Expose STT (Whisper on port 2022) tailscale serve --bg --set-path /v1/audio/transcriptions http://localhost:2022/v1/audio/transcriptions # Verify configuration tailscale serve status # Reset all serve config tailscale serve reset
Endpoints
After setup, endpoints are available at:
- TTS:
https://<hostname>.<tailnet>.ts.net/v1/audio/speech - STT:
https://<hostname>.<tailnet>.ts.net/v1/audio/transcriptions
Important Notes
- Path mapping: Tailscale strips the incoming path before forwarding, so you MUST include the full path in the target URL
- Same-machine testing: Traffic doesn't route through Tailscale locally — test from another Tailnet device
- Multiple paths: You can configure different paths to different backends on the same or different machines
- CORS: Kokoro has CORS configured to allow
originshttps://app.voicemode.dev
Use with VoiceMode Connect
In the VoiceMode Connect web app settings (app.voicemode.dev/settings), set:
- TTS Endpoint:
https://<hostname>.<tailnet>.ts.net - STT Endpoint:
https://<hostname>.<tailnet>.ts.net
Soundfonts
Audio feedback tones that play during Claude Code tool use. Toggle with
voicemode soundfonts on/off. See Soundfonts Guide.
Documentation Index
| Topic | Link |
|---|---|
| Converse Parameters | All Parameters |
| Installation | Getting Started |
| Configuration | Configuration Guide |
| Claude Code Plugin | Plugin Guide |
| Whisper STT | Whisper Setup |
| Kokoro TTS | Kokoro Setup |
| Pronunciation | Pronunciation Guide |
| Troubleshooting | Troubleshooting |
| Soundfonts | Soundfonts Guide |
| CLI Reference | CLI Docs |
| DJ Mode | Background Music |
Related Skills
- VoiceMode Connect - Remote voice via mobile/web clients (no local STT/TTS needed)