Voicemode voicemode

Voice interaction for Claude Code. Use when users mention voice mode, speak, talk, converse, voice status, or voice troubleshooting.

install
source · Clone the upstream repo
git clone https://github.com/mbailey/voicemode
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mbailey/voicemode "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/voicemode" ~/.claude/skills/mbailey-voicemode-voicemode && rm -rf "$T"
manifest: .claude/skills/voicemode/SKILL.md
source content

First-Time Setup

If VoiceMode isn't working or MCP fails to connect, run:

/voicemode:install

After install, reconnect MCP:

/mcp
→ select voicemode → "Reconnect" (or restart Claude Code).


VoiceMode

Natural voice conversations with Claude Code using speech-to-text (STT) and text-to-speech (TTS).

Note: The Python package is

voice-mode
(hyphen), but the CLI command is
voicemode
(no hyphen).

When to Use MCP vs CLI

TaskUseWhy
Voice conversationsMCP
voicemode:converse
Faster - server already running
Service start/stopMCP
voicemode:service
Works within Claude Code
InstallationCLI
voice-mode-install
One-time setup
ConfigurationCLI
voicemode config
Edit settings directly
DiagnosticsCLI
voicemode diag
Administrative tasks

Usage

Use the

converse
MCP tool to speak to users and hear their responses:

# Speak and listen for response (most common usage)
voicemode:converse("Hello! What would you like to work on?")

# Speak without waiting (for narration while working)
voicemode:converse("Searching the codebase now...", wait_for_response=False)

For most conversations, just pass your message - defaults handle everything else. Use default converse tool parameters unless there's a good reason not to. Timing parameters (

listen_duration_max
,
listen_duration_min
) use smart defaults with silence detection - don't override unless the user requests it or you see a clear need. Defaults are configurable by the user via
~/.voicemode/voicemode.env
.

ParameterDefaultDescription
message
requiredText to speak
wait_for_response
trueListen after speaking
voice
autoTTS voice

For all parameters, see Converse Parameters.

Best Practices

  1. Narrate without waiting - Use
    wait_for_response=False
    when announcing actions
  2. One question at a time - Don't bundle multiple questions in voice mode
  3. Check status first - Verify services are running before starting conversations
  4. Let VoiceMode auto-select - Don't hardcode providers unless user has preference
  5. First run is slow - Model downloads happen on first start (2-5 min), then instant

Parallel Tool Calls (Zero Dead Air)

When performing actions during a voice conversation, use parallel tool calls to eliminate dead air. Send the voice message and the action in the same turn so they execute concurrently.

Pattern: Speak + Act in Parallel

# FAST: One turn — voice and action fire simultaneously
# Turn 1: speak (fire-and-forget) + do the work (all parallel)
voicemode:converse("Checking that now.", wait_for_response=False)
bash("git status")
Agent(prompt="Research X", run_in_background=True)

# Turn 2: speak the results (with listening)
voicemode:converse("Here's what I found: ...", wait_for_response=True)
# SLOW: Two turns — unnecessary sequential delay
# Turn 1: speak
voicemode:converse("Checking that now.", wait_for_response=False)
# Turn 2: do the work
bash("git status")
# Turn 3: speak results
voicemode:converse("Here's what I found: ...", wait_for_response=True)

When to Use Parallel vs Sequential

ScenarioApproachWhy
Announce + do workParallelNo dependency between speech and action
Announce + spawn agentParallelAgent runs in background anyway
Check result then reportSequentialNeed result before speaking
Listen for responseSequential
wait_for_response=True
blocks until user finishes

Key Rules

  • All tool types can be parallel: MCP, Bash, Agent, Read — mix freely in one turn
  • Wall-clock time = longest call, not the sum of all calls
  • Use
    wait_for_response=False
    for the speak call when combining with other tools
  • Great for demos: Audience hears continuous speech with no awkward silences

Handling Pauses and Wait Requests

When the user asks you to wait or give them time:

Short pauses (up to 60 seconds): If the user says something ending with "wait" (e.g., "hang on", "give me a sec", "wait"), VoiceMode automatically pauses for 60 seconds then resumes listening. This is built-in.

Longer pauses (2+ minutes): Use

bash sleep N
where N is seconds. For example, if the user says "give me 5 minutes":

sleep 300  # Wait 5 minutes

Then call converse again when the wait is over:

voicemode:converse("Five minutes is up. Ready when you are.")

Configuration: The short pause duration is configurable via

VOICEMODE_WAIT_DURATION
(default: 60 seconds).

STT Recovery - Manual Transcription

If Whisper STT fails but the audio was recorded successfully, you can manually transcribe the saved audio file:

# Transcribe the most recent recording
whisper-cli ~/.voicemode/audio/latest-STT.wav

# Or check if file exists first (safe for inclusion in automation)
if [ -f ~/.voicemode/audio/latest-STT.wav ]; then
  whisper-cli ~/.voicemode/audio/latest-STT.wav
fi

Requirements:

  • Audio saving must be enabled via one of:
    • VOICEMODE_SAVE_AUDIO=true
      in
      ~/.voicemode/voicemode.env
    • VOICEMODE_SAVE_ALL=true
      (saves all audio and transcriptions)
    • VOICEMODE_DEBUG=true
      (enables debug mode with audio saving)

How it works:

  • VoiceMode saves all STT recordings to
    ~/.voicemode/audio/
    with timestamps
  • The
    latest-STT.wav
    symlink always points to the most recent recording
  • If the STT API fails, the recording is still saved for manual recovery
  • This lets you recover the user's speech without asking them to repeat

When to use:

  • STT service timeout or connection failure
  • Transcription returned empty but user definitely spoke
  • Need to verify what was actually said vs. what was transcribed

See also: Troubleshooting - No Speech Detected

Check Status

voicemode service status          # All services
voicemode service status whisper  # Specific service

Shows service status including running state, ports, and health.

Installation

# Install VoiceMode CLI and configure services
uvx voice-mode-install --yes

# Install local services (Apple Silicon recommended)
voicemode service install whisper
voicemode service install kokoro

See Getting Started for detailed steps.

Service Management

# Start/stop services
voicemode:service("whisper", "start")
voicemode:service("kokoro", "start")

# View logs for troubleshooting
voicemode:service("whisper", "logs", lines=50)
ServicePortPurpose
whisper2022Speech-to-text
kokoro8880Text-to-speech
voicemode8765HTTP/SSE server

Actions: status, start, stop, restart, logs, enable, disable

Configuration

voicemode config list                           # Show all settings
voicemode config set VOICEMODE_TTS_VOICE nova   # Set default voice
voicemode config edit                           # Edit config file

Config file:

~/.voicemode/voicemode.env

See Configuration Guide for all options.

DJ Mode

Background music during VoiceMode sessions with track-level control.

# Core playback
voicemode dj play /path/to/music.mp3  # Play a file or URL
voicemode dj status                    # What's playing
voicemode dj pause                     # Pause playback
voicemode dj resume                    # Resume playback
voicemode dj stop                      # Stop playback

# Navigation and volume
voicemode dj next                      # Skip to next chapter
voicemode dj prev                      # Go to previous chapter
voicemode dj volume 30                 # Set volume to 30%

# Music For Programming
voicemode dj mfp list                  # List available episodes
voicemode dj mfp play 49               # Play episode 49
voicemode dj mfp sync                  # Convert CUE files to chapters

# Music library
voicemode dj find "daft punk"          # Search library
voicemode dj library scan              # Index ~/Audio/music
voicemode dj library stats             # Show library info

# Play history and favorites
voicemode dj history                   # Show recent plays
voicemode dj favorite                  # Toggle favorite on current track

Configuration: Set

VOICEMODE_DJ_VOLUME
in
~/.voicemode/voicemode.env
to customize startup volume (default: 50%).

CLI Cheat Sheet

# Service management
voicemode service status            # All services
voicemode service start whisper     # Start a service
voicemode service logs kokoro       # View logs

# Diagnostics
voicemode deps                      # Check dependencies
voicemode diag info                 # System info
voicemode diag devices              # Audio devices

# DJ Mode
voicemode dj play <file|url>        # Start playback
voicemode dj status                 # What's playing
voicemode dj next/prev              # Navigate chapters
voicemode dj stop                   # Stop playback
voicemode dj mfp play 49            # Music For Programming

Voice Handoff Between Agents

Transfer voice conversations between Claude Code agents for multi-agent workflows.

Use cases:

  • Personal assistant routing to project-specific foremen
  • Foremen delegating to workers for focused tasks
  • Returning control when work is complete

Quick Reference

# 1. Announce the transfer
voicemode:converse("Transferring you to a project agent.", wait_for_response=False)

# 2. Spawn with voice instructions (mechanism depends on your setup)
spawn_agent(path="/path", prompt="Load voicemode skill, use converse to greet user")

# 3. Go quiet - let new agent take over

Hand-back:

voicemode:converse("Transferring you back to the assistant.", wait_for_response=False)
# Stop conversing, exit or go idle

Key Principles

  1. Announce transfers: Always tell the user before transferring
  2. One speaker: Only one agent should use converse at a time
  3. Distinct voices: Different voices make handoffs audible
  4. Provide context: Tell receiving agent why user is being transferred

Auto-focus tmux pane on speak (opt-in)

When you run multiple voice agents in separate tmux panes, set

VOICEMODE_AUTO_FOCUS_PANE=true
to make tmux follow the speaker. Focus switches after conch acquisition, so agents waiting on the conch never steal focus -- only the agent about to speak does. It also respects the
~/.voicemode/focus-hold
sentinel written by the show-me plugin, so a file you just opened stays on screen for its hold window.

# ~/.voicemode/voicemode.env
VOICEMODE_AUTO_FOCUS_PANE=true

Off by default. Silent no-op outside tmux.

Detailed Documentation

See Call Routing for comprehensive guides:

Sharing Voice Services Over Tailscale

Expose local Whisper (STT) and Kokoro (TTS) to other devices on your Tailnet via HTTPS.

Why

  • Browsers require HTTPS for microphone access (e.g., VoiceMode Connect web app)
  • Tailscale serve provides automatic HTTPS with valid Let's Encrypt certificates for
    *.ts.net
    domains
  • Enables using your powerful local machine's GPU from any device on your Tailnet

Setup

# Expose TTS (Kokoro on port 8880)
tailscale serve --bg --set-path /v1/audio/speech http://localhost:8880/v1/audio/speech

# Expose STT (Whisper on port 2022)
tailscale serve --bg --set-path /v1/audio/transcriptions http://localhost:2022/v1/audio/transcriptions

# Verify configuration
tailscale serve status

# Reset all serve config
tailscale serve reset

Endpoints

After setup, endpoints are available at:

  • TTS:
    https://<hostname>.<tailnet>.ts.net/v1/audio/speech
  • STT:
    https://<hostname>.<tailnet>.ts.net/v1/audio/transcriptions

Important Notes

  • Path mapping: Tailscale strips the incoming path before forwarding, so you MUST include the full path in the target URL
  • Same-machine testing: Traffic doesn't route through Tailscale locally — test from another Tailnet device
  • Multiple paths: You can configure different paths to different backends on the same or different machines
  • CORS: Kokoro has CORS configured to allow
    https://app.voicemode.dev
    origins

Use with VoiceMode Connect

In the VoiceMode Connect web app settings (app.voicemode.dev/settings), set:

  • TTS Endpoint:
    https://<hostname>.<tailnet>.ts.net
  • STT Endpoint:
    https://<hostname>.<tailnet>.ts.net

Soundfonts

Audio feedback tones that play during Claude Code tool use. Toggle with

voicemode soundfonts on/off
. See Soundfonts Guide.

Documentation Index

TopicLink
Converse ParametersAll Parameters
InstallationGetting Started
ConfigurationConfiguration Guide
Claude Code PluginPlugin Guide
Whisper STTWhisper Setup
Kokoro TTSKokoro Setup
PronunciationPronunciation Guide
TroubleshootingTroubleshooting
SoundfontsSoundfonts Guide
CLI ReferenceCLI Docs
DJ ModeBackground Music

Related Skills