Awesome-omni-skill vox

Lightweight voice MCP server with local Moonshine STT + Kokoro TTS

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tools/vox" ~/.claude/skills/diegosouzapw-awesome-omni-skill-vox && rm -rf "$T"
manifest: skills/tools/vox/SKILL.md
source content

Vox — Claude Code Skill

Project Overview

Vox is a lightweight voice MCP server (~1,500 lines of Rust) providing local text-to-speech (Kokoro) and speech-to-text (Moonshine Base) via the MCP protocol. It runs as a stdio subprocess per MCP client, or as a shared HTTP daemon.

Build & Test

cargo check                    # type-check only
cargo test                     # run all unit tests (84 tests across 10 modules)
cargo clippy -- -D warnings    # lint — must pass with zero warnings
cargo build --release          # optimized build (LTO + single codegen unit)
cargo bench -- resample        # benchmark resampling

All of

cargo test
,
cargo clippy -- -D warnings
, and
cargo fmt --check
must pass before submitting changes.

Architecture

ModulePurpose
main.rs
Entry point, config loading, model download, stdio/daemon startup
cli.rs
Clap CLI parser: daemon, config, download-models subcommands
server.rs
MCP tool handlers (
say
,
listen
,
converse
), streaming TTS pipeline
tts.rs
Kokoro TTS engine wrapper, voice name → speaker ID resolution, sentence splitting
audio.rs
cpal-based mic capture and speaker playback, Lanczos-3 sinc resampling
stt.rs
Moonshine Base STT engine wrapper
vad.rs
Voice activity detection (Silero ONNX)
config.rs
TOML config loading, env var overrides (
VOX_*
prefix), path resolution
daemon.rs
HTTP daemon lifecycle: daemonize, PID file, start/stop/status/log
models.rs
Model readiness checks and download/extraction
lib.rs
Public re-exports for benchmarks (
audio
,
config
,
error
,
tts
)
error.rs
VoiceError
enum with
thiserror
derives

Transport Modes

  • Stdio (default):
    rmcp::transport::stdio()
    . One process per MCP client.
  • Daemon (
    vox daemon start [--port PORT]
    ):
    StreamableHttpService
    via rmcp. Single process, models loaded once, multiple clients connect over HTTP/SSE. Factory closure creates a
    VoiceMcpServer
    per session with shared
    Arc<Mutex<TtsEngine>>
    and
    Arc<Mutex<SttEngine>>
    .

Config System

Precedence (highest wins):

  1. Environment variables:
    VOX_SPEED
    ,
    VOX_VOICE
    ,
    VOX_MODEL_DIR
    ,
    VOX_LOG_LEVEL
    ,
    VOX_PORT
  2. TOML file:
    $XDG_CONFIG_HOME/vox/config.toml
  3. Compiled defaults (
    Config::default()
    )

CLI management:

vox config get [key]
,
vox config set <key> <value>
,
vox config path

MCP Tools

ToolDescription
say
Speak text aloud through speakers (TTS only)
listen
Record from microphone and transcribe (STT only)
converse
Speak text then listen for response (TTS + STT round-trip)

Available Voices

American female (

af_*
): heart, alloy, aoede, bella, jessica, kore, nicole, nova, river, sarah, sky American male (
am_*
): adam, echo, eric, liam, michael, onyx, puck, santa British female (
bf_*
): alice, emma, lily British male (
bm_*
): daniel, fable, george, lewis

Default:

af_heart
(ID 0). Voices can be specified by name or numeric ID.

Code Conventions

  • Edition 2024 — uses
    let
    chains (
    if let Ok(x) = ... && let Ok(y) = ...
    )
  • Visibility:
    pub(crate)
    for test-only exposure, not fully
    pub
  • Clippy: treat all warnings as errors (
    -D warnings
    )
  • Tests: inline
    #[cfg(test)] mod tests
    per module,
    tempfile
    for filesystem tests
  • unsafe impl Send
    :
    TtsEngine
    and
    CaptureHandle
    have manual
    Send
    impls due to non-Send cpal/sherpa internals confined to dedicated threads

Dev Workflow

  1. Make changes
  2. cargo test
    — verify all 84 tests pass
  3. cargo clippy -- -D warnings
    — zero warnings
  4. cargo fmt --check
    — formatting
  5. If touching
    audio.rs
    resampling:
    cargo bench -- resample