install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tools/vox" ~/.claude/skills/diegosouzapw-awesome-omni-skill-vox && rm -rf "$T"
manifest:
skills/tools/vox/SKILL.mdsource content
Vox — Claude Code Skill
Project Overview
Vox is a lightweight voice MCP server (~1,500 lines of Rust) providing local text-to-speech (Kokoro) and speech-to-text (Moonshine Base) via the MCP protocol. It runs as a stdio subprocess per MCP client, or as a shared HTTP daemon.
Build & Test
cargo check # type-check only cargo test # run all unit tests (84 tests across 10 modules) cargo clippy -- -D warnings # lint — must pass with zero warnings cargo build --release # optimized build (LTO + single codegen unit) cargo bench -- resample # benchmark resampling
All of
cargo test, cargo clippy -- -D warnings, and cargo fmt --check must pass before submitting changes.
Architecture
| Module | Purpose |
|---|---|
| Entry point, config loading, model download, stdio/daemon startup |
| Clap CLI parser: daemon, config, download-models subcommands |
| MCP tool handlers (, , ), streaming TTS pipeline |
| Kokoro TTS engine wrapper, voice name → speaker ID resolution, sentence splitting |
| cpal-based mic capture and speaker playback, Lanczos-3 sinc resampling |
| Moonshine Base STT engine wrapper |
| Voice activity detection (Silero ONNX) |
| TOML config loading, env var overrides ( prefix), path resolution |
| HTTP daemon lifecycle: daemonize, PID file, start/stop/status/log |
| Model readiness checks and download/extraction |
| Public re-exports for benchmarks (, , , ) |
| enum with derives |
Transport Modes
- Stdio (default):
. One process per MCP client.rmcp::transport::stdio() - Daemon (
):vox daemon start [--port PORT]
via rmcp. Single process, models loaded once, multiple clients connect over HTTP/SSE. Factory closure creates aStreamableHttpService
per session with sharedVoiceMcpServer
andArc<Mutex<TtsEngine>>
.Arc<Mutex<SttEngine>>
Config System
Precedence (highest wins):
- Environment variables:
,VOX_SPEED
,VOX_VOICE
,VOX_MODEL_DIR
,VOX_LOG_LEVELVOX_PORT - TOML file:
$XDG_CONFIG_HOME/vox/config.toml - Compiled defaults (
)Config::default()
CLI management:
vox config get [key], vox config set <key> <value>, vox config path
MCP Tools
| Tool | Description |
|---|---|
| Speak text aloud through speakers (TTS only) |
| Record from microphone and transcribe (STT only) |
| Speak text then listen for response (TTS + STT round-trip) |
Available Voices
American female (
af_*): heart, alloy, aoede, bella, jessica, kore, nicole, nova, river, sarah, sky
American male (am_*): adam, echo, eric, liam, michael, onyx, puck, santa
British female (bf_*): alice, emma, lily
British male (bm_*): daniel, fable, george, lewis
Default:
af_heart (ID 0). Voices can be specified by name or numeric ID.
Code Conventions
- Edition 2024 — uses
chains (let
)if let Ok(x) = ... && let Ok(y) = ... - Visibility:
for test-only exposure, not fullypub(crate)pub - Clippy: treat all warnings as errors (
)-D warnings - Tests: inline
per module,#[cfg(test)] mod tests
for filesystem teststempfile
:unsafe impl Send
andTtsEngine
have manualCaptureHandle
impls due to non-Send cpal/sherpa internals confined to dedicated threadsSend
Dev Workflow
- Make changes
— verify all 84 tests passcargo test
— zero warningscargo clippy -- -D warnings
— formattingcargo fmt --check- If touching
resampling:audio.rscargo bench -- resample