Skills local-stt
Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/araa47/local-stt" ~/.claude/skills/openclaw-skills-local-stt && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/araa47/local-stt" ~/.openclaw/skills/openclaw-skills-local-stt && rm -rf "$T"
manifest:
skills/araa47/local-stt/SKILL.mdsource content
Local STT (Parakeet / Whisper)
Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:
- Parakeet (default): Best accuracy for English, correctly captures names and filler words
- Whisper: Fastest inference, supports 99 languages
Usage
# Default: Parakeet v2 (best English accuracy) ~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg # Explicit backend selection ~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b whisper ~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b parakeet -m v3 # Quiet mode (suppress progress) ~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg --quiet
Options
:-b/--backend
(default),parakeetwhisper
: Model variant (see below)-m/--model
: Disable int8 quantization--no-int8
: Suppress progress-q/--quiet
: Matrix room ID for direct message--room-id
Models
Parakeet (default backend)
| Model | Description |
|---|---|
| v2 (default) | English only, best accuracy |
| v3 | Multilingual |
Whisper
| Model | Description |
|---|---|
| tiny | Fastest, lower accuracy |
| base (default) | Good balance |
| small | Better accuracy |
| large-v3-turbo | Best quality, slower |
Benchmark (24s audio)
| Backend/Model | Time | RTF | Notes |
|---|---|---|---|
| Whisper Base int8 | 0.43s | 0.018x | Fastest |
| Parakeet v2 int8 | 0.60s | 0.025x | Best accuracy |
| Parakeet v3 int8 | 0.63s | 0.026x | Multilingual |
openclaw.json
{ "tools": { "media": { "audio": { "enabled": true, "models": [ { "type": "cli", "command": "~/.openclaw/skills/local-stt/scripts/local-stt.py", "args": ["--quiet", "{{MediaPath}}"], "timeoutSeconds": 30 } ] } } } }