Skills local-stt

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/araa47/local-stt" ~/.claude/skills/openclaw-skills-local-stt && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/araa47/local-stt" ~/.openclaw/skills/openclaw-skills-local-stt && rm -rf "$T"
manifest: skills/araa47/local-stt/SKILL.md
source content

Local STT (Parakeet / Whisper)

Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:

  • Parakeet (default): Best accuracy for English, correctly captures names and filler words
  • Whisper: Fastest inference, supports 99 languages

Usage

# Default: Parakeet v2 (best English accuracy)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg

# Explicit backend selection
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b whisper
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b parakeet -m v3

# Quiet mode (suppress progress)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg --quiet

Options

  • -b/--backend
    :
    parakeet
    (default),
    whisper
  • -m/--model
    : Model variant (see below)
  • --no-int8
    : Disable int8 quantization
  • -q/--quiet
    : Suppress progress
  • --room-id
    : Matrix room ID for direct message

Models

Parakeet (default backend)

ModelDescription
v2 (default)English only, best accuracy
v3Multilingual

Whisper

ModelDescription
tinyFastest, lower accuracy
base (default)Good balance
smallBetter accuracy
large-v3-turboBest quality, slower

Benchmark (24s audio)

Backend/ModelTimeRTFNotes
Whisper Base int80.43s0.018xFastest
Parakeet v2 int80.60s0.025xBest accuracy
Parakeet v3 int80.63s0.026xMultilingual

openclaw.json

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "~/.openclaw/skills/local-stt/scripts/local-stt.py",
            "args": ["--quiet", "{{MediaPath}}"],
            "timeoutSeconds": 30
          }
        ]
      }
    }
  }
}