Claude-code-minoan parakeet

install
source · Clone the upstream repo
git clone https://github.com/tdimino/claude-code-minoan
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/tdimino/claude-code-minoan "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/integration-automation/parakeet" ~/.claude/skills/tdimino-claude-code-minoan-parakeet && rm -rf "$T"
manifest: skills/integration-automation/parakeet/SKILL.md
source content

Parakeet Dictation Skill

Local speech-to-text powered by NVIDIA Parakeet TDT 0.6B V3 (~600MB model, 100% offline).

Two Modes

1. Handy App (Primary — Push-to-Talk into Any Text Field)

Handy is a free, open-source Tauri app (Rust + React) providing push-to-talk dictation with Parakeet V3 built in. Inference via transcribe-rs (ONNX Runtime, int8 quantized).

brew install --cask handy
  • Default hotkey: ⌥Space (Option-Space) on macOS, Ctrl-Space on Windows/Linux
  • Modes: Push-to-talk (hold) or toggle (press to start/stop)
  • Select Parakeet V3 in Settings → Models (auto-downloads ~478MB)
  • Grant microphone + accessibility permissions
  • Includes VAD (Silero), model management UI
  • Additional models: Whisper (Small/Medium/Turbo/Large), Moonshine, SenseVoice
  • Models stored at
    ~/Library/Application Support/com.pais.handy/models/

2. CLI Scripts (Claude Code File Transcription & Terminal Dictation)

CLI scripts remain for headless/terminal use within Claude Code. These use NeMo/PyTorch.

Performance

SystemSpeedEngine
Handy (M4 Max)~30x realtimetranscribe-rs / ONNX int8
Handy (Zen 3)~20x realtimetranscribe-rs / ONNX int8
Handy (Skylake i5)~5x realtimetranscribe-rs / ONNX int8
NeMo CLI (MPS)VariesNeMo / PyTorch
  • Accuracy: 6.05% WER (Word Error Rate)
  • Languages: 25 European languages with automatic detection (no prompting)
  • Privacy: 100% local processing, no cloud API
  • License: CC BY 4.0 (model), MIT (Handy app)

Commands

Transcribe Audio File

/parakeet path/to/audio.wav
/parakeet ~/recordings/interview.mp3
/parakeet meeting.m4a

Supported formats:

.wav
,
.mp3
,
.m4a
,
.flac
,
.ogg
,
.aac

Live Dictation (Terminal)

/parakeet
/parakeet dictate

Record from microphone until Enter is pressed, then transcribe.

Check Installation

/parakeet check

Verify Parakeet is properly installed and model can load.

Setup

Handy (Push-to-Talk UI)

brew install --cask handy

Launch from Applications, select Parakeet V3 model, configure hotkey.

CLI Scripts (Prerequisites)

  1. Parakeet Dictate repo at
    ~/Programming/parakeet-dictate/
    with Python venv
  2. Install dependencies:
    cd ~/Programming/parakeet-dictate
    uv venv && uv pip install -r requirements.txt
    
  3. (Optional) Set custom path:
    export PARAKEET_HOME=/path/to/parakeet-dictate

Implementation

When this skill is invoked:

  1. For audio files: Run the transcription script

    cd ~/.claude/skills/parakeet/scripts && \
    ${PARAKEET_HOME:-~/Programming/parakeet-dictate}/.venv/bin/python transcribe.py "<filepath>"
    
  2. For live dictation: Run the dictation script

    cd ~/.claude/skills/parakeet/scripts && \
    ${PARAKEET_HOME:-~/Programming/parakeet-dictate}/.venv/bin/python dictate.py
    
  3. For checking setup: Run the check script

    cd ~/.claude/skills/parakeet/scripts && \
    ${PARAKEET_HOME:-~/Programming/parakeet-dictate}/.venv/bin/python check_setup.py
    

3. Server Mode (OpenAI-Compatible STT for Takopi)

An OpenAI-compatible transcription server for integration with Takopi and other API clients. Exposes

POST /v1/audio/transcriptions
wrapping the NeMo CLI.

uv run --with fastapi,uvicorn,python-multipart \
  ~/.claude/skills/parakeet/scripts/parakeet_server.py --port 8384

Takopi config (

~/.takopi/takopi.toml
):

voice_transcription_base_url = "http://localhost:8384/v1"
voice_transcription_api_key = "local"
voice_transcription_model = "parakeet-tdt-0.6b"

Endpoints:

/v1/audio/transcriptions
(POST, multipart),
/v1/models
(GET),
/health
(GET). Supports
response_format
:
json
,
text
,
verbose_json
. Accepts
.wav
,
.mp3
,
.m4a
,
.flac
,
.ogg
,
.aac
,
.oga
,
.webm
. 50 MB upload limit. 120s transcription timeout.

Model Caches

SystemCache LocationSizeEngine
Handy
~/Library/Application Support/com.pais.handy/models/
~478MBtranscribe-rs (ONNX int8)
NeMo CLI
~/.cache/nemo/
~1.2GBNeMo / PyTorch

Model caches are separate. Handy's Parakeet V3 int8 model structure:

parakeet-tdt-0.6b-v3-int8/
├── encoder-model.int8.onnx
├── decoder_joint-model.int8.onnx
├── nemo128.onnx (audio preprocessor)
└── vocab.txt

Troubleshooting

"No module named nemo"

Use the Parakeet virtual environment. Scripts automatically use the correct Python.

"MPS not available"

Apple Silicon Metal acceleration requires PyTorch 2.0+. Falls back to CPU automatically.

"Permission denied: microphone"

Grant microphone access in System Preferences → Privacy & Security → Microphone.

Model download slow

The Parakeet model downloads on first use (~478MB for Handy, ~1.2GB for NeMo). Subsequent runs use cache.

Configuration

VariableDefaultDescription
PARAKEET_HOME
~/Programming/parakeet-dictate
Parakeet Dictate installation path

Dependencies

Handy:

brew install --cask handy
(standalone, no other deps)

CLI scripts require:

  • Parakeet Dictate repo at
    $PARAKEET_HOME
    (default:
    ~/Programming/parakeet-dictate
    )
  • Python virtual environment at
    $PARAKEET_HOME/.venv
  • NeMo toolkit with ASR support (
    nemo_toolkit[asr]>=2.0.0
    )
  • PyTorch 2.0+ (for MPS/CUDA acceleration)
  • soundfile and sounddevice for audio handling