Learn-skills.dev whisper-voice

Native macOS menu bar app for live voice-to-text with auto-type using WhisperKit on Apple Silicon

install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/aiagentwithdhruv/skills/whisper-voice" ~/.claude/skills/neversight-learn-skills-dev-whisper-voice && rm -rf "$T"
manifest: data/skills-md/aiagentwithdhruv/skills/whisper-voice/SKILL.md
source content

Whisper Voice — Live Speech-to-Text Mac App

Goal

Build and run a native macOS menu bar app that captures live microphone audio, transcribes it offline using WhisperKit (on Apple Silicon), and auto-types the text wherever the cursor is.

Inputs

NameTypeRequiredDescription
model_sizestringNoWhisper model: tiny, base (default), small
languagestringNo"en" (default) or "hi" for Hindi mode
chunk_durationfloatNoSeconds per audio chunk (default: 3.0)

Process

1. Build the app

cd AiwithDhruv_Voice/WhisperAiwithDhruv
swift build

2. Run the app

swift run WhisperAiwithDhruv
# Or open in Xcode: open Package.swift → Cmd+R

3. First launch setup

  1. Grant microphone permission when prompted
  2. Grant Accessibility in System Settings → Privacy → Accessibility
  3. Wait for model download (~140MB for base model)

4. Usage

  • Cmd+Shift+Space — Toggle recording on/off
  • Click mic icon in menu bar for controls
  • Speak — text auto-types at cursor position
  • Toggle Hindi mode for Hindi/Hinglish input

Outputs

NameTypeDescription
transcribed_textstringLive transcribed text typed at cursor
historyarrayLast 50 transcription entries in menu bar

Edge Cases

  • No mic: Shows error in menu bar dropdown
  • Accessibility denied: Auto-type disabled, manual copy from history
  • Silence: VAD skips silent chunks (energy-based threshold)
  • Hallucinations: Filters common Whisper artifacts ("Thank you.", "...")
  • Model not downloaded: Shows download progress bar

Environment

  • macOS 14+ (Sonoma)
  • Apple Silicon (M1/M2/M3/M4)
  • Xcode 15+ (for building)
  • No API keys needed (fully offline)

Schema

Inputs

NameTypeRequiredDescription
model_sizestringNotiny / base / small
languagestringNoen / hi
chunk_durationfloatNo2.0 - 8.0 seconds
silence_thresholdfloatNo0.002 - 0.05

Outputs

NameTypeDescription
transcriptionstringLive text output
auto_typedbooleanWhether text was injected at cursor

Credentials

NameSource
NoneFully offline, no API keys

Composable With

video-edit
(add transcription captions),
send-telegram
(send transcriptions to phone)

Cost

Free — runs entirely on-device. Model download is one-time (~140MB for base).