Trending-skills type4me-macos-voice-input
MacOS voice input tool with local/cloud ASR engines, LLM text optimization, and fully local storage built in Swift
git clone https://github.com/Aradotso/trending-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/Aradotso/trending-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/type4me-macos-voice-input" ~/.claude/skills/aradotso-trending-skills-type4me-macos-voice-input && rm -rf "$T"
skills/type4me-macos-voice-input/SKILL.mdType4Me macOS Voice Input
Skill by ara.so — Daily 2026 Skills collection.
Type4Me is a macOS voice input tool that captures audio via global hotkey, transcribes it using local (SherpaOnnx/Paraformer/Zipformer) or cloud (Volcengine/Deepgram) ASR engines, optionally post-processes text via LLM, and injects the result into any app. All credentials and history are stored locally — no telemetry, no cloud sync.
Architecture Overview
Type4Me/ ├── ASR/ # ASR engine abstraction │ ├── ASRProvider.swift # Provider enum + protocols │ ├── ASRProviderRegistry.swift # Plugin registry │ ├── Providers/ # Per-vendor config files │ ├── SherpaASRClient.swift # Local streaming ASR │ ├── SherpaOfflineASRClient.swift │ ├── VolcASRClient.swift # Volcengine streaming ASR │ └── DeepgramASRClient.swift # Deepgram streaming ASR ├── Bridge/ # SherpaOnnx C API Swift bridge ├── Audio/ # Audio capture ├── Session/ # Core state machine: record→ASR→inject ├── Input/ # Global hotkey management ├── Services/ # Credentials, hotwords, model manager ├── Protocol/ # Volcengine WebSocket codec └── UI/ # SwiftUI (FloatingBar + Settings)
Installation
Prerequisites
# Xcode Command Line Tools xcode-select --install # CMake (for local ASR engine) brew install cmake
Build & Deploy from Source
git clone https://github.com/joewongjc/type4me.git cd type4me # Step 1: Compile SherpaOnnx local engine (~5 min, one-time) bash scripts/build-sherpa.sh # Step 2: Build, bundle, sign, install to /Applications, and launch bash scripts/deploy.sh
Download Pre-built App
Download
Type4Me-v1.2.3.dmg from releases (cloud ASR only, no local engine):
https://github.com/joewongjc/type4me/releases/tag/v1.2.3
If macOS blocks the app:
xattr -d com.apple.quarantine /Applications/Type4Me.app
Download Local ASR Models
mkdir -p ~/Library/Application\ Support/Type4Me/Models # Option A: Lightweight ~20MB tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01.tar.bz2 \ -C ~/Library/Application\ Support/Type4Me/Models/ # Option B: Balanced ~236MB (recommended) tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2 \ -C ~/Library/Application\ Support/Type4Me/Models/ # Option C: Bilingual Chinese+English ~1GB tar xjf ~/Downloads/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2 \ -C ~/Library/Application\ Support/Type4Me/Models/
Expected structure for Paraformer model:
~/Library/Application Support/Type4Me/Models/ └── sherpa-onnx-streaming-paraformer-bilingual-zh-en/ ├── encoder.int8.onnx ├── decoder.int8.onnx └── tokens.txt
Key Protocols
SpeechRecognizer Protocol
Every ASR client must implement this protocol:
protocol SpeechRecognizer: AnyObject { /// Start a new recognition session func startRecognition() async throws /// Feed raw PCM audio data func appendAudio(_ buffer: AVAudioPCMBuffer) async /// Stop and get final result func stopRecognition() async throws -> String /// Cancel without result func cancelRecognition() async /// Streaming partial results (optional) var partialResultHandler: ((String) -> Void)? { get set } }
ASRProviderConfig Protocol
Each vendor's credential definition:
protocol ASRProviderConfig { /// Unique identifier string static var providerID: String { get } /// Display name in Settings UI static var displayName: String { get } /// Credential fields shown in Settings static var credentialFields: [CredentialField] { get } /// Validate credentials before use static func validate(_ credentials: [String: String]) -> Bool /// Create the recognizer instance static func createClient( credentials: [String: String], config: RecognitionConfig ) throws -> SpeechRecognizer }
Adding a New ASR Provider
Step 1: Create Provider Config
Create
Type4Me/ASR/Providers/OpenAIWhisperProvider.swift:
import Foundation struct OpenAIWhisperProvider: ASRProviderConfig { static let providerID = "openai_whisper" static let displayName = "OpenAI Whisper" static let credentialFields: [CredentialField] = [ CredentialField( key: "api_key", label: "API Key", placeholder: "sk-...", isSecret: true ), CredentialField( key: "model", label: "Model", placeholder: "whisper-1", isSecret: false ) ] static func validate(_ credentials: [String: String]) -> Bool { guard let apiKey = credentials["api_key"], !apiKey.isEmpty else { return false } return apiKey.hasPrefix("sk-") } static func createClient( credentials: [String: String], config: RecognitionConfig ) throws -> SpeechRecognizer { guard let apiKey = credentials["api_key"] else { throw ASRError.missingCredential("api_key") } let model = credentials["model"] ?? "whisper-1" return OpenAIWhisperASRClient(apiKey: apiKey, model: model, config: config) } }
Step 2: Implement the ASR Client
Create
Type4Me/ASR/OpenAIWhisperASRClient.swift:
import Foundation import AVFoundation final class OpenAIWhisperASRClient: SpeechRecognizer { var partialResultHandler: ((String) -> Void)? private let apiKey: String private let model: String private let config: RecognitionConfig private var audioData: Data = Data() init(apiKey: String, model: String, config: RecognitionConfig) { self.apiKey = apiKey self.model = model self.config = config } func startRecognition() async throws { audioData = Data() } func appendAudio(_ buffer: AVAudioPCMBuffer) async { // Convert PCM buffer to raw bytes and accumulate guard let channelData = buffer.floatChannelData?[0] else { return } let frameCount = Int(buffer.frameLength) let bytes = UnsafeBufferPointer(start: channelData, count: frameCount) // Convert Float32 PCM to Int16 for Whisper API let int16Samples = bytes.map { sample -> Int16 in return Int16(max(-32768, min(32767, Int(sample * 32767)))) } int16Samples.withUnsafeBytes { ptr in audioData.append(contentsOf: ptr) } } func stopRecognition() async throws -> String { // Build multipart form request to Whisper API var request = URLRequest(url: URL(string: "https://api.openai.com/v1/audio/transcriptions")!) request.httpMethod = "POST" request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization") let boundary = UUID().uuidString request.setValue("multipart/form-data; boundary=\(boundary)", forHTTPHeaderField: "Content-Type") var body = Data() // Append audio file part body.append("--\(boundary)\r\n".data(using: .utf8)!) body.append("Content-Disposition: form-data; name=\"file\"; filename=\"audio.raw\"\r\n".data(using: .utf8)!) body.append("Content-Type: audio/raw\r\n\r\n".data(using: .utf8)!) body.append(audioData) body.append("\r\n".data(using: .utf8)!) // Append model part body.append("--\(boundary)\r\n".data(using: .utf8)!) body.append("Content-Disposition: form-data; name=\"model\"\r\n\r\n".data(using: .utf8)!) body.append("\(model)\r\n".data(using: .utf8)!) body.append("--\(boundary)--\r\n".data(using: .utf8)!) request.httpBody = body let (data, response) = try await URLSession.shared.data(for: request) guard let httpResponse = response as? HTTPURLResponse, httpResponse.statusCode == 200 else { throw ASRError.networkError("Whisper API returned error") } let result = try JSONDecoder().decode(WhisperResponse.self, from: data) return result.text } func cancelRecognition() async { audioData = Data() } } private struct WhisperResponse: Codable { let text: String }
Step 3: Register the Provider
In
Type4Me/ASR/ASRProviderRegistry.swift, add to the all array:
struct ASRProviderRegistry { static let all: [any ASRProviderConfig.Type] = [ SherpaParaformerProvider.self, VolcengineProvider.self, DeepgramProvider.self, OpenAIWhisperProvider.self, // ← Add your provider here ] }
Credentials Storage
Credentials are stored at
~/Library/Application Support/Type4Me/credentials.json with permissions 0600. Never hardcode secrets — always load via CredentialStore:
// Reading credentials let store = CredentialStore.shared let apiKey = store.get(providerID: "openai_whisper", key: "api_key") // Writing credentials store.set(providerID: "openai_whisper", key: "api_key", value: userInputKey) // Checking if configured let isConfigured = store.isConfigured(providerID: "openai_whisper", fields: OpenAIWhisperProvider.credentialFields)
Custom Processing Modes with Prompt Variables
Processing modes use LLM post-processing with three context variables:
| Variable | Value |
|---|---|
| Recognized speech text |
| Text selected in active app at record start |
| Clipboard content at record start |
Example custom mode prompts:
// Translate selection using voice command let translatePrompt = """ The user selected this text: {selected} Voice command: {text} Execute the command on the selected text. Output only the result. """ // Code review via voice let codeReviewPrompt = """ Code to review: {clipboard} Review instruction: {text} Provide focused feedback addressing the instruction. """ // Email reply drafting let emailPrompt = """ Original email: {selected} My reply intent (spoken): {text} Write a professional email reply. Output only the email body. """
Built-in Processing Modes
enum ProcessingMode { case fast // Direct ASR output, zero latency case performance // Dual-channel: streaming + offline refinement case englishTranslation // Chinese speech → English text case promptOptimize // Raw prompt → optimized prompt via LLM case command // Voice command + selected/clipboard context → LLM action case custom(prompt: String) // User-defined prompt template }
Session State Machine
The core recording flow in
Session/:
[Idle] → hotkey pressed → [Recording] → audio streams to ASR client → hotkey released/pressed again → [Processing] → ASR returns text → [LLM Post-processing] (if mode requires) → [Injecting] → text injected into active app → [Idle]
Updating After Source Changes
cd type4me git pull bash scripts/deploy.sh # SherpaOnnx does NOT need recompiling unless engine version changed
Troubleshooting
App won't open (security warning)
xattr -d com.apple.quarantine /Applications/Type4Me.app
Local model not recognized in Settings
Verify the directory structure exactly matches:
ls ~/Library/Application\ Support/Type4Me/Models/sherpa-onnx-streaming-paraformer-bilingual-zh-en/ # Must show: encoder.int8.onnx decoder.int8.onnx tokens.txt
SherpaOnnx build fails
# Ensure cmake is installed brew install cmake # Clean and retry rm -rf Frameworks/ bash scripts/build-sherpa.sh
New ASR provider not appearing in Settings
- Confirm the provider type is added to
ASRProviderRegistry.all - Ensure
is unique across all providersproviderID - Clean build:
swift package clean && bash scripts/deploy.sh
Audio not captured / no floating bar
- Grant microphone permission: System Settings → Privacy & Security → Microphone → Type4Me ✓
- Grant Accessibility permission for text injection: System Settings → Privacy & Security → Accessibility → Type4Me ✓
Credentials not saving
# Check file exists and has correct permissions ls -la ~/Library/Application\ Support/Type4Me/credentials.json # Should show: -rw------- (0600) # Fix permissions if needed: chmod 0600 ~/Library/Application\ Support/Type4Me/credentials.json
Export history to CSV
Open Settings → History → select date range → Export CSV. The SQLite database is at:
~/Library/Application\ Support/Type4Me/history.db # Direct query: sqlite3 ~/Library/Application\ Support/Type4Me/history.db \ "SELECT datetime(timestamp,'unixepoch'), text FROM records ORDER BY timestamp DESC LIMIT 20;"
System Requirements
- macOS 14.0 (Sonoma) or later
- Apple Silicon (M1/M2/M3/M4) recommended for local ASR inference
- Xcode Command Line Tools + CMake for source builds
- Internet connection only needed for cloud ASR providers