Claude-skill-registry livekit-stt-selfhosted
Build self-hosted speech-to-text APIs using Hugging Face models (Whisper, Wav2Vec2) and create LiveKit voice agent plugins. Use when building STT infrastructure, creating custom LiveKit plugins, deploying self-hosted transcription services, or integrating Whisper/HF models with LiveKit agents. Includes FastAPI server templates, LiveKit plugin implementation, model selection guides, and production deployment patterns.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/livekit-stt-selfhosted" ~/.claude/skills/majiayu000-claude-skill-registry-livekit-stt-selfhosted && rm -rf "$T"
skills/data/livekit-stt-selfhosted/SKILL.mdLiveKit Self-Hosted STT Plugin
Build self-hosted speech-to-text APIs and LiveKit voice agent plugins using Hugging Face models.
Overview
This skill provides templates and guidance for:
- Building a self-hosted STT API server using FastAPI + Whisper/HF models
- Creating a LiveKit plugin that connects to your self-hosted API
- Deploying and scaling in production
Quick Start
Option 1: Build Both (API + Plugin)
When user wants complete setup:
- Create API Server:
python scripts/setup_api_server.py my-stt-server --model openai/whisper-medium cd my-stt-server pip install -r requirements.txt python main.py
- Create Plugin:
python scripts/setup_plugin.py custom-stt cd livekit-plugins-custom-stt pip install -e .
- Use in LiveKit Agent:
from livekit.plugins import custom_stt stt=custom_stt.STT(api_url="ws://localhost:8000/ws/transcribe")
Option 2: API Server Only
When user only needs the API server:
- Use
with desired modelscripts/setup_api_server.py - See
for implementation detailsreferences/api_server_guide.md - Template in
assets/api-server/
Option 3: Plugin Only
When user has existing API and needs LiveKit plugin:
- Use
with plugin namescripts/setup_plugin.py - See
for detailsreferences/plugin_implementation.md - Template in
assets/plugin-template/
Model Selection
Help user choose the right model:
| Use Case | Recommended Model | Rationale |
|---|---|---|
| Best accuracy | | SOTA quality, requires GPU |
| Production balance | | Good quality, reasonable speed |
| Real-time/fast | | Fast, acceptable quality |
| CPU-only | | Can run without GPU |
| English-only | | Optimized for English |
For detailed comparison and optimization tips, see
references/models_comparison.md.
Implementation Workflow
Building the API Server
-
Use the template: Start with
assets/api-server/main.py -
Key components:
- FastAPI app with WebSocket endpoint
- Model loading at startup (kept in memory)
- Audio buffer management
- WebSocket protocol for streaming
-
Customization points:
- Model selection (change
in .env)MODEL_ID - Audio processing parameters
- Batch size and optimization
- Error handling
- Model selection (change
For complete implementation guide, see
references/api_server_guide.md.
Building the LiveKit Plugin
-
Use the template: Start with
assets/plugin-template/ -
Required implementations:
- Non-streaming recognition_recognize_impl()
- Return SpeechStream instancestream()
class - Handle streamingSpeechStream
-
Key considerations:
- Audio format conversion (16kHz, mono, 16-bit PCM)
- WebSocket connection management
- Event emission (interim/final transcripts)
- Error handling and cleanup
For complete implementation guide, see
references/plugin_implementation.md.
Deployment
Development
# API Server uvicorn main:app --host 0.0.0.0 --port 8000 --reload # Test WebSocket ws://localhost:8000/ws/transcribe
Production
Docker (Recommended):
docker-compose up
Kubernetes: Use manifests in deployment guide
Cloud Platforms: AWS ECS, GCP Cloud Run, Azure Container Instances
For complete deployment guide including scaling, monitoring, and security, see
references/deployment.md.
WebSocket Protocol
Client → Server
- Audio: Binary (16-bit PCM, 16kHz)
- Config:
{"type": "config", "language": "en"} - End:
{"type": "end"}
Server → Client
- Interim:
{"type": "interim", "text": "..."} - Final:
{"type": "final", "text": "...", "language": "en"} - Error:
{"type": "error", "message": "..."}
Common Tasks
Change Model
Edit
.env:
MODEL_ID=openai/whisper-small # Faster model
Add Language Support
In plugin usage:
stt=custom_stt.STT(language="es") # Spanish stt=custom_stt.STT(detect_language=True) # Auto-detect
Enable GPU
In API server:
DEVICE=cuda:0 # Use GPU
Scale Horizontally
Deploy multiple API server instances behind load balancer. See
references/deployment.md for Nginx configuration.
Troubleshooting
Out of Memory
- Use smaller model (
orwhisper-small
)whisper-tiny - Reduce
in pipelinebatch_size - Enable
low_cpu_mem_usage=True
Slow Transcription
- Ensure GPU is enabled (
)DEVICE=cuda:0 - Use FP16 precision (automatic on GPU)
- Increase
batch_size - Use smaller model
Connection Issues
- Verify WebSocket support in load balancer
- Check firewall rules
- Increase timeout settings
Scripts
- Generate API server from templatescripts/setup_api_server.py
- Generate LiveKit plugin from templatescripts/setup_plugin.py
References
Load these as needed for detailed information:
- Complete API implementation guidereferences/api_server_guide.md
- LiveKit plugin developmentreferences/plugin_implementation.md
- Model selection and optimizationreferences/models_comparison.md
- Production deployment best practicesreferences/deployment.md
Assets
Ready-to-use templates:
- Complete FastAPI server with Whisperassets/api-server/
- LiveKit STT plugin structureassets/plugin-template/
Best Practices
- Keep models in memory - Load once at startup, not per request
- Use appropriate model size - Balance quality vs. speed for your use case
- Process audio in chunks - 1-second chunks work well for streaming
- Implement proper cleanup - Close WebSocket connections gracefully
- Monitor metrics - Track latency, throughput, GPU utilization
- Use Docker - Ensures consistent deployments
- Enable authentication - Secure production APIs
- Scale horizontally - Use load balancer for high availability