Claude-skill-registry livekit-stt-selfhosted

Build self-hosted speech-to-text APIs using Hugging Face models (Whisper, Wav2Vec2) and create LiveKit voice agent plugins. Use when building STT infrastructure, creating custom LiveKit plugins, deploying self-hosted transcription services, or integrating Whisper/HF models with LiveKit agents. Includes FastAPI server templates, LiveKit plugin implementation, model selection guides, and production deployment patterns.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/livekit-stt-selfhosted" ~/.claude/skills/majiayu000-claude-skill-registry-livekit-stt-selfhosted && rm -rf "$T"

manifest: skills/data/livekit-stt-selfhosted/SKILL.md

LiveKit Self-Hosted STT Plugin

Build self-hosted speech-to-text APIs and LiveKit voice agent plugins using Hugging Face models.

Overview

This skill provides templates and guidance for:

Building a self-hosted STT API server using FastAPI + Whisper/HF models
Creating a LiveKit plugin that connects to your self-hosted API
Deploying and scaling in production

Quick Start

Option 1: Build Both (API + Plugin)

When user wants complete setup:

Create API Server:

python scripts/setup_api_server.py my-stt-server --model openai/whisper-medium
cd my-stt-server
pip install -r requirements.txt
python main.py

Create Plugin:

python scripts/setup_plugin.py custom-stt
cd livekit-plugins-custom-stt
pip install -e .

Use in LiveKit Agent:

from livekit.plugins import custom_stt

stt=custom_stt.STT(api_url="ws://localhost:8000/ws/transcribe")

Option 2: API Server Only

When user only needs the API server:

Use
```
scripts/setup_api_server.py
```
with desired model
See
```
references/api_server_guide.md
```
for implementation details
Template in
```
assets/api-server/
```

Option 3: Plugin Only

When user has existing API and needs LiveKit plugin:

Use
```
scripts/setup_plugin.py
```
with plugin name
See
```
references/plugin_implementation.md
```
for details
Template in
```
assets/plugin-template/
```

Model Selection

Help user choose the right model:

Use Case	Recommended Model	Rationale
Best accuracy	`openai/whisper-large-v3`	SOTA quality, requires GPU
Production balance	`openai/whisper-medium`	Good quality, reasonable speed
Real-time/fast	`openai/whisper-small`	Fast, acceptable quality
CPU-only	`openai/whisper-tiny`	Can run without GPU
English-only	`facebook/wav2vec2-large-960h`	Optimized for English

For detailed comparison and optimization tips, see

references/models_comparison.md

Implementation Workflow

Building the API Server

Use the template: Start with
```
assets/api-server/main.py
```
Key components:
- FastAPI app with WebSocket endpoint
- Model loading at startup (kept in memory)
- Audio buffer management
- WebSocket protocol for streaming
Customization points:
- Model selection (change
```
MODEL_ID
```
  in .env)
- Audio processing parameters
- Batch size and optimization
- Error handling

For complete implementation guide, see

references/api_server_guide.md

Building the LiveKit Plugin

Use the template: Start with
```
assets/plugin-template/
```
Required implementations:
- ```
_recognize_impl()
```
  - Non-streaming recognition
- ```
stream()
```
  - Return SpeechStream instance
- ```
SpeechStream
```
  class - Handle streaming
Key considerations:
- Audio format conversion (16kHz, mono, 16-bit PCM)
- WebSocket connection management
- Event emission (interim/final transcripts)
- Error handling and cleanup

For complete implementation guide, see

references/plugin_implementation.md

Deployment

Development

# API Server
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

# Test WebSocket
ws://localhost:8000/ws/transcribe

Production

Docker (Recommended):

docker-compose up

Kubernetes: Use manifests in deployment guide

Cloud Platforms: AWS ECS, GCP Cloud Run, Azure Container Instances

For complete deployment guide including scaling, monitoring, and security, see

references/deployment.md

WebSocket Protocol

Client → Server

Audio: Binary (16-bit PCM, 16kHz)
Config:
```
{"type": "config", "language": "en"}
```
End:
```
{"type": "end"}
```

Server → Client

Interim:
```
{"type": "interim", "text": "..."}
```

Final:

{"type": "final", "text": "...", "language": "en"}

Error:
```
{"type": "error", "message": "..."}
```

Common Tasks

Change Model

Edit

.env

MODEL_ID=openai/whisper-small  # Faster model

Add Language Support

In plugin usage:

stt=custom_stt.STT(language="es")  # Spanish
stt=custom_stt.STT(detect_language=True)  # Auto-detect

Enable GPU

In API server:

DEVICE=cuda:0  # Use GPU

Scale Horizontally

Deploy multiple API server instances behind load balancer. See

references/deployment.md

for Nginx configuration.

Troubleshooting

Out of Memory

Use smaller model (
```
whisper-small
```
or
```
whisper-tiny
```
)
Reduce
```
batch_size
```
in pipeline
Enable
```
low_cpu_mem_usage=True
```

Slow Transcription

Ensure GPU is enabled (
```
DEVICE=cuda:0
```
)
Use FP16 precision (automatic on GPU)
Increase
```
batch_size
```
Use smaller model

Connection Issues

Verify WebSocket support in load balancer
Check firewall rules
Increase timeout settings

Scripts

```
scripts/setup_api_server.py
```
- Generate API server from template
```
scripts/setup_plugin.py
```
- Generate LiveKit plugin from template

References

Load these as needed for detailed information:

```
references/api_server_guide.md
```
- Complete API implementation guide
```
references/plugin_implementation.md
```
- LiveKit plugin development
```
references/models_comparison.md
```
- Model selection and optimization
```
references/deployment.md
```
- Production deployment best practices

Assets

Ready-to-use templates:

```
assets/api-server/
```
- Complete FastAPI server with Whisper
```
assets/plugin-template/
```
- LiveKit STT plugin structure

Best Practices

Keep models in memory - Load once at startup, not per request
Use appropriate model size - Balance quality vs. speed for your use case
Process audio in chunks - 1-second chunks work well for streaming
Implement proper cleanup - Close WebSocket connections gracefully
Monitor metrics - Track latency, throughput, GPU utilization
Use Docker - Ensures consistent deployments
Enable authentication - Secure production APIs
Scale horizontally - Use load balancer for high availability