Claude-skill-registry livekit-stt-selfhosted

Build self-hosted speech-to-text APIs using Hugging Face models (Whisper, Wav2Vec2) and create LiveKit voice agent plugins. Use when building STT infrastructure, creating custom LiveKit plugins, deploying self-hosted transcription services, or integrating Whisper/HF models with LiveKit agents. Includes FastAPI server templates, LiveKit plugin implementation, model selection guides, and production deployment patterns.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/livekit-stt-selfhosted" ~/.claude/skills/majiayu000-claude-skill-registry-livekit-stt-selfhosted && rm -rf "$T"
manifest: skills/data/livekit-stt-selfhosted/SKILL.md
source content

LiveKit Self-Hosted STT Plugin

Build self-hosted speech-to-text APIs and LiveKit voice agent plugins using Hugging Face models.

Overview

This skill provides templates and guidance for:

  1. Building a self-hosted STT API server using FastAPI + Whisper/HF models
  2. Creating a LiveKit plugin that connects to your self-hosted API
  3. Deploying and scaling in production

Quick Start

Option 1: Build Both (API + Plugin)

When user wants complete setup:

  1. Create API Server:
python scripts/setup_api_server.py my-stt-server --model openai/whisper-medium
cd my-stt-server
pip install -r requirements.txt
python main.py
  1. Create Plugin:
python scripts/setup_plugin.py custom-stt
cd livekit-plugins-custom-stt
pip install -e .
  1. Use in LiveKit Agent:
from livekit.plugins import custom_stt

stt=custom_stt.STT(api_url="ws://localhost:8000/ws/transcribe")

Option 2: API Server Only

When user only needs the API server:

  • Use
    scripts/setup_api_server.py
    with desired model
  • See
    references/api_server_guide.md
    for implementation details
  • Template in
    assets/api-server/

Option 3: Plugin Only

When user has existing API and needs LiveKit plugin:

  • Use
    scripts/setup_plugin.py
    with plugin name
  • See
    references/plugin_implementation.md
    for details
  • Template in
    assets/plugin-template/

Model Selection

Help user choose the right model:

Use CaseRecommended ModelRationale
Best accuracy
openai/whisper-large-v3
SOTA quality, requires GPU
Production balance
openai/whisper-medium
Good quality, reasonable speed
Real-time/fast
openai/whisper-small
Fast, acceptable quality
CPU-only
openai/whisper-tiny
Can run without GPU
English-only
facebook/wav2vec2-large-960h
Optimized for English

For detailed comparison and optimization tips, see

references/models_comparison.md
.

Implementation Workflow

Building the API Server

  1. Use the template: Start with

    assets/api-server/main.py

  2. Key components:

    • FastAPI app with WebSocket endpoint
    • Model loading at startup (kept in memory)
    • Audio buffer management
    • WebSocket protocol for streaming
  3. Customization points:

    • Model selection (change
      MODEL_ID
      in .env)
    • Audio processing parameters
    • Batch size and optimization
    • Error handling

For complete implementation guide, see

references/api_server_guide.md
.

Building the LiveKit Plugin

  1. Use the template: Start with

    assets/plugin-template/

  2. Required implementations:

    • _recognize_impl()
      - Non-streaming recognition
    • stream()
      - Return SpeechStream instance
    • SpeechStream
      class - Handle streaming
  3. Key considerations:

    • Audio format conversion (16kHz, mono, 16-bit PCM)
    • WebSocket connection management
    • Event emission (interim/final transcripts)
    • Error handling and cleanup

For complete implementation guide, see

references/plugin_implementation.md
.

Deployment

Development

# API Server
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

# Test WebSocket
ws://localhost:8000/ws/transcribe

Production

Docker (Recommended):

docker-compose up

Kubernetes: Use manifests in deployment guide

Cloud Platforms: AWS ECS, GCP Cloud Run, Azure Container Instances

For complete deployment guide including scaling, monitoring, and security, see

references/deployment.md
.

WebSocket Protocol

Client → Server

  • Audio: Binary (16-bit PCM, 16kHz)
  • Config:
    {"type": "config", "language": "en"}
  • End:
    {"type": "end"}

Server → Client

  • Interim:
    {"type": "interim", "text": "..."}
  • Final:
    {"type": "final", "text": "...", "language": "en"}
  • Error:
    {"type": "error", "message": "..."}

Common Tasks

Change Model

Edit

.env
:

MODEL_ID=openai/whisper-small  # Faster model

Add Language Support

In plugin usage:

stt=custom_stt.STT(language="es")  # Spanish
stt=custom_stt.STT(detect_language=True)  # Auto-detect

Enable GPU

In API server:

DEVICE=cuda:0  # Use GPU

Scale Horizontally

Deploy multiple API server instances behind load balancer. See

references/deployment.md
for Nginx configuration.

Troubleshooting

Out of Memory

  • Use smaller model (
    whisper-small
    or
    whisper-tiny
    )
  • Reduce
    batch_size
    in pipeline
  • Enable
    low_cpu_mem_usage=True

Slow Transcription

  • Ensure GPU is enabled (
    DEVICE=cuda:0
    )
  • Use FP16 precision (automatic on GPU)
  • Increase
    batch_size
  • Use smaller model

Connection Issues

  • Verify WebSocket support in load balancer
  • Check firewall rules
  • Increase timeout settings

Scripts

  • scripts/setup_api_server.py
    - Generate API server from template
  • scripts/setup_plugin.py
    - Generate LiveKit plugin from template

References

Load these as needed for detailed information:

  • references/api_server_guide.md
    - Complete API implementation guide
  • references/plugin_implementation.md
    - LiveKit plugin development
  • references/models_comparison.md
    - Model selection and optimization
  • references/deployment.md
    - Production deployment best practices

Assets

Ready-to-use templates:

  • assets/api-server/
    - Complete FastAPI server with Whisper
  • assets/plugin-template/
    - LiveKit STT plugin structure

Best Practices

  1. Keep models in memory - Load once at startup, not per request
  2. Use appropriate model size - Balance quality vs. speed for your use case
  3. Process audio in chunks - 1-second chunks work well for streaming
  4. Implement proper cleanup - Close WebSocket connections gracefully
  5. Monitor metrics - Track latency, throughput, GPU utilization
  6. Use Docker - Ensures consistent deployments
  7. Enable authentication - Secure production APIs
  8. Scale horizontally - Use load balancer for high availability