Voicebox add-tts-engine
Use this skill to add a new TTS engine to Voicebox. It walks through dependency research, backend implementation, frontend wiring, PyInstaller bundling, and frozen-build testing. Always start with Phase 0 (dependency audit) before writing any code.
git clone https://github.com/jamiepine/voicebox
T=$(mktemp -d) && git clone --depth=1 https://github.com/jamiepine/voicebox "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/add-tts-engine" ~/.claude/skills/jamiepine-voicebox-add-tts-engine && rm -rf "$T"
.agents/skills/add-tts-engine/SKILL.mdAdd TTS Engine
Goal
Integrate a new text-to-speech engine into Voicebox end-to-end: dependency research, backend protocol implementation, frontend UI wiring, PyInstaller bundling, and frozen-build verification. The user should only need to test the final build locally.
Reference Doc
The full phased guide lives at
docs/content/docs/developer/tts-engines.mdx. Read this file in its entirety before starting. It contains:
- Phase 0: Dependency research (mandatory before writing code)
- Phase 1: Backend implementation (
protocol)TTSBackend - Phase 2: Route and service integration (usually zero changes)
- Phase 3: Frontend integration (5 files)
- Phase 4: Dependencies (
, justfile, CI, Docker)requirements.txt - Phase 5: PyInstaller bundling (
+build_binary.py
)server.py - Phase 6: Common upstream workarounds
- Implementation checklist (gate between phases)
Workflow
1. Read the guide
# Read the full TTS engines doc cat docs/content/docs/developer/tts-engines.mdx
Internalize all phases, especially Phase 0 and Phase 5. The v0.2.3 release was three patch releases because Phase 0 was skipped.
2. Dependency research (Phase 0)
Clone the model library into a temporary directory and audit it. Do NOT skip this.
mkdir /tmp/engine-research && cd /tmp/engine-research git clone <model-library-url>
Run the grep searches from Phase 0.2 in the guide against the cloned source and its transitive dependencies. Produce a written dependency audit covering:
- PyPI vs non-PyPI packages
- PyInstaller directives needed (
,--collect-all
,--copy-metadata
)--hidden-import - Runtime data files that must be bundled
- Native library paths that need env var overrides in frozen builds
- Monkey-patches needed (
, float64, MPS, HF token)torch.load - Sample rate
- Model download method (
vsfrom_pretrained
+snapshot_download
)from_local
Test model loading and generation on CPU in the throwaway venv before proceeding.
3. Implement (Phases 1–4)
Follow the guide's phases in order. Key files to modify:
Backend (Phase 1):
- Create
backend/backends/<engine>_backend.py - Register in
(ModelConfig + TTS_ENGINES + factory)backend/backends/__init__.py - Update regex in
backend/models.py
Frontend (Phase 3):
— engine union typeapp/src/lib/api/types.ts
— ENGINE_LANGUAGESapp/src/lib/constants/languages.ts
— ENGINE_OPTIONS, ENGINE_DESCRIPTIONSapp/src/components/Generation/EngineModelSelector.tsx
— Zod schema, model-name mappingapp/src/lib/hooks/useGenerationForm.ts
— MODEL_DESCRIPTIONSapp/src/components/ServerSettings/ModelManagement.tsx
Dependencies (Phase 4):
backend/requirements.txt
(setup-python, setup-python-release targets)justfile.github/workflows/release.yml
(if applicable)Dockerfile
4. PyInstaller bundling (Phase 5)
Register the engine in
backend/build_binary.py:
for the backend module and model package--hidden-import
for packages using--collect-all
, shipping data files, or native librariesinspect.getsource
for packages using--copy-metadataimportlib.metadata
If the engine has native data paths, add
os.environ.setdefault() in backend/server.py inside the if getattr(sys, 'frozen', False): block.
5. Verify in dev mode
just dev
Test the full chain: model download → load → generate → voice cloning.
6. Use the checklist
Walk through the Implementation Checklist at the bottom of
tts-engines.mdx. Every item must be checked before handing the build to the user.
Key Lessons (from v0.2.3)
These are the most common failure modes. Phase 0 research catches all of them:
| Pattern | Symptom in Frozen Build | Fix |
|---|---|---|
/ | "could not get source code" | |
| Package ships pretrained model files | for , | |
| C library with hardcoded system paths | for | + env var in |
| "No package metadata found" | |
without | CUDA device not available on CPU build | Monkey-patch |
on float64 data | dtype mismatch RuntimeError | Cast to |
in HF download calls | Auth failure without stored HF token | Use + |
Notes
- The route and service layers have zero per-engine dispatch points.
requires zero changes.main.py - The model config registry in
handles all dispatch automatically.backends/__init__.py - Use
andget_torch_device()
frommodel_load_progress()
— don't reimplement device detection or progress tracking.backends/base.py - Always test with a clean HuggingFace cache (no pre-downloaded models from dev).
- Do NOT push or create a release. Hand the build to the user for local testing.