install
source · Clone the upstream repo
git clone https://github.com/sundial-org/awesome-openclaw-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/sundial-org/awesome-openclaw-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/pocket-tts" ~/.claude/skills/sundial-org-awesome-openclaw-skills-pocket-tts && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/sundial-org/awesome-openclaw-skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/pocket-tts" ~/.openclaw/skills/sundial-org-awesome-openclaw-skills-pocket-tts && rm -rf "$T"
manifest:
skills/pocket-tts/SKILL.mdsource content
Pocket TTS Skill
Fully local, offline text-to-speech using Kyutai's Pocket TTS model. Generate high-quality audio from text without any API calls or internet connection. Features 8 built-in voices, voice cloning support, and runs entirely on CPU.
Features
- 🎯 Fully local - No API calls, runs completely offline
- 🚀 CPU-only - No GPU required, works on any computer
- ⚡ Fast generation - ~2-6x real-time on CPU
- 🎤 8 built-in voices - alba, marius, javert, jean, fantine, cosette, eponine, azelma
- 🎭 Voice cloning - Clone any voice from a WAV sample
- 🔊 Low latency - ~200ms first audio chunk
- 📚 Simple Python API - Easy integration into any project
Installation
# 1. Accept the model license on Hugging Face # https://huggingface.co/kyutai/pocket-tts # 2. Install the package pip install pocket-tts # Or use uv for automatic dependency management uvx pocket-tts generate "Hello world"
Usage
CLI
# Basic usage pocket-tts "Hello, I am your AI assistant" # With specific voice pocket-tts "Hello" --voice alba --output hello.wav # With custom voice file (voice cloning) pocket-tts "Hello" --voice-file myvoice.wav --output output.wav # Adjust speed pocket-tts "Hello" --speed 1.2 # Start local server pocket-tts --serve # List available voices pocket-tts --list-voices
Python API
from pocket_tts import TTSModel import scipy.io.wavfile # Load model tts_model = TTSModel.load_model() # Get voice state voice_state = tts_model.get_state_for_audio_prompt( "hf://kyutai/tts-voices/alba-mackenna/casual.wav" ) # Generate audio audio = tts_model.generate_audio(voice_state, "Hello world!") # Save to WAV scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy()) # Check sample rate print(f"Sample rate: {tts_model.sample_rate} Hz")
Available Voices
| Voice | Description |
|---|---|
| alba | Casual female voice |
| marius | Male voice |
| javert | Clear male voice |
| jean | Natural male voice |
| fantine | Female voice |
| cosette | Female voice |
| eponine | Female voice |
| azelma | Female voice |
Or use
--voice-file /path/to/wav.wav for custom voice cloning.
Options
| Option | Description | Default |
|---|---|---|
| Text to convert | Required |
| Output WAV file | |
| Voice preset | |
| Speech speed (0.5-2.0) | |
| Custom WAV for cloning | None |
| Start HTTP server | False |
| List all voices | False |
Requirements
- Python 3.10-3.14
- PyTorch 2.5+ (CPU version works)
- Works on 2 CPU cores
Notes
- ⚠️ Model is gated - accept license on Hugging Face first
- 🌍 English language only (v1)
- 💾 First run downloads model (~100M parameters)
- 🔊 Audio is returned as 1D torch tensor (PCM data)