Skills tts
Use this skill whenever the user wants to convert text into speech, generate audio from text, or produce voiceovers. Triggers include: any mention of 'TTS', 'text to speech', 'speak', 'say', 'voice', 'read aloud', 'audio narration', 'voiceover', 'dubbing', or requests to turn written content into spoken audio. Also use when converting EPUB/PDF/SRT/articles to audio, cloning voices from reference audio, controlling emotion or speed in speech, aligning speech to subtitle timelines, or producing per-segment voice-mapped audio.
git clone https://github.com/NoizAI/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/NoizAI/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tts" ~/.claude/skills/noizai-skills-tts && rm -rf "$T"
skills/tts/SKILL.mdtts
Convert any text into speech audio. Supports two backends (Kokoro local, Noiz cloud), two modes (simple or timeline-accurate), and per-segment voice control.
Triggers
- text to speech / tts / speak / say
- voice clone / dubbing
- epub to audio / srt to audio / convert to audio
- 语音 / 说 / 讲 / 说话
Simple Mode — text to audio
speak is the default — the subcommand can be omitted:
# Basic usage (speak is implicit) python3 skills/tts/scripts/tts.py -t "Hello world" # add -o path to save python3 skills/tts/scripts/tts.py -f article.txt -o out.mp3 # Voice cloning — local file path or URL python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio ./ref.wav python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio https://example.com/my_voice.wav -o clone.wav # Voice message format python3 skills/tts/scripts/tts.py -t "Hello" --format opus -o voice.opus python3 skills/tts/scripts/tts.py -t "Hello" --format ogg -o voice.ogg
Third-party integration (Feishu/Telegram/Discord) is documented in ref_3rd_party.md.
Timeline Mode — SRT to time-aligned audio
For precise per-segment timing (dubbing, subtitles, video narration).
Step 1: Get or create an SRT
If the user doesn't have one, generate from text:
python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt --cps 15 --gap 500
--cps = characters per second (default 4, good for Chinese; ~15 for English). The agent can also write SRT manually.
Step 2: Create a voice map
JSON file controlling default + per-segment voice settings.
segments keys support single index "3" or range "5-8".
Kokoro voice map:
{ "default": { "voice": "zf_xiaoni", "lang": "cmn" }, "segments": { "1": { "voice": "zm_yunxi" }, "5-8": { "voice": "af_sarah", "lang": "en-us", "speed": 0.9 } } }
Noiz voice map (adds
emo, reference_audio support). reference_audio can be a local path or a URL (user’s own audio; Noiz only):
{ "default": { "voice_id": "voice_123", "target_lang": "zh" }, "segments": { "1": { "voice_id": "voice_host", "emo": { "Joy": 0.6 } }, "2-4": { "reference_audio": "./refs/guest.wav" } } }
Dynamic Reference Audio Slicing: If you are translating or dubbing a video and want each sentence to automatically use the audio from the original video at the exact same timestamp as its reference audio, use the
--ref-audio-track argument instead of setting reference_audio in the map:
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --ref-audio-track original_video.mp4 -o output.wav
See
examples/ for full samples.
Step 3: Render
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json -o output.wav python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --backend noiz --auto-emotion -o output.wav
When to Choose Which
| Need | Recommended |
|---|---|
| Just read text aloud, no fuss | Kokoro (default) |
| EPUB/PDF audiobook with chapters | Kokoro (native support) |
Voice blending () | Kokoro |
| Voice cloning from reference audio | Noiz |
Emotion control ( param) | Noiz |
| Exact server-side duration per segment | Noiz |
When the user needs emotion control + voice cloning + precise duration together, Noiz is the only backend that supports all three.
Guest Mode (no API key)
When no API key is configured,
tts.py automatically falls back to guest mode — a limited Noiz endpoint that requires no authentication. Guest mode only supports --voice-id, --speed, and --format; voice cloning, emotion, duration, and timeline rendering are not available.
# Guest mode (auto-detected when no API key is set) python3 skills/tts/scripts/tts.py -t "Hello" --voice-id 883b6b7c -o hello.wav # Explicit backend override to use kokoro instead python3 skills/tts/scripts/tts.py -t "Hello" --backend kokoro
Available guest voices (15 built-in):
| voice_id | name | lang | gender | tone |
|---|---|---|---|---|
| 販売員(なおみ) | ja | F | 喜び |
| 落ち着いた女性 | ja | F | 穏やか |
| 熱血漢(たける) | ja | M | 怒り |
| 安らぎ(みなと) | ja | M | 穏やか |
| 旅人(かいと) | ja | M | 穏やか |
| 悦悦|社交分享 | zh | F | Joyful |
| 婉青|情绪抚慰 | zh | F | Calm |
| 阿豪|磁性主持 | zh | M | Calm |
| 建国|知识科普 | zh | M | Calm |
| 小明|科技达人 | zh | M | Joyful |
| Science Narration | en | M | Calm |
| The Mentor (Alex) | en | M | Joyful |
| The Naturalist (Silas) | en | M | Calm |
| The Healer (Serena) | en | F | Calm |
| The Mentor (Maya) | en | F | Calm |
Security & data disclosure
This skill performs the following file and network operations at runtime:
- Credential storage: When you run
, the key is saved toconfig --set-api-key
(permissions~/.config/noiz/api_key
). The0600
environment variable is also supported as an alternative.NOIZ_API_KEY - Legacy key migration: If
exists and~/.noiz_api_key
does not, the key is copied (not deleted) to the new location. A message is printed; the old file is left untouched for you to remove manually.~/.config/noiz/api_key - Network calls (Noiz backend): Text and optional reference audio are uploaded to
for synthesis. No data is sent unless you invoke a Noiz command.https://noiz.ai/v1/ - Reference audio download: When
is a URL, the file is downloaded to a temp file, used for the API call, then deleted. If no voice-id or ref-audio is provided, a default reference audio is downloaded from--ref-audio
orstorage.googleapis.com
.noiz.ai - Temp files: Temporary audio/text files may be created during synthesis and are cleaned up after use.
- ffmpeg: Invoked only in timeline
mode to assemble the final audio.render
No files outside the output path and
~/.config/noiz/ are modified. The Kokoro backend runs entirely offline with no network access.
Requirements
in PATH (timeline mode only)ffmpeg
package:requests
(required for Noiz backend)uv pip install requests- Get your API key at Noiz Developer, then run
(guest mode works without a key but has limited features)python3 skills/tts/scripts/tts.py config --set-api-key YOUR_KEY - Kokoro: if already installed, pass
to use the local backend--backend kokoro
Noiz API authentication
Use only the base64-encoded API key as
Authorization—no prefix (e.g. no APIKEY or Bearer ). Any prefix causes 401.
For backend details and full argument reference, see reference.md.