Skills qwen-tts
Text-to-speech using Qwen3-TTS CustomVoice MLX model. Supports 9 speakers and multiple emotion/style instructions. Optimized for Apple Silicon. Use when user wants audio speech output.
install
source · Clone the upstream repo
git clone https://github.com/stvlynn/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/stvlynn/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/qwen-tts" ~/.claude/skills/stvlynn-skills-qwen-tts && rm -rf "$T"
manifest:
skills/qwen-tts/SKILL.mdsource content
Qwen TTS Skill
Text-to-speech using Qwen3-TTS CustomVoice model, running locally on Apple Silicon via MLX.
Overview
- Model:
(4-bit quantized, ~600MB)mlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-4bit - Runtime: MLX (Apple Silicon GPU acceleration)
- Speakers: 9 built-in voices
- Output: WAV audio files
- Auto-cleanup: Files older than 24 hours are removed automatically
First-time Deployment
Prerequisites
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.10+
1. Create virtual environment
cd /path/to/skills/skills/qwen-tts python3 -m venv venv source venv/bin/activate
2. Install dependencies
模型从 ModelScope 镜像下载(国内更快):
pip install -r scripts/requirements.txt
3. Pre-download model (optional)
首次运行时会自动下载模型。如需提前下载:
source venv/bin/activate export HF_ENDPOINT="https://hf-mirror.com" python3 -c "from mlx_audio.tts.utils import load_model; load_model('mlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-4bit')"
4. Verify
source venv/bin/activate python3 scripts/tts.py "你好,这是一段测试语音。" --output /tmp # Should produce a WAV file in /tmp/
Troubleshooting
| 问题 | 解决方法 |
|---|---|
| 确认使用 Apple Silicon Mac,MLX 不支持 Intel Mac |
| 模型下载缓慢 | 设置 |
| 内存不足 | 4-bit 模型约需 1.5GB 内存,关闭其他大型应用 |
| 无声音输出 | 检查输出文件是否为 0 字节,可能是文本过短 |
Configuration
⚠️ 以下为示例默认值。请根据实际使用场景修改 speaker 和 instruct。
- Speaker:
(示例)Serena - Instruct:
(示例)撒娇语气 - Language:
Chinese - Speed:
1.0
Available Speakers
| Speaker | Language |
|---|---|
| Chinese |
| Chinese |
| Chinese |
| Chinese |
| Chinese |
| English |
| English |
| Japanese |
| Korean |
Available Instructs (emotion/style)
— coquettish撒娇语气
— calm analysis冷静分析
— surprised惊讶
— excited兴奋
— mysterious神秘
— happy开心
— wronged/sad委屈
Also supports free-form natural language instructions, e.g.
用特别愤怒的语气说.
Usage
# Default settings python3 scripts/tts.py "你好!" # Custom speaker python3 scripts/tts.py "Hello!" --speaker Ryan --language English # Custom emotion python3 scripts/tts.py "其实我真的有发现..." --instruct 冷静分析 # Full customization python3 scripts/tts.py "哥哥,你回来啦!" \ --speaker Serena \ --instruct 撒娇语气 \ --speed 1.0 # Custom output directory python3 scripts/tts.py "测试" --output /tmp # Skip auto-cleanup of old files python3 scripts/tts.py "测试" --no-cleanup
Audio Output
- Default directory:
(override with~/tts-output/
)$QWEN_TTS_OUTPUT_DIR - File naming:
tts_{timestamp}_{index}.wav - Auto-cleanup: Files older than 24 hours removed on each run (disable with
)--no-cleanup
Integration
- Generate audio using the TTS script
- Send the audio file as a voice message (Telegram, etc.)
- Old files are cleaned up automatically