Skills qwen-audio-lab
Hybrid text-to-speech, reusable voice cloning, and narrated audio generation for macOS plus Aliyun Qwen. Use when the user wants to convert text into speech, clone and reuse a voice from a reference recording, generate narration files from plain text or text files, or create PPT speaker-note voiceovers.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/aliyx/qwen-audio-lab" ~/.claude/skills/openclaw-skills-qwen-audio-lab && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/aliyx/qwen-audio-lab" ~/.openclaw/skills/openclaw-skills-qwen-audio-lab && rm -rf "$T"
manifest:
skills/aliyx/qwen-audio-lab/SKILL.mdsource content
Qwen Audio Lab
Use this skill for text-to-speech on macOS or with Aliyun Qwen.
Choose the backend
- Use
for fast local playback, notifications, and low-friction speech on a Mac.mac-say - Use
when the user wants better naturalness, reusable output files, custom voices, or voice cloning.qwen-tts - If
is missing, fall back toDASHSCOPE_API_KEY
for local playback.mac-say
Environment
: required for Qwen synthesis and voice cloning.DASHSCOPE_API_KEY
: optional,QWEN_AUDIO_REGION
(default) orcn
.intl
: optional directory for generated audio files. Defaults toQWEN_AUDIO_OUTPUT_DIR
.~/.openclaw/data/qwen-audio-lab/output
: optional directory for local state such as remembered voices. Defaults toQWEN_AUDIO_STATE_DIR
.~/.openclaw/data/qwen-audio-lab/state
Commands
Run all commands through:
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py <command> [...]
Preferred high-level commands
Use these first for most user-facing narration tasks:
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-text --text "这是要转成语音的正文" python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-file --text-file /path/to/script.txt python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-ppt --ppt /path/to/file.pptx
Use the older commands only when you specifically want the legacy workflow names. Generated audio and remembered voice state now default to
~/.openclaw/data/qwen-audio-lab/ instead of the skill folder.
Local macOS speech
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py mac-say \ --text "开会了,别忘了带电脑" \ --voice Tingting
Qwen TTS from inline text
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \ --text "你好,我是你的语音助手。" \ --voice Cherry \ --model qwen3-tts-flash \ --language-type Chinese \ --download
Qwen TTS from a text file
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \ --text-file /path/to/script.txt \ --voice Cherry \ --download
Qwen TTS from stdin
cat /path/to/script.txt | python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \ --stdin \ --voice Cherry \ --download
Clone a voice
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py clone-voice \ --audio /path/to/reference.mp3 \ --name claw-voice-01 \ --target-model qwen3-tts-vc-2026-01-22
- Keep the cloning
aligned with the synthesis model family.target-model - Use a clean speech sample with minimal background noise.
- Ask before cloning a third party voice when consent is unclear.
Design a voice from a text prompt
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py design-voice \ --prompt "沉稳的中年男性播音员,音色低沉浑厚,适合纪录片旁白。" \ --name doc-voice-01 \ --target-model qwen3-tts-vd-2026-01-26 \ --preview-format wav
Legacy command: reuse the latest cloned voice
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py speak-last-cloned \ --text "你好,这是我的声音测试。" \ --download
High-level narration from any text source
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-text \ --text "这是要转成语音的正文" \ --output narration.wav python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-file \ --text-file /path/to/script.txt
- Default voice source is
.last-cloned - Use
to use the latest designed voice instead.--voice-source last-designed - Use
and optionally--voice
to force a specific voice id and synthesis model.--model
Legacy command: narrate PPT speaker notes with the latest cloned voice
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py ppt-own-voice --ppt "/path/to/file.pptx"
High-level PPT narration
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-ppt --ppt "/path/to/file.pptx"
- Default voice source is
.last-cloned - Use
to switch to the latest designed voice.--voice-source last-designed - Use
and optionally--voice
to force a specific voice id and synthesis model.--model - Keep
as the backward-compatible alias for the original workflow.ppt-own-voice
Inspect or manage remembered voices
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py list-voices python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py show-last-voice --kind cloned python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py delete-voice --voice claw-voice-01
Workflow rules
- Reuse an existing cloned voice before asking for a new sample.
- Ask for a reference recording if the user wants their own voice and no cloned voice exists yet.
- Prefer the
commands as the primary high-level interface for narration tasks.narrate-* - Keep
andspeak-last-cloned
for backward compatibility with older workflows.ppt-own-voice - Keep only final outputs by default after segmented synthesis unless the user explicitly asks to keep fragments.