Awesome-openclaw-skills audio-reply
Generate audio replies using TTS. Trigger with "read it to me [URL]" to fetch and read content aloud, or "talk to me [topic]" to generate a spoken response. Also responds to "speak", "say it", "voice reply".
install
source · Clone the upstream repo
git clone https://github.com/sundial-org/awesome-openclaw-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/sundial-org/awesome-openclaw-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/audio-reply" ~/.claude/skills/sundial-org-awesome-openclaw-skills-audio-reply && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/sundial-org/awesome-openclaw-skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/audio-reply" ~/.openclaw/skills/sundial-org-awesome-openclaw-skills-audio-reply && rm -rf "$T"
manifest:
skills/audio-reply/SKILL.mdsource content
Audio Reply Skill
Generate spoken audio responses using MLX Audio TTS (chatterbox-turbo model).
Trigger Phrases
- "read it to me [URL]" - Fetch content from URL and read it aloud
- "talk to me [topic/question]" - Generate a conversational response as audio
- "speak", "say it", "voice reply" - Convert your response to audio
How to Use
Mode 1: Read URL Content
User: read it to me https://example.com/article
- Fetch the URL content using WebFetch
- Extract readable text (strip HTML, focus on main content)
- Generate audio using TTS
- Play the audio and delete the file afterward
Mode 2: Conversational Audio Response
User: talk to me about the weather today
- Generate a natural, conversational response
- Keep it concise (TTS works best with shorter segments)
- Convert to audio, play it, then delete the file
Implementation
TTS Command
uv run mlx_audio.tts.generate \ --model mlx-community/chatterbox-turbo-fp16 \ --text "Your text here" \ --play \ --file_prefix /tmp/audio_reply
Key Parameters
- Fast, natural voice--model mlx-community/chatterbox-turbo-fp16
- Auto-play the generated audio--play
- Save to temp location for cleanup--file_prefix
- Optional: add expressiveness (0.0-1.0)--exaggeration 0.3
- Adjust speech rate if needed--speed 1.0
Text Preparation Guidelines
For "read it to me" mode:
- Fetch URL with WebFetch tool
- Extract main content, strip navigation/ads/boilerplate
- Summarize if very long (>500 words) - keep key points
- Add natural pauses with periods and commas
For "talk to me" mode:
- Write conversationally, as if speaking
- Use contractions (I'm, you're, it's)
- Add filler words sparingly for naturalness ([chuckle], um, anyway)
- Keep responses under 200 words for best quality
- Avoid technical jargon unless explaining it
Audio Generation & Cleanup (IMPORTANT)
Always delete the audio file after playing - it's already in the chat history.
# Generate with unique filename and play OUTPUT_FILE="/tmp/audio_reply_$(date +%s)" uv run mlx_audio.tts.generate \ --model mlx-community/chatterbox-turbo-fp16 \ --text "Your response text" \ --play \ --file_prefix "$OUTPUT_FILE" # ALWAYS clean up after playing rm -f "${OUTPUT_FILE}"*.wav 2>/dev/null
Error Handling
If TTS fails:
- Check if model is downloaded (first run downloads ~500MB)
- Ensure
is installed and in PATHuv - Fall back to text response with apology
Example Workflows
Example 1: Read URL
User: read it to me https://blog.example.com/new-feature Assistant actions: 1. WebFetch the URL 2. Extract article content 3. Generate TTS: uv run mlx_audio.tts.generate \ --model mlx-community/chatterbox-turbo-fp16 \ --text "Here's what I found... [article summary]" \ --play --file_prefix /tmp/audio_reply_1706123456 4. Delete: rm -f /tmp/audio_reply_1706123456*.wav 5. Confirm: "Done reading the article to you."
Example 2: Talk to Me
User: talk to me about what you can help with Assistant actions: 1. Generate conversational response text 2. Generate TTS: uv run mlx_audio.tts.generate \ --model mlx-community/chatterbox-turbo-fp16 \ --text "Hey! So I can help you with all kinds of things..." \ --play --file_prefix /tmp/audio_reply_1706123789 3. Delete: rm -f /tmp/audio_reply_1706123789*.wav 4. (No text output needed - audio IS the response)
Notes
- First run may take longer as the model downloads (~500MB)
- Audio quality is best for English; other languages may vary
- For long content, consider chunking into multiple audio segments
- The
flag uses system audio - ensure volume is up--play