Awesome-openclaw-skills audio-reply

Generate audio replies using TTS. Trigger with "read it to me [URL]" to fetch and read content aloud, or "talk to me [topic]" to generate a spoken response. Also responds to "speak", "say it", "voice reply".

install

source · Clone the upstream repo

git clone https://github.com/sundial-org/awesome-openclaw-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/sundial-org/awesome-openclaw-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/audio-reply" ~/.claude/skills/sundial-org-awesome-openclaw-skills-audio-reply && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/sundial-org/awesome-openclaw-skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/audio-reply" ~/.openclaw/skills/sundial-org-awesome-openclaw-skills-audio-reply && rm -rf "$T"

manifest: skills/audio-reply/SKILL.md

source content

Audio Reply Skill

Generate spoken audio responses using MLX Audio TTS (chatterbox-turbo model).

Trigger Phrases

"read it to me [URL]" - Fetch content from URL and read it aloud
"talk to me [topic/question]" - Generate a conversational response as audio
"speak", "say it", "voice reply" - Convert your response to audio

How to Use

Mode 1: Read URL Content

User: read it to me https://example.com/article

Fetch the URL content using WebFetch
Extract readable text (strip HTML, focus on main content)
Generate audio using TTS
Play the audio and delete the file afterward

Mode 2: Conversational Audio Response

User: talk to me about the weather today

Generate a natural, conversational response
Keep it concise (TTS works best with shorter segments)
Convert to audio, play it, then delete the file

Implementation

TTS Command

uv run mlx_audio.tts.generate \
  --model mlx-community/chatterbox-turbo-fp16 \
  --text "Your text here" \
  --play \
  --file_prefix /tmp/audio_reply

Key Parameters

--model mlx-community/chatterbox-turbo-fp16

- Fast, natural voice

```
--play
```
- Auto-play the generated audio
```
--file_prefix
```
- Save to temp location for cleanup
```
--exaggeration 0.3
```
- Optional: add expressiveness (0.0-1.0)
```
--speed 1.0
```
- Adjust speech rate if needed

Text Preparation Guidelines

For "read it to me" mode:

Fetch URL with WebFetch tool
Extract main content, strip navigation/ads/boilerplate
Summarize if very long (>500 words) - keep key points
Add natural pauses with periods and commas

For "talk to me" mode:

Write conversationally, as if speaking
Use contractions (I'm, you're, it's)
Add filler words sparingly for naturalness ([chuckle], um, anyway)
Keep responses under 200 words for best quality
Avoid technical jargon unless explaining it

Audio Generation & Cleanup (IMPORTANT)

Always delete the audio file after playing - it's already in the chat history.

# Generate with unique filename and play
OUTPUT_FILE="/tmp/audio_reply_$(date +%s)"
uv run mlx_audio.tts.generate \
  --model mlx-community/chatterbox-turbo-fp16 \
  --text "Your response text" \
  --play \
  --file_prefix "$OUTPUT_FILE"

# ALWAYS clean up after playing
rm -f "${OUTPUT_FILE}"*.wav 2>/dev/null

Error Handling

If TTS fails:

Check if model is downloaded (first run downloads ~500MB)
Ensure
```
uv
```
is installed and in PATH
Fall back to text response with apology

Example Workflows

Example 1: Read URL

User: read it to me https://blog.example.com/new-feature

Assistant actions:
1. WebFetch the URL
2. Extract article content
3. Generate TTS:
   uv run mlx_audio.tts.generate \
     --model mlx-community/chatterbox-turbo-fp16 \
     --text "Here's what I found... [article summary]" \
     --play --file_prefix /tmp/audio_reply_1706123456
4. Delete: rm -f /tmp/audio_reply_1706123456*.wav
5. Confirm: "Done reading the article to you."

Example 2: Talk to Me

User: talk to me about what you can help with

Assistant actions:
1. Generate conversational response text
2. Generate TTS:
   uv run mlx_audio.tts.generate \
     --model mlx-community/chatterbox-turbo-fp16 \
     --text "Hey! So I can help you with all kinds of things..." \
     --play --file_prefix /tmp/audio_reply_1706123789
3. Delete: rm -f /tmp/audio_reply_1706123789*.wav
4. (No text output needed - audio IS the response)

Notes

First run may take longer as the model downloads (~500MB)
Audio quality is best for English; other languages may vary
For long content, consider chunking into multiple audio segments
The
```
--play
```
flag uses system audio - ensure volume is up