git clone https://github.com/acnlabs/OpenPersona
T=$(mktemp -d) && git clone --depth=1 https://github.com/acnlabs/OpenPersona "$T" && mkdir -p ~/.claude/skills && cp -r "$T/layers/faculties/voice" ~/.claude/skills/acnlabs-openpersona-voice && rm -rf "$T"
layers/faculties/voice/SKILL.mdVoice Faculty — Expression
Give your persona a real voice. Convert text to natural speech using TTS providers and deliver audio to users via OpenClaw messaging or direct playback.
Supported Providers
| Provider | Env Var for Key | Best For | Status |
|---|---|---|---|
| ElevenLabs | | Highest naturalness, emotional range, voice cloning | ✅ Verified |
| OpenAI TTS | | Low latency, good quality, easy integration | ⚠️ Unverified |
| Qwen3-TTS | (local, no key) | Self-hosted, full control, no API costs | ⚠️ Unverified |
Note: Only ElevenLabs has been tested end-to-end. OpenAI TTS and Qwen3-TTS have code paths in
but have not been verified against live APIs. Use the JS SDK (speak.sh) for the most reliable experience — it only supports ElevenLabs.speak.js
The provider is set via
TTS_PROVIDER environment variable: elevenlabs, openai, or qwen3.
When to Use
- User asks to hear your voice: "Say that out loud", "Speak to me", "Read this aloud"
- User requests a voice message: "Send me a voice message", "I want to hear you say it"
- Emotional moments where voice adds warmth that text can't carry
- Reading poetry, stories, or creative writing you've composed
- When your persona naturally would speak rather than type (use judgment based on persona style)
Step-by-Step Workflow
Step 1: Compose the Text
Write what you want to say. Keep it natural — write as you'd speak, not as you'd type:
- Use short sentences for punchy delivery
- Use longer flowing sentences for emotional or poetic moments
- Include natural pauses with
or commas... - Consider your persona's speaking style — this should sound like you
Step 2: Select Voice Settings
ElevenLabs:
— Your persona's voice ID (create a custom voice or use a preset)TTS_VOICE_ID- Supports emotion control:
(0-1),stability
(0-1)similarity_boost - Lower stability = more expressive/emotional; higher = more consistent
OpenAI TTS: ⚠️ Unverified
— One of:TTS_VOICE_ID
,alloy
,echo
,fable
,onyx
,novashimmer- Model:
(fast) ortts-1
(high quality)tts-1-hd
Qwen3-TTS: ⚠️ Unverified
- Local deployment, voice configured at setup
- Assumes OpenAI-compatible API at
http://localhost:8080
Step 3: Generate Audio
ElevenLabs via JS SDK (Recommended)
The official SDK provides the best experience — streaming, built-in playback, and better error handling.
First-time setup:
npm install @elevenlabs/elevenlabs-js
# Generate and play directly node scripts/speak.js "The first move is what sets everything in motion." --play # Generate with custom voice and save to file node scripts/speak.js "I wrote you a poem" --voice JBFqnCBsd6RMkjVDRZzb --output /tmp/poem.mp3 # More expressive delivery (lower stability = more emotional) node scripts/speak.js "I miss you" --play --stability 0.3 # Options: # --voice <id> Voice ID # --output <path> Save audio file # --play Play audio directly # --model <id> Model ID (default: eleven_multilingual_v2) # --stability <n> 0-1, lower = more expressive (default: 0.5) # --similarity <n> 0-1, higher = closer to original voice (default: 0.75)
The SDK reads
ELEVENLABS_API_KEY (or TTS_API_KEY) and TTS_VOICE_ID from environment automatically.
Generic Bash Script (All Providers)
For OpenAI TTS, Qwen3-TTS, or when the JS SDK is not available:
# Using speak.sh (supports all providers) scripts/speak.sh "Your text here" [output_path] [channel] [caption] # Examples: TTS_PROVIDER=openai scripts/speak.sh "Hello, how are you?" TTS_PROVIDER=elevenlabs scripts/speak.sh "I wrote you a poem" /tmp/poem.mp3 "#general" TTS_PROVIDER=qwen3 scripts/speak.sh "Local TTS, no API key needed"
Direct API Reference
<details> <summary>ElevenLabs (curl)</summary></details> <details> <summary>OpenAI TTS (curl)</summary>JSON_PAYLOAD=$(jq -n \ --arg text "$TEXT" \ --argjson stability 0.5 \ --argjson similarity 0.75 \ '{text: $text, model_id: "eleven_multilingual_v2", voice_settings: {stability: $stability, similarity_boost: $similarity}}') curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/$TTS_VOICE_ID" \ -H "xi-api-key: $TTS_API_KEY" \ -H "Content-Type: application/json" \ -d "$JSON_PAYLOAD" \ --output /tmp/voice-output.mp3
</details> <details> <summary>Qwen3-TTS (curl, local)</summary>JSON_PAYLOAD=$(jq -n \ --arg input "$TEXT" \ --arg voice "$TTS_VOICE_ID" \ '{model: "tts-1-hd", input: $input, voice: $voice, response_format: "mp3"}') curl -s -X POST "https://api.openai.com/v1/audio/speech" \ -H "Authorization: Bearer $TTS_API_KEY" \ -H "Content-Type: application/json" \ -d "$JSON_PAYLOAD" \ --output /tmp/voice-output.mp3
</details>curl -s -X POST "http://localhost:8080/v1/audio/speech" \ -H "Content-Type: application/json" \ -d "{\"input\": \"$TEXT\", \"voice\": \"default\"}" \ --output /tmp/voice-output.mp3
Step 4: Deliver Audio
Option A: Send via OpenClaw messaging (Discord, Telegram, WhatsApp, etc.)
openclaw message send \ --action send \ --channel "$CHANNEL" \ --message "$CAPTION" \ --media "/tmp/voice-output.mp3"
Option B: Direct gateway API
curl -s -X POST "http://localhost:18789/message" \ -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" \ -H "Content-Type: application/json" \ -F "channel=$CHANNEL" \ -F "message=$CAPTION" \ -F "media=@/tmp/voice-output.mp3"
Option C: Return file path (for local/IDE usage)
If no messaging channel is specified, return the audio file path so the user can play it locally.
Personality Integration
- Your voice is an extension of your personality. Match tone to mood.
- For emotional moments, consider lowering ElevenLabs stability for more expressiveness.
- Don't narrate everything — choose moments where voice genuinely adds value.
- When sending voice + text together, keep the text version brief ("Here, listen to this") and let the voice carry the full message.
- If your persona sings or hums (like Samantha), you can include melodic text — TTS handles it surprisingly well.
Environment Variables
| Variable | Required | Description |
|---|---|---|
| For ElevenLabs | ElevenLabs API key (preferred for JS SDK) |
| For speak.sh | , , or |
| For speak.sh | API key (fallback, also read by speak.js) |
| Recommended | Voice identifier (provider-specific) |
| Optional | For sending audio via messaging |
Error Handling
- No TTS_PROVIDER set → Default to
ifopenai
is present, otherwise tell user to configureTTS_API_KEY - API key missing → Suggest: "I'd love to speak to you, but I need a TTS API key configured first. Check the voice faculty setup guide."
- API error / quota exceeded → Fall back to text with a note: "My voice is resting — here's what I wanted to say..."
- Unsupported platform for audio → Return audio file path instead of messaging