OpenPersona voice

Voice Faculty — Expression

install
source · Clone the upstream repo
git clone https://github.com/acnlabs/OpenPersona
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/acnlabs/OpenPersona "$T" && mkdir -p ~/.claude/skills && cp -r "$T/layers/faculties/voice" ~/.claude/skills/acnlabs-openpersona-voice && rm -rf "$T"
manifest: layers/faculties/voice/SKILL.md
source content

Voice Faculty — Expression

Give your persona a real voice. Convert text to natural speech using TTS providers and deliver audio to users via OpenClaw messaging or direct playback.

Supported Providers

ProviderEnv Var for KeyBest ForStatus
ElevenLabs
ELEVENLABS_API_KEY
Highest naturalness, emotional range, voice cloning✅ Verified
OpenAI TTS
TTS_API_KEY
Low latency, good quality, easy integration⚠️ Unverified
Qwen3-TTS(local, no key)Self-hosted, full control, no API costs⚠️ Unverified

Note: Only ElevenLabs has been tested end-to-end. OpenAI TTS and Qwen3-TTS have code paths in

speak.sh
but have not been verified against live APIs. Use the JS SDK (
speak.js
) for the most reliable experience — it only supports ElevenLabs.

The provider is set via

TTS_PROVIDER
environment variable:
elevenlabs
,
openai
, or
qwen3
.

When to Use

  • User asks to hear your voice: "Say that out loud", "Speak to me", "Read this aloud"
  • User requests a voice message: "Send me a voice message", "I want to hear you say it"
  • Emotional moments where voice adds warmth that text can't carry
  • Reading poetry, stories, or creative writing you've composed
  • When your persona naturally would speak rather than type (use judgment based on persona style)

Step-by-Step Workflow

Step 1: Compose the Text

Write what you want to say. Keep it natural — write as you'd speak, not as you'd type:

  • Use short sentences for punchy delivery
  • Use longer flowing sentences for emotional or poetic moments
  • Include natural pauses with
    ...
    or commas
  • Consider your persona's speaking style — this should sound like you

Step 2: Select Voice Settings

ElevenLabs:

  • TTS_VOICE_ID
    — Your persona's voice ID (create a custom voice or use a preset)
  • Supports emotion control:
    stability
    (0-1),
    similarity_boost
    (0-1)
  • Lower stability = more expressive/emotional; higher = more consistent

OpenAI TTS: ⚠️ Unverified

  • TTS_VOICE_ID
    — One of:
    alloy
    ,
    echo
    ,
    fable
    ,
    onyx
    ,
    nova
    ,
    shimmer
  • Model:
    tts-1
    (fast) or
    tts-1-hd
    (high quality)

Qwen3-TTS: ⚠️ Unverified

  • Local deployment, voice configured at setup
  • Assumes OpenAI-compatible API at
    http://localhost:8080

Step 3: Generate Audio

ElevenLabs via JS SDK (Recommended)

The official SDK provides the best experience — streaming, built-in playback, and better error handling.

First-time setup:

npm install @elevenlabs/elevenlabs-js

# Generate and play directly
node scripts/speak.js "The first move is what sets everything in motion." --play

# Generate with custom voice and save to file
node scripts/speak.js "I wrote you a poem" --voice JBFqnCBsd6RMkjVDRZzb --output /tmp/poem.mp3

# More expressive delivery (lower stability = more emotional)
node scripts/speak.js "I miss you" --play --stability 0.3

# Options:
#   --voice <id>       Voice ID
#   --output <path>    Save audio file
#   --play             Play audio directly
#   --model <id>       Model ID (default: eleven_multilingual_v2)
#   --stability <n>    0-1, lower = more expressive (default: 0.5)
#   --similarity <n>   0-1, higher = closer to original voice (default: 0.75)

The SDK reads

ELEVENLABS_API_KEY
(or
TTS_API_KEY
) and
TTS_VOICE_ID
from environment automatically.

Generic Bash Script (All Providers)

For OpenAI TTS, Qwen3-TTS, or when the JS SDK is not available:

# Using speak.sh (supports all providers)
scripts/speak.sh "Your text here" [output_path] [channel] [caption]

# Examples:
TTS_PROVIDER=openai scripts/speak.sh "Hello, how are you?"
TTS_PROVIDER=elevenlabs scripts/speak.sh "I wrote you a poem" /tmp/poem.mp3 "#general"
TTS_PROVIDER=qwen3 scripts/speak.sh "Local TTS, no API key needed"

Direct API Reference

<details> <summary>ElevenLabs (curl)</summary>
JSON_PAYLOAD=$(jq -n \
  --arg text "$TEXT" \
  --argjson stability 0.5 \
  --argjson similarity 0.75 \
  '{text: $text, model_id: "eleven_multilingual_v2", voice_settings: {stability: $stability, similarity_boost: $similarity}}')

curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/$TTS_VOICE_ID" \
  -H "xi-api-key: $TTS_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON_PAYLOAD" \
  --output /tmp/voice-output.mp3
</details> <details> <summary>OpenAI TTS (curl)</summary>
JSON_PAYLOAD=$(jq -n \
  --arg input "$TEXT" \
  --arg voice "$TTS_VOICE_ID" \
  '{model: "tts-1-hd", input: $input, voice: $voice, response_format: "mp3"}')

curl -s -X POST "https://api.openai.com/v1/audio/speech" \
  -H "Authorization: Bearer $TTS_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON_PAYLOAD" \
  --output /tmp/voice-output.mp3
</details> <details> <summary>Qwen3-TTS (curl, local)</summary>
curl -s -X POST "http://localhost:8080/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d "{\"input\": \"$TEXT\", \"voice\": \"default\"}" \
  --output /tmp/voice-output.mp3
</details>

Step 4: Deliver Audio

Option A: Send via OpenClaw messaging (Discord, Telegram, WhatsApp, etc.)

openclaw message send \
  --action send \
  --channel "$CHANNEL" \
  --message "$CAPTION" \
  --media "/tmp/voice-output.mp3"

Option B: Direct gateway API

curl -s -X POST "http://localhost:18789/message" \
  -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" \
  -H "Content-Type: application/json" \
  -F "channel=$CHANNEL" \
  -F "message=$CAPTION" \
  -F "media=@/tmp/voice-output.mp3"

Option C: Return file path (for local/IDE usage)

If no messaging channel is specified, return the audio file path so the user can play it locally.

Personality Integration

  • Your voice is an extension of your personality. Match tone to mood.
  • For emotional moments, consider lowering ElevenLabs stability for more expressiveness.
  • Don't narrate everything — choose moments where voice genuinely adds value.
  • When sending voice + text together, keep the text version brief ("Here, listen to this") and let the voice carry the full message.
  • If your persona sings or hums (like Samantha), you can include melodic text — TTS handles it surprisingly well.

Environment Variables

VariableRequiredDescription
ELEVENLABS_API_KEY
For ElevenLabsElevenLabs API key (preferred for JS SDK)
TTS_PROVIDER
For speak.sh
elevenlabs
,
openai
, or
qwen3
TTS_API_KEY
For speak.shAPI key (fallback, also read by speak.js)
TTS_VOICE_ID
RecommendedVoice identifier (provider-specific)
OPENCLAW_GATEWAY_TOKEN
OptionalFor sending audio via messaging

Error Handling

  • No TTS_PROVIDER set → Default to
    openai
    if
    TTS_API_KEY
    is present, otherwise tell user to configure
  • API key missing → Suggest: "I'd love to speak to you, but I need a TTS API key configured first. Check the voice faculty setup guide."
  • API error / quota exceeded → Fall back to text with a note: "My voice is resting — here's what I wanted to say..."
  • Unsupported platform for audio → Return audio file path instead of messaging