Skills clawvox
ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more.
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/abhishek-official1/clawvox" ~/.claude/skills/openclaw-skills-clawvox && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/abhishek-official1/clawvox" ~/.openclaw/skills/openclaw-skills-clawvox && rm -rf "$T"
skills/abhishek-official1/clawvox/SKILL.mdClawVox
Transform your OpenClaw assistant into a professional voice production studio with ClawVox - powered by ElevenLabs.
Quick Reference
| Action | Command | Description |
|---|---|---|
| Speak | | Convert text to speech |
| Transcribe | | Speech to text |
| Clone | | Clone a voice |
| SFX | | Generate sound effects |
| Voices | | List available voices |
| Dub | | Translate audio |
| Isolate | | Remove background noise |
Setup
- Get your API key from elevenlabs.io/app/settings/api-keys
- Configure in
:~/.openclaw/openclaw.json
{ skills: { entries: { "clawvox": { apiKey: "YOUR_ELEVENLABS_API_KEY", config: { defaultVoice: "Rachel", defaultModel: "eleven_turbo_v2_5", outputDir: "~/.openclaw/audio" } } } } }
Or set the environment variable:
export ELEVENLABS_API_KEY="your_api_key_here"
Voice Generation (TTS)
Basic Text-to-Speech
# Quick speak with default voice (Rachel) {baseDir}/scripts/speak.sh 'Hello, I am your personal AI assistant.' # Specify voice by name {baseDir}/scripts/speak.sh --voice Adam 'Hello from Adam' # Save to file {baseDir}/scripts/speak.sh --out ~/audio/greeting.mp3 'Welcome to the show' # Use specific model {baseDir}/scripts/speak.sh --model eleven_multilingual_v2 'Bonjour' # Adjust voice settings {baseDir}/scripts/speak.sh --stability 0.5 --similarity 0.8 'Expressive speech' # Adjust speed {baseDir}/scripts/speak.sh --speed 1.2 'Faster speech' # Use multilingual model for other languages {baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Rachel 'Hola, que tal' {baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Adam 'Guten Tag'
Voice Models
| Model | Latency | Languages | Best For |
|---|---|---|---|
| ~75ms | 32 | Real-time, streaming |
| ~250ms | 32 | Balanced quality/speed |
| ~500ms | 29 | Long-form, highest quality |
Available Voices
Premade voices: Rachel, Adam, Antoni, Bella, Domi, Elli, Josh, Sam, Callum, Charlie, George, Liam, Matilda, Alice, Bill, Brian, Chris, Daniel, Eric, Jessica, Laura, Lily, River, Roger, Sarah, Will
Long-Form Content
# Generate audio from text file {baseDir}/scripts/speak.sh --input chapter.txt --voice "George" --out audiobook.mp3
Speech-to-Text (Transcription)
Basic Transcription
# Transcribe audio file {baseDir}/scripts/transcribe.sh recording.mp3 # Save to file {baseDir}/scripts/transcribe.sh --out transcript.txt audio.mp3 # Transcribe with language hint {baseDir}/scripts/transcribe.sh --language es spanish_audio.mp3 # Include timestamps {baseDir}/scripts/transcribe.sh --timestamps podcast.mp3
Supported Formats
- MP3, MP4, MPEG, MPGA, M4A, WAV, WebM
- Maximum file size: 100MB
Voice Cloning
Instant Voice Clone
# Clone from single sample (minimum 30 seconds recommended) {baseDir}/scripts/clone.sh --name MyVoice recording.mp3 # Clone with description {baseDir}/scripts/clone.sh --name BusinessVoice \ --description 'Professional male voice' \ sample.mp3 # Clone with labels {baseDir}/scripts/clone.sh --name MyVoice \ --labels '{"gender":"male","age":"adult"}' \ sample.mp3 # Remove background noise during cloning {baseDir}/scripts/clone.sh --name CleanVoice \ --remove-bg-noise \ sample.mp3 # Test cloned voice {baseDir}/scripts/speak.sh --voice MyVoice 'Testing my cloned voice'
Voice Library Management
# List all available voices {baseDir}/scripts/voices.sh list # Get voice details {baseDir}/scripts/voices.sh info --name Rachel {baseDir}/scripts/voices.sh info --id 21m00Tcm4TlvDq8ikWAM # Search voices (filter output with grep) {baseDir}/scripts/voices.sh list | grep -i "female" # Filter by category {baseDir}/scripts/voices.sh list --category premade {baseDir}/scripts/voices.sh list --category cloned # Download voice preview {baseDir}/scripts/voices.sh preview --name Rachel -o preview.mp3 # Delete custom voice {baseDir}/scripts/voices.sh delete --id "voice_id"
Sound Effects
# Generate sound effect {baseDir}/scripts/sfx.sh 'Heavy rain on a tin roof' # With duration {baseDir}/scripts/sfx.sh --duration 5 'Forest ambiance with birds' # With prompt influence (higher = more accurate) {baseDir}/scripts/sfx.sh --influence 0.8 'Sci-fi laser gun firing' # Save to file {baseDir}/scripts/sfx.sh --out effects/thunder.mp3 'Rolling thunder'
Note: Duration range is 0.5 to 22 seconds (rounded to nearest 0.5)
Voice Isolation
# Remove background noise and isolate voice {baseDir}/scripts/isolate.sh noisy_recording.mp3 # Save to specific file {baseDir}/scripts/isolate.sh --out clean_voice.mp3 meeting_recording.mp3 # Don't tag audio events {baseDir}/scripts/isolate.sh --no-audio-events recording.mp3
Requirements:
- Minimum duration: 4.6 seconds
- Supported formats: MP3, WAV, M4A, OGG, FLAC
Dubbing (Multi-Language Translation)
# Dub audio to Spanish {baseDir}/scripts/dub.sh --target es audio.mp3 # Dub with source language specified {baseDir}/scripts/dub.sh --source en --target ja video.mp4 # Check dubbing status {baseDir}/scripts/dub.sh --status --id "dubbing_id" # Download dubbed audio {baseDir}/scripts/dub.sh --download --id "dubbing_id" --out dubbed.mp3
Supported languages: en, es, fr, de, it, pt, pl, hi, ar, zh, ja, ko, nl, ru, tr, vi, sv, da, fi, cs, el, he, id, ms, no, ro, uk, hu, th
API Usage Examples
For direct API access, all scripts use curl under the hood:
# Direct TTS API call curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/VOICE_ID" \ -H "xi-api-key: $ELEVENLABS_API_KEY" \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "model_id": "eleven_turbo_v2_5"}' \ --output speech.mp3
Error Handling
All scripts provide helpful error messages:
- 401: Authentication failed - Check your API key
- 403: Permission denied - Your API key may not have access
- 429: Rate limit exceeded - Wait before trying again
- 500/502/503: ElevenLabs API issues - Try again later
Testing
Run the test suite to verify everything works:
{baseDir}/test.sh YOUR_API_KEY
Or with environment variable:
export ELEVENLABS_API_KEY="your_key" {baseDir}/test.sh
Troubleshooting
Common Issues
-
"exec host not allowed (requested gateway)"
- The skill needs to run commands in a sandbox environment
- Configure OpenClaw to use sandbox:
tools.exec.host: "sandbox" - Or enable sandboxing in your OpenClaw config
- Alternative: Configure exec approvals for gateway host (see OpenClaw docs)
-
Parse errors with quotes or exclamation marks
- Use single quotes instead of double quotes:
not'Hello world'"Hello world!" - Avoid exclamation marks (
) in text when using double quotes! - For complex text, use the
option with a file--input
- Use single quotes instead of double quotes:
-
"ELEVENLABS_API_KEY not set"
- Ensure
is set or configured in openclaw.jsonELEVENLABS_API_KEY - Check that the API key is at least 20 characters long
- Ensure
-
"jq is required but not installed"
- Install jq:
(Linux) orapt-get install jq
(macOS)brew install jq
- Install jq:
-
"Rate limited"
- Check your ElevenLabs plan quota at elevenlabs.io/app/usage
- Free tier: ~10,000 characters/month
-
"Voice not found"
- Use
to see available voices{baseDir}/scripts/voices.sh list - Check if the voice ID is correct
- Use
-
"Dubbing failed"
- Ensure source audio is clear and audible
- Check supported language codes
-
"File too large"
- Transcription: 100MB max
- Dubbing: 500MB max
- Voice cloning: 50MB per file
Debug Mode
# Enable verbose output DEBUG=1 {baseDir}/scripts/speak.sh 'test' # Show API request details DEBUG=1 {baseDir}/scripts/transcribe.sh audio.mp3
Pricing Notes
ElevenLabs API pricing (approximate):
- Flash v2.5: ~$0.06/min
- Turbo v2.5: ~$0.06/min
- Multilingual v2: ~$0.12/min
- Voice cloning: Included in plan
- Sound effects: ~$0.02/generation
- Transcription: ~$0.02/min (Scribe v1)
Free tier: ~10,000 characters/month