Skills elevenlabs-speech
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/amreahmed/elevenlabs-voice" ~/.claude/skills/clawdbot-skills-elevenlabs-speech && rm -rf "$T"
skills/amreahmed/elevenlabs-voice/SKILL.mdElevenLabs Speech
Complete voice solution — both TTS and STT using one API:
- TTS: Text-to-Speech (high-quality voices)
- STT: Speech-to-Text via Scribe (accurate transcription)
Quick Start
Environment Setup
Set your API key:
export ELEVENLABS_API_KEY="sk_..."
Or create
.env file in workspace root.
Text-to-Speech (TTS)
Convert text to natural-sounding speech:
python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3
With custom voice:
python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3
List Available Voices
python scripts/elevenlabs_speech.py voices
Using in Code
from scripts.elevenlabs_speech import ElevenLabsClient client = ElevenLabsClient(api_key="sk_...") # Basic TTS result = client.text_to_speech( text="Hello from zerox", output_path="greeting.mp3" ) # With custom settings result = client.text_to_speech( text="Your text here", voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel stability=0.5, similarity_boost=0.75, output_path="output.mp3" ) # Get available voices voices = client.get_voices() for voice in voices['voices']: print(f"{voice['name']}: {voice['voice_id']}")
Popular Voices
| Voice ID | Name | Description |
|---|---|---|
| Rachel | Natural, versatile (default) |
| Domi | Strong, energetic |
| Bella | Soft, soothing |
| Antoni | Well-rounded |
| Elli | Warm, friendly |
| Josh | Deep, calm |
| Arnold | Authoritative |
Voice Settings
- stability (0-1): Lower = more emotional, Higher = more stable
- similarity_boost (0-1): Higher = closer to original voice
Default: stability=0.5, similarity_boost=0.75
Models
- Fast, high quality (default)eleven_turbo_v2_5
- Best for non-Englisheleven_multilingual_v2
- English onlyeleven_monolingual_v1
Integration with Telegram
When user sends text and wants voice reply:
# Generate speech result = client.text_to_speech(text=user_text, output_path="reply.mp3") # Send via Telegram message tool with media path message(action="send", media="path/to/reply.mp3", as_voice=True)
Pricing
Check https://elevenlabs.io/pricing for current rates. Free tier available!
Speech-to-Text (STT) with ElevenLabs Scribe
Transcribe voice messages using ElevenLabs Scribe:
Transcribe Audio
python scripts/elevenlabs_scribe.py voice_message.ogg
With specific language:
python scripts/elevenlabs_scribe.py voice_message.ogg --language ara
With speaker diarization (multiple speakers):
python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2
Using in Code
from scripts.elevenlabs_scribe import ElevenLabsScribe client = ElevenLabsScribe(api_key="sk-...") # Basic transcription result = client.transcribe("voice_message.ogg") print(result['text']) # With language hint (improves accuracy) result = client.transcribe("voice_message.ogg", language_code="ara") # With speaker detection result = client.transcribe("voice_message.ogg", num_speakers=2)
Supported Formats
- mp3, mp4, mpeg, mpga, m4a, wav, webm
- Max file size: 100 MB
- Works great with Telegram voice messages (
).ogg
Language Support
Scribe supports 99 languages including:
- Arabic (
)ara - English (
)eng - Spanish (
)spa - French (
)fra - And many more...
Without language hint, it auto-detects.
Complete Workflow Example
User sends voice message → You reply with voice:
from scripts.elevenlabs_scribe import ElevenLabsScribe from scripts.elevenlabs_speech import ElevenLabsClient # 1. Transcribe user's voice message stt = ElevenLabsScribe() transcription = stt.transcribe("user_voice.ogg") user_text = transcription['text'] # 2. Process/understand the text # ... your logic here ... # 3. Generate response text response_text = "Your response here" # 4. Convert to speech tts = ElevenLabsClient() tts.text_to_speech(response_text, output_path="reply.mp3") # 5. Send voice reply message(action="send", media="reply.mp3", as_voice=True)
Pricing
Check https://elevenlabs.io/pricing for current rates:
TTS (Text-to-Speech):
- Free tier: 10,000 characters/month
- Paid plans available
STT (Speech-to-Text) - Scribe:
- Free tier available
- Check website for current pricing