Skills elevenlabs-speech

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

install

source · Clone the upstream repo

git clone https://github.com/openclaw/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/amreahmed/elevenlabs-voice" ~/.claude/skills/clawdbot-skills-elevenlabs-speech && rm -rf "$T"

manifest: skills/amreahmed/elevenlabs-voice/SKILL.md

source content

ElevenLabs Speech

Complete voice solution — both TTS and STT using one API:

TTS: Text-to-Speech (high-quality voices)
STT: Speech-to-Text via Scribe (accurate transcription)

Quick Start

Environment Setup

Set your API key:

export ELEVENLABS_API_KEY="sk_..."

Or create

.env

file in workspace root.

Text-to-Speech (TTS)

Convert text to natural-sounding speech:

python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3

With custom voice:

python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3

List Available Voices

python scripts/elevenlabs_speech.py voices

Using in Code

from scripts.elevenlabs_speech import ElevenLabsClient

client = ElevenLabsClient(api_key="sk_...")

# Basic TTS
result = client.text_to_speech(
    text="Hello from zerox",
    output_path="greeting.mp3"
)

# With custom settings
result = client.text_to_speech(
    text="Your text here",
    voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel
    stability=0.5,
    similarity_boost=0.75,
    output_path="output.mp3"
)

# Get available voices
voices = client.get_voices()
for voice in voices['voices']:
    print(f"{voice['name']}: {voice['voice_id']}")

Popular Voices

Voice ID	Name	Description
`21m00Tcm4TlvDq8ikWAM`	Rachel	Natural, versatile (default)
`AZnzlk1XvdvUeBnXmlld`	Domi	Strong, energetic
`EXAVITQu4vr4xnSDxMaL`	Bella	Soft, soothing
`ErXwobaYiN019PkySvjV`	Antoni	Well-rounded
`MF3mGyEYCl7XYWbV9V6O`	Elli	Warm, friendly
`TxGEqnHWrfWFTfGW9XjX`	Josh	Deep, calm
`VR6AewLTigWG4xSOukaG`	Arnold	Authoritative

Voice Settings

stability (0-1): Lower = more emotional, Higher = more stable
similarity_boost (0-1): Higher = closer to original voice

Default: stability=0.5, similarity_boost=0.75

Models

```
eleven_turbo_v2_5
```
- Fast, high quality (default)
```
eleven_multilingual_v2
```
- Best for non-English
```
eleven_monolingual_v1
```
- English only

Integration with Telegram

When user sends text and wants voice reply:

# Generate speech
result = client.text_to_speech(text=user_text, output_path="reply.mp3")

# Send via Telegram message tool with media path
message(action="send", media="path/to/reply.mp3", as_voice=True)

Pricing

Check https://elevenlabs.io/pricing for current rates. Free tier available!

Speech-to-Text (STT) with ElevenLabs Scribe

Transcribe voice messages using ElevenLabs Scribe:

Transcribe Audio

python scripts/elevenlabs_scribe.py voice_message.ogg

With specific language:

python scripts/elevenlabs_scribe.py voice_message.ogg --language ara

With speaker diarization (multiple speakers):

python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2

Using in Code

from scripts.elevenlabs_scribe import ElevenLabsScribe

client = ElevenLabsScribe(api_key="sk-...")

# Basic transcription
result = client.transcribe("voice_message.ogg")
print(result['text'])

# With language hint (improves accuracy)
result = client.transcribe("voice_message.ogg", language_code="ara")

# With speaker detection
result = client.transcribe("voice_message.ogg", num_speakers=2)

Supported Formats

mp3, mp4, mpeg, mpga, m4a, wav, webm
Max file size: 100 MB
Works great with Telegram voice messages (
```
.ogg
```
)

Language Support

Scribe supports 99 languages including:

Arabic (
```
ara
```
)
English (
```
eng
```
)
Spanish (
```
spa
```
)
French (
```
fra
```
)
And many more...

Without language hint, it auto-detects.

Complete Workflow Example

User sends voice message → You reply with voice:

from scripts.elevenlabs_scribe import ElevenLabsScribe
from scripts.elevenlabs_speech import ElevenLabsClient

# 1. Transcribe user's voice message
stt = ElevenLabsScribe()
transcription = stt.transcribe("user_voice.ogg")
user_text = transcription['text']

# 2. Process/understand the text
# ... your logic here ...

# 3. Generate response text
response_text = "Your response here"

# 4. Convert to speech
tts = ElevenLabsClient()
tts.text_to_speech(response_text, output_path="reply.mp3")

# 5. Send voice reply
message(action="send", media="reply.mp3", as_voice=True)

Pricing

Check https://elevenlabs.io/pricing for current rates:

TTS (Text-to-Speech):

Free tier: 10,000 characters/month
Paid plans available

STT (Speech-to-Text) - Scribe:

Free tier available
Check website for current pricing