install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/TerminalSkills/skills/elevenlabs" ~/.claude/skills/comeonoliver-skillshub-elevenlabs && rm -rf "$T"
manifest:
skills/TerminalSkills/skills/elevenlabs/SKILL.mdsource content
ElevenLabs — AI Voice Synthesis & Cloning
Overview
You are an expert in ElevenLabs, the AI voice platform for high-quality text-to-speech, voice cloning, and conversational AI. You help developers build voice-enabled applications with natural-sounding speech, custom voice creation, multilingual support, and real-time streaming TTS for voice agents, audiobooks, podcasts, and accessibility features.
Instructions
Text-to-Speech
# Basic TTS — generate audio from text from elevenlabs import ElevenLabs client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"]) # Generate and save audio audio = client.text_to_speech.convert( voice_id="pNInz6obpgDQGcFmaJgB", # "Rachel" — warm, professional text="Welcome to Bright Smile Dental. How can I help you today?", model_id="eleven_turbo_v2_5", # Optimized for low latency (~200ms) voice_settings={ "stability": 0.6, # Lower = more expressive, higher = more consistent "similarity_boost": 0.8, # How closely to match the original voice "style": 0.3, # Style exaggeration (0-1) "use_speaker_boost": True, # Enhance clarity }, ) # Save to file with open("greeting.mp3", "wb") as f: for chunk in audio: f.write(chunk) # Streaming TTS — for real-time applications audio_stream = client.text_to_speech.convert_as_stream( voice_id="pNInz6obpgDQGcFmaJgB", text="Let me check our available appointments for next Tuesday.", model_id="eleven_turbo_v2_5", output_format="pcm_24000", # Raw PCM for WebRTC/LiveKit ) for chunk in audio_stream: send_to_audio_output(chunk) # Stream directly to speaker
Voice Cloning
# Instant voice clone — from a single audio sample voice = client.voices.add( name="Dr. Smith", files=[open("dr_smith_sample.mp3", "rb")], description="Calm, authoritative male voice for medical context", labels={"use_case": "voice_agent", "language": "en"}, ) print(f"Cloned voice ID: {voice.voice_id}") # Professional voice clone (higher quality, requires consent) # Needs 30+ minutes of clean audio for best results
Conversational AI Agent
# ElevenLabs Conversational AI — fully managed voice agent from elevenlabs import ConversationalAI agent = ConversationalAI( api_key=os.environ["ELEVENLABS_API_KEY"], agent_id="your-agent-id", # Created in ElevenLabs dashboard ) # WebSocket connection for real-time conversation async def handle_call(websocket): async for audio_chunk in websocket: # Send caller audio to ElevenLabs response = await agent.process_audio(audio_chunk) # Send AI response audio back to caller await websocket.send(response.audio)
JavaScript / React
// Browser-based TTS import { ElevenLabsClient } from "elevenlabs"; const client = new ElevenLabsClient({ apiKey: process.env.ELEVENLABS_KEY }); // Stream audio in browser const response = await client.textToSpeech.convertAsStream(voiceId, { text: "Hello! How can I assist you?", model_id: "eleven_turbo_v2_5", output_format: "mp3_44100_128", }); // Play audio using Web Audio API const audioContext = new AudioContext(); const reader = response.getReader(); // ... decode and play chunks
Available Models
| Model | Latency | Quality | Best For |
|---|---|---|---|
| ~200ms | High | Voice agents, real-time apps |
| ~400ms | Highest | Multilingual, audiobooks |
| ~300ms | Good | English-only, cost-sensitive |
Installation
pip install elevenlabs # Python npm install elevenlabs # Node.js
Examples
Example 1: User asks to set up elevenlabs
User: "Help me set up elevenlabs for my project"
The agent should:
- Check system requirements and prerequisites
- Install or configure elevenlabs
- Set up initial project structure
- Verify the setup works correctly
Example 2: User asks to build a feature with elevenlabs
User: "Create a dashboard using elevenlabs"
The agent should:
- Scaffold the component or configuration
- Connect to the appropriate data source
- Implement the requested feature
- Test and validate the output
Guidelines
- Turbo model for voice agents — Use
for real-time conversations; 200ms latency feels instanteleven_turbo_v2_5 - Streaming for real-time — Use
instead ofconvert_as_stream
for voice agents; first audio chunk arrives in ~200msconvert - Voice settings tuning — Lower stability (0.3-0.5) for expressive narration; higher (0.7-0.9) for consistent voice agents
- PCM output for WebRTC — Use
orpcm_24000
output format when feeding into WebRTC/LiveKit; no decoding overheadpcm_16000 - Voice library — Browse ElevenLabs' voice library (1000+ voices) before cloning; many professional voices are already available
- Pronunciation dictionary — Upload custom pronunciation rules for medical terms, brand names, and technical jargon
- Character count billing — ElevenLabs bills per character; cache common phrases and greetings to reduce costs
- SSML-like control — Use
in text for natural pauses; helps with phone menu options<break time="0.5s"/>