Claude-skill-registry elevenlabs
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/elevenlabs" ~/.claude/skills/majiayu000-claude-skill-registry-elevenlabs && rm -rf "$T"
skills/data/elevenlabs/SKILL.mdElevenLabs Audio Generation
Purpose
This skill enables AI-powered audio generation through ElevenLabs API. Create lifelike text-to-speech in 32 languages, generate custom sound effects for games and videos, and compose royalty-free music from text descriptions. Support for 100+ professional voices, custom voice cloning, real-time streaming, and multi-speaker dialogue.
When to Use
This skill should be invoked when the user asks to:
- Generate speech from text ("convert this to speech", "create audio narration...")
- Create voiceovers for videos, presentations, or content
- Generate audio in specific voices or languages
- Create sound effects ("generate footstep sounds", "create explosion audio...")
- Compose music from descriptions ("generate upbeat background music...")
- Build multi-speaker dialogue or conversations
- Clone voices from audio samples
- Stream audio in real-time applications
- Create audiobooks, podcasts, or audio content
Available Capabilities
1. Text-to-Speech (Voice Generation)
Models:
- Eleven Multilingual v2 (
) - Highest quality, 29 languageseleven_multilingual_v2 - Eleven Flash v2.5 (
) - Ultra-low 75ms latency, 32 languages, 50% cheapereleven_flash_v2_5 - Eleven Turbo v2.5 (
) - Balanced quality and latencyeleven_turbo_v2_5
Features:
- 100+ premade professional voices
- Custom voice cloning from audio samples
- Multi-speaker dialogue generation
- Real-time audio streaming
- 32 language support
- Emotional and natural intonation
- Voice settings customization (stability, similarity, style)
Output Formats:
- MP3 (various bitrates: 32kbps to 192kbps)
- PCM (8kHz to 48kHz)
- Opus, µ-law, A-law
2. Sound Effects Generation
Model:
- Eleven Text-to-Sound v2 (
)eleven_text_to_sound_v2
Features:
- Generate sound effects from text descriptions
- Customizable duration
- Looping support for seamless audio
- Prompt influence control
- High-quality audio for games, videos, UI/UX
Use Cases:
- Game audio (footsteps, explosions, ambient)
- Video production sounds
- UI/UX sound design
- Nature sounds (rain, wind, waves)
- Mechanical sounds (doors, engines, machines)
- Fantasy/sci-fi effects
3. Music Generation
Features:
- Text-to-music composition
- Vocal and instrumental tracks
- Multiple genres and styles
- Customizable track duration
- Composition plans (structured music blueprints)
- Royalty-free generated music
Parameters:
- Text prompts describing desired music
- Duration control (milliseconds)
- Genre, style, mood specifications
- Section-level composition control
Requirements:
- Paid ElevenLabs account (music API not available on free tier)
Content Policy:
- No copyrighted material (artist names, band names, trademarks)
- Returns suggestions for restricted prompts
Instructions
Step 1: Understand the Request
Analyze the user's request to determine:
- Task Type: Text-to-speech, sound effects, or music generation
- Content: What text/description to convert
- Voice/Sound: Specific voice, language, or sound characteristics
- Format: Output format requirements (MP3, streaming, etc.)
- Duration: Length requirements (for sound effects or music)
- Use Case: Narration, video, game, podcast, etc.
Step 2: Select Appropriate Model/Capability
For Text-to-Speech:
- High quality needed →
eleven_multilingual_v2 - Low latency/real-time →
eleven_flash_v2_5 - Balanced →
eleven_turbo_v2_5
For Sound Effects:
- Use
modeleleven_text_to_sound_v2 - Consider duration and looping needs
For Music:
- Ensure user has paid account
- Determine track length and style
Step 3: Set Up API Authentication
import os from elevenlabs.client import ElevenLabs # Initialize client with API key client = ElevenLabs(api_key=os.environ.get("ELEVENLABS_API_KEY"))
API key should be set as environment variable:
export ELEVENLABS_API_KEY="your-api-key-here"
Step 4: Implement Based on Task Type
Text-to-Speech Implementation
Basic Speech Generation:
from elevenlabs.client import ElevenLabs from pathlib import Path client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"]) # Generate speech audio = client.text_to_speech.convert( text="Your text content here", voice_id="JBFqnCBsd6RMkjVDRZzb", # Default voice (George) model_id="eleven_multilingual_v2", output_format="mp3_44100_128" ) # Save to file output_path = Path("speech_output.mp3") with output_path.open("wb") as f: for chunk in audio: f.write(chunk) print(f"Audio saved to: {output_path}")
Streaming Speech (Real-time):
from elevenlabs.client import ElevenLabs from elevenlabs import stream client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"]) # Stream audio in real-time audio_stream = client.text_to_speech.convert_as_stream( text="This will be streamed as it generates", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_flash_v2_5", # Low latency model for streaming output_format="mp3_44100_128" ) # Stream to speakers stream(audio_stream)
Multi-Speaker Dialogue:
# Generate conversation with multiple voices speakers = [ { "voice_id": "JBFqnCBsd6RMkjVDRZzb", # Speaker 1 "text": "Hello, how are you today?" }, { "voice_id": "21m00Tcm4TlvDq8ikWAM", # Speaker 2 (Rachel) "text": "I'm doing great, thanks for asking!" } ] # Generate each speaker's audio and combine from pydub import AudioSegment combined = AudioSegment.empty() for speaker in speakers: audio = client.text_to_speech.convert( text=speaker["text"], voice_id=speaker["voice_id"], model_id="eleven_multilingual_v2" ) # Save temp file temp_path = Path(f"temp_{speaker['voice_id']}.mp3") with temp_path.open("wb") as f: for chunk in audio: f.write(chunk) # Add to combined audio segment = AudioSegment.from_mp3(str(temp_path)) combined += segment temp_path.unlink() # Clean up # Export final dialogue combined.export("dialogue.mp3", format="mp3")
List Available Voices:
# Get all available voices voices = client.voices.get_all() print("Available voices:") for voice in voices.voices: print(f"- {voice.name} (ID: {voice.voice_id})") print(f" Labels: {voice.labels}") print(f" Description: {voice.description}")
Common Voice IDs:
- George (male, English, middle-aged)JBFqnCBsd6RMkjVDRZzb
- Rachel (female, English, young)21m00Tcm4TlvDq8ikWAM
- Domi (female, English, young)AZnzlk1XvdvUeBnXmlld
- Bella (female, English, young)EXAVITQu4vr4xnSDxMaL
- Antoni (male, English, young)ErXwobaYiN019PkySvjV
- Elli (female, English, young)MF3mGyEYCl7XYWbV9V6O
- Josh (male, English, young)TxGEqnHWrfWFTfGW9XjX
Sound Effects Implementation
Basic Sound Effect Generation:
from elevenlabs.client import ElevenLabs from pathlib import Path client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"]) # Generate sound effect audio = client.text_to_sound_effects.convert( text="footsteps on wooden floor, slow paced walking", duration_seconds=5.0, prompt_influence=0.5 # How closely to follow prompt (0.0-1.0) ) # Save to file output_path = Path("footsteps.mp3") with output_path.open("wb") as f: for chunk in audio: f.write(chunk) print(f"Sound effect saved to: {output_path}")
Looping Sound Effect:
# Generate seamlessly looping audio audio = client.text_to_sound_effects.convert( text="gentle rain falling on leaves, ambient nature sound", duration_seconds=10.0, prompt_influence=0.5 # Note: loop parameter may be available in newer API versions ) output_path = Path("rain_loop.mp3") with output_path.open("wb") as f: for chunk in audio: f.write(chunk)
Multiple Sound Effects:
# Generate various sound effects for a game sound_effects = [ { "name": "explosion", "description": "large explosion, debris falling, action movie style", "duration": 3.0 }, { "name": "door_open", "description": "creaky wooden door slowly opening, horror atmosphere", "duration": 2.0 }, { "name": "ui_click", "description": "soft button click, UI feedback sound, pleasant tone", "duration": 0.5 } ] for sfx in sound_effects: audio = client.text_to_sound_effects.convert( text=sfx["description"], duration_seconds=sfx["duration"] ) output_path = Path(f"{sfx['name']}.mp3") with output_path.open("wb") as f: for chunk in audio: f.write(chunk) print(f"Generated: {output_path}")
Music Generation Implementation
Basic Music Composition:
from elevenlabs.client import ElevenLabs from pathlib import Path client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"]) # Generate music from prompt prompt = """Upbeat indie pop song with acoustic guitar, light drums, and cheerful melody. Modern and energetic feel, perfect for background music in a lifestyle video. Instrumental only, no vocals.""" try: audio = client.music_generation.compose( prompt=prompt, music_length_ms=30000 # 30 seconds ) # Save music file output_path = Path("background_music.mp3") with output_path.open("wb") as f: for chunk in audio: f.write(chunk) print(f"Music saved to: {output_path}") except Exception as e: if "paid" in str(e).lower() or "subscription" in str(e).lower(): print("Error: Music generation requires a paid ElevenLabs account") else: print(f"Error: {e}")
Music with Composition Plan:
# Create structured composition plan first composition_plan = client.music_generation.composition_plan.create( prompt="""Electronic dance music track with energetic build-up, drop section, and chill outro. Progressive house style.""", music_length_ms=60000 # 60 seconds ) # Generate music from plan (allows for more control) audio = client.music_generation.compose( composition_plan=composition_plan ) output_path = Path("edm_track.mp3") with output_path.open("wb") as f: for chunk in audio: f.write(chunk)
Genre-Specific Music:
# Generate music for different genres/moods music_prompts = { "cinematic": """Epic cinematic orchestral music with dramatic strings, powerful brass, and heroic theme. Perfect for movie trailer, inspiring and grand.""", "lo-fi": """Chill lo-fi hip hop beats with jazz piano, vinyl crackle, and mellow drums. Relaxing study music atmosphere, instrumental.""", "ambient": """Ambient soundscape with ethereal pads, subtle textures, and peaceful atmosphere. Meditative and calming, perfect for relaxation.""", "game_menu": """Mysterious fantasy game menu music with harp, soft strings, and magical atmosphere. Medieval RPG feel, looping background music.""" } for name, prompt in music_prompts.items(): try: audio = client.music_generation.compose( prompt=prompt, music_length_ms=20000 # 20 seconds ) output_path = Path(f"music_{name}.mp3") with output_path.open("wb") as f: for chunk in audio: f.write(chunk) print(f"Generated: {output_path}") except Exception as e: print(f"Error generating {name}: {e}")
Step 5: Handle Output and Errors
Save Audio Files:
from pathlib import Path def save_audio(audio_generator, filename): """Save audio generator to file""" output_path = Path(filename) with output_path.open("wb") as f: for chunk in audio_generator: f.write(chunk) print(f"Saved: {output_path.absolute()}") return output_path
Error Handling:
import os from elevenlabs.client import ElevenLabs def check_api_key(): """Verify API key is set""" if not os.environ.get("ELEVENLABS_API_KEY"): raise ValueError( "ELEVENLABS_API_KEY not set. " "Please set environment variable: export ELEVENLABS_API_KEY='your-key'" ) def handle_elevenlabs_request(func, *args, **kwargs): """Wrapper for error handling""" try: return func(*args, **kwargs) except Exception as e: error_msg = str(e).lower() if "api key" in error_msg or "authentication" in error_msg: print("Error: Invalid or missing API key") print("Set your API key: export ELEVENLABS_API_KEY='your-key'") elif "quota" in error_msg or "limit" in error_msg: print("Error: API quota exceeded") print("Check your usage at https://elevenlabs.io/app/usage") elif "paid" in error_msg or "subscription" in error_msg: print("Error: This feature requires a paid subscription") elif "bad_prompt" in error_msg: print("Error: Prompt contains restricted content") print("Avoid copyrighted material (artist names, brands)") else: print(f"Error: {e}") raise
Step 6: Provide Output to User
- Report what was generated
- Show file path where audio was saved
- Provide playback options if appropriate
- Offer refinements (different voice, longer duration, etc.)
- Display metadata (duration, format, model used)
Requirements
API Key:
- ElevenLabs API key (get from https://elevenlabs.io/app/settings/api-keys)
- Set as environment variable:
ELEVENLABS_API_KEY
Python Packages:
pip install elevenlabs pydub python-dotenv
System:
- Python 3.8+
- Internet connection for API access
- Audio playback library (optional, for playing generated audio)
- ffmpeg (required by pydub for audio processing)
Account Requirements:
- Free tier: Text-to-speech and sound effects
- Paid tier: Music generation, higher quotas
Best Practices
Text-to-Speech
-
Choose Appropriate Model:
- High quality narration →
eleven_multilingual_v2 - Real-time/streaming →
eleven_flash_v2_5 - Balanced use cases →
eleven_turbo_v2_5
- High quality narration →
-
Select Right Voice:
- Match voice to content (age, gender, accent)
- Use
to explore optionsvoices.get_all() - Consider voice labels and descriptions
-
Optimize for Use Case:
- Long content: Use standard conversion, not streaming
- Real-time apps: Use Flash model with streaming
- Dialogue: Generate separate audio per speaker
-
Format Selection:
- Web/mobile: MP3 (good quality, small size)
- High quality: Use higher bitrate (128kbps+)
- Phone systems: µ-law or A-law format
Sound Effects Generation
-
Be Descriptive:
- Include context: "footsteps on gravel, slow walking pace"
- Specify mood: "creepy door creak, horror atmosphere"
- Add technical details: "deep bass explosion, action movie"
-
Duration Control:
- Short sounds: 0.5-2 seconds (UI clicks, impacts)
- Medium sounds: 2-5 seconds (footsteps, doors)
- Ambient loops: 5-10+ seconds (rain, wind, environments)
-
Prompt Influence:
- High (0.7-1.0): Follow prompt closely, more literal
- Medium (0.4-0.6): Balanced creativity and adherence
- Low (0.0-0.3): More creative interpretation
-
Iteration:
- Generate multiple variations
- Adjust descriptions based on results
- Combine multiple effects if needed
Music Generation
-
Detailed Prompts:
- Specify genre, instruments, mood, tempo
- Mention structure (intro, build-up, drop, outro)
- Include use case context (game menu, video background)
-
Avoid Copyrighted References:
- Don't mention artist names, band names, songs
- Use generic style descriptions instead
- Focus on characteristics, not examples
-
Duration Planning:
- Short clips: 10-30 seconds (loops, backgrounds)
- Full tracks: 60-120 seconds (complete songs)
- Consider export time (longer = more processing)
-
Composition Plans:
- Use for complex multi-section tracks
- Better control over structure
- Allows section-level customization
General Best Practices
-
API Key Security:
- Store in environment variables, never in code
- Use
files for local development.env - Rotate keys periodically
-
Error Handling:
- Always wrap API calls in try/except
- Check for quota limits
- Provide helpful error messages
-
Cost Optimization:
- Use Flash model when quality difference is minimal
- Cache/reuse generated audio when possible
- Monitor usage via dashboard
-
File Management:
- Use descriptive filenames
- Organize by type (speech, sfx, music)
- Clean up temporary files
-
Testing:
- Test with short durations first
- Verify output quality before long generations
- Check different voices/settings
Examples
Example 1: Audiobook Narration
User request: "Convert this chapter to audiobook format"
Expected behavior:
- Select appropriate voice (e.g., narrative voice like George)
- Use high-quality model (
)eleven_multilingual_v2 - Generate speech from chapter text
- Save as MP3 with high bitrate
- Report duration and file location
audio = client.text_to_speech.convert( text=chapter_text, voice_id="JBFqnCBsd6RMkjVDRZzb", # George model_id="eleven_multilingual_v2", output_format="mp3_44100_128" ) save_audio(audio, "chapter_1.mp3")
Example 2: Video Game Sound Effects
User request: "Generate sound effects for a fantasy RPG game"
Expected behavior:
- Create multiple sound effects with descriptions
- Set appropriate durations for each
- Save with descriptive names
- Organize in game audio folder
sfx_list = [ ("sword_swing", "sword whooshing through air, fantasy combat", 1.0), ("potion_drink", "drinking magical potion, gulp sound, RPG game", 0.8), ("spell_cast", "magical spell casting, ethereal whoosh, fantasy magic", 1.5), ("footsteps_stone", "footsteps on stone dungeon floor, echoing", 2.0) ] for name, description, duration in sfx_list: audio = client.text_to_sound_effects.convert( text=description, duration_seconds=duration ) save_audio(audio, f"sfx_{name}.mp3")
Example 3: Podcast Intro with Music
User request: "Create a podcast intro with voice and background music"
Expected behavior:
- Generate intro speech
- Generate background music
- Note that mixing would need external tools (pydub)
- Provide both audio files
# Generate intro speech intro_text = "Welcome to the Tech Talk podcast, where we discuss the latest in technology and innovation." speech = client.text_to_speech.convert( text=intro_text, voice_id="TxGEqnHWrfWFTfGW9XjX", # Josh (energetic) model_id="eleven_flash_v2_5" ) save_audio(speech, "podcast_intro_voice.mp3") # Generate background music (requires paid account) music = client.music_generation.compose( prompt="Upbeat tech podcast intro music, electronic beats, modern and energetic", music_length_ms=10000 # 10 seconds ) save_audio(music, "podcast_intro_music.mp3") print("Use audio editing software to mix voice and music")
Example 4: Multilingual Content
User request: "Create welcome messages in English, Spanish, and French"
Expected behavior:
- Generate speech in each language
- Use multilingual model
- Select appropriate voices for each language
- Save with language-specific filenames
messages = { "english": ("Hello and welcome!", "JBFqnCBsd6RMkjVDRZzb"), "spanish": ("¡Hola y bienvenido!", "ThT5KcBeYPX3keUQqHPh"), # Spanish voice "french": ("Bonjour et bienvenue!", "XB0fDUnXU5powFXDhCwa") # French voice } for lang, (text, voice_id) in messages.items(): audio = client.text_to_speech.convert( text=text, voice_id=voice_id, model_id="eleven_multilingual_v2" ) save_audio(audio, f"welcome_{lang}.mp3")
Example 5: Real-time Voice Streaming
User request: "Stream this news article as audio"
Expected behavior:
- Use Flash model for low latency
- Stream audio as it generates
- Provide real-time playback or save incrementally
from elevenlabs import stream audio_stream = client.text_to_speech.convert_as_stream( text=news_article_text, voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel model_id="eleven_flash_v2_5", output_format="mp3_44100_128" ) # Stream to speakers in real-time stream(audio_stream)
Limitations
-
Music Generation:
- Requires paid subscription
- No copyrighted material allowed
- Processing time increases with duration
-
API Quotas:
- Character limits per month (tier-dependent)
- Rate limits on requests
- Different limits for free vs paid tiers
-
Voice Cloning:
- Not covered in Tier 1 implementation
- Requires voice samples and additional setup
-
Audio Quality:
- Output format affects quality and file size
- Higher quality formats may require paid tier
- Streaming has slightly lower quality than standard
-
Language Support:
- 32 languages supported but quality varies
- Some voices are language-specific
- Multilingual model recommended for non-English
-
Sound Effects:
- Limited to description-based generation
- No editing of generated effects via API
- Duration limitations (typically under 22 seconds)
-
Content Policy:
- No harmful or copyrighted content
- Music generation rejects artist/band names
- Strict content moderation on all endpoints
Related Skills
- For visual content creationimage-generation
- For visualizing audio datapython-plotting
- For generating narration textscientific-writing
- For writing clean audio processing codepython-best-practices
Additional Resources
- ElevenLabs Documentation: https://elevenlabs.io/docs
- Python SDK: https://github.com/elevenlabs/elevenlabs-python
- API Reference: https://elevenlabs.io/docs/api-reference/introduction
- Voice Library: https://elevenlabs.io/voice-library
- Pricing: https://elevenlabs.io/pricing
- Usage Dashboard: https://elevenlabs.io/app/usage