git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tools/skill-elevenlabs-tts-tool" ~/.claude/skills/diegosouzapw-awesome-omni-skill-skill-elevenlabs-tts-tool && rm -rf "$T"
skills/tools/skill-elevenlabs-tts-tool/SKILL.mdWhen to use
- Converting text to speech with ElevenLabs API
- Exploring available voices and models
- Managing TTS subscriptions and usage
- Integrating TTS into workflows and pipelines
ElevenLabs TTS Tool Skill
Purpose
Comprehensive guide for the
elevenlabs-tts-tool CLI - a professional command-line interface for ElevenLabs text-to-speech synthesis. Provides both direct audio playback and file output with support for 42+ premium voices and multiple models.
When to Use This Skill
Use this skill when:
- Converting text to speech for notifications, audiobooks, or content creation
- Exploring and comparing different voice characteristics
- Managing ElevenLabs subscription quotas and usage
- Building voice-enabled workflows and automation
- Integrating TTS into Claude Code hooks or other tools
Do NOT use this skill for:
- Direct ElevenLabs API programming (use SDK docs instead)
- Custom voice cloning (requires ElevenLabs web interface)
- Real-time streaming TTS (tool focuses on file/playback generation)
CLI Tool: elevenlabs-tts-tool
Professional text-to-speech CLI tool built with Python 3.13+, uv, and the ElevenLabs SDK.
Installation
# Clone repository git clone https://github.com/dnvriend/elevenlabs-tts-tool.git cd elevenlabs-tts-tool # Install globally with uv uv tool install . # Verify installation elevenlabs-tts-tool --version
Prerequisites
- Python: 3.13 or higher
- API Key: ElevenLabs API key (get from https://elevenlabs.io/app/settings/api-keys)
- Environment Variable:
export ELEVENLABS_API_KEY='your-api-key'
Quick Start
# Set API key export ELEVENLABS_API_KEY='your-api-key' # Basic text-to-speech elevenlabs-tts-tool synthesize "Hello world" # Use different voice elevenlabs-tts-tool synthesize "Hello" --voice adam # Save to file elevenlabs-tts-tool synthesize "Text" --output speech.mp3
Progressive Disclosure
<details> <summary><strong>📖 Core Commands (Click to expand)</strong></summary>synthesize - Convert Text to Speech
Convert text to speech using ElevenLabs API. Supports direct playback or file output.
Usage:
elevenlabs-tts-tool synthesize [TEXT] [OPTIONS]
Arguments:
: Text to synthesize (optional if --stdin used)TEXT
: Read text from stdin instead of argument--stdin, -s
: Voice name or ID (default: rachel)--voice, -v NAME
: Model ID (default: eleven_turbo_v2_5)--model, -m ID
: Save to audio file instead of playing--output, -o PATH
: Output format (default: mp3_44100_128)--format, -f FORMAT
Examples:
# Basic usage - play through speakers elevenlabs-tts-tool synthesize "Hello world" # Use different voice elevenlabs-tts-tool synthesize "Hello" --voice adam # Use specific model elevenlabs-tts-tool synthesize "Hello" --model eleven_multilingual_v2 # Emotional expression (requires eleven_v3 model) elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3 # Multiple emotions elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3 # Add pauses with SSML elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three." # Read from stdin echo "Text from pipeline" | elevenlabs-tts-tool synthesize --stdin # Save to file elevenlabs-tts-tool synthesize "Text" --output speech.mp3 # Pipeline integration cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audiobook.mp3
Output: Plays audio through default speakers or saves to specified file format.
Available Formats:
(default): MP3, 44.1kHz, 128kbpsmp3_44100_128
: MP3, 44.1kHz, 64kbpsmp3_44100_64
: MP3, 22.05kHz, 32kbpsmp3_22050_32
: PCM WAV, 44.1kHz (requires Pro tier)pcm_44100
list-voices - Show Available Voices
List all available ElevenLabs voices with characteristics.
Usage:
elevenlabs-tts-tool list-voices
Examples:
# List all voices elevenlabs-tts-tool list-voices # Filter by gender elevenlabs-tts-tool list-voices | grep female elevenlabs-tts-tool list-voices | grep male # Filter by accent elevenlabs-tts-tool list-voices | grep British elevenlabs-tts-tool list-voices | grep American # Filter by age elevenlabs-tts-tool list-voices | grep young elevenlabs-tts-tool list-voices | grep middle_aged # Combine filters elevenlabs-tts-tool list-voices | grep "female.*young.*British"
Output:
Voice Gender Age Accent Description ==================================================================================================== rachel female young American Calm and friendly American voice... adam male middle_aged American Deep, authoritative American male... charlotte female middle_aged British Smooth, professional British voice... ... ==================================================================================================== Total: 42 voices available
Popular Voices:
- rachel: Calm, friendly American female (default)
- adam: Deep, authoritative American male
- charlotte: Professional British female
- josh: Young, casual American male
- bella: Expressive Italian female
list-models - Show TTS Models
List all available ElevenLabs TTS models with characteristics and use cases.
Usage:
elevenlabs-tts-tool list-models
Examples:
# List all models elevenlabs-tts-tool list-models # Filter by status elevenlabs-tts-tool list-models | grep stable elevenlabs-tts-tool list-models | grep deprecated # Find low-latency models elevenlabs-tts-tool list-models | grep -i "ultra-low" # Find multilingual models elevenlabs-tts-tool list-models | grep -i "multilingual"
Output: Comprehensive model information including:
- Model ID and version
- Quality and latency characteristics
- Language support (mono vs multilingual)
- Character limits
- Best use cases
- Special features (emotions, etc.)
Key Models:
- eleven_turbo_v2_5: Fast, high-quality (default, best value)
- eleven_flash_v2_5: Ultra-low latency (real-time applications)
- eleven_multilingual_v2: 29 languages, production quality
- eleven_v3: Most expressive with emotion tags (alpha, 2x cost)
Cost Multipliers:
- Turbo/Flash models: 1x cost
- Multilingual v2: 1x cost
- v3 models: 2x cost (half the minutes/tokens)
info - Show Subscription Info
Display subscription tier, character usage, quota limits, and historical usage.
Usage:
elevenlabs-tts-tool info [--days N]
Arguments:
: Number of days of historical usage to display (default: 7)--days, -d N
Examples:
# View subscription with last 7 days of usage elevenlabs-tts-tool info # View last 30 days of usage elevenlabs-tts-tool info --days 30 # Quick quota check (1 day) elevenlabs-tts-tool info --days 1 # Check usage before long generation elevenlabs-tts-tool info --days 1 && elevenlabs-tts-tool synthesize "Long text..."
Output Information:
- Subscription tier and status
- Character usage (used/limit/remaining)
- Quota reset date
- Historical usage breakdown by day
- Average daily usage
- Projected monthly usage
- Warnings when approaching quota limits
Use Cases:
- Monitor character quota consumption
- Track usage patterns over time
- Plan when to upgrade subscription tier
- Avoid hitting quota limits unexpectedly
- Identify high-usage periods
update-voices - Update Voice Table
Fetch latest voices from ElevenLabs API and update local lookup table.
Usage:
elevenlabs-tts-tool update-voices [--output PATH]
Arguments:
: Output file path (default: ~/.config/elevenlabs-tts-tool/voices_lookup.json)--output, -o PATH
Examples:
# Update default voice lookup (user config directory) elevenlabs-tts-tool update-voices # Save to custom location elevenlabs-tts-tool update-voices --output custom_voices.json # Update before listing voices elevenlabs-tts-tool update-voices && elevenlabs-tts-tool list-voices
Behavior:
- Fetches all premade voices from ElevenLabs API
- Saves to user config directory by default (
)~/.config/elevenlabs-tts-tool/ - Creates config directory if it doesn't exist
- Updates take precedence over package default
- Persists across package reinstalls
pricing - Show Pricing Information
Display ElevenLabs pricing tiers and feature comparison.
Usage:
elevenlabs-tts-tool pricing
Examples:
# View full pricing table elevenlabs-tts-tool pricing # Find specific tier information elevenlabs-tts-tool pricing | grep Creator elevenlabs-tts-tool pricing | grep "44.1kHz PCM"
Output Information:
- Pricing tiers (Free, Starter, Creator, Pro, Scale, Business)
- Minutes included per tier
- Additional minute costs
- Audio quality options
- Concurrency limits
- Priority levels
- API formats by tier
- Model cost multipliers
Key Insights:
- Free tier: 10,000-20,000 characters/month
- v3 models cost 2x (half the minutes/tokens)
- Use Flash v2.5 for high-volume integrations
- Reserve v3 for content requiring emotional expression
- PCM 44.1kHz requires Pro tier
completion - Shell Completion
Generate shell completion scripts for bash, zsh, or fish.
Usage:
elevenlabs-tts-tool completion [bash|zsh|fish]
Installation:
# Bash (add to ~/.bashrc) eval "$(elevenlabs-tts-tool completion bash)" # Zsh (add to ~/.zshrc) eval "$(elevenlabs-tts-tool completion zsh)" # Fish (save to completion file) elevenlabs-tts-tool completion fish > ~/.config/fish/completions/elevenlabs-tts-tool.fish
Features:
- Tab-complete commands and subcommands
- Tab-complete options and flags
- Context-aware completion for file paths
Emotion Control (v3 Models)
ElevenLabs v3 model (
eleven_v3) supports Audio Tags for emotional expression.
Available Emotion Tags:
- Basic emotions:
,[happy]
,[excited]
,[sad]
,[angry]
,[nervous][curious] - Delivery styles:
,[cheerfully]
,[playfully]
,[mischievously]
,[resigned tone]
,[flatly][deadpan] - Speech characteristics:
,[whispers]
,[laughs]
,[gasps]
,[sighs]
,[pauses]
,[hesitates]
,[stammers][gulps]
Usage Examples:
# Basic emotion (requires eleven_v3 model) elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3 # Multiple emotions in sequence elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3 # Combine emotions with pauses elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [curious] How are you today?" --model eleven_v3 # Whispered speech elevenlabs-tts-tool synthesize "[whispers] This is a secret message." --model eleven_v3 # Playful delivery elevenlabs-tts-tool synthesize "[playfully] Guess what I found!" --model eleven_v3
Best Practices:
- Place tags at the beginning of phrases
- Align text content with emotional intent
- Test with different voices for best results
- Use sparingly - let AI infer emotion from context when possible
- Remember: v3 models cost 2x as much (half the minutes/tokens)
Pause Control (SSML)
Add natural pauses using SSML
<break> tags.
Syntax:
<break time="X.Xs" />
Examples:
# 1-second pause elevenlabs-tts-tool synthesize "Welcome <break time=\"1.0s\" /> to our service." # Multiple pauses elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three." # Short pause for emphasis elevenlabs-tts-tool synthesize "Think about this <break time=\"0.3s\" /> carefully." # Combine with emotions (requires eleven_v3) elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [cheerfully] How are you?" --model eleven_v3
Limitations:
- Maximum pause duration: 3 seconds
- Recommended: 2-4 breaks per generation
- Too many breaks can cause:
- AI speedup
- Audio artifacts
- Background noise
- Generation instability
Alternative Methods:
- Dashes (
or-
) for shorter pauses (less consistent)— - Ellipses (
) for hesitation (may add nervous tone)... - SSML
is most reliable<break>
Verbosity Control
Multi-level verbosity for progressive detail control.
Verbosity Levels:
- No flag (default): WARNING level - only critical issues
: INFO level - high-level operations, important events-v
: DEBUG level - detailed operations, API calls, validation steps-vv
: TRACE level - full HTTP requests/responses, ElevenLabs SDK internals-vvv
Usage:
# Quiet mode (warnings only) elevenlabs-tts-tool synthesize "Hello world" # INFO level elevenlabs-tts-tool -v synthesize "Hello world" # DEBUG level (detailed operations) elevenlabs-tts-tool -vv synthesize "Hello world" # TRACE level (shows HTTP requests/responses) elevenlabs-tts-tool -vvv synthesize "Hello world"
Dependent Library Logging: At trace level (
-vvv), the following libraries enable DEBUG logging:
- ElevenLabs SDK internalselevenlabs
/httpx
- HTTP request/response detailshttpcore
- Low-level HTTP operationsurllib3
Pipeline Integration
The tool is designed for composition with other CLI tools.
Design Principles:
- JSON output to stdout, logs/errors to stderr
- Stdin support for text input
- Exit codes for success/failure detection
- Shell completion for productivity
Examples:
# Read from file cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audio.mp3 # Combine with other tools gemini-google-search-tool query "AI news" | \ elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3 # Conditional execution make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \ elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs." # Process multiple texts for text in "First" "Second" "Third"; do elevenlabs-tts-tool synthesize "$text" --output "${text}.mp3" done
Claude Code Integration
Use
elevenlabs-tts-tool as notification system for Claude Code hooks.
Use Cases:
- Task Completion Alerts
# After long-running task elevenlabs-tts-tool synthesize "[excited] Task completed successfully!"
- Error Notifications
# On build failure elevenlabs-tts-tool synthesize "[nervous] Error detected. Please check output."
- Custom Workflows
# Shell script integration make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \ elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."
- Multi-Tool Integration
# Combine with other CLI tools gemini-google-search-tool query "AI news" | \ elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3
Hook Configuration:
Create hooks in
~/.config/claude-code/hooks.json:
{ "hooks": { "after_command": { "type": "bash", "command": "elevenlabs-tts-tool synthesize \"[happy] Task completed!\" --voice rachel" }, "on_error": { "type": "bash", "command": "elevenlabs-tts-tool synthesize \"[nervous] Error occurred!\" --voice adam" } } }
Benefits:
- Audio alerts for completed tasks without monitoring terminal
- Error notifications while away from screen
- Multi-step automation with voice feedback
- Voice-enabled AI agent pipelines
Common Issues
Issue: "API key not found" error
# Symptom Error: ELEVENLABS_API_KEY environment variable not set
Solution:
- Get API key from https://elevenlabs.io/app/settings/api-keys
- Export as environment variable:
export ELEVENLABS_API_KEY='your-api-key' - Add to shell profile for persistence:
echo 'export ELEVENLABS_API_KEY="your-api-key"' >> ~/.bashrc source ~/.bashrc
Issue: "Voice not found" error
# Symptom ValueError: Voice 'unknown' not found in lookup table
Solution:
- List available voices:
elevenlabs-tts-tool list-voices - Update voice table if needed:
elevenlabs-tts-tool update-voices - Use correct voice name (case-insensitive):
elevenlabs-tts-tool synthesize "Hello" --voice rachel
Issue: Character quota exceeded
# Symptom Error: Character quota exceeded for this month
Solution:
- Check current usage:
elevenlabs-tts-tool info - Wait until quota reset date
- Consider upgrading subscription tier:
elevenlabs-tts-tool pricing - Use more efficient models (Flash/Turbo vs v3)
Issue: Audio quality issues
Symptom: Poor audio quality or artifacts
Solution:
- Try different output format:
elevenlabs-tts-tool synthesize "Text" --format mp3_44100_128 - Use higher-quality model:
elevenlabs-tts-tool synthesize "Text" --model eleven_multilingual_v2 - For professional content, use PCM format (requires Pro tier):
elevenlabs-tts-tool synthesize "Text" --format pcm_44100
Issue: Emotional tags not working
Symptom: Emotion tags like
[happy] are spoken literally
Solution:
- Ensure using v3 model:
elevenlabs-tts-tool synthesize "[happy] Text" --model eleven_v3 - Place tags at beginning of phrases
- Test with different voices (some work better than others)
Issue: Too many SSML breaks causing issues
Symptom: Audio artifacts, speedup, or noise with multiple
<break> tags
Solution:
- Limit to 2-4 breaks per generation
- Use maximum 3 seconds per break
- Consider splitting into multiple generations:
elevenlabs-tts-tool synthesize "Part 1" --output part1.mp3 elevenlabs-tts-tool synthesize "Part 2" --output part2.mp3
Getting Help
# Main help elevenlabs-tts-tool --help # Command-specific help elevenlabs-tts-tool synthesize --help elevenlabs-tts-tool list-voices --help elevenlabs-tts-tool info --help # Version information elevenlabs-tts-tool --version
Additional Resources:
- GitHub Issues: https://github.com/dnvriend/elevenlabs-tts-tool/issues
- ElevenLabs Docs: https://elevenlabs.io/docs
- API Reference: https://elevenlabs.io/docs/api-reference
Free Tier Limitations
ElevenLabs Free Tier (2024-2025):
- ✅ 10,000-20,000 characters per month
- ✅ All 42 premade voices
- ✅ Create up to 3 custom voices
- ✅ MP3 formats (all bitrates)
- ✅ Basic SSML support (
, phonemes)<break> - ✅ Emotional tags (v3 models)
- ✅ Full API access
- ❌ No commercial license (personal/experimentation only)
- ❌ PCM 44.1kHz format (requires Pro tier)
- ⚠️ Max 2,500 characters per single generation
Upgrade Tiers:
- Starter ($5/month): 30,000 characters, commercial license
- Creator ($22/month): 100,000 characters, PCM formats
- Pro ($99/month): 500,000 characters, PCM 44.1kHz, highest priority
- Scale ($330/month): 2,000,000 characters
- Business (custom): Custom limits and features
Rate Limits: Not publicly documented - expect reasonable use restrictions on free tier
Exit Codes
: Success0
: General error (validation, API error, etc.)1
Output Formats
Audio Formats:
: MP3, 44.1kHz, 128kbps (default, best quality)mp3_44100_128
: MP3, 44.1kHz, 64kbps (good quality, smaller)mp3_44100_64
: MP3, 22.05kHz, 32kbps (acceptable quality, smallest)mp3_22050_32
: PCM WAV, 44.1kHz, uncompressed (requires Pro tier)pcm_44100
Text Formats:
- Human-readable tables for list commands
- Structured output with clear sections
- Errors to stderr, audio/data to stdout
Best Practices
- Use Turbo v2.5 for High Volume: Default model offers best value (1x cost, fast, high quality)
- Reserve v3 for Emotional Content: Use v3 only when emotion tags needed (costs 2x)
- Monitor Quota Regularly: Check
command before large generationsinfo - Update Voices Periodically: Run
monthly to get latest voicesupdate-voices - Test Voices for Your Use Case: Different voices work better for different content types
- Use SSML Breaks Sparingly: Limit to 2-4 breaks per generation for stability
- Pipeline for Efficiency: Combine with other tools for automated workflows
- Set Verbosity Appropriately: Use
or-vv
for debugging, default for production-vvv
Resources
- GitHub Repository: https://github.com/dnvriend/elevenlabs-tts-tool
- ElevenLabs Documentation: https://elevenlabs.io/docs
- API Reference: https://elevenlabs.io/docs/api-reference
- Voice Library: https://elevenlabs.io/voice-library
- Python SDK: https://github.com/elevenlabs/elevenlabs-python
- Claude Code: https://docs.anthropic.com/claude-code