Learn-skills.dev ai-voice-cloning
AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Kokoro TTS, DIA, Chatterbox, Higgs, VibeVoice for natural speech. Capabilities: multiple voices, emotions, accents, long-form narration, conversation. Use for: voiceovers, audiobooks, podcasts, video narration, accessibility. Triggers: voice cloning, tts, text to speech, ai voice, voice generation, voice synthesis, voice over, narration, speech synthesis, ai narrator, elevenlabs alternative, natural voice, realistic speech, voice ai
install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/1nfsh/s/ai-voice-cloning" ~/.claude/skills/neversight-learn-skills-dev-ai-voice-cloning-f0af16 && rm -rf "$T"
manifest:
data/skills-md/1nfsh/s/ai-voice-cloning/SKILL.mdsource content
AI Voice Generation
Generate natural AI voices via inference.sh CLI.

Quick Start
curl -fsSL https://cli.inference.sh | sh && infsh login # Generate speech infsh app run infsh/kokoro-tts --input '{ "text": "Hello! This is an AI-generated voice that sounds natural and engaging.", "voice": "af_sarah" }'
Available Models
| Model | App ID | Best For |
|---|---|---|
| Kokoro TTS | | Natural, multiple voices |
| DIA | | Conversational, expressive |
| Chatterbox | | Casual, entertainment |
| Higgs | | Professional narration |
| VibeVoice | | Emotional range |
Kokoro Voice Library
American English
| Voice ID | Gender | Style |
|---|---|---|
| Female | Warm, friendly |
| Female | Professional |
| Female | Youthful |
| Male | Authoritative |
| Male | Conversational |
| Male | Clear, neutral |
British English
| Voice ID | Gender | Style |
|---|---|---|
| Female | Refined |
| Female | Warm |
| Male | Classic |
| Male | Modern |
Voice Generation Examples
Professional Narration
infsh app run infsh/kokoro-tts --input '{ "text": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.", "voice": "am_michael", "speed": 1.0 }'
Conversational Style
infsh app run infsh/dia-tts --input '{ "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?", "voice": "conversational" }'
Audiobook Narration
infsh app run infsh/kokoro-tts --input '{ "text": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.", "voice": "bf_emma", "speed": 0.9 }'
Video Voiceover
infsh app run infsh/kokoro-tts --input '{ "text": "Introducing the next generation of productivity. Work smarter, not harder.", "voice": "af_nicole", "speed": 1.1 }'
Podcast Host
infsh app run infsh/kokoro-tts --input '{ "text": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.", "voice": "am_adam" }'
Multi-Voice Conversation
# Generate dialogue between two speakers # Speaker 1 infsh app run infsh/kokoro-tts --input '{ "text": "Have you seen the latest AI developments? Its incredible how fast things are moving.", "voice": "am_michael" }' > speaker1.json # Speaker 2 infsh app run infsh/kokoro-tts --input '{ "text": "I know, right? Just last week I tried that new image generator and was blown away.", "voice": "af_sarah" }' > speaker2.json # Merge conversation infsh app run infsh/media-merger --input '{ "audio_files": ["<speaker1-url>", "<speaker2-url>"], "crossfade_ms": 300 }'
Long-Form Content
Chunked Processing
For content over 5000 characters, split into chunks:
# Process long text in chunks TEXT="Your very long text here..." # Split and generate # Chunk 1 infsh app run infsh/kokoro-tts --input '{ "text": "<chunk-1>", "voice": "bf_emma" }' > chunk1.json # Chunk 2 infsh app run infsh/kokoro-tts --input '{ "text": "<chunk-2>", "voice": "bf_emma" }' > chunk2.json # Merge chunks infsh app run infsh/media-merger --input '{ "audio_files": ["<chunk1-url>", "<chunk2-url>"], "crossfade_ms": 100 }'
Voice + Video Workflow
Add Voiceover to Video
# 1. Generate voiceover infsh app run infsh/kokoro-tts --input '{ "text": "This stunning footage shows the beauty of nature in its purest form.", "voice": "am_michael" }' > voiceover.json # 2. Merge with video infsh app run infsh/media-merger --input '{ "video_url": "https://your-video.mp4", "audio_url": "<voiceover-url>" }'
Create Talking Head
# 1. Generate speech infsh app run infsh/kokoro-tts --input '{ "text": "Hi, Im excited to share some updates with you today.", "voice": "af_sarah" }' > speech.json # 2. Animate with avatar infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "<speech-url>" }'
Speed and Pacing
| Speed | Effect | Use For |
|---|---|---|
| 0.8 | Slow, deliberate | Audiobooks, meditation |
| 0.9 | Slightly slow | Education, tutorials |
| 1.0 | Normal | General purpose |
| 1.1 | Slightly fast | Commercials, energy |
| 1.2 | Fast | Quick announcements |
# Slow narration infsh app run infsh/kokoro-tts --input '{ "text": "Take a deep breath. Let yourself relax.", "voice": "bf_emma", "speed": 0.8 }'
Punctuation for Pacing
Use punctuation to control speech rhythm:
| Punctuation | Effect |
|---|---|
Period | Full pause |
Comma | Brief pause |
| Extended pause |
| Emphasis |
| Question intonation |
| Quick break |
infsh app run infsh/kokoro-tts --input '{ "text": "Wait... Did you hear that? Something is coming. Something big!", "voice": "am_adam" }'
Best Practices
- Match voice to content - Professional voice for business, casual for social
- Use punctuation - Control pacing with periods and commas
- Keep sentences short - Easier to generate and sounds more natural
- Test different voices - Same text sounds different across voices
- Adjust speed - Slightly slower often sounds more natural
- Break long content - Process in chunks for consistency
Use Cases
- Voiceovers - Video narration, commercials
- Audiobooks - Full book narration
- Podcasts - AI hosts and guests
- E-learning - Course narration
- Accessibility - Screen reader content
- IVR - Phone system messages
- Content localization - Translate and voice
Related Skills
# All TTS models npx skills add inference-sh/skills@text-to-speech # Podcast creation npx skills add inference-sh/skills@ai-podcast-creation # AI avatars npx skills add inference-sh/skills@ai-avatar-video # Video generation npx skills add inference-sh/skills@ai-video-generation # Full platform skill npx skills add inference-sh/skills@inference-sh
Browse audio apps:
infsh app list --category audio