Learn-skills.dev ai-avatar-video
install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/1nf-sh/skills/ai-avatar-video" ~/.claude/skills/neversight-learn-skills-dev-ai-avatar-video && rm -rf "$T"
manifest:
data/skills-md/1nf-sh/skills/ai-avatar-video/SKILL.mdsource content
AI Avatar & Talking Head Videos

Create AI avatars and talking head videos via inference.sh CLI.
Quick Start
curl -fsSL https://cli.inference.sh | sh && infsh login # Create avatar video from image + audio infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'
Available Models
| Model | App ID | Best For |
|---|---|---|
| OmniHuman 1.5 | | Multi-character, best quality |
| OmniHuman 1.0 | | Single character |
| Fabric 1.0 | | Image talks with lipsync |
| PixVerse Lipsync | | Highly realistic |
Search Avatar Apps
infsh app list --search "omnihuman" infsh app list --search "lipsync" infsh app list --search "fabric"
Examples
OmniHuman 1.5 (Multi-Character)
infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'
Supports specifying which character to drive in multi-person images.
Fabric 1.0 (Image Talks)
infsh app run falai/fabric-1-0 --input '{ "image_url": "https://face.jpg", "audio_url": "https://audio.mp3" }'
PixVerse Lipsync
infsh app run falai/pixverse-lipsync --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'
Generates highly realistic lipsync from any audio.
Full Workflow: TTS + Avatar
# 1. Generate speech from text infsh app run infsh/kokoro-tts --input '{ "text": "Welcome to our product demo. Today I will show you..." }' > speech.json # 2. Create avatar video with the speech infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://presenter-photo.jpg", "audio_url": "<audio-url-from-step-1>" }'
Full Workflow: Dub Video in Another Language
# 1. Transcribe original video infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json # 2. Translate text (manually or with an LLM) # 3. Generate speech in new language infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json # 4. Lipsync the original video with new audio infsh app run infsh/latentsync-1-6 --input '{ "video_url": "https://original-video.mp4", "audio_url": "<new-audio-url>" }'
Use Cases
- Marketing: Product demos with AI presenter
- Education: Course videos, explainers
- Localization: Dub content in multiple languages
- Social Media: Consistent virtual influencer
- Corporate: Training videos, announcements
Tips
- Use high-quality portrait photos (front-facing, good lighting)
- Audio should be clear with minimal background noise
- OmniHuman 1.5 supports multiple people in one image
- LatentSync is best for syncing existing videos to new audio
Related Skills
# Full platform skill (all 150+ apps) npx skills add inferencesh/skills@inference-sh # Text-to-speech (generate audio for avatars) npx skills add inferencesh/skills@text-to-speech # Speech-to-text (transcribe for dubbing) npx skills add inferencesh/skills@speech-to-text # Video generation npx skills add inferencesh/skills@ai-video-generation # Image generation (create avatar images) npx skills add inferencesh/skills@ai-image-generation
Browse all video apps:
infsh app list --category video
Documentation
- Running Apps - How to run apps via CLI
- Content Pipeline Example - Building media workflows
- Streaming Results - Real-time progress updates