Learn-skills.dev ai-avatar-video

install

source · Clone the upstream repo

git clone https://github.com/NeverSight/learn-skills.dev

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/1nf-sh/skills/ai-avatar-video" ~/.claude/skills/neversight-learn-skills-dev-ai-avatar-video && rm -rf "$T"

manifest: data/skills-md/1nf-sh/skills/ai-avatar-video/SKILL.md

source content

AI Avatar & Talking Head Videos

Create AI avatars and talking head videos via inference.sh CLI.

Quick Start

curl -fsSL https://cli.inference.sh | sh && infsh login

# Create avatar video from image + audio
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Available Models

Model	App ID	Best For
OmniHuman 1.5	`bytedance/omnihuman-1-5`	Multi-character, best quality
OmniHuman 1.0	`bytedance/omnihuman-1-0`	Single character
Fabric 1.0	`falai/fabric-1-0`	Image talks with lipsync
PixVerse Lipsync	`falai/pixverse-lipsync`	Highly realistic

Search Avatar Apps

infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"

Examples

OmniHuman 1.5 (Multi-Character)

infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Supports specifying which character to drive in multi-person images.

Fabric 1.0 (Image Talks)

infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

PixVerse Lipsync

infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Generates highly realistic lipsync from any audio.

Full Workflow: TTS + Avatar

# 1. Generate speech from text
infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome to our product demo. Today I will show you..."
}' > speech.json

# 2. Create avatar video with the speech
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://presenter-photo.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'

Full Workflow: Dub Video in Another Language

# 1. Transcribe original video
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

# 2. Translate text (manually or with an LLM)

# 3. Generate speech in new language
infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

# 4. Lipsync the original video with new audio
infsh app run infsh/latentsync-1-6 --input '{
  "video_url": "https://original-video.mp4",
  "audio_url": "<new-audio-url>"
}'

Use Cases

Marketing: Product demos with AI presenter
Education: Course videos, explainers
Localization: Dub content in multiple languages
Social Media: Consistent virtual influencer
Corporate: Training videos, announcements

Tips

Use high-quality portrait photos (front-facing, good lighting)
Audio should be clear with minimal background noise
OmniHuman 1.5 supports multiple people in one image
LatentSync is best for syncing existing videos to new audio

Related Skills

# Full platform skill (all 150+ apps)
npx skills add inferencesh/skills@inference-sh

# Text-to-speech (generate audio for avatars)
npx skills add inferencesh/skills@text-to-speech

# Speech-to-text (transcribe for dubbing)
npx skills add inferencesh/skills@speech-to-text

# Video generation
npx skills add inferencesh/skills@ai-video-generation

# Image generation (create avatar images)
npx skills add inferencesh/skills@ai-image-generation

Browse all video apps:

infsh app list --category video

Documentation

Running Apps - How to run apps via CLI
Content Pipeline Example - Building media workflows
Streaming Results - Real-time progress updates