Ai-video-generator-claude seedance-ai-avatar

Generate AI avatar and digital persona video prompts for Seedance 2.0 on Higgsfield. Use for virtual spokesperson content, digital twin videos, AI presenter clips, avatar-based marketing, virtual influencer content, or any video featuring a digital/AI-generated character as the main subject. Triggers on avatar, digital persona, virtual presenter, AI character, digital twin, virtual influencer, synthetic media, animated spokesperson.

install

source · Clone the upstream repo

git clone https://github.com/rediumvex/ai-video-generator-claude

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/rediumvex/ai-video-generator-claude "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/09-ai-avatar" ~/.claude/skills/rediumvex-ai-video-generator-claude-seedance-ai-avatar && rm -rf "$T"

manifest: skills/09-ai-avatar/SKILL.md

source content

AI Avatar — Digital Persona Video Prompts

A complete system for crafting Seedance 2.0 prompts on Higgsfield that feature AI avatars, digital personas, virtual presenters, and synthetic characters as the primary subject. This skill covers everything from photorealistic digital humans to stylized 3D characters, including hook patterns, environment design, camera work, lighting, and audio direction.

1. Input Specs

Before generating prompts, gather the following from the user:

Character definition

Avatar style: photorealistic human / stylized 3D / 2D animated / abstract-geometric
Gender presentation, approximate age, and visual identity descriptors (if applicable)
Skin tone, hair, wardrobe notes, or reference aesthetics
Personality register: corporate, playful, authoritative, futuristic, warm

Content purpose

Use case: product demo, brand spokesperson, educational content, virtual influencer post, sales video, onboarding clip
Platform destination: Instagram Reel, YouTube, LinkedIn, TikTok, website hero
Desired emotional response: trust, excitement, curiosity, aspiration

Technical requirements

Clip duration: 2–4 sec / 6–10 sec / 15–30 sec
Aspect ratio: 9:16 vertical / 16:9 landscape / 1:1 square
Motion intensity: static with subtle animation / active gestures / high-energy movement

Brand constraints

Color palette or forbidden colors
Brand keywords or phrases the avatar should embody visually
Any style references (films, games, other AI avatars)

2. Philosophy — Why AI Avatars Are the Future of Scalable Content

The uncanny valley problem — and how to avoid it

The uncanny valley is the psychological discomfort triggered when a synthetic human looks almost real but falls short in ways the brain cannot consciously articulate. Eyes that do not quite track. Skin that lacks micro-texture. Movement that is smooth but wrong. The brain pattern-matches to illness, deception, or death.

There are two reliable exits from the uncanny valley:

Exit 1: Go fully photorealistic. Push render quality, skin subsurface scattering, eye moisture, micro-expression, and hair fidelity past the threshold where the brain accepts the character as real. This requires explicit prompt language: "hyper-photorealistic skin texture," "subsurface light scattering on face," "wet corneal reflection," "micro-expressions," "natural breath movement."

Exit 2: Commit to stylization. A clearly stylized character — 3D cartoon, cel-shaded, geometric — never triggers uncanny valley because the brain never expected biological realism. Own the artifice. Make it beautiful and internally consistent.

The danger zone is the middle ground: a character that is clearly rendered but aspires to realism without achieving it. Avoid half-measures. Either push all the way to photorealism or lean fully into a defined non-realistic aesthetic.

Additional uncanny valley avoidance rules:

Never describe eye movement without describing blink timing
Always specify natural weight and momentum in gestures
Avoid "perfect symmetry" — real faces are asymmetric
Include micro-imperfections: a strand of hair out of place, slight asymmetry in the smile
Ground the avatar with realistic environmental interaction: light falloff, shadow contact, hair affected by subtle air movement

3. Avatar Style Spectrum

Photorealistic Human

The most persuasive style for commercial and B2B content. Optimized for spokesperson videos, testimonials, and brand ambassadors.

Prompt markers: "photorealistic digital human," "hyper-realistic skin with visible pores," "subsurface scattering skin shader," "natural asymmetry in facial features," "wet corneal highlight," "professional studio grooming," "fine hair strands rendered individually"

Best for: SaaS product demos, financial services, healthcare, e-commerce spokesperson

Risks: Highest uncanny valley exposure — demand maximum fidelity or shift styles

Stylized 3D Character

Distinct personality with artistic intentionality. Strong brand identity potential. Avoids uncanny valley entirely.

Prompt markers: "stylized 3D character with smooth simplified geometry," "large expressive eyes," "exaggerated proportions with cartoon appeal," "cel-shaded render," "clean edge lighting," "Pixar-adjacent character design," "bold color palette skin tones"

Best for: Consumer brands, gaming, entertainment, apps, youth-oriented products

Risks: May feel low-trust for serious B2B content — calibrate style to audience

2D Animated

Flat or semi-flat illustrated character with motion. Highly distinctive, excellent for social media and explainer content.

Prompt markers: "2D animated character in motion graphics style," "flat illustration aesthetic," "bold outline strokes," "limited animation with expressive poses," "vector art quality," "anime-influenced design," "looping idle animation"

Best for: EdTech, productivity tools, startup explainers, social media content

Risks: Hard to convey facial nuance — lean on body language and motion design

Abstract / Geometric

The avatar is not a humanoid but a geometric entity — particle systems, fluid simulations, crystalline structures — that communicates through movement and light. Most experimental.

Prompt markers: "abstract digital entity formed from particles of light," "geometric crystalline structure with inner luminescence," "fluid simulation avatar morphing between states," "data visualization made sentient," "void-space entity with no fixed form"

Best for: Tech companies, AI products, concept content, high-concept brand films

Risks: Low relatability — pair with clear voice-over or text overlay

4. 2-Second Hook Patterns

The first two seconds determine watch-through rate. For avatar content, there are four proven hook architectures.

The Glitch-to-Perfect

Opens with visual corruption — static, scan lines, fragmented pixels, digital artifacts — then resolves into a clean, confident avatar. Communicates: synthetic origin, premium output.

Prompt language: "opens with heavy RGB chromatic aberration and scan line distortion, digital glitch artifacts fragmenting the face, resolves at 1.5 seconds into pristine clarity, sharp focus snapping to perfect render quality"

Use when: Launching an AI product, announcing new technology, establishing digital-native brand identity

The Digital Birth

The avatar materializes from nothing — particles assembling, wireframe filling, data streams converging into form. Communicates: creation, emergence, future.

Prompt language: "avatar assembles from scattered luminous particles converging to a central point, wireframe mesh fills with texture and color, form resolves from abstract to fully rendered in under two seconds, volumetric glow during formation"

Use when: Product launches, brand debut videos, origin story content, intro sequences

The Style Morph

The avatar rapidly shifts visual style — sketch to 3D to photorealistic, or multiple artistic styles in sequence — before landing on a final defined look. Communicates: versatility, creativity, range.

Prompt language: "avatar transitions through four visual styles in rapid succession: pencil sketch, flat 2D illustration, stylized 3D, photorealistic render, each held for 0.4 seconds, final state held as primary look, transitions use dissolve with chromatic fringing"

Use when: Creative agencies, design platforms, AI tools, content creation products

The Fourth Wall Break

Avatar makes direct, deliberate eye contact with the camera, creating the immediate impression of personal address. Can begin mid-gesture or mid-sentence — in media res. Communicates: directness, confidence, presence.

Prompt language: "avatar turns to face camera directly with deliberate intention, locks eye contact with subtle confident smile, slight lean forward, no introduction or preamble, as if resuming a conversation"

Use when: Sales content, coaching products, high-conversion landing page videos, personal brand content

5. Environment Design

The environment shapes perception of the avatar's authority, energy, and brand world.

Void / Abstract Space

The avatar exists in pure black, pure white, or a gradient void with subtle environmental effects (fog, particles, light beams). Maximum focus on the character. Minimum distraction.

Prompt language: "infinite dark void background, subtle volumetric fog at floor level, no horizon line, directional key light from upper left, ambient fill light from right, deep shadows behind avatar"

Best for: Premium product launches, minimalist brands, close-up spokesperson content

Virtual Studio

A clearly designed, branded environment suggesting a broadcast or content studio — but digital. Can include branded color washes, abstract architecture, floating graphic elements.

Prompt language: "virtual broadcast studio environment with clean geometric architecture, brand color accent lighting on background panels, subtle depth-of-field blur on background, professional lighting rig casting clean key light, polished floor with soft reflection"

Best for: B2B, SaaS, media companies, professional services

Futuristic Office / Corporate Space

A recognizable workspace — desk, window, shelving — but elevated with futuristic design language. Grounds the avatar in a professional context while signaling technological advancement.

Prompt language: "near-future office environment with clean architectural lines, large floor-to-ceiling window with city view, ambient blue-white environmental light, holographic display elements visible in background, depth-of-field blur keeping avatar in sharp focus"

Best for: Enterprise, finance, leadership content, thought leadership

Neon Cyber Environment

High-contrast urban digital environments — neon-lit corridors, data centers, glowing grid floors, cyberpunk architecture. High energy, high visual impact.

Prompt language: "neon-lit cyberpunk environment with deep shadows and saturated color pools of magenta and cyan, glowing geometric floor grid extending to horizon, electric particle effects drifting through air, haze atmosphere with volumetric light shafts"

Best for: Gaming, crypto/web3, youth brands, entertainment, tech startups

6. Camera Techniques for Avatar Content

Camera work communicates status, intimacy, and energy for digital characters as much as for human subjects.

Framing fundamentals

Close-up (face-fill frame): maximum trust and intimacy — best for direct address and emotional moments
Medium (chest-up): most versatile — shows gesture, expression, and upper body simultaneously
Wide (full body): establishes character scale and presence — best for action or environment reveals
Rule of thirds: place avatar eyes on upper third horizontal line — avoid dead center unless intentional

Camera movement vocabulary

Slow push-in: builds anticipation and intimacy. "Camera slowly pushes in from medium to close-up over 4 seconds, no shake, smooth dolly motion"

Ken Burns on static frame: creates life from a still render. "Subtle slow zoom in with slight rightward drift, mimicking handheld documentary intimacy"

Orbit / arc move: reveals the avatar's three-dimensionality. "Camera arcs 30 degrees around avatar from left to right at shoulder height, smooth constant speed, avatar rotates slightly to maintain eye contact"

Reveal push: avatar is revealed as camera moves. "Camera starts on environment detail, pushes forward to reveal avatar in foreground, avatar turns to address camera at full reveal"

Match cut / style cut: avatar holds a pose that cuts perfectly to a different style or angle. Requires specifying the exact pose to hold at cut point.

Depth of field guidance

Photorealistic avatars: always specify shallow depth of field — it reads as cinematic and elevates perceived quality
Stylized avatars: can run sharper focus — their visual world tends toward graphic clarity
Prompt language: "f/1.8 equivalent shallow depth of field, background in soft bokeh blur, avatar in sharp focus from eyes to chin"

Stabilization language

For authority and trust: "locked-off tripod shot, zero camera shake"
For energy and immediacy: "subtle handheld movement, barely perceptible breathing motion on camera"
For cinematic premium: "smooth gimbal-stabilized movement, fluid and controlled"

7. Lighting for Avatar Content

Holographic Glow

The avatar emits or is surrounded by luminous energy — neon, bioluminescent, data-light. Strongly signals AI/digital nature.

Prompt language: "avatar lit from within by soft blue-white holographic glow, light emanating from chest area and face, ambient environmental light fills from all sides evenly, subtle rim light of electric cyan outlines the silhouette, no hard shadows"

Studio Clean

Professional three-point lighting setup. Key light, fill light, rim light. The standard for commercial spokesperson content.

Prompt language: "professional three-point lighting: strong key light from upper left at 45 degrees, soft fill light from right reducing shadow contrast, tight rim light from behind right creating clean separation from background, color temperature 5600K daylight-balanced"

Neon Rim

Hard, colored rim light from behind creates dramatic silhouette and strong graphic identity. Background colors bleed onto subject.

Prompt language: "neon rim lighting with magenta light from behind left and cyan from behind right, strong color separation, dark front face with colored halo around silhouette, moody underexposure on face revealing only key features"

Volumetric Atmosphere

Light behaves as physical substance — fog, haze, dust particles catching beams. Creates depth, atmosphere, and cinematic quality.

Prompt language: "volumetric light shafts cutting through atmospheric haze at 45-degree angle, dust and particle matter visible in light beams, soft falloff from bright to dark, environmental light wrapping avatar with soft fill from bounced volumetric scatter"

Lighting rules for photorealistic avatars specifically:

Always include a catch light in the eyes — a small specular highlight
Specify skin tone interaction with colored light: "warm amber light warming skin to golden tones"
Avoid flat frontal lighting — it eliminates the three-dimensionality that makes a render feel real
Include shadow direction consistency between avatar and environment

8. Sound Design for Avatar Content

Sound design for AI avatar videos operates in three layers.

Synthetic Voice Integration

The avatar's voice is its most direct communication channel. Even in a video prompt (which describes the visual), the voice direction informs character and performance.

Voice direction markers to include in prompts or production notes:

"Voice has measured, authoritative cadence with subtle warmth"
"Speech pace is deliberate — one beat pause before key statements"
"Slight synthetic processing on voice: clean reverb tail, light high-frequency shimmer"
"Voice texture is clear and close-mic'd, no room ambience"
"AI-native voice quality: intelligible, precise articulation, slight harmonic brightness"

Digital Ambient

Environmental sound that signals the digital world the avatar inhabits.

Textures to specify:

"Sub-bass hum of active server infrastructure, very low level"
"High-frequency tonal shimmer suggesting energy fields"
"Subtle digital breathing sound: periodic soft clicks and chirps"
"Near-silence with occasional data-ping transient"
"Low-level static white noise suggesting signal presence"

Futuristic Music

Background music that contextualizes the avatar's world and emotional register.

Music direction markers:

"Minimal electronic score with synthetic textures, no live instruments"
"Tempo-matched to avatar's speaking rhythm, approximately 90 BPM"
"Cinematic trailer energy: percussive hits synchronized to visual cuts"
"Ambient generative music: evolving pads, no melodic line, pure atmosphere"
"Clean tech-house pulse at low level, provides forward momentum without distraction"

Sound design principle for AI avatars: sound should feel designed, not organic. Real-world ambience (birds, traffic, HVAC) undermines the digital world effect. Every sonic element should feel intentional and synthetic.

Platform Optimization

TikTok (9:16): High energy, fast cuts acceptable. Stylized 3D or 2D avatars perform best. Keep under 15s.
Instagram Reels (9:16): Higher visual polish expected. Photorealistic avatars work well here. Cover frame must show avatar clearly.
YouTube (16:9): Longer format, more room for establishing shots. Enterprise/educational use cases.
LinkedIn (16:9 or 1:1): Professional, corporate-appropriate. Photorealistic only. Subtle animation, minimal visual effects.

9. Complete Example Prompts

Example 1 — SaaS Product Spokesperson (Photorealistic)

Photorealistic AI avatar of a professional woman in her early 30s, South Asian appearance,
shoulder-length dark hair, wearing a sharp minimal blazer in charcoal, positioned center-frame
in medium shot from chest up.

Avatar is placed in a virtual studio environment: clean architectural background with frosted
glass panels catching diffuse blue-white ambient light, polished light floor with subtle
reflection of subject, background held in soft bokeh blur.

Lighting: professional three-point setup — strong key light from upper left at 45 degrees,
warm fill from the right reducing shadow contrast, thin rim light from behind creating clean
silhouette separation from background. Color temperature 5600K.

Camera: static tripod shot at eye level, no movement, locked and authoritative. Shallow
depth of field with background in smooth bokeh.

Avatar's expression is open and confident — slight genuine smile, direct eye contact with
camera, a catch light visible in both eyes. Subtle asymmetry in the smile. A single strand
of hair rests slightly displaced to the right, adding naturalism.

At frame open she has just finished a thought and takes a brief natural breath before
beginning to speak. Head has a slight, natural tilt of attentiveness. No glitch effect —
clean professional entry.

Skin rendered with visible pore texture, subsurface light scattering at cheek and brow
areas, natural lip moisture highlight, eye corneal wetness with point source reflection.

Micromotion: almost imperceptible chest movement from breathing, barely visible blink at
second 2, slight weight shift that creates minimal shoulder movement. Everything reads as
alive, not frozen.

Color grade: clean and cool, slight desaturation, professional broadcast color treatment.

Example 2 — Virtual Influencer Hook (Stylized 3D, Digital Birth Hook)

Stylized 3D avatar character: gender-neutral presentation, large luminous eyes with
teal irises, smooth simplified skin geometry with clean subsurface warmth, short silver
hair rendered in individual strands, wearing a graphic oversized jacket in deep purple
with gold accent trim.

Opens with The Digital Birth hook: scattered luminous gold and teal particles
in void space, drifting inward as if being called together, density increasing,
wireframe mesh of the character's face emerging at center, particles filling in
surface texture, color and detail resolving over 1.8 seconds into the fully rendered
stylized avatar.

At full resolution the avatar opens its eyes — both irises catch light simultaneously —
and looks directly into the camera with a slow, deliberate smile. This moment is held
for 0.5 seconds in silence before any other motion.

Environment: pure void background, deep black with barely visible particle drift,
no horizon, rim lighting of teal from behind left and gold from behind right creating
dual-color silhouette framing that matches character palette.

Camera: medium close-up, centered, no movement during formation sequence, then very
slow push-in beginning at full resolution moment, arriving at close-up by second 5.

Lighting: rim glow from environment colors, soft fill from below and in front,
no hard shadows on face — clean, graphic, beautiful.

Music/sound: during particle formation, high-frequency shimmer building in intensity,
resolving to a single clean synth tone at full avatar reveal. Then minimal electronic
ambient pulse begins beneath subsequent content.

Cel-adjacent render with clean edge highlights, smooth gradient skin tones, bold
stylistic intentionality throughout. Never attempts photorealism.

Example 3 — AI Presenter, Enterprise Tech Launch (Futuristic, Fourth Wall Break Hook)

Photorealistic digital human avatar, male, mid-40s, Northwestern European appearance,
close-cropped dark hair with salt-and-pepper at temples, wearing a clean white shirt
under a structured dark navy jacket, no tie. Commands immediate authority.

Opens in media res with The Fourth Wall Break: avatar is already mid-turn toward camera,
as if he became aware of our presence. He completes the turn in the first 0.8 seconds,
makes direct, unhurried eye contact, and leans forward almost imperceptibly — the body
language of a person with something important to say, not a presentation to deliver.

Environment: near-future corporate interior, large window occupying the entire background,
out-of-focus cityscape in deep-focus bokeh, the room's architecture is clean and minimal —
single floating desk edge visible at frame right, ambient indirect ceiling light from above,
blue-hour exterior light bleeding through window glass and catching the side of his face.

Lighting: ambient daylight from window provides cool atmospheric fill, key light from
left at 45 degrees slightly warmer to balance the cool ambient, crisp but narrow rim
light from the window side. High dynamic range — bright window, controlled shadows on
face, catch light clearly visible in both eyes.

Camera: low eye-level camera — slightly below eyeline, reinforcing the subject's authority.
After initial static hold (0–3 seconds), a nearly imperceptible slow push-in begins,
arriving 8% closer by second 10. This movement is felt subconsciously, not noticed.

Skin: maximum photorealistic fidelity — visible pore texture, subsurface warmth at
cheeks and forehead, five o'clock shadow rendered at individual hair level, natural
lip texture, oil/moisture on brow area.

Micromotion: regular natural blinks every 3–4 seconds, subtle jaw movement as if
processing a thought before speaking, minimal but present chest breath movement,
a very slight asymmetric quality to his resting expression — the face of someone
thinking, not posing.

Sound: room ambience is complete silence except for a very low server-hum that suggests
he exists at the intersection of physical and digital. No music until he begins speaking.

Color grade: slightly desaturated with lifted blacks, clean digital cinema aesthetic,
neutral whites, no stylization that would undermine the photorealistic character intent.

10. Prompt Rules for Avatar Content

Rule 1: Commit to a style lane and stay in it. Mixed signals between photorealism and stylization produce uncanny valley. Every detail in the prompt must serve the same aesthetic target. If you write "photorealistic skin" and "cartoon eyes" in the same prompt, the output will be incoherent.

Rule 2: Describe aliveness, not perfection. Perfect symmetry, frozen expressions, and stillness read as dead or rendered. Include asymmetry, micro-motion, imperfection, and breath. "A strand of hair slightly displaced" does more for believability than five sentences about render quality.

Rule 3: Eyes anchor everything. The eyes are the first thing viewers evaluate for believability and connection. Always specify: catch light presence, iris color and texture detail, corneal wetness, natural blink timing, and where the gaze is directed. Underspecified eyes produce the single most common uncanny valley failure.

Rule 4: Ground the avatar to its environment. The avatar should interact with its environment through light and shadow — light from a window should fall on the skin, a neon environment should tint the avatar's face. A character that floats in front of its background without environmental interaction reads as a composite, not a presence.

Rule 5: Camera language is character language. How the camera frames and moves around an avatar communicates personality and status before the character does anything. A locked-off shot says authority. A slow push-in says intimacy. A low angle says power. A slight handheld wobble says authenticity. Choose intentionally.

Rule 6: Use sound as world-building, not decoration. Synthetic ambience and designed sound textures establish that this is a digital world, which paradoxically makes the avatar more believable — you are setting context for synthetic existence rather than trying to pass it off as real.

Rule 7: Specify the hook in the first sentence of the prompt. Seedance and most AI video models weight the opening of the prompt heavily for the opening of the video. Put the hook description first — what happens in seconds 0–2 — before any other detail.

Rule 8: Duration shapes detail density. For 2–4 second clips: focus entirely on one moment, one expression, one action. Every detail must earn its place. For 6–10 second clips: allow for a single transition or beat — hook, then hold. For 15–30 second clips: you can describe an arc — opening state, middle development, closing beat.

Rule 9: Avoid "AI-looking" descriptions. Paradoxically, describing an avatar as "AI-generated" or "digital" in the prompt often produces visual clichés — blue grids, floating data, lens flares on eyes. If the goal is photorealism, describe a real human. If the goal is stylized, describe a designed character. Let the aesthetic communicate the AI identity, not the text.

Rule 10: Test photorealism with the close-up rule. If you are writing a photorealistic avatar prompt, ask: does this description hold up in extreme close-up? If the answer is no — if the pores, eye detail, hair texture, and micro-expression are underspecified — push harder on those elements before finalizing the prompt.