Zkagi-video-engine image-prompt-craft

AI Image Prompt Craft Skill

install
source · Clone the upstream repo
git clone https://github.com/ZkAGI/zkagi-video-engine
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ZkAGI/zkagi-video-engine "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/image-prompt-craft" ~/.claude/skills/zkagi-zkagi-video-engine-image-prompt-craft && rm -rf "$T"
manifest: .claude/skills/image-prompt-craft/SKILL.md
source content

AI Image Prompt Craft Skill

Prompt Structure Formula

Every image prompt should follow this order:

[SUBJECT] + [ACTION/POSE] + [SETTING/ENVIRONMENT] + [ART STYLE] + [LIGHTING] + [CAMERA/COMPOSITION] + [QUALITY BOOSTERS]

Example:

"A nervous cartoon tiger clutching a giant golden key to its chest, standing in a dark alley with shadowy figures reaching from the walls, Pixar 3D render style, dramatic rim lighting with warm highlights, low angle hero shot, highly detailed, vibrant saturated colors, cinematic composition"

Component Breakdown

SUBJECT — Who/what is in the scene? Be specific.

  • BAD: "a person" → GOOD: "a nervous middle-aged man in a wrinkled suit"
  • BAD: "a wallet" → GOOD: "a battered leather wallet with coins spilling out"
  • BAD: "technology" → GOOD: "a glowing holographic cube floating above a desk"

ACTION/POSE — What are they doing? Static = boring.

  • BAD: "standing" → GOOD: "clutching a golden key to their chest, eyes darting around"
  • BAD: "holding a phone" → GOOD: "frantically tapping a cracked phone screen, sweating"

SETTING — Where is this? Ground the scene.

  • BAD: "a room" → GOOD: "a dimly lit basement with exposed pipes and a single flickering bulb"
  • BAD: "cyberspace" → GOOD: "a massive server room with glowing blue racks stretching to infinity"

ART STYLE — Pick from the style library below.

LIGHTING — Sets the mood. Always specify.

  • Dramatic: "harsh rim lighting, deep shadows"
  • Warm: "golden hour sunlight, warm ambient glow"
  • Cold: "cool blue moonlight, clinical fluorescent"
  • Cinematic: "volumetric light rays, atmospheric haze"
  • Neon: "pink and cyan neon glow, reflective wet surfaces"

CAMERA — Composition matters.

  • "low angle hero shot" — makes subject look powerful
  • "bird's eye view" — shows scale and layout
  • "extreme close-up" — emotion and detail
  • "wide establishing shot" — sets the scene
  • "dutch angle" — creates tension/unease
  • "over-the-shoulder" — creates intimacy

QUALITY BOOSTERS — Always add 2-3.

  • "highly detailed, sharp focus, professional illustration"
  • "cinematic composition, depth of field, 8k quality"
  • "vibrant saturated colors, artstation quality"

Art Style Library (20+ Styles)

1. Pixar 3D

Keywords: "3D rendered, Pixar style, smooth shading, subsurface scattering, expressive cartoon characters, soft ambient occlusion" Best for: Fun, approachable scenes. Characters with big eyes and exaggerated expressions. Negative: "realistic, photographic, uncanny valley, flat shading"

2. Anime / Manga

Keywords: "anime style illustration, cel shading, dramatic speed lines, vibrant colors, detailed eyes, manga influence, clean linework" Best for: Action scenes, emotional moments, dramatic reveals. Negative: "3d render, photorealistic, western cartoon, blurry"

3. Comic Book / Graphic Novel

Keywords: "comic book art, bold black outlines, halftone dots, dynamic composition, speech bubbles removed, pop art influence, Ben-Day dots" Best for: Hook scenes, dramatic moments, roasts. High energy. Negative: "soft, blurry, realistic, watercolor, pastel"

4. Studio Ghibli

Keywords: "Studio Ghibli inspired, hand-painted backgrounds, soft lighting, lush detail, whimsical atmosphere, watercolor textures, Miyazaki style" Best for: Calm/warm scenes, nature, hope, resolution moments. Negative: "dark, gritty, neon, harsh lighting, photorealistic"

5. Synthwave / Retrowave

Keywords: "synthwave aesthetic, neon grid landscape, chrome reflections, 80s retro futurism, sunset gradient purple orange pink, VHS scan lines, laser beams" Best for: Tech/crypto vibes, night scenes, futuristic themes. Negative: "natural, organic, daytime, pastel, watercolor"

6. Claymation / Stop Motion

Keywords: "claymation style, stop motion aesthetic, polymer clay characters, fingerprint texture, miniature set, practical lighting, handmade feel" Best for: Quirky/funny scenes, making complex topics feel approachable. Negative: "smooth, digital, photorealistic, 2D, flat"

7. Watercolor

Keywords: "watercolor painting, soft wet edges, ink outline details, paper texture visible, flowing color bleeds, delicate washes, artistic" Best for: Emotional scenes, transitions, storytelling moments. Negative: "digital, sharp, photorealistic, neon, harsh"

8. Retro 80s

Keywords: "1980s aesthetic, Memphis design, geometric shapes, bold primary colors, CRT monitor glow, VHS grain, cassette tape era, radical" Best for: Nostalgic/fun hooks, comparison scenes (old vs new). Negative: "modern, minimal, clean, monochrome"

9. Isometric 3D

Keywords: "isometric perspective, low poly 3D, miniature diorama, tilt-shift effect, clean geometric shapes, soft shadows, pastel color palette" Best for: Explainer scenes, system architecture, showing how things connect. Negative: "perspective, realistic, organic, dark, gritty"

10. Cyberpunk (SPECIFIC — not generic)

Keywords: "cyberpunk street scene, rain-slicked neon reflections, holographic advertisements, dense urban layers, exposed cables and pipes, smog and volumetric light, Blade Runner atmosphere" Best for: Dystopian contrast scenes (showing the problem), hacker vibes. Negative: "clean, bright, minimal, natural, pastoral" RULE: Never use generic "cyberpunk neon" as a lazy default. If using cyberpunk, be SPECIFIC about the scene.

11. Flat Illustration / Vector

Keywords: "flat vector illustration, geometric shapes, limited color palette, clean lines, modern graphic design, no gradients, bold silhouettes" Best for: Clean explainer frames, infographic-style scenes, professional tone. Negative: "realistic, 3D, textured, gritty, painterly"

12. Noir / Detective

Keywords: "film noir style, high contrast black and white with accent color, venetian blind shadows, cigarette smoke, fedora silhouette, dramatic chiaroscuro" Best for: Mystery, suspicion scenes (before revealing the solution), security threats. Negative: "colorful, bright, cheerful, cartoon, flat"

13. Vaporwave

Keywords: "vaporwave aesthetic, Greek bust statues, pastel pink and teal gradient, palm trees, geometric grids, glitch distortion, Japanese text accents, dreamy" Best for: Ironic/meme-adjacent tone, crypto culture, "aesthetic" vibes. Negative: "realistic, dark, serious, natural, organic"

14. Storybook / Children's Illustration

Keywords: "children's book illustration, gentle rounded shapes, warm cozy lighting, textured paper background, friendly characters, gouache painting style" Best for: Making complex topics simple, warm/friendly CTAs, approachable explanations. Negative: "dark, scary, realistic, complex, harsh"

15. Glitch Art / Digital Corruption

Keywords: "glitch art, RGB channel splitting, pixel sorting, data moshing, corrupted digital image, scan line artifacts, chromatic aberration, broken data" Best for: Hacking/security threat scenes, "things going wrong" moments, before the fix. Negative: "clean, pristine, analog, natural, smooth"

16. Art Deco / Gatsby

Keywords: "Art Deco style, gold and black geometric patterns, Gatsby era elegance, ornate symmetric frames, chrome and marble, luxury illustration, 1920s glamour" Best for: Premium/luxury product positioning, "gold standard" metaphors, wealth/value scenes. Negative: "casual, minimal, organic, grunge, messy"

17. Ukiyo-e / Japanese Woodblock

Keywords: "ukiyo-e style, Japanese woodblock print, flat areas of color, bold outlines, wave patterns, traditional Japanese composition, Hokusai influence" Best for: Artistic variety, nature metaphors, standing out from typical tech aesthetics. Negative: "photorealistic, 3D, digital, neon, modern"

18. Graffiti / Street Art

Keywords: "street art style, spray paint texture, concrete wall background, stencil art, bold tags, dripping paint, urban gritty, Banksy influence" Best for: Rebellious/disruptive messaging, "breaking free" scenes, indie/punk energy. Negative: "clean, corporate, minimal, elegant, smooth"

19. Paper Craft / Cut Paper

Keywords: "paper craft illustration, layered cut paper, slight shadows between layers, construction paper texture, handmade collage, tactile depth" Best for: Whimsical explainers, step-by-step processes, approachable complexity. Negative: "digital, smooth, photorealistic, glossy, 3D render"

20. Pixel Art

Keywords: "pixel art, 16-bit retro game aesthetic, limited color palette, dithering, sprite-style characters, nostalgic game screen, CRT scanlines" Best for: Gamification themes, nostalgic hooks, "level up" metaphors. Negative: "realistic, smooth, high resolution, photographic, 3D"

21. Concept Art / Matte Painting

Keywords: "concept art, digital matte painting, epic scale, atmospheric perspective, dramatic sky, photobashed details, environment design, cinematic scope" Best for: Establishing shots, big dramatic reveals, "the future" scenes. Negative: "cartoon, simple, flat, minimal, close-up"

22. Botanical / Scientific Illustration

Keywords: "scientific illustration, botanical drawing, precise line work, labeled diagram feel, aged paper, naturalist sketchbook, cross-section view" Best for: "Under the hood" technical explainers, anatomy/structure metaphors. Negative: "abstract, neon, dark, cartoon, blurry"


Character-Driven Prompts vs Abstract Concepts

The Golden Rule: Characters > Concepts

People connect with CHARACTERS in SITUATIONS, not abstract ideas.

ABSTRACT (weak):

  • "blockchain security concept art"
  • "privacy technology illustration"
  • "decentralized network visualization"

CHARACTER-DRIVEN (strong):

  • "a tiny person sitting inside a giant opaque crystal ball, curious onlookers pressing their faces against the outside trying to peek in, whimsical illustration"
  • "a cartoon thief trying to break into a vault but the vault keeps splitting into three smaller vaults that run away in different directions, comic book style"
  • "a sleeping person in pajamas while a robot in a suit makes trades on Wall Street, split scene, cozy bedroom vs chaotic trading floor, Pixar style"

Character Prompt Templates:

The Metaphor Scene:

"[Character] doing [unexpected thing] that represents [concept], in [setting], [style]" "A chef cooking with a blindfold on, perfectly plating a gourmet meal without seeing the ingredients, warm kitchen setting, Pixar 3D style" (Metaphor for: AI processing data without seeing it)

The Contrast Scene:

"[Character A] in [bad situation] vs [Character B] in [good situation], split composition, [style]" "Left side: a stressed person drowning in paper keys and passwords, dark chaotic room. Right side: same person relaxing in a hammock with a single glowing card, bright sunny beach. Split composition, comic book style"

The Reaction Scene:

"[Character] reacting to [event] with [emotion], in [setting], [style]" "A person's jaw dropping as their wallet transforms from a rusty lockbox into a sleek glowing vault, dramatic lighting from the vault illuminating their face, anime style"


Negative Prompt Library

Universal Negatives (always include):

text, words, letters, numbers, watermark, signature, logo, blurry, low quality, deformed, disfigured, extra limbs, bad anatomy, cropped, out of frame

Per-Style Additions:

For character scenes:

+ bad hands, extra fingers, fused fingers, poorly drawn face, mutation, ugly

For landscapes/environments:

+ oversaturated, flat lighting, amateur, simple background

For comic/cartoon styles:

+ photorealistic, photograph, 3d render, uncanny valley

For realistic/cinematic styles:

+ cartoon, anime, illustration, drawing, painting, sketch

For clean/professional styles:

+ messy, chaotic, grunge, dirty, noisy, grainy

Quality Boosters That Actually Work

Tier 1 — Always Include (pick 2-3):

  • highly detailed
  • sharp focus
  • professional illustration
    /
    professional photograph
  • cinematic composition
  • vibrant saturated colors

Tier 2 — Style-Dependent:

  • artstation quality
    — for illustration/concept art
  • depth of field
    — for cinematic/realistic
  • volumetric lighting
    — for dramatic/moody
  • subsurface scattering
    — for 3D renders
  • atmospheric perspective
    — for landscapes/epic scenes

Tier 3 — Avoid (diminishing returns or meaningless):

  • 8k uhd masterpiece
    — vague, clutters the prompt
  • trending on artstation
    — doesn't reliably improve quality
  • best quality, amazing
    — filler words, zero effect
  • hyper-realistic ultra-detailed
    — pick one specific style instead

Before/After Examples

Example 1: Wallet Security Scene

BEFORE (generic, lazy):

"digital wallet security, glowing blue shield, cyber background, futuristic"

AFTER (character-driven, specific):

"a cartoon character nervously clutching an enormous golden key while shadowy hands reach for it from every direction, dramatic spotlight from above, dark alley setting, comic book art style with bold outlines, halftone dots, vibrant saturated colors, expressive panicked face, cinematic composition"

Why it's better: Character with emotion. Specific action. Specific setting. Specific style. Tells a story in one frame.

Example 2: AI Privacy Concept

BEFORE (abstract, meaningless):

"AI privacy technology, abstract data visualization, blue and purple"

AFTER (metaphor-driven):

"a blindfolded robot chef in a spotless kitchen, perfectly slicing vegetables and plating an elegant meal without seeing any of the ingredients, ingredients floating in sealed glass containers around it, warm kitchen lighting, Pixar 3D render, soft ambient occlusion, whimsical, highly detailed"

Why it's better: The blindfolded chef IS the metaphor for private AI computation. Instantly understandable. Memorable. Fun.

Example 3: Key Recovery Feature

BEFORE (generic tech):

"data recovery concept, cloud storage, secure backup"

AFTER (scenario-based):

"a person standing in pouring rain looking at their phone which just cracked on the ground, but a glowing holographic phoenix rises from the broken phone carrying a golden key upward, dark rainy street with warm golden light from the phoenix, anime style illustration with dramatic lighting, detailed rain particles, cinematic"

Why it's better: Emotional scenario (phone broke). Visual metaphor (phoenix = recovery). Dramatic and memorable.

Example 4: Multi-Key Security

BEFORE (abstract):

"three keys security system, digital protection"

AFTER (humorous scenario):

"three tiny cartoon guards each holding one piece of a golden puzzle key, standing in front of three separate vaults on three separate mountain peaks, a confused thief at the bottom looking up at all three with a comically long to-do list, isometric perspective, Pixar style, warm golden lighting, whimsical and playful"

Why it's better: Shows the concept (3 keys, 3 locations) through characters and humor. The confused thief SHOWS why it's secure.

Example 5: Trading Bot

BEFORE (generic):

"AI trading bot, stock market, futuristic interface"

AFTER (split-scene storytelling):

"split scene: left side shows a cozy bedroom with a person sleeping peacefully in pajamas hugging a pillow, right side shows a serious robot in a tiny suit frantically working multiple screens on a Wall Street trading floor, warm bedroom light vs cold blue monitor glow, Pixar 3D style, exaggerated expressions, cinematic split composition, highly detailed"

Why it's better: Tells the STORY of the feature (you sleep, bot trades) in one image. Humor from the contrast. Memorable.


Critical Rules

  1. NEVER default to generic cyberpunk neon. If every scene looks like Tron, the video is visually monotonous. Vary styles across scenes.

  2. NEVER write abstract concept prompts. "Blockchain technology" generates garbage. Always use characters, scenarios, or metaphors.

  3. NEVER forget the negative prompt. Without it, you'll get text, watermarks, and deformed anatomy.

  4. ALWAYS vary art styles across scenes in a single video. Scene 1 comic book, Scene 2 Pixar, Scene 3 watercolor = visual interest. All scenes same style = boring.

  5. ALWAYS include lighting direction. Lighting is 50% of mood. "Dramatic rim lighting" vs "soft ambient glow" completely changes the feel.

  6. Match style to scene emotion:

    • Hook (attention): Bold, high contrast, dramatic (comic book, noir, glitch)
    • Problem (pain): Dark, moody, chaotic (cyberpunk, noir, graffiti)
    • Solution (relief): Bright, clean, warm (Pixar, Ghibli, watercolor)
    • CTA (action): Energetic, vibrant, inviting (synthwave, pop art, storybook)
  7. Test with 768x512 for LTX-2 input images, 1344x768 for standalone images.