kling-ai
git clone https://github.com/maciejdzierzek/kling-ai-prompt-generator
T=$(mktemp -d) && git clone --depth=1 https://github.com/maciejdzierzek/kling-ai-prompt-generator "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/kling-ai" ~/.claude/skills/maciejdzierzek-kling-ai-prompt-generator-kling-ai && rm -rf "$T"
skills/kling-ai/SKILL.mdKling AI Video Generation
Platform Access
Primary interface: app.klingai.com/global
Alternative platforms with Kling integration: Higgsfield, Pollo.ai, Fal.ai, Media.io, Artlist, Vidful.ai, Scenario, BasedLabs, LetzAI
Quick Start: Animate an Image in 60 Seconds
- Go to app.klingai.com/global → AI Videos → Image to Video
- Select Video 3.0 (V3)
- Upload your image
- Write a short motion prompt - describe only what moves, not the whole scene:
Subtle breeze moves through hair. Eyes blink naturally. Camera static. - Set duration: 5s for loops, 10-15s for narrative
- Generate at 1080p Standard mode (saves credits, same prompt quality as Professional)
That's it. For model selection, advanced prompting, and multi-shot workflows - read on.
Model Lineup (as of February 2026)
| Model | Best For | Resolution | Audio | Max Duration |
|---|---|---|---|---|
| Video 3.0 (V3) | Multi-shot storytelling, cinematic control | 1080p | Yes (5 languages) | 15s |
| Video 3.0 Omni (O3) | Reference-based consistency, voice cloning, 4K | 4K / 60fps | Yes (5 languages) | 15s |
| Video O1 | Complex scenes, character consistency, editing | 1080p | No | 10s |
| Video 2.6 | Audio-visual sync, dialogue (EN/CN) | 1080p | Yes (EN/CN) | 10s |
| Video 2.5 Turbo | Fast drafts, simple scenes | 1080p | No | 10s |
| 2.6 Motion Control | Motion transfer from reference video | 1080p | Optional | 30s |
Model Selection Guide
Choose Video 3.0 (V3) when:
- Need multi-shot sequences (up to 6 shots in one generation)
- Want first-and-last-frame control for seamless transitions
- Want native multilingual audio (EN, ZH, JA, KO, ES)
- Working from text or image prompts (prompt-first workflow)
- Need videos up to 15 seconds with narrative structure
Choose Video 3.0 Omni (O3) when:
- Have a reference video or image to anchor character identity
- Need voice cloning from a reference clip
- Require 4K resolution or 60fps for fast-paced content (60fps requires Pro plan)
- Editing or remixing existing footage
- Need the strongest subject consistency across shots
Choose O1 when:
- Complex multi-element scenes with the older pipeline
- Video-to-video editing tasks in pre-3.0 workflows
Choose 2.6 when:
- Need synchronized audio specifically in English or Chinese only
Choose 2.5 Turbo when:
- Rapid prototyping before committing credits to a 3.0 generation
- Simple 3-4-element scenes without audio needs
Choose 2.6 Motion Control when:
- Have a reference video with exact motion you want to transfer to a character
Core Workflows
Image-to-Video
- Navigate to AI Videos - Image to Video
- Select model (V3 or O3 recommended)
- Upload image (min 300x300px, max 10MB, JPG/PNG/WEBP)
- Write a motion-focused prompt - describe only what moves, not the image content (the scene already exists in your image)
- Optionally set an end frame (new in 3.0) to control where motion resolves
- Set duration (5s for loops/social, up to 15s for narrative)
- Set aspect ratio to match source image
- Generate
Text-to-Video
- Navigate to AI Videos - Text to Video
- Select model (V3 for prompt-driven, O3 for reference-driven)
- Write prompt in this structured order: Scene - Characters - Action - Camera - Audio & Style
- For 3.0: optionally use multi-shot mode to define each shot separately
- Set duration (3-15s for V3/O3), aspect ratio, and quality
- Generate at 1080p for drafts; use 4K (O3 only) for final output
Multi-Shot Storyboard (3.0 Only)
The biggest new workflow in Kling 3.0. Instead of one continuous clip, you direct a complete scene sequence in a single generation pass.
- Enter Custom Storyboard mode in the V3/O3 interface
- Define up to 6 shots - for each shot specify: duration, camera movement, composition, and subject action
- Keep 4-6 shots for a 10-15 second video (more than 6 shots in under 10 seconds feels rushed)
- Use Elements 3.0 or reference images to maintain character consistency across shots
- Generate
Example multi-shot prompt structure:
Shot 1 (3s): Wide establishing shot of rain-slicked Tokyo street at night, neon reflections on pavement. Camera: static. Shot 2 (4s): Medium shot - young woman in red coat emerges from subway exit, looks around. Camera: slow push in. Shot 3 (3s): Close-up on her face, raindrops on cheek, determined expression. Camera: static. Shot 4 (5s): She walks toward camera into the crowd. Camera: tracking shot from behind.
First and Last Frame Control (3.0 Only)
Define exactly where a video starts and ends visually.
- Start + End frames: Full control over composition and motion arc
- End frame only: Guide how motion resolves without fixing the start
- Upload reference images for either or both frames
Use this to create near-seamless loops by matching the end frame composition to the start.
Elements 3.0 - Subject Consistency
Reference specific subjects or objects using the
@element_name syntax:
- In the V3/O3 interface, open the Elements panel and upload a reference image
- Give it a name in the name field (e.g.,
)hero_character - In your prompt, reference it as
@hero_character walks through the market
Elements can be reused across separate generations, enabling a consistent visual library of characters and objects. O3 has the strongest implementation - use for commercial work where identity must stay locked.
Motion Control (2.6)
- Navigate to Motion Control
- Upload character image (full/half body with background)
- Upload reference video (3-30s) or select from library
- Set character orientation:
(max 10s) orimage
(max 30s)video - Add optional text prompt for atmosphere/style
- Generate
Language of Prompts
Always write Kling prompts in English, regardless of the language the user is writing in. Kling was trained predominantly on English-language descriptions, and English prompts produce significantly better and more predictable results than prompts in other languages.
The workflow is:
- Communicate with the user in their language (Polish, German, French, etc.)
- Write all Kling prompts in English
If the user writes their prompt in Polish or another language, translate it to English before presenting the final version they should paste into Kling. Explain this briefly if it's not obvious.
Exception: when using the native audio features (V3/O3), dialogue text in the
[Speaker: Name] "text" syntax should be in the target language (e.g., Polish dialogue for a Polish-language video). The surrounding prompt structure should still be in English.
Prompt Engineering
The Director's Mindset
Kling 3.0 understands cinematic intent. The key shift: stop writing prompts like image captions, start directing like a DoP (Director of Photography). Think of each prompt as a mini-screenplay:
Scene setting - Camera direction - Subject action
Structured Prompt Formula
[Scene/Environment] + [Characters/Subjects] + [Sequential Actions] + [Camera Movement] + [Audio & Style]
For 3.0, you can describe sequential actions: "First she looks up, then turns toward the window, finally smiles." Previous models couldn't handle this reliably - 3.0 can.
Model-Specific Guidance
| Model | Prompt Length | Max Elements | Key Pattern |
|---|---|---|---|
| V3 / O3 | 100-200 words | 6-7 | Sequential actions, multi-shot |
| O1 | 60-100 words | 5-7 | Clear element descriptions |
| 2.6 | 50-80 words | 5-7 | Include audio instructions |
| 2.5 Turbo | 40-60 words | 3-4 | Keep it simple |
Essential Prompt Rules
- Specify camera behavior explicitly - Without it, camera will improvise unpredictably
- Add motion endpoints - "then settles back" or "returns to starting position" prevents 99% of generation hangs
- Use spatial language - "left to right", "toward camera", "background depth"
- Be hyper-specific about setting - "cyberpunk alleyway at midnight, neon reflections on wet pavement" beats "a street"
- One main action per shot - Use multi-shot for complexity, not a single overloaded prompt
- Use negative prompts - Explicitly tell the model what to avoid
Keeping Elements Static
All [element] must remain absolutely fixed and unchanged throughout. [element] stays completely static. No movement on [element].
For critical logos or text: generate without text, add as a post-production overlay.
Loop-Friendly Prompts
Subtle [motion type], gentle oscillation, returns to starting position. Breathing effect, slow pulse, cyclical movement.
With 3.0: use first/last frame control with matching compositions. Post-process with 0.3-0.5s cross-dissolve.
Resolution Workflow
- 1080p for all drafts - saves credits, same prompt quality
- 4K (O3 only) for final deliverables - switch only when prompt is locked in
- 30fps for most content; 60fps for dance, sport, fast action (Kling 3.0, requires Pro plan)
Prompt Templates
See
references/prompt-templates.md for ready-to-use templates organized by use case.
Common Issues & Solutions
| Issue | Cause | Solution |
|---|---|---|
| Generation stuck at 99% | Open-ended motion | Add endpoint: "then stops", "returns to start" |
| Unwanted camera movement | No camera instruction | Add "static camera" or specific movement |
| Text/elements changing | AI interpretation | Repeat "fixed", "unchanged", "static" multiple times |
| Character morphing across shots | Identity drift | Use O3 with Elements 3.0 and reference images. Rewrite prompt from scratch rather than retrying with same settings. |
| Artifacts on hands/faces | Model limitation | Simplify scene, reduce duration, use O3 |
| Multi-shot feels disjointed | No shared style anchor | Define visual style once at start, use @element refs |
| Audio desync | Missing speaker attribution | Use format |
Settings Reference
Duration: 5s (loops/social), up to 15s (V3/O3 narrative), 10s (O1/2.6)
Aspect Ratios: 16:9 (landscape), 9:16 (vertical/mobile), 1:1 (square)
Quality modes:
- Standard mode (~10 credits/5s) - for drafts and iteration
- Professional mode (~35 credits/5s) - for finals (requires paid plan)
V3 vs O3: Same generation quality - O3 adds 4K, reference-based control, voice cloning. Both cost the same per generation.
Credits & Pricing (2026)
Free tier: 66 credits/day (expire in 24h, watermarked output)
Paid plans:
| Plan | Monthly | Credits/mo | ~Videos/mo (Pro mode, 5s) |
|---|---|---|---|
| Standard | ~$10 | 660 | ~18 |
| Pro | ~$37 | 3,000 | ~85 |
| Premier | ~$92 | 8,000 | ~230 |
| Ultra | ~$180 | 26,000 | ~740 |
Annual plans: ~34% discount. Additional credit packs available from $5 (330 credits).
All paid plans include: watermark removal, Professional mode access, 1080p output, priority processing.
Audio surcharge: Audio-enabled generation (Kling 2.6 with native audio) costs approximately 5x more credits than silent generation (~50 credits/5s standard, ~100 credits/5s professional).
Audio Features (V3, O3, and 2.6)
Languages for native audio: Chinese, English, Japanese, Korean, Spanish (V3/O3); English + Chinese only (2.6)
Audio types: speech/dialogue, narration, singing/rap, sound effects, ambient
Speaker attribution syntax (critical for accurate lip-sync):
[Speaker: Character Name] "[dialogue]" in a [tone/emotion] [accent] voice. Add [sound: footsteps / rain / door closing] when [action occurs]. Background ambient: [environment description].
Voice cloning (O3 only): Upload an audio clip (min 3s) as part of an Element. O3 binds the voice profile to the character's element_id, reusable across generations. In the prompt, reference the character via
@element_name - voice follows automatically.