Awesome-omni-skill ltxv2-video
Build LTX-V2 19B video workflows — text-to-video, image-to-video, distilled model, camera control LoRAs, and two-stage upscaling
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/content-media/ltxv2-video" ~/.claude/skills/diegosouzapw-awesome-omni-skill-ltxv2-video && rm -rf "$T"
skills/content-media/ltxv2-video/SKILL.mdLTX-V2 19B Video Workflows
Overview
LTX-V2 (LTX-2) is a 19-billion parameter DiT-based video foundation model from Lightricks. It uses a Gemma 3 12B text encoder and supports both text-to-video (T2V) and image-to-video (I2V). Key features:
- Distilled model for fast 8-step generation
- Two-stage pipeline: Generate at low res, then 2x spatial upscale in latent space
- Camera control LoRAs for cinematic movements
- Audio-video generation in a single pass (optional)
Models
Checkpoint (Installed)
| Component | Node | Model | Notes |
|---|---|---|---|
| Checkpoint | | | 41GB bf16, distilled variant |
Text Encoder (Installed)
| Component | Node | Model | Notes |
|---|---|---|---|
| Gemma 3 | (type=) | | 9GB FP4, in text_encoders/ |
Loading note: The checkpoint bundles the VAE internally. The Gemma 3 text encoder loads separately. Use
CLIPLoader with type: "ltxv" pointing at the text_encoders/ directory.
LoRAs (Installed)
| LoRA | File | Purpose |
|---|---|---|
| Distilled LoRA | | Camera dolly left |
| Distilled LoRA (384) | | Apply to base model for distilled behavior |
| Camera Dolly Left | | Camera movement |
Concept/Style LoRAs (Installed)
Located in
loras/LTXV2/:
style/PLORAV7_LTX_000010500.safetensorsconcept/head_swap_v1_13500_first_frame.safetensorsconcept/LTX-2 - Better Female Nudity.safetensorsaction/LTX2-i2v-OralSuite.safetensorsaction/LTX2-i2v-SexThrust.safetensors- And more in
andconcept/
subfoldersaction/
Key Nodes
LTXVConditioning
Binds text conditioning with frame rate information:
{ "class_type": "LTXVConditioning", "inputs": { "positive": ["<clip_text_encode>", 0], "negative": ["<clip_text_encode_neg>", 0], "frame_rate": 25 } }
EmptyLTXVLatentVideo
Creates the initial video latent (for T2V):
{ "class_type": "EmptyLTXVLatentVideo", "inputs": { "width": 768, "height": 512, "length": 97, "batch_size": 1 } }
Frame count constraint: Must be
8n + 1 (9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121).
LTXVScheduler
Dedicated sigma schedule for LTX-V2 latent space:
{ "class_type": "LTXVScheduler", "inputs": { "steps": 8, "max_shift": 2.05, "base_shift": 0.95, "stretch": true, "terminal": 0.1 } }
Connect the optional
latent input for latent-aware shift scaling.
LTXVImgToVideo (For I2V)
All-in-one node that encodes image, creates latent, and wraps conditioning:
{ "class_type": "LTXVImgToVideo", "inputs": { "positive": ["<conditioning>", 0], "negative": ["<conditioning>", 0], "vae": ["<checkpoint>", 2], "image": ["<load_image>", 0], "width": 768, "height": 512, "length": 97, "batch_size": 1, "strength": 0.6 } }
LTXVLatentUpsampler (For Two-Stage Upscale)
{ "class_type": "LTXVLatentUpsampler", "inputs": { "latent": ["<sampler_output>", 0], "upscale_model": ["<upscale_loader>", 0] } }
Requires
LatentUpscaleModelLoader with ltx-2-spatial-upscaler-x2-1.0.safetensors.
Sampler Settings
Distilled Model (Installed)
Uses
SamplerCustomAdvanced with manual sigmas, NOT standard KSampler:
| Parameter | Stage 1 (Generate) | Stage 2 (Upscale) |
|---|---|---|
| sampler | euler | euler |
| steps | 8 | 4 |
| cfg | 1.0 | 1.0 |
| scheduler | LTXVScheduler | Manual sigmas |
Stage 1 sigmas (via LTXVScheduler):
max_shift=2.05, base_shift=0.95, stretch=true, terminal=0.1
Stage 2 sigmas (manual, for upscale refinement):
0.909375, 0.725, 0.421875, 0.0
Base Model (If Using Distilled LoRA on Base)
| Parameter | Value |
|---|---|
| sampler | res_2s |
| steps | 20 |
| cfg | 4.0 |
| scheduler | LTXVScheduler |
| distilled_lora_strength | 0.6 |
Resolution and Frame Count
Resolutions (Must be multiples of 32)
| Aspect | Stage 1 | After 2x Upscale | Notes |
|---|---|---|---|
| 3:2 landscape | 768x512 | 1536x1024 | Default |
| 16:9 landscape | 960x544 | 1920x1088 | Official example |
| 1:1 square | 640x640 | 1280x1280 | |
| 4:3 landscape | 704x512 | 1408x1024 |
Start at lower resolution for Stage 1 to manage VRAM, then upscale.
Frame Count (8n + 1
)
8n + 1| Frames | Duration @25fps | Duration @24fps | Notes |
|---|---|---|---|
| 49 | 1.96s | 2.04s | Quick test |
| 81 | 3.24s | 3.38s | Short clip |
| 97 | 3.88s | 4.04s | Default |
| 121 | 4.84s | 5.04s | Official example, recommended |
| 161 | 6.44s | 6.71s | Longer clip |
| 257 | 10.28s | 10.71s | Maximum |
Frame Rate
Standard: 25 fps (conditioned via
LTXVConditioning). 24 and 30 fps also supported.
Pipeline Flow: T2V Distilled
CheckpointLoaderSimple → MODEL + VAE CLIPLoader (ltxv, gemma_3_12B_it_fp4_mixed) → CLIP ├─ CLIPTextEncode (positive) → CONDITIONING └─ CLIPTextEncode (negative) → CONDITIONING LTXVConditioning (positive, negative, frame_rate=25) → pos/neg CONDITIONING EmptyLTXVLatentVideo (768x512, 121 frames) → LATENT LTXVScheduler (steps=8, max_shift=2.05, base_shift=0.95) → SIGMAS SamplerCustomAdvanced (model, sigmas, positive, negative, latent) → Stage 1 LATENT [Optional: LTXVLatentUpsampler → 2x LATENT → SamplerCustomAdvanced Stage 2] VAEDecode (or LTXVSpatioTemporalTiledVAEDecode for VRAM savings) → IMAGE VHS_VideoCombine (or CreateVideo + SaveVideo) → MP4
Complete Workflow: T2V Distilled (8-Step)
{ "1": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "ltx-2-19b-distilled.safetensors" }}, "2": { "class_type": "CLIPLoader", "inputs": { "clip_name": "gemma_3_12B_it_fp4_mixed.safetensors", "type": "ltxv" }}, "3": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["2", 0], "text": "<positive prompt>" }}, "4": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["2", 0], "text": "" }}, "5": { "class_type": "LTXVConditioning", "inputs": { "positive": ["3", 0], "negative": ["4", 0], "frame_rate": 25 }}, "6": { "class_type": "EmptyLTXVLatentVideo", "inputs": { "width": 768, "height": 512, "length": 121, "batch_size": 1 }}, "7": { "class_type": "LTXVScheduler", "inputs": { "steps": 8, "max_shift": 2.05, "base_shift": 0.95, "stretch": true, "terminal": 0.1, "latent": ["6", 0] }}, "8": { "class_type": "KSamplerSelect", "inputs": { "sampler_name": "euler" }}, "9": { "class_type": "SamplerCustomAdvanced", "inputs": { "model": ["1", 0], "positive": ["5", 0], "negative": ["5", 1], "sigmas": ["7", 0], "latent_image": ["6", 0], "noise": ["10", 0], "sampler": ["8", 0], "guider": ["11", 0] }}, "10": { "class_type": "RandomNoise", "inputs": { "noise_seed": 42 }}, "11": { "class_type": "CFGGuider", "inputs": { "model": ["1", 0], "positive": ["5", 0], "negative": ["5", 1], "cfg": 1.0 }}, "12": { "class_type": "VAEDecode", "inputs": { "samples": ["9", 0], "vae": ["1", 2] }}, "13": { "class_type": "VHS_VideoCombine", "inputs": { "images": ["12", 0], "frame_rate": 25, "loop_count": 0, "filename_prefix": "ltxv2", "format": "video/h264-mp4", "pingpong": false, "save_output": true, "pix_fmt": "yuv420p", "crf": 19, "save_metadata": true, "trim_to_audio": false }} }
Alternative simple output (built-in nodes instead of VHS):
{ "12": { "class_type": "VAEDecode", "inputs": { "samples": ["9", 0], "vae": ["1", 2] }}, "13": { "class_type": "CreateVideo", "inputs": { "images": ["12", 0], "fps": 25 }}, "14": { "class_type": "SaveVideo", "inputs": { "video": ["13", 0], "filename_prefix": "video/ltxv2", "format": "auto", "codec": "auto" }} }
Camera Control LoRAs
Seven official camera control LoRAs from Lightricks:
| Movement | LoRA File |
|---|---|
| Dolly Left | |
| Dolly Right | |
| Dolly In | |
| Dolly Out | |
| Jib Up | |
| Jib Down | |
| Static | |
Usage: Apply with
LoraLoaderModelOnly at strength 1.0. Do NOT describe camera movement in your prompt — the LoRA handles it.
{ "class_type": "LoraLoaderModelOnly", "inputs": { "model": ["<checkpoint>", 0], "lora_name": "ltx-2-19b-lora-camera-control-dolly-left.safetensors", "strength_model": 1.0 } }
Cannot combine camera control LoRA with IC-LoRA (canny/depth/pose) in the same generation.
Concept/Style LoRAs
Apply with
LoraLoaderModelOnly. Typical strength: 0.5–1.0.
{ "class_type": "LoraLoaderModelOnly", "inputs": { "model": ["<checkpoint_or_camera_lora>", 0], "lora_name": "LTXV2\\concept\\LTX-2 - Better Female Nudity.safetensors", "strength_model": 0.8 } }
Concept/style LoRAs CAN be stacked with camera control LoRAs.
VRAM Considerations
| Config | VRAM | Notes |
|---|---|---|
| bf16 checkpoint + FP4 Gemma | ~24GB+ | Tight on RTX 4090, may OOM |
| FP8 checkpoint + FP4 Gemma | ~16-20GB | Recommended for 24GB GPUs |
| bf16 + tiled VAE decode | ~22GB | Use |
VRAM warnings from MEMORY.md: "LTXV2 can OOM on 24GB — suggest FP8 quantized models or --lowvram"
Tips for 24GB GPUs
- Use
orVAEDecodeTiled
instead of standardLTXVSpatioTemporalTiledVAEDecodeVAEDecode - Start at 768x512 resolution, upscale in Stage 2
- Use FP4 Gemma text encoder (installed)
- Consider GGUF quantized models for tighter VRAM budgets
- Always
before switching to LTX-V2 from another model familyclear_vram - Reduce frame count to 81 or 49 if OOM persists
Prompt Style
Natural language descriptions. Be specific about motion, camera angles, and temporal progression:
Good: "A woman with flowing auburn hair walks through a sun-dappled forest, leaves falling gently around her, soft golden hour lighting, cinematic depth of field" Bad: "woman, forest, walking"
Describe the entire scene progression, not just a single moment. Include lighting, mood, and motion cues.
Two-Stage Upscale Pattern
For production quality, generate at low resolution then upscale:
- Stage 1: Generate at 768x512, 121 frames, 8 steps (distilled)
- Upscale:
(2x spatial) → 1536x1024LTXVLatentUpsampler - Stage 2: Resample the upscaled latent with 3-4 steps at CFG 1.0
- Decode: Use tiled VAE decode for the larger resolution
This requires the spatial upscaler model:
ltx-2-spatial-upscaler-x2-1.0.safetensors (place in models/latent_upscale_models/).