Openclaw-master-skills ltx-video

install
source · Clone the upstream repo
git clone https://github.com/LeoYeAI/openclaw-master-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/LeoYeAI/openclaw-master-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ltx-video" ~/.claude/skills/leoyeai-openclaw-master-skills-ltx-video && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/LeoYeAI/openclaw-master-skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/ltx-video" ~/.openclaw/skills/leoyeai-openclaw-master-skills-ltx-video && rm -rf "$T"
manifest: skills/ltx-video/SKILL.md
safety · automated scan (medium risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
  • makes HTTP requests (curl)
  • references .env files
  • references API keys
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content

LTX-2.3 Video API

API Reference

Base URL:

https://api.ltx.video/v1

Auth:
Authorization: Bearer <API_KEY>

Response: MP4 binary (direct download, no polling)

Endpoints

EndpointInputUse
/v1/text-to-video
promptGenerate video from text
/v1/image-to-video
image_uri + promptAnimate a still image
/v1/audio-to-video
audio_uri + image_uri + promptLip-sync video from audio + image
/v1/extend
video_uri + promptExtend a video at start or end
/v1/retake
video_uri + time rangeRegenerate a section of a video

Models

ModelSpeedQuality
ltx-2-3-fast
~17sGood (use for tests)
ltx-2-3-pro
~30-60sBest (use for final)

Supported Resolutions

  • 1920x1080
    (landscape 16:9)
  • 1080x1920
    (portrait 9:16 — native vertical, trained on vertical data)
  • 1440x1080
    ,
    4096x2160
    (text-to-video only)

audio-to-video only supports:

1920x1080
or
1080x1920


Quick Examples

Text to Video

curl -X POST "https://api.ltx.video/v1/text-to-video" \
  -H "Authorization: Bearer $LTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A man in a navy blue suit sits at a luxury restaurant table...",
    "model": "ltx-2-3-pro",
    "duration": 8,
    "resolution": "1920x1080"
  }' -o output.mp4

Audio to Video (Lip-sync)

curl -X POST "https://api.ltx.video/v1/audio-to-video" \
  -H "Authorization: Bearer $LTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_uri": "https://example.com/voice.mp3",
    "image_uri": "https://example.com/portrait.jpg",
    "prompt": "A man speaks directly to camera...",
    "model": "ltx-2-3-pro",
    "resolution": "1920x1080"
  }' -o output.mp4

Python Wrapper

import requests

def ltx_audio_to_video(audio_url, image_url, prompt, api_key,
                        model="ltx-2-3-pro", resolution="1920x1080",
                        output_path="output.mp4"):
    r = requests.post(
        "https://api.ltx.video/v1/audio-to-video",
        headers={"Authorization": f"Bearer {api_key}",
                 "Content-Type": "application/json"},
        json={"audio_uri": audio_url, "image_uri": image_url,
              "prompt": prompt, "model": model, "resolution": resolution},
        timeout=300, stream=True
    )
    if r.status_code != 200:
        raise RuntimeError(f"LTX error {r.status_code}: {r.text}")
    with open(output_path, "wb") as f:
        for chunk in r.iter_content(8192): f.write(chunk)
    return output_path

⚠️ Critical Rules (learned from experience)

File Hosting

  • URLs must be HTTPS — HTTP is rejected
  • Files must return correct MIME type (not
    application/octet-stream
    )
  • uguu.se works: upload with
    curl -F "files[]=@file.mp3" https://uguu.se/upload
  • Audio: upload as MP3 (not WAV) → uguu returns
    audio/mpeg
  • 4K images fail → resize to 1920x1080 before uploading
# Upload MP3 to uguu.se
AUDIO_URL=$(curl -s -F "files[]=@audio.mp3" "https://uguu.se/upload" | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")

# Upload image
IMAGE_URL=$(curl -s -F "files[]=@portrait.jpg" "https://uguu.se/upload" | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")

Image Size Limit

# Resize large images before upload
ffmpeg -y -i input_4k.png -vf "scale=1920:1080" output_1080.jpg

Face Consistency

  • Avoid prompts where the character looks down — breaks face consistency
  • Keep head level and gaze forward throughout
  • Place objects already in frame instead of having character reach below frame

Last Frame

  • LTX does not support first+last frame natively
  • Workaround: generate clip A, generate clip B, then use
    /v1/extend
    to chain them

Prompting Guide (LTX-2.3)

LTX-2.3 has a much stronger text connector. Specificity wins.

1. Use Verbs, Not Nouns

"A dramatic portrait of a man standing"

"A man stands on a rooftop. His coat flaps in the wind. He adjusts his collar and steps forward as the camera tracks right."

2. Block the Scene Like a Director

  • Specify left vs right, foreground vs background
  • Describe who moves, what moves, how they move, what the camera does
  • Spatial relationships are now respected

3. Describe Audio Explicitly (for text-to-video)

  • Name the type of sound: dialogue, ambient, music
  • Specify tone and intensity
  • Example:
    "His voice is clear and warm. Restaurant ambient sound softly in the background."

4. Avoid Static Photo-Like Prompts

  • If the prompt reads like a still image → the output behaves like one
  • Add wind, motion, breathing, gestures, camera movement

5. Describe Texture and Material

  • Hair, fabric, surface finish, lighting fall-off
  • "Individual hair strands visible in the backlight"
    → now renders correctly

6. Portrait (9:16) Native

  • resolution: "1080x1920"
    → trained on vertical data
  • Frame for vertical intentionally, don't treat as cropped landscape

7. Complex Shots Work Now

  • Layer multiple actions:
    "He picks up the banana, raises it to his ear, and smirks"
  • Combine character performance + environment + camera motion

Lip-Sync Prompt Template

A [description of person] sits/stands [location]. He/she speaks directly 
to camera, lips moving in perfect sync with his/her voice. [Gesture details]. 
Head stays level and gaze remains locked on camera throughout. 
[Environment description softly blurred in background]. 
[Lighting]. [Camera: holds steady at eye level, front-on].

ComfyUI Node

Custom nodes for ComfyUI (no manual API calls):

cd ComfyUI/custom_nodes
git clone https://github.com/PauldeLavallaz/comfyui-ltx-node

Nodes:

LTX Text to Video
,
LTX Image to Video
,
LTX Extend Video

Category: LTX Video


API Key

Paul's key: stored in

~/clawd/.env
as
LTX_API_KEY

ltxv_RfSU5hdKJb_g5dwbECZWnilE1P8dJzbavz6niP_0LQJ942ARHIVhrBCfebcytEL1efLVx_63S_PJyWTzicrBcWEkOXfCbGTl8JSzlJJk329MwRViEgOoE2KnE9LIA5t6QSFeBy7DLnTIcX0AZNbV9Jv0TuC7qcq2gV33G6ROhUVUDCuN