Skills video-understanding
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bill492/video-understanding" ~/.claude/skills/openclaw-skills-video-understanding && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bill492/video-understanding" ~/.openclaw/skills/openclaw-skills-video-understanding && rm -rf "$T"
manifest:
skills/bill492/video-understanding/SKILL.mdsource content
Video Understanding (Gemini)
Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.
Requirements
—yt-dlp
/brew install yt-dlppip install yt-dlp
—ffmpeg
(for merging video+audio streams)brew install ffmpeg
environment variableGEMINI_API_KEY
Default Output
Returns structured JSON:
- transcript — Verbatim transcript with
timestamps[MM:SS] - description — Visual description (people, setting, UI, text on screen, flow)
- summary — 2-3 sentence summary
- duration_seconds — Estimated duration
- speakers — Identified speakers
Usage
Analyze a video (structured JSON output)
uv run {baseDir}/scripts/analyze_video.py "<video-url>"
Ask a question (adds "answer" field)
uv run {baseDir}/scripts/analyze_video.py "<video-url>" -q "What product is shown?"
Override prompt entirely
uv run {baseDir}/scripts/analyze_video.py "<video-url>" -p "Custom prompt" --raw
Download only (no analysis)
uv run {baseDir}/scripts/analyze_video.py "<video-url>" --download-only -o video.mp4
Options
| Flag | Description | Default |
|---|---|---|
/ | Question to answer (added to default fields) | none |
/ | Override entire prompt (ignores -q) | structured JSON |
/ | Gemini model | gemini-2.5-flash |
/ | Save output to file | stdout |
| Keep downloaded video file | false |
| Download only, skip analysis | false |
| Max file size in MB | 500 |
| Raw text output instead of JSON | false |
How It Works
- YouTube URLs → Passed directly to Gemini (no download needed)
- All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
- Gemini analyzes video with structured prompt → returns JSON
- Temp files and Gemini uploads cleaned up automatically
Supported Sources
Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.
Tips
- Use
for targeted questions on top of the full analysis-q - YouTube is fastest (no download step)
- Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
- The script auto-installs Python dependencies via
uv