Skills video-understanding

Name: video-understanding
Author: openclaw

install

source · Clone the upstream repo

git clone https://github.com/openclaw/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bill492/video-understanding" ~/.claude/skills/openclaw-skills-video-understanding && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bill492/video-understanding" ~/.openclaw/skills/openclaw-skills-video-understanding && rm -rf "$T"

manifest: skills/bill492/video-understanding/SKILL.md

Video Understanding (Gemini)

Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.

Requirements

yt-dlp

—

brew install yt-dlp

pip install yt-dlp

```
ffmpeg
```
—
```
brew install ffmpeg
```
(for merging video+audio streams)
```
GEMINI_API_KEY
```
environment variable

Default Output

Returns structured JSON:

transcript — Verbatim transcript with
```
[MM:SS]
```
timestamps
description — Visual description (people, setting, UI, text on screen, flow)
summary — 2-3 sentence summary
duration_seconds — Estimated duration
speakers — Identified speakers

Usage

Analyze a video (structured JSON output)

uv run {baseDir}/scripts/analyze_video.py "<video-url>"

Ask a question (adds "answer" field)

uv run {baseDir}/scripts/analyze_video.py "<video-url>" -q "What product is shown?"

Override prompt entirely

uv run {baseDir}/scripts/analyze_video.py "<video-url>" -p "Custom prompt" --raw

Download only (no analysis)

uv run {baseDir}/scripts/analyze_video.py "<video-url>" --download-only -o video.mp4

Options

Flag	Description	Default
`-q` / `--question`	Question to answer (added to default fields)	none
`-p` / `--prompt`	Override entire prompt (ignores -q)	structured JSON
`-m` / `--model`	Gemini model	gemini-2.5-flash
`-o` / `--output`	Save output to file	stdout
`--keep`	Keep downloaded video file	false
`--download-only`	Download only, skip analysis	false
`--max-size`	Max file size in MB	500
`--raw`	Raw text output instead of JSON	false

How It Works

YouTube URLs → Passed directly to Gemini (no download needed)
All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
Gemini analyzes video with structured prompt → returns JSON
Temp files and Gemini uploads cleaned up automatically

Supported Sources

Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.

Tips

Use
```
-q
```
for targeted questions on top of the full analysis
YouTube is fastest (no download step)
Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
The script auto-installs Python dependencies via
```
uv
```