OpenMontage video-understand
install
source · Clone the upstream repo
git clone https://github.com/calesthio/OpenMontage
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/calesthio/OpenMontage "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/video-understand" ~/.claude/skills/calesthio-openmontage-video-understand-3ae2a4 && rm -rf "$T"
manifest:
.claude/skills/video-understand/SKILL.mdsource content
video-understand
Understand video content locally using ffmpeg for frame extraction and Whisper for transcription. Fully offline, no API keys required.
Prerequisites
+ffmpeg
(required):ffprobebrew install ffmpeg
(optional, for transcription):openai-whisperpip install openai-whisper
Commands
# Scene detection + transcribe (default) python3 skills/video-understand/scripts/understand_video.py video.mp4 # Keyframe extraction python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe # Regular interval extraction python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval # Limit frames extracted python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10 # Use a larger Whisper model python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small # Frames only, skip transcription python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe # Quiet mode (JSON only, no progress) python3 skills/video-understand/scripts/understand_video.py video.mp4 -q # Output to file python3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json
CLI Options
| Flag | Description |
|---|---|
| Input video file (positional, required) |
| Extraction mode: (default), , |
| Maximum frames to keep (default: 20) |
| Whisper model size: tiny, base, small, medium, large (default: base) |
| Skip audio transcription, extract frames only |
| Write result JSON to file instead of stdout |
| Suppress progress messages, output only JSON |
Extraction Modes
| Mode | How it works | Best for |
|---|---|---|
| Detects scene changes via ffmpeg | Most videos, varied content |
| Extracts I-frames (codec keyframes) | Encoded video with natural keyframe placement |
| Evenly spaced frames based on duration and max-frames | Fixed sampling, predictable output |
If
scene mode detects no scene changes, it automatically falls back to interval mode.
Output
The script outputs JSON to stdout (or file with
-o). See references/output-format.md for the full schema.
{ "video": "video.mp4", "duration": 18.076, "resolution": {"width": 1224, "height": 1080}, "mode": "scene", "frames": [ {"path": "/abs/path/frame_0001.jpg", "timestamp": 0.0, "timestamp_formatted": "00:00"} ], "frame_count": 12, "transcript": [ {"start": 0.0, "end": 2.5, "text": "Hello and welcome..."} ], "text": "Full transcript...", "note": "Use the Read tool to view frame images for visual understanding." }
Use the Read tool on frame image paths to visually inspect extracted frames.
References
-- Full JSON output schema documentationreferences/output-format.md