Skills youtube-digest
Understand, summarize, translate, and extract key points from YouTube videos. Use when a user provides a YouTube URL and wants: (1) a Chinese summary, (2) a transcript or subtitle extraction, (3) translation of spoken content, (4) timestamps / chapter notes, (5) visual understanding via key frames, or (6) question answering about a video. Prefer this skill for transcript-first workflows.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/benheee/youtube-digest" ~/.claude/skills/openclaw-skills-youtube-digest && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/benheee/youtube-digest" ~/.openclaw/skills/openclaw-skills-youtube-digest && rm -rf "$T"
manifest:
skills/benheee/youtube-digest/SKILL.mdsource content
YouTube Digest
Use a transcript-first workflow.
Quick workflow
- Run
to collect metadata and subtitles. If behind a proxy, addscripts/fetch_youtube.py <url> --out <dir>
.--proxy <proxy-url> - If subtitles exist, read
and the generated transcript file first.summary.json - If the user only wants a quick answer, summarize directly from the transcript.
- If the user needs stronger visual grounding, extract key frames with ffmpeg after downloading the video or by using an existing local video file.
- If no subtitles are available, report that transcript extraction needs
+ a speech-to-text path (for example Whisper) before promising a result.yt-dlp
Default behavior
- Prefer manual subtitles over auto subtitles.
- Prefer Chinese subtitles when available; otherwise use English auto/manual subtitles.
- Keep downloads minimal: subtitles + metadata first, full video only when visual analysis is necessary.
- For long videos, produce:
- 3-line executive summary
- bullet timeline with timestamps
- key insights / actionable points
- open questions or uncertainties
Outputs
For normal requests, return:
- Video topic
- Summary (in user's language)
- Key timestamps
- Notable quotes / insights
- If confidence is limited, say whether the result came from manual subtitles, auto subtitles, or partial metadata only.
Files produced by the script
The fetch script writes an output directory containing:
— chosen subtitle file, title, uploader, duration, and extraction statussummary.json
— plain text transcript when subtitles are availabletranscript.txt- raw subtitle files from
(VTT/SRT)yt-dlp
Read
summary.json first to decide what to do next.
Required runtime tools
for metadata + subtitle extractionyt-dlp
as JS runtime (required by yt-dlp 2026+)deno
for media conversion / optional frame extraction (optional)ffmpeg
Key commands
Basic extraction:
python3 scripts/fetch_youtube.py "<youtube-url>" --out /tmp/youtube-digest
With proxy:
python3 scripts/fetch_youtube.py "<youtube-url>" --proxy http://your-proxy:port --out /tmp/youtube-digest
Prefer specific subtitle languages:
python3 scripts/fetch_youtube.py "<youtube-url>" --langs zh.*,en.* --out /tmp/youtube-digest
Failure handling
- If
is missing, stop and install it instead of improvising.yt-dlp - If YouTube blocks the request (429 or bot detection), try using a proxy or report the limitation.
- If only metadata is available, do not pretend you understood the full video.
- If subtitles are auto-generated, mention that wording may be noisy.
References
- Read
for deployment instructions.references/install-and-deploy.md - Read
for output templates for summaries, translations, or Q&A.references/usage-patterns.md