Claude-skill-registry get-y2b-clips

Extract the most meaningful, engaging clips from YouTube videos. Use when user provides a YouTube URL and wants to find highlights, best moments, controversial takes, or valuable segments. Supports specifying number of clips or topic focus.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/get-y2b-clips" ~/.claude/skills/majiayu000-claude-skill-registry-get-y2b-clips && rm -rf "$T"
manifest: skills/data/get-y2b-clips/SKILL.md
source content

YouTube Nuggets Extractor

Extract the most valuable clips ("nuggets") from YouTube videos automatically. Analyzes transcripts to find high-value segments based on controversy, insightful analysis, or user-specified topics.

When to Use This Skill

Activate when the user:

  • Wants to extract "best clips", "highlights", or "nuggets" from a YouTube video
  • Asks to find "interesting moments" or "valuable segments"
  • Wants controversial takes, insights, or specific topics from a video
  • Provides a YouTube URL and mentions clips, segments, or highlights

Dependencies Check

ALWAYS check dependencies first:

# Check for yt-dlp
command -v yt-dlp || echo "MISSING: yt-dlp"

# Check for ffmpeg
command -v ffmpeg || echo "MISSING: ffmpeg"

Install Missing Dependencies

yt-dlp:

# macOS
brew install yt-dlp

# Linux
sudo apt update && sudo apt install -y yt-dlp

# pip (universal)
pip3 install yt-dlp

ffmpeg:

# macOS
brew install ffmpeg

# Linux
sudo apt update && sudo apt install -y ffmpeg

Input Requirements

  • Required: YouTube URL
  • Optional (ask user if not specified for long videos >30 min):
    • Number of clips (default: 3-5 based on video length)
    • Topic focus keywords
    • Min/max clip duration (default: 30s-180s)

Available Python Scripts

The skill includes helper scripts in the

.claude/skills/get-y2b-clips/
directory:

ScriptPurpose
parse_vtt.py
Parse VTT subtitles into segments.json (cleans HTML entities)
extract_transcript.py
Extract transcript with auto sentence boundary detection
download_clip.py
Download video clip with retry logic and progress reporting
burn_subtitles.py
Generate subtitled video with hardcoded captions
utils.py
Shared utilities for timestamp parsing

Transcript Curation (CRITICAL)

Auto-generated YouTube captions lack punctuation. The transcript must be manually curated to ensure:

  1. Complete starting sentence: Must begin with a coherent thought, not mid-sentence
  2. Complete ending sentence: Must end with a complete thought, not cut off
  3. Proper formatting: Sentences on separate lines with blank lines between
  4. Punctuation added: Add periods, commas, question marks as needed

Workflow: Transcript → Video (Not the reverse!)

1. Identify target timestamps (where the insight is)
2. Run extract_transcript.py to get raw extraction + suggested video timestamps
3. MANUALLY CURATE the transcript:
   - Ensure first sentence is complete (may need to trim start)
   - Ensure last sentence is complete (may need to extend/trim end)
   - Add punctuation and formatting
   - Split into readable paragraphs
4. Use the VIDEO_START and VIDEO_END from script output
   - Video should start ~2s BEFORE first word of transcript
   - Video should end ~2s AFTER last word of transcript
5. Download video using those curated timestamps

Example Curation:

Raw extraction (bad):

Successful why do you think we're learned and it turns out that many or most of the people in The Venture business...

Curated (good):

It turns out that many or most of the people in the venture business historically would answer that question by telling you they finance the best and brightest, the greatest managers.

We do not.

We have always focused on the market - the size of the market, the dynamics of the market, the nature of the competition.

Because our objective always was to build big companies.

If you don't attack a big market, it's highly unlikely you're ever going to build a big company.

Using extract_transcript.py

# Run the script to get raw extraction and video timestamps
python3 extract_transcript.py \
    --start 00:04:00 \        # Target start (where insight begins)
    --end 00:05:08 \          # Target end (where insight ends)
    --title "Clip Title" \
    --source "Video Title" \
    --output "clip_folder/Transcript.txt" \
    --json                     # Also outputs JSON with timestamps

# Output will show:
#   VIDEO_START=00:03:58      <- Use this for video download
#   VIDEO_END=00:05:10        <- Use this for video download

Then manually edit the Transcript.txt file to curate the text before downloading the video.

Console Progress Reporting

IMPORTANT: Provide clear progress updates to the user at each stage:

[SETUP] Fetching video info...
  ✓ Video: "Title Here" (32 min)
  ✓ Output: ./clips/2024-01-01_12-00-00_video-slug/

[TRANSCRIPT] Downloading subtitles...
  ✓ Auto-generated English subtitles found
  ✓ Parsed 912 segments

[ANALYSIS] Identifying best clips...
  ✓ Found 5 high-value segments
  ✓ Selected top 2 clips

[CLIP 1/2] "Factory is the Weapon" (08:56 - 10:31)
  ✓ metadata.json created
  ✓ Transcript.txt extracted (321 words)
  ✓ Video.mp4 downloaded (14.1 MB)
  ✓ Subtitled.mp4 created (13.8 MB)

[CLIP 2/2] "Peter Thiel Always Right" (29:59 - 31:17)
  ✓ metadata.json created
  ✓ Transcript.txt extracted (307 words)
  ⚠ Download failed (403), retrying...
  ✓ Video.mp4 downloaded (10.4 MB)
  ✓ Subtitled.mp4 created (10.1 MB)

[DONE] Extracted 2 clips (2m53s total)

Retry Logic for Downloads

YouTube occasionally returns 403 errors. Always implement retry logic:

# Use the download_clip.py script with built-in retries
python3 .claude/skills/get-y2b-clips/download_clip.py \
    --url "$VIDEO_URL" \
    --start "00:08:56" \
    --end "00:10:31" \
    --output "$CLIP_DIR/Video.mp4" \
    --retries 3

Or implement inline:

import time
import subprocess

def download_with_retry(cmd, max_retries=3, delay=2):
    for attempt in range(max_retries):
        result = subprocess.run(cmd)
        if result.returncode == 0:
            return True
        print(f"  ⚠ Retry {attempt + 1}/{max_retries}...")
        time.sleep(delay)
    return False

Complete Workflow

CRITICAL: The order of operations is WHY → TRANSCRIPT → VIDEO

The "Why" justifies the selection, the transcript defines the EXACT timestamps, and the video is downloaded to match those exact timestamps.

Phase 1: Setup

# Get video info
VIDEO_URL="USER_PROVIDED_URL"
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL" | tr '/:?*"<>|\\' '-')
VIDEO_DURATION=$(yt-dlp --print "%(duration)s" "$VIDEO_URL")
VIDEO_ID=$(yt-dlp --print "%(id)s" "$VIDEO_URL")

echo "Video: $VIDEO_TITLE"
echo "Duration: $((VIDEO_DURATION / 60)) minutes"

# Create output folder
TIMESTAMP=$(date +"%Y-%m-%d_%H-%M-%S")
SLUG=$(echo "$VIDEO_TITLE" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | cut -c1-50)
OUTPUT_DIR="./clips/${TIMESTAMP}_${SLUG}"
mkdir -p "$OUTPUT_DIR"

echo "Output folder: $OUTPUT_DIR"

Phase 2: Get Transcript with Exact Timestamps

Priority order: Manual subtitles → Auto-generated → Whisper

cd "$OUTPUT_DIR"

# Check available subtitles
yt-dlp --list-subs "$VIDEO_URL"

# Try manual subtitles first
if yt-dlp --write-sub --sub-langs "en" --skip-download -o "transcript" "$VIDEO_URL" 2>/dev/null; then
    echo "Manual subtitles downloaded"
elif yt-dlp --write-auto-sub --sub-langs "en" --skip-download -o "transcript" "$VIDEO_URL" 2>/dev/null; then
    echo "Auto-generated subtitles downloaded"
else
    echo "No subtitles available - Whisper transcription required"
    # Ask user before proceeding with Whisper (downloads audio)
fi

Parse VTT using the skill's Python script:

# Parse VTT file into segments.json and full_transcript.txt
python3 .claude/skills/get-y2b-clips/parse_vtt.py transcript.en.vtt

# This creates:
#   - segments.json (for precise timestamp lookup)
#   - full_transcript.txt (for reading/analysis)

Alternative inline Python (if script not available):

import re
import json

def parse_vtt(filename):
    with open(filename, 'r', encoding='utf-8') as f:
        content = f.read()

    lines = content.split('\n')
    segments = []
    current_start = None
    current_end = None
    seen_text = set()

    for line in lines:
        line = line.strip()

        # Check if this is a timestamp line
        time_match = re.match(r'^(\d{2}:\d{2}:\d{2})\.(\d{3}) --> (\d{2}:\d{2}:\d{2})\.(\d{3})', line)
        if time_match:
            current_start = f"{time_match.group(1)}.{time_match.group(2)}"
            current_end = f"{time_match.group(3)}.{time_match.group(4)}"
            continue

        # Skip metadata and empty lines
        if not line or line.startswith('WEBVTT') or line.startswith('Kind:') or line.startswith('Language:'):
            continue

        # Skip lines with tags (word-by-word breakdowns)
        if '<' in line:
            continue

        # This is a clean text line
        text = line.strip()
        if text and text not in seen_text and current_start:
            seen_text.add(text)
            segments.append({
                'start': current_start,
                'end': current_end,
                'text': text
            })

    return segments

segments = parse_vtt("transcript.en.vtt")

# Write full transcript with timestamps
with open('full_transcript.txt', 'w') as f:
    for seg in segments:
        f.write(f"[{seg['start']}] {seg['text']}\n")

# Write segments JSON for precise timestamp lookup
with open('segments.json', 'w') as f:
    json.dump(segments, f, indent=2)

print(f"Parsed {len(segments)} segments with exact timestamps")

Phase 3: Analyze Content & Generate "Why" FIRST

This is the critical phase - identify segments and justify selection BEFORE extracting.

Read

full_transcript.txt
and analyze using these scoring criteria:

  1. Controversy Signals (weight: 0.30)

    • "I disagree", "controversial", "unpopular opinion"
    • Strong language: "absolutely", "never", "always"
    • Debate markers: "push back", "challenge that"
  2. Insight Signals (weight: 0.35)

    • Statistics, data points, percentages
    • Predictions: "will happen", "in X years"
    • Frameworks: "the way I see it", "my model"
    • Expert knowledge, technical depth
  3. Engagement Signals (weight: 0.20)

    • Rhetorical questions
    • Stories: "let me tell you", "for example"
    • Emotional peaks, emphasis
    • Direct address: "think about it"
  4. Topic Match (weight: 0.15, or 0.40 if user specified topics)

    • Keyword presence
    • Semantic relevance

Analysis output - use EXACT timestamps from transcript:

For each identified clip, record:

  1. Title: Short descriptive name
  2. Start timestamp: EXACT timestamp from first line of segment (from segments.json)
  3. End timestamp: EXACT timestamp from last line of segment (from segments.json)
  4. Why: Full justification with scores

IMPORTANT: The start and end times MUST come from the transcript timestamps. Do not approximate or round. The video will be cut to match these exact times.

Phase 4: For Each Clip - Create Files in Order

Order: metadata.json → Transcript.txt → Video.mp4 → Subtitled.mp4

Step 1: Create metadata.json (combines "why" + transcript info)

import json

clip_metadata = {
    "title": "Clip Title",
    "source_video": "Video Title",
    "video_start": "00:12:06.000",
    "video_end": "00:14:05.000",
    "duration_seconds": 119,
    "word_count": 321,
    "transcript": "Full transcript text with proper formatting...",
    "selection_rationale": {
        "controversy": {
            "score": 8,
            "reason": "Explanation of controversy signals found"
        },
        "insight": {
            "score": 9,
            "reason": "Key insights delivered"
        },
        "engagement": {
            "score": 7,
            "reason": "Engagement signals found"
        },
        "relevance": {
            "score": 8,
            "reason": "How it relates to the main topic"
        }
    },
    "actionable_takeaway": "What viewers/investors should do with this information"
}

with open('Clip Title metadata.json', 'w') as f:
    json.dump(clip_metadata, f, indent=2)

Step 2: Create Transcript.txt (human-readable version)

Extract transcript text with a 5-second buffer before and after the video timestamps. This ensures all spoken words in the video clip are captured in the transcript (accounting for keyframe cuts).

Key rules:

  • Clean text only - no timestamps in the output
  • 5-second buffer - transcript covers slightly more than the video
  • Proper formatting - capitalize first letter of sentences, new line for each sentence
  • Readable flow - sentences separated by blank lines for easy reading

Formatting the transcript:

  1. Join all segment text together
  2. Split on sentence boundaries (. ! ?)
  3. Capitalize first letter of each sentence
  4. Write each sentence on its own line with blank line between
import json
import re

# Load segments
with open('segments.json', 'r') as f:
    segments = json.load(f)

# Define VIDEO boundaries (what will be downloaded)
video_start = "00:12:06.000"
video_end = "00:14:05.000"

# Define TRANSCRIPT boundaries (5-second buffer)
transcript_start = "00:12:01.000"  # 5s before video
transcript_end = "00:14:10.000"    # 5s after video

# Extract matching segments
clip_text = []
for seg in segments:
    if seg['start'] >= transcript_start and seg['start'] <= transcript_end:
        clip_text.append(seg['text'])

# Join and format nicely
raw_text = ' '.join(clip_text)

# Split into sentences and format
sentences = re.split(r'(?<=[.!?])\s+', raw_text)
formatted_sentences = []
for s in sentences:
    s = s.strip()
    if s:
        # Capitalize first letter
        s = s[0].upper() + s[1:] if len(s) > 1 else s.upper()
        formatted_sentences.append(s)

# Write clean, formatted transcript
with open('Clip_Title Transcript.txt', 'w') as f:
    f.write(f"Clip Title - Transcript\n")
    f.write(f"Source: Video Title\n")
    f.write(f"Video: {video_start[:8]} - {video_end[:8]}\n\n---\n\n")
    f.write('\n\n'.join(formatted_sentences))  # Each sentence on new line

Step 3: Download Video using EXACT timestamps

CRITICAL: Use the same timestamps from the transcript extraction.

START_TIME="00:12:06"  # Must match transcript start
END_TIME="00:14:05"    # Must match transcript end
CLIP_TITLE="Clip Title"
SAFE_TITLE=$(echo "$CLIP_TITLE" | tr '/:?*"<>|\\' '-')

# Create clip folder
CLIP_DIR="$OUTPUT_DIR/01_$(echo "$SAFE_TITLE" | tr '[:upper:]' '[:lower:]' | tr ' ' '_' | cut -c1-40)"
mkdir -p "$CLIP_DIR"

# Download video clip with EXACT timestamps
yt-dlp -f 'bestvideo[height<=1080]+bestaudio/best[height<=1080]' \
  --download-sections "*${START_TIME}-${END_TIME}" \
  --force-keyframes-at-cuts \
  --merge-output-format mp4 \
  -o "$CLIP_DIR/${SAFE_TITLE} Video.%(ext)s" \
  "$VIDEO_URL"

Step 4: Create Subtitled Video

ALWAYS create a subtitled version for social media sharing. Uses ffmpeg to burn subtitles directly into the video with styled captions.

# Use the burn_subtitles.py script
python3 .claude/skills/get-y2b-clips/burn_subtitles.py \
    --video "$CLIP_DIR/${SAFE_TITLE} Video.mp4" \
    --segments segments.json \
    --start "$START_TIME" \
    --end "$END_TIME" \
    --output "$CLIP_DIR/${SAFE_TITLE} Subtitled.mp4"

What it does:

  1. Reads
    segments.json
    for accurate word-level timing
  2. Generates SRT subtitles for the clip's time range
  3. Burns subtitles into video using ffmpeg with styling:
    • Large readable font (24pt)
    • Semi-transparent black background
    • White text with black outline
    • Bottom-center positioning

Subtitle options:

# Custom font size (default: 24)
--font-size 28

# Keep the SRT file (for external use)
--keep-srt

# When video has buffer before speech starts (sync fix)
--transcript-start "00:31:38.350"

# When video has buffer after speech ends
--transcript-end "00:32:50.000"

Important: If the video starts a few seconds before the actual speech, use

--transcript-start
to specify when the transcript content begins. This prevents showing subtitles for content before the clip's main topic.

Phase 5: Generate Metadata

{
  "source": {
    "url": "VIDEO_URL",
    "title": "VIDEO_TITLE",
    "video_id": "VIDEO_ID",
    "duration_seconds": DURATION
  },
  "extraction": {
    "date": "ISO_TIMESTAMP",
    "output_dir": "OUTPUT_DIR",
    "clip_count": N,
    "topic_filter": null
  },
  "clips": [
    {
      "index": 1,
      "folder": "01_clip_slug",
      "title": "Clip Title",
      "start_time": "00:12:06.000",
      "end_time": "00:14:05.000",
      "duration_seconds": 119,
      "scores": {
        "controversy": 0.9,
        "insight": 0.9,
        "engagement": 0.8,
        "overall": 0.87
      },
      "summary": "Brief description"
    }
  ]
}

Phase 6: Summary

Display to user:

  • Number of clips extracted
  • Total clip duration
  • List of clips with titles and EXACT timestamps
  • Output folder location

Timestamp Alignment Rules

These rules ensure video matches transcript exactly:

  1. Source of truth: The VTT transcript timestamps are the source of truth
  2. No rounding: Use timestamps exactly as they appear in segments.json
  3. Verify alignment: The first and last words in
    Transcript.txt
    should match the first and last words spoken in
    Video.mp4
  4. Buffer if needed: If a sentence is cut mid-word, extend to the next segment boundary

Clip Duration Guidelines

Video LengthRecommended ClipsDefault Clip Length
< 15 min1-230-60 seconds
15-30 min2-345-90 seconds
30-60 min3-460-120 seconds
1-2 hours4-590-180 seconds
> 2 hours5-790-180 seconds

Interactive Flow (Long Videos)

For videos > 30 minutes without user guidance, ask:

I found a [X] minute video. How would you like to proceed?

A) Extract top 5 clips automatically (recommended)
B) Focus on specific topics - please specify keywords
C) Extract more clips - specify how many
D) Let me scan the transcript first and suggest topics

Error Handling

IssueSolution
No subtitlesOffer Whisper with audio size warning
ffmpeg missingProvide install command
Clip download failsRetry with simpler format
-f best
Private videoInform user, cannot proceed
Very short videoSuggest 1 clip or full download
Timestamp mismatchRe-verify against segments.json

Output File Naming

  • Folder:
    YYYY-MM-DD_HH-MM-SS_<video-slug>/
  • Clips:
    NN_<clip-slug>/
  • Files per clip:
    • <Title> metadata.json
      - Complete clip info (transcript, timestamps, why selected)
    • <Title> Transcript.txt
      - Human-readable transcript with formatting
    • <Title> Video.mp4
      - Raw video clip
    • <Title> Subtitled.mp4
      - Video with burned-in captions (for social media)

Per-Clip metadata.json Structure

Each clip folder contains a

metadata.json
with all information combined:

{
  "title": "This Is NOT a Bubble",
  "source_video": "Bitcoin Fear Hits All-Time High",
  "video_url": "https://www.youtube.com/watch?v=VIDEO_ID&t=1896s",
  "video_start": "00:31:36.000",
  "video_end": "00:32:54.000",
  "transcript_start": "00:31:38.350",
  "duration_seconds": 78,
  "word_count": 218,
  "selection_rationale": {
    "controversy": {
      "score": 10,
      "reason": "Directly calls out Michael Burr and Jeff Gundlach as wrong about AI bubble"
    },
    "insight": {
      "score": 9,
      "reason": "Data-driven comparison: NDX 800% vs 100%, Nvidia $60B quarterly revenue"
    },
    "engagement": {
      "score": 9,
      "reason": "Strong emotional conviction, direct callout of famous contrarians"
    },
    "relevance": {
      "score": 9,
      "reason": "Directly addresses fear sentiment from video title"
    }
  },
  "actionable_takeaway": "Don't conflate extreme fear with bubble conditions. Look at earnings growth and historical comparisons.",
  "transcript": "Full transcript text here (always last key for readability)..."
}

Notes:

  • video_url
    : Direct link to the clip's start time in the original video (uses
    &t=XXXs
    parameter)
  • transcript_start
    : Used when video has a buffer before speech starts (ensures subtitle sync)

Example Session

User: Extract the best clips from https://www.youtube.com/watch?v=abc123

Claude:

  1. Checks dependencies (yt-dlp, ffmpeg)
  2. Gets video info: "AI Future Podcast - 1:45:00"
  3. Downloads transcript with exact timestamps
  4. Analyzes transcript, identifies top segments
  5. For each clip:
    • Creates metadata.json (transcript + selection rationale + scores)
    • Creates Transcript.txt (human-readable version)
    • Downloads video using exact timestamps
    • Creates subtitled version with burned-in captions
  6. Returns summary with folder location

User: Get 3 clips about "startup funding" from https://youtube.com/watch?v=xyz789

Claude:

  1. Same setup
  2. Filters transcript for "startup", "funding", "invest", "raise" keywords
  3. Scores segments with topic_match weighted higher
  4. For each of 3 clips: metadata.json → Transcript.txt → Video → Subtitled Video
  5. Returns summary