Awesome-omni-skill caption-clip

Download and caption YouTube clips. Use when the user asks to "caption a YouTube video", "download and add subtitles", "create captioned clips", "transcribe and caption a video", or mentions yt-dlp captioning workflows.

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tools/caption-clip" ~/.claude/skills/diegosouzapw-awesome-omni-skill-caption-clip && rm -rf "$T"
manifest: skills/tools/caption-clip/SKILL.md
source content

Caption YouTube Clips

Download YouTube video clips with timestamps, transcribe with Deepgram, and burn styled captions.

Prerequisites

  • yt-dlp installed
  • ffmpeg installed
  • DEEPGRAM_API_KEY in .env file (or environment variable)
  • Avenir Next font (falls back to Arial if unavailable)

Workflow

Step 1: Download Clip

Download a specific section of a YouTube video using yt-dlp:

yt-dlp --download-sections "*START-END" -o "clip_name.%(ext)s" "YOUTUBE_URL"

Example with timestamps

02:53-04:15
:

yt-dlp --download-sections "*02:53-04:15" -o "clip_intro.%(ext)s" "https://www.youtube.com/watch?v=VIDEO_ID"

Step 2: Convert to MP4

Ensure consistent encoding with ffmpeg defaults:

ffmpeg -i clip_name.webm -y clip_name.mp4

Step 3: Transcribe with Deepgram

Load API key from .env and call Deepgram API:

DEEPGRAM_API_KEY=$(grep '^DEEPGRAM_API_KEY=' .env | cut -d'=' -f2 | tr -d '"' | tr -d "'")

curl -s --request POST \
  --url 'https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true&punctuate=true&utterances=true' \
  --header "Authorization: Token $DEEPGRAM_API_KEY" \
  --header 'Content-Type: audio/mp4' \
  --data-binary @clip_name.mp4 > clip_transcript.json

Step 4: Generate SRT

Convert Deepgram JSON to SRT format using bundled script:

python3 ~/.claude/skills/caption-clip/scripts/json-to-srt.py clip_transcript.json clip.srt

Step 5: Clean Up Transcription

Read the generated SRT file and create a cleaned version (

clip_clean.srt
):

  1. Fix transcription errors - Correct misheard words
  2. Remove filler words - Delete: "like", "you know", "I'd say", "yeah", "um", "uh", "sort of", "kind of"
  3. Consolidate fragments - Merge short utterances into complete sentences
  4. Improve readability - Adjust phrasing to flow better when read vs. heard
  5. Fix timestamps - Ensure no invalid times (e.g.,
    00:00:60
    should be
    00:01:00
    )

Write the cleaned content to

clip_clean.srt
.

Step 6: Burn Captions

Apply styled subtitles with semi-transparent background:

ffmpeg -i clip_name.mp4 \
  -vf "subtitles=clip_clean.srt:force_style='FontSize=24,FontName=Avenir Next Medium,PrimaryColour=&H00FFFFFF,BackColour=&H80000000,BorderStyle=4,Outline=0,Shadow=0,MarginV=30,MarginL=40,MarginR=40'" \
  -c:a copy -y clip_captioned.mp4

Caption Styling Reference

ParameterValueDescription
FontSize24Point size
FontNameAvenir Next MediumClean sans-serif (use Arial as fallback)
PrimaryColour&H00FFFFFFWhite text (AABBGGRR format)
BackColour&H8000000050% transparent black background
BorderStyle4Opaque box using BackColour
MarginV30Vertical margin from bottom
MarginL/R40Horizontal margins

Alternative Fonts

If Avenir Next is unavailable, these work well for captions:

  • SF Pro Display
  • Helvetica Neue
  • Futura
  • Arial (universal fallback)

Check available fonts:

fc-list : family | sort -u | grep -iE '(avenir|sf pro|helvetica|futura)'

Output Files

After completing the workflow:

FileDescription
clip_name.mp4
Original downloaded clip
clip_transcript.json
Raw Deepgram transcription (preserved for re-runs)
clip.srt
Auto-generated subtitles
clip_clean.srt
Cleaned/edited subtitles
clip_captioned.mp4
Final video with burned-in captions