Claude-skill-registry ffmpeg-captions-subtitles
Complete subtitle and caption system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20). PROACTIVELY activate for: (1) Burning subtitles (hardcoding SRT/ASS/VTT), (2) Adding soft subtitle tracks, (3) Extracting subtitles from video, (4) Subtitle format conversion, (5) Styled captions (font, color, outline, shadow), (6) Subtitle positioning and alignment, (7) CEA-608/708 closed captions, (8) Text overlays with drawtext, (9) Whisper AI automatic transcription (FFmpeg 8.0+ with VAD, multi-language, GPU), (10) Batch subtitle processing. Provides: Format reference tables, styling parameter guide, position alignment charts, Whisper model comparison, VAD configuration, dynamic text examples, accessibility best practices. Ensures: Professional captions with proper styling and accessibility compliance.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/ffmpeg-captions-subtitles" ~/.claude/skills/majiayu000-claude-skill-registry-ffmpeg-captions-subtitles && rm -rf "$T"
skills/data/ffmpeg-captions-subtitles/SKILL.mdCRITICAL GUIDELINES
Windows File Path Requirements
MANDATORY: Always Use Backslashes on Windows for File Paths
When using Edit or Write tools on Windows, you MUST use backslashes (
\) in file paths, NOT forward slashes (/).
Documentation Guidelines
NEVER create new documentation files unless explicitly requested by the user.
Quick Reference
| Task | Command |
|---|---|
| Burn SRT | |
| Burn ASS | |
| Add soft sub | |
| Extract sub | |
| Style subs | |
| Text overlay | |
| Format | Extension | Best For |
|---|---|---|
| SRT | .srt | Simple, universal |
| ASS | .ass | Styled, animated, anime |
| VTT | .vtt | Web/HTML5 video |
When to Use This Skill
Use for subtitle and caption operations:
- Hardcoding (burning) subtitles into video
- Adding soft subtitle tracks to containers
- Extracting subtitles from MKV/MP4
- Styling captions (font, color, position)
- Dynamic text overlays
FFmpeg Captions and Subtitles (2025)
Complete guide to working with subtitles, closed captions, and text overlays using FFmpeg.
Subtitle Format Reference
Supported Formats
| Format | Extension | Features | Use Case |
|---|---|---|---|
| SubRip | .srt | Simple timing + text | Universal, web |
| Advanced SubStation Alpha | .ass/.ssa | Rich styling, positioning, effects | Anime, styled subs |
| WebVTT | .vtt | Web standard, cues, styling | HTML5 video |
| TTML/DFXP | .ttml/.dfxp | Broadcast, accessibility | Streaming services |
| MOV Text | .mov (embedded) | QuickTime native | Apple ecosystem |
| DVB Subtitle | (embedded) | Bitmap-based | European broadcast |
| PGS | .sup | Blu-ray bitmap subtitles | Blu-ray |
| CEA-608/708 | (embedded) | Closed captions | US broadcast, streaming |
Format Characteristics
SRT (SubRip): - Simple text-based format - Supports basic HTML tags (<b>, <i>, <u>) - Widely compatible - No positioning or advanced styling ASS/SSA: - Advanced styling (fonts, colors, outlines) - Precise positioning anywhere on screen - Animation and effects support - Karaoke timing WebVTT: - HTML5 standard format - CSS-like styling - Cue settings for positioning - Speaker identification
Burning Subtitles (Hardcoding)
Basic Subtitle Burn-in
# Burn SRT subtitles ffmpeg -i video.mp4 -vf "subtitles=subs.srt" output.mp4 # Burn ASS/SSA subtitles (preserves styling) ffmpeg -i video.mp4 -vf "ass=subs.ass" output.mp4 # Burn subtitles from MKV container ffmpeg -i video.mkv -vf "subtitles=video.mkv" output.mp4 # Burn specific subtitle track (index 0) ffmpeg -i video.mkv -vf "subtitles=video.mkv:si=0" output.mp4 # Burn subtitles with stream index ffmpeg -i video.mkv -vf "subtitles=video.mkv:stream_index=1" output.mp4
Styled Subtitle Burn-in
# Force style (overrides subtitle styling) ffmpeg -i video.mp4 \ -vf "subtitles=subs.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1'" \ output.mp4 # Yellow subtitles with black outline ffmpeg -i video.mp4 \ -vf "subtitles=subs.srt:force_style='FontSize=28,PrimaryColour=&H00FFFF,OutlineColour=&H000000,Outline=3'" \ output.mp4 # Larger font for accessibility ffmpeg -i video.mp4 \ -vf "subtitles=subs.srt:force_style='FontSize=36,Bold=1'" \ output.mp4
ASS Style Parameters
| Parameter | Description | Example |
|---|---|---|
| FontName | Font family | |
| FontSize | Size in points | |
| PrimaryColour | Text color (AABBGGRR) | |
| SecondaryColour | Karaoke color | |
| OutlineColour | Outline/border color | |
| BackColour | Shadow/background color | |
| Bold | Bold text (0/1) | |
| Italic | Italic text (0/1) | |
| Underline | Underlined text (0/1) | |
| Outline | Outline width | |
| Shadow | Shadow depth | |
| Alignment | Position (numpad style) | |
| MarginL/R/V | Margins in pixels | |
Color Format (ASS)
ASS uses &HAABBGGRR format (Alpha, Blue, Green, Red): - White: &HFFFFFF or &H00FFFFFF - Black: &H000000 or &H00000000 - Yellow: &H00FFFF (00-Blue, FF-Green, FF-Red) - Red: &H0000FF - Blue: &HFF0000 - 50% transparent black: &H80000000
Adding Subtitle Tracks (Soft Subs)
Embed SRT as Track
# Add SRT to MP4 (MOV text) ffmpeg -i video.mp4 -i subs.srt \ -c copy -c:s mov_text \ output.mp4 # Add SRT to MKV ffmpeg -i video.mp4 -i subs.srt \ -c copy -c:s srt \ output.mkv # Add SRT to WebM (WebVTT) ffmpeg -i video.webm -i subs.srt \ -c copy -c:s webvtt \ output.webm
Multiple Subtitle Tracks
# Add multiple languages ffmpeg -i video.mp4 -i subs_en.srt -i subs_es.srt -i subs_fr.srt \ -map 0:v -map 0:a -map 1 -map 2 -map 3 \ -c copy -c:s mov_text \ -metadata:s:s:0 language=eng -metadata:s:s:0 title="English" \ -metadata:s:s:1 language=spa -metadata:s:s:1 title="Spanish" \ -metadata:s:s:2 language=fra -metadata:s:s:2 title="French" \ output.mp4 # Add ASS subtitles to MKV (preserves styling) ffmpeg -i video.mp4 -i styled.ass \ -map 0 -map 1 \ -c copy -c:s ass \ output.mkv
Set Default Subtitle Track
# Set subtitle as default ffmpeg -i video.mp4 -i subs.srt \ -c copy -c:s mov_text \ -disposition:s:0 default \ output.mp4 # Set forced subtitles (always display) ffmpeg -i video.mp4 -i forced.srt \ -c copy -c:s mov_text \ -disposition:s:0 forced \ output.mp4
Extracting Subtitles
Extract to External File
# Extract first subtitle track to SRT ffmpeg -i video.mkv -map 0:s:0 output.srt # Extract specific subtitle stream ffmpeg -i video.mkv -map 0:s:1 output.srt # Extract to ASS format ffmpeg -i video.mkv -map 0:s:0 output.ass # Extract to WebVTT ffmpeg -i video.mkv -map 0:s:0 output.vtt # Extract all subtitle tracks ffmpeg -i video.mkv -map 0:s subs_%d.srt
List Available Subtitle Tracks
# Show all streams including subtitles ffprobe -v error -show_entries stream=index,codec_name,codec_type:stream_tags=language,title \ -of csv=p=0 video.mkv # Show only subtitle streams ffprobe -v error -select_streams s \ -show_entries stream=index,codec_name:stream_tags=language,title \ -of csv=p=0 video.mkv
Convert Subtitle Formats
# SRT to ASS ffmpeg -i subs.srt subs.ass # ASS to SRT (loses styling) ffmpeg -i subs.ass subs.srt # SRT to WebVTT ffmpeg -i subs.srt subs.vtt # WebVTT to SRT ffmpeg -i subs.vtt subs.srt
Text Overlays (drawtext)
Basic Text Overlay
# Simple text overlay ffmpeg -i video.mp4 \ -vf "drawtext=text='Hello World':x=10:y=10:fontsize=24:fontcolor=white" \ output.mp4 # Centered text ffmpeg -i video.mp4 \ -vf "drawtext=text='Centered Text':x=(w-tw)/2:y=(h-th)/2:fontsize=48:fontcolor=white" \ output.mp4 # Text at bottom center ffmpeg -i video.mp4 \ -vf "drawtext=text='Bottom Text':x=(w-tw)/2:y=h-th-20:fontsize=32:fontcolor=white" \ output.mp4
Styled Text Overlays
# Text with background box ffmpeg -i video.mp4 \ -vf "drawtext=text='With Background':x=10:y=10:fontsize=24:fontcolor=white:box=1:boxcolor=black@0.5:boxborderw=5" \ output.mp4 # Text with outline (border) ffmpeg -i video.mp4 \ -vf "drawtext=text='Outlined Text':x=10:y=10:fontsize=48:fontcolor=white:borderw=3:bordercolor=black" \ output.mp4 # Shadow effect ffmpeg -i video.mp4 \ -vf "drawtext=text='Shadow Text':x=10:y=10:fontsize=36:fontcolor=white:shadowcolor=black:shadowx=3:shadowy=3" \ output.mp4 # Custom font ffmpeg -i video.mp4 \ -vf "drawtext=text='Custom Font':fontfile=/path/to/font.ttf:fontsize=24:fontcolor=white" \ output.mp4
Dynamic Text
# Display timestamp ffmpeg -i video.mp4 \ -vf "drawtext=text='%{pts\:hms}':x=10:y=10:fontsize=24:fontcolor=white" \ output.mp4 # Display frame number ffmpeg -i video.mp4 \ -vf "drawtext=text='Frame\: %{n}':x=10:y=10:fontsize=24:fontcolor=white" \ output.mp4 # Display current date/time ffmpeg -i video.mp4 \ -vf "drawtext=text='%{localtime\:%Y-%m-%d %H\\\:%M\\\:%S}':x=10:y=10:fontsize=24:fontcolor=white" \ output.mp4 # Text from file ffmpeg -i video.mp4 \ -vf "drawtext=textfile=message.txt:x=10:y=10:fontsize=24:fontcolor=white:reload=1" \ output.mp4
Animated Text
# Scrolling text (credits) ffmpeg -i video.mp4 \ -vf "drawtext=textfile=credits.txt:x=(w-tw)/2:y=h-t*50:fontsize=24:fontcolor=white" \ output.mp4 # Fade in text ffmpeg -i video.mp4 \ -vf "drawtext=text='Fade In':x=(w-tw)/2:y=(h-th)/2:fontsize=48:fontcolor=white:alpha='if(lt(t,1),t,1)'" \ output.mp4 # Text appears at specific time (3 seconds) ffmpeg -i video.mp4 \ -vf "drawtext=text='Appears at 3s':x=10:y=10:fontsize=24:fontcolor=white:enable='gte(t,3)'" \ output.mp4 # Text visible between 2-5 seconds ffmpeg -i video.mp4 \ -vf "drawtext=text='Visible 2-5s':x=10:y=10:fontsize=24:fontcolor=white:enable='between(t,2,5)'" \ output.mp4 # Blinking text ffmpeg -i video.mp4 \ -vf "drawtext=text='BLINK':x=10:y=10:fontsize=36:fontcolor=red:alpha='if(mod(floor(t*2),2),1,0)'" \ output.mp4
Whisper AI Integration (FFmpeg 8.0+)
FFmpeg 8.0 integrates OpenAI's Whisper speech recognition model via whisper.cpp, enabling fully local and offline automatic transcription and subtitle generation.
Features
- Speech to Text: High accuracy transcription
- Multi-language: 99 languages supported
- Flexible Output: SRT subtitles, JSON, or plain text
- GPU Acceleration: Uses system GPU by default
- Voice Activity Detection: Silero VAD support for better segmentation
- Real-time: Works with live audio streams
Generate SRT Subtitles
# Generate subtitles from video using Whisper ffmpeg -i video.mp4 -vn \ -af "whisper=model=ggml-base.bin:language=auto:queue=3:destination=output.srt:format=srt" \ -f null - # Specify language for better accuracy ffmpeg -i video.mp4 -vn \ -af "whisper=model=ggml-base.bin:language=en:destination=english.srt:format=srt" \ -f null - # Use larger model for higher quality ffmpeg -i video.mp4 -vn \ -af "whisper=model=ggml-medium.bin:language=auto:destination=output.srt:format=srt" \ -f null -
Live Transcription
# Live transcription from microphone (Linux PulseAudio) ffmpeg -loglevel warning -f pulse -i default \ -af "highpass=f=200,lowpass=f=3000,whisper=model=ggml-medium-q5_0.bin:language=en:queue=10:destination=-:format=json:vad_model=for-tests-silero-v5.1.2-ggml.bin" \ -f null - # Live transcription from microphone (Windows) ffmpeg -loglevel warning -f dshow -i audio="Microphone" \ -af "whisper=model=ggml-base.bin:language=en:queue=10:destination=-:format=text" \ -f null - # Live transcription from microphone (macOS) ffmpeg -loglevel warning -f avfoundation -i ":0" \ -af "whisper=model=ggml-base.bin:language=en:queue=10:destination=-:format=text" \ -f null -
Display Live Subtitles on Video
# Whisper writes to frame metadata, drawtext reads it ffmpeg -i input.mp4 \ -af "whisper=model=ggml-base.en.bin:language=en" \ -vf "drawtext=text='%{metadata\:lavfi.whisper.text}':fontsize=24:fontcolor=white:x=10:y=h-th-10:box=1:boxcolor=black@0.5:boxborderw=5" \ output_with_subtitles.mp4 # Centered subtitles with styling ffmpeg -i input.mp4 \ -af "whisper=model=ggml-base.bin:language=auto" \ -vf "drawtext=text='%{metadata\:lavfi.whisper.text}':fontsize=32:fontcolor=white:x=(w-tw)/2:y=h-th-50:borderw=2:bordercolor=black" \ output.mp4
JSON Output for Post-Processing
# Generate JSON for custom processing ffmpeg -i video.mp4 -vn \ -af "whisper=model=ggml-base.bin:language=auto:destination=output.json:format=json" \ -f null -
With Voice Activity Detection (VAD)
# Use Silero VAD for better speech detection ffmpeg -i video.mp4 -vn \ -af "whisper=model=ggml-base.bin:language=en:queue=20:destination=output.srt:format=srt:vad_model=for-tests-silero-v5.1.2-ggml.bin:vad_threshold=0.5" \ -f null - # VAD parameters for noisy audio ffmpeg -i noisy_video.mp4 -vn \ -af "whisper=model=ggml-medium.bin:language=en:queue=20:destination=output.srt:format=srt:vad_model=silero.bin:vad_threshold=0.6:vad_min_speech_duration=0.2:vad_min_silence_duration=0.3" \ -f null -
Disable GPU (CPU-only processing)
# Force CPU processing (slower but works without GPU) ffmpeg -i video.mp4 -vn \ -af "whisper=model=ggml-base.bin:language=en:use_gpu=false:destination=output.srt:format=srt" \ -f null -
Whisper Models
| Model | Size | Speed | Quality | VRAM | Recommended For |
|---|---|---|---|---|---|
| tiny | 39 MB | Fastest | Basic | ~1 GB | Quick previews, low-resource |
| base | 74 MB | Fast | Good | ~1 GB | General use, balanced |
| small | 244 MB | Medium | Better | ~2 GB | Higher accuracy |
| medium | 769 MB | Slow | High | ~5 GB | Quality-critical |
| large | 1.55 GB | Slowest | Best | ~10 GB | Maximum accuracy |
Whisper Filter Parameters
| Parameter | Description | Default |
|---|---|---|
| Path to GGML model file | Required |
| Language code or "auto" | "auto" |
| Output format: "text", "srt", "json" | "text" |
| Output file path or "-" for stdout | Required |
| Buffer size in seconds | 3 |
| Path to Silero VAD model | None |
| VAD sensitivity (0-1) | 0.5 |
| Min speech duration (seconds) | 0.1 |
| Min silence duration (seconds) | 0.5 |
| Enable GPU acceleration | true |
Model Download
Download GGML models from the whisper.cpp project:
# Download base model curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin # Download medium model for better quality curl -L -o ggml-medium.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin
CEA-608/708 Closed Captions
Extract Closed Captions
# Extract CEA-608 captions from ATSC stream ffmpeg -f lavfi -i "movie=broadcast.ts[out0+subcc]" -map 0:1 captions.srt # Extract from video with embedded CC ffmpeg -i video_with_cc.mp4 \ -filter_complex "[0:v]format=yuv420p[v];[0:v]crop=1:1:0:0[c]" \ -map "[c]" -c:v libx264 -f null - 2>&1 | grep -A 1 "Closed caption"
Add CEA-608 Captions
# Embed CEA-608 captions (requires eia608 line) ffmpeg -i video.mp4 -i captions.scc \ -c:v libx264 -c:a copy \ -vf "movie=captions.scc[captions];[0:v][captions]overlay" \ output.mp4
Subtitle Positioning
ASS Alignment Values
7 (top-left) 8 (top-center) 9 (top-right) 4 (mid-left) 5 (mid-center) 6 (mid-right) 1 (bottom-left) 2 (bottom-center) 3 (bottom-right)
Position Examples
# Top center subtitles ffmpeg -i video.mp4 \ -vf "subtitles=subs.srt:force_style='Alignment=8,MarginV=20'" \ output.mp4 # Left-aligned subtitles ffmpeg -i video.mp4 \ -vf "subtitles=subs.srt:force_style='Alignment=1,MarginL=50,MarginV=30'" \ output.mp4 # Right side for speaker identification ffmpeg -i video.mp4 \ -vf "subtitles=subs.srt:force_style='Alignment=3,MarginR=50'" \ output.mp4
Multiple Subtitle Positions
# Two subtitle tracks at different positions ffmpeg -i video.mp4 \ -vf "[in]subtitles=speaker1.srt:force_style='Alignment=1,MarginL=50'[tmp];\ [tmp]subtitles=speaker2.srt:force_style='Alignment=3,MarginR=50'" \ output.mp4
Batch Processing
Burn Subtitles to Multiple Videos
#!/bin/bash # burn_subs_batch.sh for video in *.mp4; do base="${video%.mp4}" if [ -f "${base}.srt" ]; then ffmpeg -i "$video" \ -vf "subtitles=${base}.srt:force_style='FontSize=24,Outline=2'" \ -c:a copy \ "output/${base}_subbed.mp4" fi done
Convert Subtitle Format Batch
#!/bin/bash # convert_subs.sh for srt in *.srt; do base="${srt%.srt}" ffmpeg -i "$srt" "${base}.vtt" done
Troubleshooting
Common Issues
"Unable to find a suitable output format"
# Specify output format explicitly ffmpeg -i video.mkv -map 0:s:0 -f srt output.srt
Garbled characters in subtitles
# Force UTF-8 encoding ffmpeg -i video.mp4 \ -vf "subtitles=subs.srt:charenc=UTF-8" \ output.mp4
Font not found
# Specify fonts directory ffmpeg -i video.mp4 \ -vf "subtitles=subs.srt:fontsdir=/path/to/fonts" \ output.mp4 # List available fonts fc-list : family | sort | uniq
Subtitle timing offset
# Delay subtitles by 2 seconds ffmpeg -i video.mp4 \ -vf "subtitles=subs.srt:itsoffset=2" \ output.mp4 # Or use setpts to adjust ffmpeg -i subs.srt -itsoffset 2 delayed.srt
Subtitle not showing on high-res video
# Scale subtitle rendering to video resolution ffmpeg -i video_4k.mp4 \ -vf "subtitles=subs.srt:force_style='FontSize=48,Outline=3'" \ output.mp4
Verification Commands
# Check if subtitles are present ffprobe -v error -select_streams s -show_entries stream=codec_name -of default=nw=1 video.mp4 # Count subtitle lines grep -c "^[0-9]" subs.srt # Validate SRT format ffmpeg -i subs.srt -f null -
Best Practices
Accessibility
- Use high contrast colors (white on dark, yellow on dark)
- Minimum font size of 24pt for standard video
- Include speaker identification for multiple speakers
- Position subtitles to avoid obscuring important visual content
- Keep lines short (42 characters max per line)
- Display duration: minimum 1 second, maximum 7 seconds per caption
Quality
- Use ASS format for styled subtitles (anime, music videos)
- Use SRT for simple dialogue
- Use WebVTT for web delivery
- Preserve original styling when possible
- Test on target devices before distribution
Performance
- Burn subtitles only when necessary (streaming, compatibility)
- Prefer soft subs for archival and flexibility
- Use hardware encoding when burning subtitles to large files
- Process in parallel for batch operations
This guide covers FFmpeg subtitle and caption operations. For text overlays with shapes and graphics, see the shapes-graphics skill.