Claude-skill-registry ffmpeg-python-integration-reference

Authoritative Python-FFmpeg parameter integration reference ensuring type safety, accurate parameter mappings, and proper unit conversions. PROACTIVELY activate for: (1) ffmpeg-python library usage, (2) Python subprocess FFmpeg calls, (3) Caption/subtitle parameter mapping (drawtext, ASS), (4) Color format conversions (BGR, RGB, ABGR, ASS &HAABBGGRR), (5) Time unit conversions (seconds, centiseconds, milliseconds), (6) Type safety validation (int, float, string), (7) Coordinate systems, (8) Parameter range enforcement, (9) Frame pipe handling, (10) Error detection for type mismatches. Provides: Complete parameter type reference, color format conversion tables, time unit conversion formulas, validation patterns, working Python examples with proper typing.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/ffmpeg-python-integration-reference" ~/.claude/skills/majiayu000-claude-skill-registry-ffmpeg-python-integration-reference && rm -rf "$T"
manifest: skills/data/ffmpeg-python-integration-reference/SKILL.md
source content

CRITICAL GUIDELINES

Windows File Path Requirements

MANDATORY: Always Use Backslashes on Windows for File Paths

When using Edit or Write tools on Windows, you MUST use backslashes (

\
) in file paths, NOT forward slashes (
/
).


Python-FFmpeg Integration Reference (2025-2026)

Complete type-safe parameter mapping reference for integrating FFmpeg with Python.

Quick Reference

FFmpeg ParameterPython TypeRange/FormatCommon Mistake
-crf
int
or
str
0-51 (H.264/H.265)Using float:
crf=18.5
-b:v
str
"5M", "1000k"Using int:
b:v=5000000
fontsize
int
1-999Using str:
fontsize="24"
fontcolor
str
"white", "#FFFFFF", "0xFFFFFF"Wrong format:
fontcolor="255,255,255"
ASS PrimaryColour
str
"&HAABBGGRR"Using RGB:
&H00FFFFFF
❌ (should be BGR)
alpha
str
(expression)
"0.5", "min(1,t/2)"Using float:
alpha=0.5
x
,
y
str
or
int
"100", "(w-tw)/2"Forgetting quotes on expressions ❌

Section 1: Color Format Conversions

Critical: FFmpeg Color Format Landscape

ContextFormatByte OrderExamplePython Type
FFmpeg drawtextNamed/HexRGB"white", "#FFFFFF", "0xFFFFFF"
str
ASS PrimaryColour&HAABBGGRRABGR
&H00FFFFFF
(white)
str
ASS OutlineColour&HAABBGGRRABGR
&H00000000
(black)
str
OpenCV cv2ArrayBGR
[255, 255, 255]
np.ndarray
PIL/PillowTuple/HexRGB
(255, 255, 255)
, "#FFFFFF"
tuple
or
str
NumPyArrayRGB or BGRDepends on source
np.ndarray

Color Conversion Functions (Python)

from typing import Tuple

def rgb_to_bgr_hex(r: int, g: int, b: int) -> str:
    """
    Convert RGB (0-255) to BGR hex string.
    Used for OpenCV color specifications.

    Args:
        r, g, b: Red, Green, Blue (0-255)

    Returns:
        Hex string in BGR order: "0xBBGGRR"
    """
    return f"0x{b:02X}{g:02X}{r:02X}"

def rgb_to_ass_color(r: int, g: int, b: int, alpha: int = 0) -> str:
    """
    Convert RGB to ASS/SSA color format (&HAABBGGRR).

    CRITICAL: ASS uses BGR order, not RGB!

    Args:
        r, g, b: Red, Green, Blue (0-255)
        alpha: Alpha channel (0=opaque, 255=transparent)

    Returns:
        ASS color string: "&HAABBGGRR"

    Examples:
        >>> rgb_to_ass_color(255, 255, 255)  # White
        '&H00FFFFFF'
        >>> rgb_to_ass_color(255, 0, 0)      # Red
        '&H000000FF'
        >>> rgb_to_ass_color(0, 255, 0)      # Green
        '&H0000FF00'
        >>> rgb_to_ass_color(0, 0, 255)      # Blue
        '&H00FF0000'
    """
    return f"&H{alpha:02X}{b:02X}{g:02X}{r:02X}"

def ass_color_to_rgb(ass_color: str) -> Tuple[int, int, int, int]:
    """
    Parse ASS color format (&HAABBGGRR) to RGBA.

    Args:
        ass_color: ASS color string like "&H00FFFFFF"

    Returns:
        Tuple (r, g, b, alpha)
    """
    # Remove &H prefix
    hex_val = ass_color.replace("&H", "").replace("&h", "")

    # Pad to 8 characters if needed (some formats omit alpha)
    hex_val = hex_val.zfill(8)

    # Extract AABBGGRR
    alpha = int(hex_val[0:2], 16)
    blue = int(hex_val[2:4], 16)
    green = int(hex_val[4:6], 16)
    red = int(hex_val[6:8], 16)

    return (red, green, blue, alpha)

def ffmpeg_color_to_rgb(color: str) -> Tuple[int, int, int]:
    """
    Parse FFmpeg named or hex color to RGB.

    Args:
        color: Named color ("white") or hex ("#FFFFFF", "0xFFFFFF")

    Returns:
        Tuple (r, g, b)
    """
    # Named colors (subset)
    named_colors = {
        "white": (255, 255, 255),
        "black": (0, 0, 0),
        "red": (255, 0, 0),
        "green": (0, 255, 0),
        "blue": (0, 0, 255),
        "yellow": (255, 255, 0),
        "cyan": (0, 255, 255),
        "magenta": (255, 0, 255),
    }

    if color.lower() in named_colors:
        return named_colors[color.lower()]

    # Parse hex
    hex_val = color.replace("#", "").replace("0x", "")
    r = int(hex_val[0:2], 16)
    g = int(hex_val[2:4], 16)
    b = int(hex_val[4:6], 16)

    return (r, g, b)

# Common color presets (ASS format)
ASS_COLORS = {
    "white": "&H00FFFFFF",
    "black": "&H00000000",
    "red": "&H000000FF",
    "green": "&H0000FF00",
    "blue": "&H00FF0000",
    "yellow": "&H0000FFFF",
    "cyan": "&H00FFFF00",
    "magenta": "&H00FF00FF",
    "orange": "&H0000A5FF",  # RGB(255, 165, 0)
    "purple": "&H00800080",  # RGB(128, 0, 128)
}

# Transparency examples (ASS alpha channel)
ASS_ALPHA = {
    "opaque": 0x00,        # Fully opaque (0%)
    "transparent_10": 0x1A,  # 10% transparent
    "transparent_25": 0x40,  # 25% transparent
    "transparent_50": 0x80,  # 50% transparent (common for shadows)
    "transparent_75": 0xBF,  # 75% transparent
    "invisible": 0xFF,     # Fully transparent (100%)
}

Color Format Comparison Chart

ColorRGBHex (RGB)ASS (&HAABBGGRR)OpenCV BGR Array
White(255,255,255)#FFFFFF&H00FFFFFF[255,255,255]
Black(0,0,0)#000000&H00000000[0,0,0]
Red(255,0,0)#FF0000&H000000FF[0,0,255]
Green(0,255,0)#00FF00&H0000FF00[0,255,0]
Blue(0,0,255)#0000FF&H00FF0000[255,0,0]
Yellow(255,255,0)#FFFF00&H0000FFFF[0,255,255]
Cyan(0,255,255)#00FFFF&H00FFFF00[255,255,0]
Magenta(255,0,255)#FF00FF&H00FF00FF[255,0,255]
Orange(255,165,0)#FFA500&H0000A5FF[0,165,255]

Section 2: Time Unit Conversions

Critical: Three Different Time Systems

ContextUnitPython TypeConversion FormulaExample
FFmpeg filters (fade, xfade)Seconds
float
or
int
N/A
duration=1.5
ASS karaoke (\k, \kf, \ko)Centiseconds
int
cs = seconds * 100
{\k50}
= 0.5s
ASS animation (\t, \fad, \move)Milliseconds
int
ms = seconds * 1000
\t(0,500,...)
= 0.5s

Time Conversion Functions

from typing import Union

def seconds_to_centiseconds(seconds: float) -> int:
    """
    Convert seconds to centiseconds for ASS karaoke tags.

    Args:
        seconds: Time in seconds

    Returns:
        Centiseconds (1/100 second)

    Examples:
        >>> seconds_to_centiseconds(0.5)
        50
        >>> seconds_to_centiseconds(1.0)
        100
        >>> seconds_to_centiseconds(2.5)
        250
    """
    return int(seconds * 100)

def seconds_to_milliseconds(seconds: float) -> int:
    """
    Convert seconds to milliseconds for ASS animation tags.

    Args:
        seconds: Time in seconds

    Returns:
        Milliseconds (1/1000 second)

    Examples:
        >>> seconds_to_milliseconds(0.5)
        500
        >>> seconds_to_milliseconds(1.0)
        1000
        >>> seconds_to_milliseconds(0.2)
        200
    """
    return int(seconds * 1000)

def centiseconds_to_seconds(centiseconds: int) -> float:
    """
    Convert ASS karaoke centiseconds to seconds.

    Args:
        centiseconds: Duration in centiseconds

    Returns:
        Seconds
    """
    return centiseconds / 100.0

def milliseconds_to_seconds(milliseconds: int) -> float:
    """
    Convert ASS animation milliseconds to seconds.

    Args:
        milliseconds: Duration in milliseconds

    Returns:
        Seconds
    """
    return milliseconds / 1000.0

def format_ass_timestamp(seconds: float) -> str:
    """
    Format seconds as ASS timestamp (H:MM:SS.CC).

    Args:
        seconds: Time in seconds

    Returns:
        ASS timestamp string

    Examples:
        >>> format_ass_timestamp(1.5)
        '0:00:01.50'
        >>> format_ass_timestamp(65.25)
        '0:01:05.25'
        >>> format_ass_timestamp(3661.0)
        '1:01:01.00'
    """
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    secs = int(seconds % 60)
    centis = int((seconds % 1) * 100)

    return f"{hours}:{minutes:02d}:{secs:02d}.{centis:02d}"

def parse_ass_timestamp(timestamp: str) -> float:
    """
    Parse ASS timestamp to seconds.

    Args:
        timestamp: ASS timestamp like "0:00:05.50"

    Returns:
        Time in seconds

    Examples:
        >>> parse_ass_timestamp("0:00:01.50")
        1.5
        >>> parse_ass_timestamp("0:01:05.25")
        65.25
    """
    parts = timestamp.split(":")
    hours = int(parts[0])
    minutes = int(parts[1])
    sec_parts = parts[2].split(".")
    seconds = int(sec_parts[0])
    centiseconds = int(sec_parts[1]) if len(sec_parts) > 1 else 0

    total = hours * 3600 + minutes * 60 + seconds + centiseconds / 100.0
    return total

# Quick conversion constants
SECOND_TO_CS = 100      # Centiseconds per second
SECOND_TO_MS = 1000     # Milliseconds per second
CS_TO_MS = 10           # Milliseconds per centisecond

Common Duration Mappings

Human ReadableSecondsCentiseconds (ASS \k)Milliseconds (ASS \t)
100ms (flash)0.110100
250ms (quick)0.2525250
500ms (half second)0.550500
1 second1.01001000
1.5 seconds1.51501500
2 seconds2.02002000

Section 3: FFmpeg drawtext Parameters

Complete Parameter Type Reference

Text Content

ParameterPython TypeDescriptionExample
text
str
Text to display
text='Hello World'
textfile
str
(path)
Path to text file
textfile='/path/to/text.txt'

Font Parameters

ParameterPython TypeRange/FormatValidationExample
fontfile
str
(path)
Absolute path to .ttf/.otfFile exists
fontfile='/fonts/Arial.ttf'
fontsize
int
1-999 (practical: 12-200)
1 <= size <= 999
fontsize=48
fontcolor
str
Named or hex RGBValid color
fontcolor='white'
fontcolor_expr
str
(expression)
Dynamic color expressionValid FFmpeg expr
fontcolor_expr='0xFFFFFF'

CRITICAL:

fontsize
MUST be
int
, not
str
. Common error:

# ❌ WRONG:
drawtext(fontsize="24")

# ✅ CORRECT:
drawtext(fontsize=24)

Position Parameters

ParameterPython TypeFormatExample
x
str
or
int
Pixel value or expression
x=10
or
x='(w-tw)/2'
y
str
or
int
Pixel value or expression
y=50
or
y='h-th-20'

Position Expression Variables:

  • w
    : Video width (pixels)
  • h
    : Video height (pixels)
  • tw
    : Text width (pixels)
  • th
    : Text height (pixels)
  • t
    : Time in seconds
# Static position (int)
x=100, y=50

# Dynamic position (str expression)
x='(w-tw)/2'  # Centered horizontally
y='h-th-20'   # 20px from bottom

# Time-based animation (str expression)
x='w-mod(t*100,w+tw)'  # Scrolling ticker

Styling Parameters

ParameterPython TypeRangeExample
borderw
int
0-20 (practical)
borderw=3
bordercolor
str
Named or hex RGB
bordercolor='black'
shadowx
int
-50 to 50 (practical)
shadowx=2
shadowy
int
-50 to 50 (practical)
shadowy=2
shadowcolor
str
Named or hex RGB
shadowcolor='black'
box
int
0 (off) or 1 (on)
box=1
boxcolor
str
Named/hex + alpha
boxcolor='black@0.5'
boxborderw
int
0-50
boxborderw=5

Alpha Transparency in Colors:

# Format: "color@opacity"
# opacity: 0.0 (transparent) to 1.0 (opaque)

boxcolor='black@0.5'      # 50% transparent black
shadowcolor='red@0.3'     # 30% opaque red
fontcolor='white@0.8'     # 80% opaque white

Timing Parameters

ParameterPython TypeFormatExample
enable
str
(expression)
Boolean expression
enable='gte(t,2)'
alpha
str
(expression)
0.0-1.0
alpha='min(1,t/2)'

Timing Expressions:

# Show after 2 seconds
enable='gte(t,2)'

# Show between 2-5 seconds
enable='between(t,2,5)'

# Fade in over 1 second
alpha='min(1,t)'

# Fade out over last 2 seconds (10s video)
alpha='if(gt(t,8),1-(t-8)/2,1)'

Section 4: ASS/SSA Subtitle Parameters

ASS Style Definition

from typing import NamedTuple

class ASSStyle(NamedTuple):
    """Type-safe ASS style definition."""
    name: str
    fontname: str
    fontsize: int
    primary_colour: str      # &HAABBGGRR format
    secondary_colour: str    # &HAABBGGRR format
    outline_colour: str      # &HAABBGGRR format
    back_colour: str         # &HAABBGGRR format (shadow)
    bold: int                # 0 or -1 (FFmpeg quirk: -1 for bold)
    italic: int              # 0 or 1
    underline: int           # 0 or 1
    strikeout: int           # 0 or 1
    scale_x: int             # Percentage (100 = normal)
    scale_y: int             # Percentage (100 = normal)
    spacing: int             # Letter spacing in pixels
    angle: float             # Rotation angle in degrees
    border_style: int        # 1 (outline) or 3 (opaque box)
    outline: float           # Outline width (0.0-4.0)
    shadow: float            # Shadow distance (0.0-4.0)
    alignment: int           # Numpad alignment (1-9)
    margin_l: int            # Left margin (pixels)
    margin_r: int            # Right margin (pixels)
    margin_v: int            # Vertical margin (pixels)
    encoding: int            # Character encoding (1=UTF-8)

def ass_style_to_string(style: ASSStyle) -> str:
    """
    Convert ASSStyle to ASS format string.

    Returns:
        ASS Style line
    """
    return (
        f"Style: {style.name},"
        f"{style.fontname},{style.fontsize},"
        f"{style.primary_colour},{style.secondary_colour},"
        f"{style.outline_colour},{style.back_colour},"
        f"{style.bold},{style.italic},{style.underline},{style.strikeout},"
        f"{style.scale_x},{style.scale_y},{style.spacing},{style.angle},"
        f"{style.border_style},{style.outline},{style.shadow},"
        f"{style.alignment},"
        f"{style.margin_l},{style.margin_r},{style.margin_v},"
        f"{style.encoding}"
    )

# Example usage
karaoke_style = ASSStyle(
    name="Karaoke",
    fontname="Arial Black",
    fontsize=72,
    primary_colour="&H00FFFFFF",  # White text (unhighlighted)
    secondary_colour="&H000000FF",  # Red text (highlighted)
    outline_colour="&H00000000",  # Black outline
    back_colour="&H80000000",     # 50% transparent black shadow
    bold=-1,                      # Bold (FFmpeg uses -1)
    italic=0,
    underline=0,
    strikeout=0,
    scale_x=100,
    scale_y=100,
    spacing=0,
    angle=0.0,
    border_style=1,               # Outline + shadow
    outline=3.0,                  # 3px outline
    shadow=2.0,                   # 2px shadow
    alignment=2,                  # Bottom center
    margin_l=10,
    margin_r=10,
    margin_v=50,                  # 50px from bottom
    encoding=1                    # UTF-8
)

print(ass_style_to_string(karaoke_style))

ASS Parameter Ranges and Types

ParameterPython TypeRangeUnitNotes
fontsize
int
1-999PointsScreen-relative
primary_colour
str
&H00000000 - &HFFFFFFFFABGR hexText color
secondary_colour
str
&H00000000 - &HFFFFFFFFABGR hexKaraoke fill
outline_colour
str
&H00000000 - &HFFFFFFFFABGR hexBorder color
back_colour
str
&H00000000 - &HFFFFFFFFABGR hexShadow color
bold
int
-1 (on), 0 (off)BooleanFFmpeg quirk: -1 for bold
italic
int
0, 1BooleanStandard
scale_x
,
scale_y
int
1-1000Percentage100 = normal
outline
float
0.0-4.0PixelsBorder width
shadow
float
0.0-4.0PixelsShadow offset
alignment
int
1-9NumpadSee alignment chart

ASS Alignment (Numpad Layout)

7 (top-left)      8 (top-center)      9 (top-right)
4 (middle-left)   5 (middle-center)   6 (middle-right)
1 (bottom-left)   2 (bottom-center)   3 (bottom-right)
ASS_ALIGNMENT = {
    "bottom_left": 1,
    "bottom_center": 2,
    "bottom_right": 3,
    "middle_left": 4,
    "middle_center": 5,
    "middle_right": 6,
    "top_left": 7,
    "top_center": 8,
    "top_right": 9,
}

Section 5: ASS Karaoke Tags

Karaoke Tag Reference

TagNameUnitPython TypeRangeEffect
\k
KaraokeCentiseconds
int
0-9999Instant highlight
\kf
/
\K
Karaoke FillCentiseconds
int
0-9999Progressive fill
\ko
Karaoke OutlineCentiseconds
int
0-9999Outline sweep
def generate_karaoke_line(
    words: list[str],
    durations: list[float],  # In SECONDS
    style: str = "Karaoke"
) -> str:
    """
    Generate ASS karaoke dialogue line.

    Args:
        words: List of words/syllables
        durations: Duration for each word IN SECONDS
        style: ASS style name

    Returns:
        ASS dialogue line with karaoke tags

    Example:
        >>> generate_karaoke_line(
        ...     ["Hello", "world"],
        ...     [0.5, 0.6]
        ... )
        '{\\k50}Hello {\\k60}world'
    """
    # Convert seconds to centiseconds
    karaoke_tags = []
    for word, duration_sec in zip(words, durations):
        cs = int(duration_sec * 100)  # Centiseconds
        karaoke_tags.append(f"{{\\k{cs}}}{word}")

    return " ".join(karaoke_tags)

# Example usage
words = ["Never", "gonna", "give", "you", "up"]
durations = [0.8, 0.6, 0.6, 0.5, 0.7]  # seconds

karaoke_text = generate_karaoke_line(words, durations)
print(karaoke_text)
# Output: {\k80}Never {\k60}gonna {\k60}give {\k50}you {\k70}up

Section 6: ASS Animation Tags

Animation Tag Reference

TagFormatUnitExampleDescription
\t
\t(t1,t2,tags)
Milliseconds
\t(0,500,\fscx120)
Animate over time
\fad
\fad(in,out)
Milliseconds
\fad(300,200)
Fade in/out
\move
\move(x1,y1,x2,y2,t1,t2)
Milliseconds
\move(0,0,100,100,0,1000)
Move position
\fscx
,
\fscy
\fscxN
,
\fscyN
Percentage
\fscx120\fscy120
Scale X/Y
\frz
\frzN
Degrees
\frz45
Rotate (Z-axis)
\c
,
\1c
\c&HBBGGRR&
ABGR hex
\c&HFF0000&
Primary color
\3c
\3c&HBBGGRR&
ABGR hex
\3c&H000000&
Outline color
\4c
\4c&HBBGGRR&
ABGR hex
\4c&H808080&
Shadow color
from typing import List, Tuple

def create_scale_animation(
    duration_ms: int,
    start_scale: int = 80,
    peak_scale: int = 120,
    end_scale: int = 100
) -> str:
    """
    Create bounce scale animation (pop effect).

    Args:
        duration_ms: Total animation duration in MILLISECONDS
        start_scale: Initial scale percentage
        peak_scale: Peak scale (overshoot)
        end_scale: Final settled scale

    Returns:
        ASS animation tags

    Example:
        >>> create_scale_animation(400)
        '\\fscx80\\fscy80\\t(0,150,\\fscx120\\fscy120)\\t(150,300,\\fscx95\\fscy95)\\t(300,400,\\fscx100\\fscy100)'
    """
    t1 = int(duration_ms * 0.375)  # 37.5% to peak
    t2 = int(duration_ms * 0.75)   # 75% to settle
    t3 = duration_ms

    mid_scale = int((peak_scale + end_scale) / 2) - 5

    return (
        f"\\fscx{start_scale}\\fscy{start_scale}"
        f"\\t(0,{t1},\\fscx{peak_scale}\\fscy{peak_scale})"
        f"\\t({t1},{t2},\\fscx{mid_scale}\\fscy{mid_scale})"
        f"\\t({t2},{t3},\\fscx{end_scale}\\fscy{end_scale})"
    )

def create_fade_animation(
    fade_in_ms: int,
    fade_out_ms: int = 0
) -> str:
    """
    Create fade in/out animation.

    Args:
        fade_in_ms: Fade in duration in MILLISECONDS
        fade_out_ms: Fade out duration in MILLISECONDS (0 = no fade out)

    Returns:
        ASS fade tag

    Example:
        >>> create_fade_animation(300, 200)
        '\\fad(300,200)'
    """
    return f"\\fad({fade_in_ms},{fade_out_ms})"

def create_color_transition(
    start_color_rgb: Tuple[int, int, int],
    end_color_rgb: Tuple[int, int, int],
    duration_ms: int
) -> str:
    """
    Create smooth color transition animation.

    Args:
        start_color_rgb: Starting RGB color
        end_color_rgb: Ending RGB color
        duration_ms: Transition duration in MILLISECONDS

    Returns:
        ASS animation tags

    Example:
        >>> create_color_transition((255,255,255), (255,0,0), 1000)
        '\\c&H00FFFFFF&\\t(0,1000,\\c&H000000FF&)'
    """
    start_ass = rgb_to_ass_color(*start_color_rgb)[:-2] + "&"  # Remove last 2 chars, add &
    end_ass = rgb_to_ass_color(*end_color_rgb)[:-2] + "&"

    return f"\\c{start_ass}\\t(0,{duration_ms},\\c{end_ass})"

# Example: Complete animated karaoke line
def create_animated_karaoke_word(
    word: str,
    karaoke_duration_sec: float,
    pop_animation: bool = True
) -> str:
    """
    Create word with karaoke + pop animation.

    Args:
        word: Word text
        karaoke_duration_sec: Karaoke fill duration in SECONDS
        pop_animation: Add scale pop effect

    Returns:
        ASS karaoke word with animations
    """
    karaoke_cs = int(karaoke_duration_sec * 100)  # Centiseconds
    karaoke_ms = int(karaoke_duration_sec * 1000)  # Milliseconds

    if pop_animation:
        pop = create_scale_animation(karaoke_ms, 90, 115, 100)
        return f"{{\\k{karaoke_cs}{pop}}}{word}"
    else:
        return f"{{\\k{karaoke_cs}}}{word}"

# Usage
animated_line = " ".join([
    create_animated_karaoke_word("Never", 0.8, True),
    create_animated_karaoke_word("gonna", 0.6, True),
    create_animated_karaoke_word("give", 0.6, True),
    create_animated_karaoke_word("you", 0.5, True),
    create_animated_karaoke_word("up", 0.7, True),
])

Section 7: ffmpeg-python Library Integration

Type-Safe Filter Application

import ffmpeg
from typing import Optional, Union

def apply_drawtext_filter(
    input_stream,
    text: str,
    fontsize: int,
    fontcolor: str = "white",
    x: Union[str, int] = 10,
    y: Union[str, int] = 10,
    fontfile: Optional[str] = None,
    borderw: int = 0,
    bordercolor: str = "black",
    shadowx: int = 0,
    shadowy: int = 0,
    shadowcolor: str = "black",
    box: int = 0,
    boxcolor: str = "black@0.5",
    boxborderw: int = 0,
    enable: Optional[str] = None,
    alpha: Optional[str] = None
):
    """
    Apply drawtext filter with type safety.

    Args:
        input_stream: ffmpeg input stream
        text: Text to display
        fontsize: Font size in points (int, 1-999)
        fontcolor: Color name or hex string
        x: X position (int or expression string)
        y: Y position (int or expression string)
        fontfile: Path to font file (optional)
        borderw: Border width (int, 0-20)
        bordercolor: Border color
        shadowx: Shadow X offset (int)
        shadowy: Shadow Y offset (int)
        shadowcolor: Shadow color
        box: Enable background box (0 or 1)
        boxcolor: Box color with alpha (e.g., "black@0.5")
        boxborderw: Box border width
        enable: Enable expression (e.g., "gte(t,2)")
        alpha: Alpha expression (e.g., "min(1,t)")

    Returns:
        ffmpeg stream with drawtext filter applied

    Raises:
        TypeError: If parameters have incorrect types
        ValueError: If parameters are out of valid range
    """
    # Type validation
    if not isinstance(fontsize, int):
        raise TypeError(f"fontsize must be int, got {type(fontsize).__name__}")

    if not (1 <= fontsize <= 999):
        raise ValueError(f"fontsize must be 1-999, got {fontsize}")

    if not isinstance(borderw, int) or borderw < 0:
        raise ValueError(f"borderw must be non-negative int, got {borderw}")

    if not isinstance(box, int) or box not in (0, 1):
        raise ValueError(f"box must be 0 or 1, got {box}")

    # Build filter parameters
    params = {
        "text": text,
        "fontsize": fontsize,
        "fontcolor": fontcolor,
        "x": x,
        "y": y,
        "borderw": borderw,
        "bordercolor": bordercolor,
        "box": box,
    }

    # Optional parameters
    if fontfile:
        params["fontfile"] = fontfile

    if shadowx != 0:
        params["shadowx"] = shadowx

    if shadowy != 0:
        params["shadowy"] = shadowy
        params["shadowcolor"] = shadowcolor

    if box == 1:
        params["boxcolor"] = boxcolor
        if boxborderw > 0:
            params["boxborderw"] = boxborderw

    if enable:
        params["enable"] = enable

    if alpha:
        params["alpha"] = alpha

    return input_stream.drawtext(**params)

# Example usage
input_file = ffmpeg.input("input.mp4")
output = apply_drawtext_filter(
    input_file,
    text="Hello World",
    fontsize=48,
    fontcolor="white",
    x="(w-tw)/2",
    y="(h-th)/2",
    borderw=2,
    bordercolor="black",
    shadowx=2,
    shadowy=2,
    enable="between(t,1,5)"
)
output = ffmpeg.output(output, "output.mp4")
ffmpeg.run(output)

Complete Audio/Video Filter Example

import ffmpeg
from pathlib import Path

def add_subtitles_with_audio(
    input_video: str,
    output_video: str,
    subtitle_text: str,
    fontsize: int = 48,
    crf: int = 18,
    audio_codec: str = "aac",
    audio_bitrate: str = "192k"
):
    """
    Add burned-in subtitles while preserving audio.

    CRITICAL: Always explicitly handle audio stream to prevent loss.

    Args:
        input_video: Input video path
        output_video: Output video path
        subtitle_text: Text to display
        fontsize: Font size (int)
        crf: Constant Rate Factor for H.264 (int, 0-51)
        audio_codec: Audio codec (default: "aac")
        audio_bitrate: Audio bitrate (default: "192k")
    """
    # Input
    input_file = ffmpeg.input(input_video)

    # Video processing
    video = input_file.video.drawtext(
        text=subtitle_text,
        fontsize=fontsize,
        fontcolor="white",
        x="(w-tw)/2",
        y="h-th-50",
        borderw=2,
        bordercolor="black"
    )

    # Audio passthrough (CRITICAL)
    audio = input_file.audio

    # Output with both streams
    output = ffmpeg.output(
        video,
        audio,
        output_video,
        vcodec="libx264",
        crf=crf,
        acodec=audio_codec,
        audio_bitrate=audio_bitrate
    )

    # Run
    output = output.overwrite_output()
    ffmpeg.run(output)

Section 8: Subprocess Pattern with Pipes

Frame-by-Frame Processing

import subprocess
import numpy as np
from typing import Generator, Tuple

def read_video_frames(
    input_path: str,
    width: int,
    height: int,
    pix_fmt: str = "rgb24"
) -> Generator[np.ndarray, None, None]:
    """
    Read video frames using FFmpeg subprocess.

    Args:
        input_path: Input video file path
        width: Frame width
        height: Frame height
        pix_fmt: Pixel format ("rgb24" or "bgr24")

    Yields:
        NumPy array frames (height, width, 3)

    Example:
        >>> for frame in read_video_frames("input.mp4", 1920, 1080):
        ...     # Process frame (RGB format)
        ...     processed = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
    """
    # FFmpeg command
    cmd = [
        "ffmpeg",
        "-i", input_path,
        "-f", "rawvideo",
        "-pix_fmt", pix_fmt,
        "-"  # Output to stdout
    ]

    process = subprocess.Popen(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.DEVNULL,
        bufsize=10**8
    )

    frame_size = width * height * 3

    try:
        while True:
            raw_frame = process.stdout.read(frame_size)
            if len(raw_frame) != frame_size:
                break

            # Convert to NumPy array
            frame = np.frombuffer(raw_frame, dtype=np.uint8)
            frame = frame.reshape((height, width, 3))

            yield frame
    finally:
        process.stdout.close()
        process.wait()

def write_video_frames(
    output_path: str,
    width: int,
    height: int,
    fps: float = 30.0,
    pix_fmt: str = "rgb24",
    crf: int = 18
) -> subprocess.Popen:
    """
    Create FFmpeg process for writing frames.

    Args:
        output_path: Output video file path
        width: Frame width
        height: Frame height
        fps: Frames per second
        pix_fmt: Pixel format ("rgb24" or "bgr24")
        crf: Constant Rate Factor (0-51)

    Returns:
        subprocess.Popen instance (write frames to .stdin)

    Example:
        >>> writer = write_video_frames("output.mp4", 1920, 1080, 30)
        >>> for frame in frames:
        ...     writer.stdin.write(frame.tobytes())
        >>> writer.stdin.close()
        >>> writer.wait()
    """
    cmd = [
        "ffmpeg",
        "-y",  # Overwrite output
        "-f", "rawvideo",
        "-vcodec", "rawvideo",
        "-s", f"{width}x{height}",
        "-pix_fmt", pix_fmt,
        "-r", str(fps),
        "-i", "-",  # Read from stdin
        "-c:v", "libx264",
        "-preset", "fast",
        "-crf", str(crf),
        "-pix_fmt", "yuv420p",
        output_path
    ]

    process = subprocess.Popen(
        cmd,
        stdin=subprocess.PIPE,
        stderr=subprocess.DEVNULL
    )

    return process

# Complete pipeline example
def process_video_frames(
    input_path: str,
    output_path: str,
    process_fn: callable
):
    """
    Read, process, and write video frames.

    Args:
        input_path: Input video
        output_path: Output video
        process_fn: Function to process each frame
    """
    # Get video dimensions
    import cv2
    cap = cv2.VideoCapture(input_path)
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = cap.get(cv2.CAP_PROP_FPS)
    cap.release()

    # Create writer
    writer = write_video_frames(output_path, width, height, fps)

    try:
        # Process frames
        for frame in read_video_frames(input_path, width, height):
            processed = process_fn(frame)
            writer.stdin.write(processed.tobytes())
    finally:
        writer.stdin.close()
        writer.wait()

Section 9: Common Pitfalls and Solutions

Pitfall 1: String vs Int for Numeric Parameters

# ❌ WRONG - passing string for int parameter
import ffmpeg
ffmpeg.input("input.mp4").drawtext(
    text="Hello",
    fontsize="24"  # ❌ Should be int
).output("output.mp4")

# ✅ CORRECT
ffmpeg.input("input.mp4").drawtext(
    text="Hello",
    fontsize=24  # ✅ int
).output("output.mp4")

Pitfall 2: RGB vs BGR Color Order

# ❌ WRONG - using RGB for ASS color
def wrong_ass_color(r: int, g: int, b: int) -> str:
    return f"&H00{r:02X}{g:02X}{b:02X}"  # ❌ RGB order

# ✅ CORRECT - BGR order for ASS
def correct_ass_color(r: int, g: int, b: int) -> str:
    return f"&H00{b:02X}{g:02X}{r:02X}"  # ✅ BGR order

# Example: Pure red
print(wrong_ass_color(255, 0, 0))   # ❌ "&H00FF0000" = Blue in ASS!
print(correct_ass_color(255, 0, 0))  # ✅ "&H000000FF" = Red in ASS

Pitfall 3: Centiseconds vs Milliseconds Confusion

# ❌ WRONG - mixing units
def wrong_karaoke_timing():
    # Karaoke tag uses centiseconds
    # Animation uses milliseconds - DIFFERENT!
    return r"{\k100\t(0,100,\fscx120)}Word"  # ❌ Mismatch!
    # \k100 = 1 second
    # \t(0,100,...) = 0.1 seconds (100ms)

# ✅ CORRECT - consistent timing
def correct_karaoke_timing(duration_sec: float):
    cs = int(duration_sec * 100)   # Karaoke: centiseconds
    ms = int(duration_sec * 1000)  # Animation: milliseconds
    return rf"{{\k{cs}\t(0,{ms},\fscx120)}}Word"

print(correct_karaoke_timing(1.0))
# Output: {\k100\t(0,1000,\fscx120)}Word
# Both tags now represent 1 second ✅

Pitfall 4: Forgetting Quotes on Expressions

# ❌ WRONG - expression without quotes
import ffmpeg
ffmpeg.input("input.mp4").drawtext(
    text="Hello",
    x=(w-tw)/2  # ❌ Python evaluates this, causes NameError
)

# ✅ CORRECT - expression as string
ffmpeg.input("input.mp4").drawtext(
    text="Hello",
    x="(w-tw)/2"  # ✅ FFmpeg evaluates this at runtime
)

Pitfall 5: Audio Stream Loss

# ❌ WRONG - audio is silently dropped
import ffmpeg
input_file = ffmpeg.input("input.mp4")
(
    input_file
    .filter("scale", 1280, 720)  # ❌ Only video stream
    .output("output.mp4")
    .run()
)

# ✅ CORRECT - explicitly handle audio
input_file = ffmpeg.input("input.mp4")
video = input_file.video.filter("scale", 1280, 720)
audio = input_file.audio  # ✅ Preserve audio
ffmpeg.output(video, audio, "output.mp4").run()

Pitfall 6: Incorrect ASS Bold Value

# ❌ WRONG - using 1 for bold (works in some renderers, not all)
ass_style = f"Style: Default,Arial,48,&H00FFFFFF,...,1,..."  # ❌ May not work

# ✅ CORRECT - FFmpeg expects -1 for bold
ass_style = f"Style: Default,Arial,48,&H00FFFFFF,...,-1,..."  # ✅ Proper bold

Section 10: Validation Helpers

from typing import Union
import re

def validate_fontsize(size: int) -> int:
    """Validate fontsize parameter."""
    if not isinstance(size, int):
        raise TypeError(f"fontsize must be int, got {type(size).__name__}")
    if not (1 <= size <= 999):
        raise ValueError(f"fontsize must be 1-999, got {size}")
    return size

def validate_crf(crf: int, codec: str = "h264") -> int:
    """Validate CRF parameter."""
    if not isinstance(crf, int):
        raise TypeError(f"crf must be int, got {type(crf).__name__}")

    ranges = {
        "h264": (0, 51),
        "h265": (0, 51),
        "hevc": (0, 51),
        "vp9": (0, 63),
        "av1": (0, 63),
    }

    min_crf, max_crf = ranges.get(codec, (0, 51))
    if not (min_crf <= crf <= max_crf):
        raise ValueError(f"crf for {codec} must be {min_crf}-{max_crf}, got {crf}")

    return crf

def validate_ass_color(color: str) -> str:
    """Validate ASS color format."""
    pattern = r"^&H[0-9A-Fa-f]{8}$"
    if not re.match(pattern, color):
        raise ValueError(f"Invalid ASS color format: {color} (expected &HAABBGGRR)")
    return color

def validate_alignment(alignment: int) -> int:
    """Validate ASS alignment (1-9 numpad)."""
    if not isinstance(alignment, int):
        raise TypeError(f"alignment must be int, got {type(alignment).__name__}")
    if not (1 <= alignment <= 9):
        raise ValueError(f"alignment must be 1-9, got {alignment}")
    return alignment

def validate_ffmpeg_color(color: str) -> str:
    """Validate FFmpeg color (named or hex)."""
    named_colors = {
        "white", "black", "red", "green", "blue",
        "yellow", "cyan", "magenta", "orange", "purple"
    }

    if color.lower() in named_colors:
        return color

    # Validate hex format
    hex_pattern = r"^(#|0x)?[0-9A-Fa-f]{6}$"
    if re.match(hex_pattern, color):
        return color

    raise ValueError(f"Invalid FFmpeg color: {color}")

def validate_time_expression(expr: str) -> str:
    """Validate FFmpeg time expression syntax."""
    # Basic validation - check for common mistakes
    if not isinstance(expr, str):
        raise TypeError(f"Time expression must be str, got {type(expr).__name__}")

    # Check for unquoted expressions in Python (common mistake)
    if "w" in expr or "h" in expr or "tw" in expr or "th" in expr:
        # Likely an expression - ensure it's a string
        if not isinstance(expr, str):
            raise TypeError("FFmpeg expressions must be strings")

    return expr

Section 11: Complete Working Examples

Example 1: Type-Safe Karaoke Generator

import ffmpeg
from typing import List, Tuple
from pathlib import Path

class KaraokeGenerator:
    """Type-safe karaoke subtitle generator."""

    def __init__(
        self,
        video_path: str,
        output_path: str,
        font_size: int = 72,
        font_name: str = "Arial Black",
        text_color_rgb: Tuple[int, int, int] = (255, 255, 255),
        highlight_color_rgb: Tuple[int, int, int] = (255, 0, 0)
    ):
        self.video_path = video_path
        self.output_path = output_path
        self.font_size = validate_fontsize(font_size)
        self.font_name = font_name
        self.text_color = rgb_to_ass_color(*text_color_rgb)
        self.highlight_color = rgb_to_ass_color(*highlight_color_rgb)

        self.lyrics: List[Tuple[float, List[Tuple[str, float]]]] = []

    def add_line(
        self,
        start_time: float,
        words: List[str],
        durations: List[float]
    ):
        """
        Add karaoke line.

        Args:
            start_time: Line start time in SECONDS
            words: List of words
            durations: Duration for each word in SECONDS
        """
        if len(words) != len(durations):
            raise ValueError("words and durations must have same length")

        self.lyrics.append((start_time, list(zip(words, durations))))

    def generate_ass(self) -> str:
        """Generate complete ASS subtitle file."""
        lines = [
            "[Script Info]",
            "Title: Karaoke Subtitles",
            "ScriptType: v4.00+",
            "PlayResX: 1920",
            "PlayResY: 1080",
            "",
            "[V4+ Styles]",
            "Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, "
            "OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, "
            "ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, "
            "Alignment, MarginL, MarginR, MarginV, Encoding",
            (
                f"Style: Karaoke,{self.font_name},{self.font_size},"
                f"{self.text_color},{self.highlight_color},"
                "&H00000000,&H80000000,"
                "-1,0,0,0,100,100,0,0,1,3,2,2,10,10,50,1"
            ),
            "",
            "[Events]",
            "Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text"
        ]

        # Generate dialogue lines
        for start_time, word_durations in self.lyrics:
            # Calculate end time
            total_duration = sum(dur for _, dur in word_durations)
            end_time = start_time + total_duration

            # Generate karaoke tags (centiseconds)
            karaoke_text = ""
            for word, duration_sec in word_durations:
                cs = seconds_to_centiseconds(duration_sec)
                karaoke_text += f"{{\\k{cs}}}{word} "

            karaoke_text = karaoke_text.strip()

            # Format timestamps
            start_str = format_ass_timestamp(start_time)
            end_str = format_ass_timestamp(end_time)

            dialogue = f"Dialogue: 0,{start_str},{end_str},Karaoke,,0,0,0,,{karaoke_text}"
            lines.append(dialogue)

        return "\n".join(lines)

    def render(self, crf: int = 18):
        """Render video with karaoke subtitles."""
        # Generate ASS file
        ass_content = self.generate_ass()
        ass_path = Path(self.output_path).with_suffix(".ass")
        ass_path.write_text(ass_content, encoding="utf-8")

        # Apply subtitles with ffmpeg-python
        input_file = ffmpeg.input(self.video_path)
        video = input_file.video.filter("ass", str(ass_path))
        audio = input_file.audio

        output = ffmpeg.output(
            video,
            audio,
            self.output_path,
            vcodec="libx264",
            crf=validate_crf(crf),
            acodec="aac"
        )

        ffmpeg.run(output.overwrite_output())

        print(f"✅ Karaoke video saved: {self.output_path}")

# Usage example
karaoke = KaraokeGenerator(
    "input.mp4",
    "karaoke_output.mp4",
    font_size=80,
    text_color_rgb=(255, 255, 255),  # White
    highlight_color_rgb=(255, 0, 0)   # Red
)

# Add lyrics
karaoke.add_line(
    start_time=1.0,
    words=["Never", "gonna", "give", "you", "up"],
    durations=[0.8, 0.6, 0.6, 0.5, 0.7]
)

karaoke.add_line(
    start_time=4.3,
    words=["Never", "gonna", "let", "you", "down"],
    durations=[0.8, 0.6, 0.6, 0.5, 0.7]
)

# Render
karaoke.render(crf=18)

Example 2: Dynamic Text Overlay with Type Safety

import ffmpeg
from typing import Optional

class TextOverlay:
    """Type-safe text overlay builder."""

    def __init__(self, text: str):
        self.text = text
        self.fontsize: int = 48
        self.fontcolor: str = "white"
        self.x: str = "10"
        self.y: str = "10"
        self.borderw: int = 0
        self.bordercolor: str = "black"
        self.shadowx: int = 0
        self.shadowy: int = 0
        self.enable: Optional[str] = None
        self.alpha: Optional[str] = None

    def set_size(self, size: int) -> 'TextOverlay':
        """Set font size (fluent API)."""
        self.fontsize = validate_fontsize(size)
        return self

    def set_color(self, color: str) -> 'TextOverlay':
        """Set font color (fluent API)."""
        self.fontcolor = validate_ffmpeg_color(color)
        return self

    def center(self) -> 'TextOverlay':
        """Center text horizontally and vertically."""
        self.x = "(w-tw)/2"
        self.y = "(h-th)/2"
        return self

    def set_border(self, width: int, color: str = "black") -> 'TextOverlay':
        """Add text border."""
        if not isinstance(width, int) or width < 0:
            raise ValueError(f"Border width must be non-negative int")
        self.borderw = width
        self.bordercolor = validate_ffmpeg_color(color)
        return self

    def set_shadow(self, x: int, y: int, color: str = "black") -> 'TextOverlay':
        """Add text shadow."""
        self.shadowx = x
        self.shadowy = y
        return self

    def fade_in(self, duration: float) -> 'TextOverlay':
        """Add fade-in effect."""
        self.alpha = f"min(1,t/{duration})"
        return self

    def show_between(self, start: float, end: float) -> 'TextOverlay':
        """Show text between specific times."""
        self.enable = f"between(t,{start},{end})"
        return self

    def apply(self, stream) -> ffmpeg.Stream:
        """Apply text overlay to ffmpeg stream."""
        params = {
            "text": self.text,
            "fontsize": self.fontsize,
            "fontcolor": self.fontcolor,
            "x": self.x,
            "y": self.y,
        }

        if self.borderw > 0:
            params["borderw"] = self.borderw
            params["bordercolor"] = self.bordercolor

        if self.shadowx != 0 or self.shadowy != 0:
            params["shadowx"] = self.shadowx
            params["shadowy"] = self.shadowy

        if self.enable:
            params["enable"] = self.enable

        if self.alpha:
            params["alpha"] = self.alpha

        return stream.drawtext(**params)

# Usage with fluent API
input_file = ffmpeg.input("input.mp4")

overlay = (
    TextOverlay("Hello World")
    .set_size(72)
    .set_color("white")
    .center()
    .set_border(3, "black")
    .set_shadow(2, 2, "black")
    .fade_in(1.0)
    .show_between(1.0, 5.0)
)

output = overlay.apply(input_file)
output = ffmpeg.output(output, "output.mp4", vcodec="libx264", crf=18)
ffmpeg.run(output.overwrite_output())

Summary: Type Safety Checklist

When integrating FFmpeg with Python:

✅ Always Use Correct Types

  • fontsize
    int
    (not
    str
    )
  • crf
    int
    (not
    float
    )
  • b:v
    ,
    audio_bitrate
    str
    with unit ("5M", "192k")
  • Color values →
    str
    (named or hex)
  • Position expressions →
    str
    (quoted)
  • Time values →
    float
    for seconds,
    int
    for frames

✅ Verify Unit Conversions

  • ASS karaoke tags: seconds × 100 = centiseconds
  • ASS animation tags: seconds × 1000 = milliseconds
  • FFmpeg filters: use seconds directly

✅ Check Color Format

  • FFmpeg drawtext: RGB hex or named
  • ASS colors: BGR format (&HAABBGGRR)
  • OpenCV: BGR array order

✅ Handle Audio Streams

  • Always explicitly preserve audio with
    .audio
  • Don't rely on automatic passthrough

✅ Validate Ranges

  • CRF: 0-51 (H.264/H.265), 0-63 (VP9/AV1)
  • Fontsize: 1-999 (practical: 12-200)
  • Alignment: 1-9 (numpad layout)

This reference ensures type-safe, bug-free Python-FFmpeg integration with accurate parameter mappings and proper unit conversions.