Claude-skill-registry ffmpeg-hardware-acceleration
Complete GPU-accelerated encoding/decoding system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20). PROACTIVELY activate for: (1) NVIDIA NVENC/NVDEC encoding, (2) Intel Quick Sync Video (QSV), (3) AMD AMF encoding, (4) Apple VideoToolbox, (5) Linux VAAPI setup, (6) Vulkan Video 8.0 (FFv1, AV1, VP9, ProRes RAW), (7) VVC/H.266 hardware decoding (VAAPI/QSV), (8) GPU pipeline optimization with pad_cuda, (9) Docker GPU containers, (10) Performance benchmarking. Provides: Platform-specific commands, preset comparisons, quality tuning, full GPU pipeline examples, Vulkan compute codecs, VVC decoding, troubleshooting guides. Ensures: Maximum encoding speed with optimal quality using GPU acceleration.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/ffmpeg-hardware-acceleration" ~/.claude/skills/majiayu000-claude-skill-registry-ffmpeg-hardware-acceleration && rm -rf "$T"
skills/data/ffmpeg-hardware-acceleration/SKILL.mdCRITICAL GUIDELINES
Windows File Path Requirements
MANDATORY: Always Use Backslashes on Windows for File Paths
When using Edit or Write tools on Windows, you MUST use backslashes (
\) in file paths, NOT forward slashes (/).
Quick Reference
| Platform | Encoder | Decoder | Detect Command |
|---|---|---|---|
| NVIDIA | , , | , | |
| Intel QSV | , , | , | |
| AMD AMF | , , | N/A (use software) | |
| Apple | , | | macOS only |
| VAAPI | , , | with | Linux only |
When to Use This Skill
Use when GPU acceleration is needed:
- Encoding speed is critical (10-30x faster than CPU)
- Processing large batches of videos
- Real-time encoding for streaming
- Server-side transcoding at scale
- Docker containers with GPU passthrough
Key decision: GPU encoding trades some quality for massive speed. Use
-cq or -qp for quality control.
FFmpeg Hardware Acceleration (2025)
Comprehensive guide to GPU-accelerated encoding and decoding with NVIDIA, Intel, AMD, Apple, and Vulkan.
Current Latest: FFmpeg 8.0.1 (released 2025-11-20) - Check with
ffmpeg -version
Hardware Acceleration Overview
Hardware acceleration uses dedicated GPU/SoC components for video processing:
- NVENC/NVDEC (NVIDIA): Dedicated video encode/decode engines
- QSV (Intel): Quick Sync Video on Intel CPUs with integrated graphics
- AMF (AMD): Advanced Media Framework for AMD GPUs
- VideoToolbox (Apple): macOS/iOS hardware acceleration
- VAAPI (Linux): Video Acceleration API (Intel, AMD on Linux)
- Vulkan Video (Cross-platform): FFmpeg 7.1+/8.0 GPU acceleration
Performance Comparison (2025 Benchmarks)
| Method | Speed | Quality | Power | Use Case |
|---|---|---|---|---|
| libx264 (CPU) | 1x | Best | High | Quality-critical |
| libx265 (CPU) | 0.3x | Best | Very High | Archival |
| h264_nvenc | 10-20x | Good | Low | Real-time, streaming |
| hevc_nvenc | 8-15x | Good | Low | 4K streaming |
| h264_qsv | 8-15x | Good | Very Low | Laptop, efficiency |
| h264_amf | 8-15x | Good | Low | AMD systems |
NVIDIA NVENC/NVDEC
Requirements
- NVIDIA GPU (GTX 600+ / Quadro K series+)
- NVIDIA drivers 450+
- FFmpeg built with
--enable-nvenc --enable-cuda --enable-cuvid
Check NVIDIA Support
# Check available NVIDIA codecs ffmpeg -encoders | grep nvenc ffmpeg -decoders | grep cuvid # Check GPU info nvidia-smi # List hardware accelerators ffmpeg -hwaccels
Basic NVENC Encoding
# H.264 NVENC encoding ffmpeg -i input.mp4 -c:v h264_nvenc -preset p4 -b:v 5M output.mp4 # H.265/HEVC NVENC encoding ffmpeg -i input.mp4 -c:v hevc_nvenc -preset p4 -b:v 4M output.mp4 # AV1 NVENC (RTX 40 series+) ffmpeg -i input.mp4 -c:v av1_nvenc -preset p4 -b:v 3M output.mp4
NVENC Presets (FFmpeg 7+)
| Preset | Speed | Quality | Use Case |
|---|---|---|---|
| p1 | Fastest | Lowest | Real-time capture |
| p2 | Faster | Low | Screen recording |
| p3 | Fast | Medium | General streaming |
| p4 | Medium | Good | Recommended |
| p5 | Slow | Better | High-quality streaming |
| p6 | Slower | Best | Offline encoding |
| p7 | Slowest | Highest | Maximum quality |
Full GPU Pipeline (Decode + Encode)
# Keep frames in GPU memory (fastest) ffmpeg -y -vsync 0 \ -hwaccel cuda \ -hwaccel_output_format cuda \ -i input.mp4 \ -c:v h264_nvenc \ -preset p4 \ -b:v 5M \ -c:a copy \ output.mp4
GPU Scaling and Filtering
# GPU-based scaling with scale_cuda ffmpeg -y -vsync 0 \ -hwaccel cuda \ -hwaccel_output_format cuda \ -i input.mp4 \ -vf scale_cuda=1280:720 \ -c:v h264_nvenc \ -preset p4 \ output.mp4 # GPU overlay with overlay_cuda ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i main.mp4 \ -hwaccel cuda -hwaccel_output_format cuda -i overlay.mp4 \ -filter_complex "[0:v][1:v]overlay_cuda=10:10" \ -c:v h264_nvenc output.mp4 # GPU padding with pad_cuda (FFmpeg 8.0+) ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \ -vf "pad_cuda=1920:1080:(ow-iw)/2:(oh-ih)/2:black" \ -c:v h264_nvenc output.mp4 # Letterbox with pad_cuda ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \ -vf "scale_cuda=1920:-2,pad_cuda=1920:1080:(ow-iw)/2:(oh-ih)/2" \ -c:v h264_nvenc output.mp4
CUDA Filters Reference (FFmpeg 8.0+)
| Filter | Description | Key Parameters |
|---|---|---|
| GPU video scaling | , , , |
| NPP-based scaling (higher quality) | , , |
| GPU overlay/compositing | , , |
| GPU padding (8.0+) | , , , , |
| GPU chroma keying | , , |
| Color space conversion | , , , |
| Bilateral filter | , , |
| Deinterlacing | , , |
| Upload to GPU | - |
| Download from GPU | - |
scale_npp (NVIDIA Performance Primitives)
For higher-quality scaling, use scale_npp with optimized interpolation algorithms:
# High-quality downscaling with super-sampling ffmpeg -hwaccel cuda -hwaccel_output_format cuda \ -i input.mp4 \ -vf "scale_npp=1280:720:interp_algo=super" \ -c:v h264_nvenc output.mp4 # Lanczos interpolation for upscaling ffmpeg -hwaccel cuda -hwaccel_output_format cuda \ -i input.mp4 \ -vf "scale_npp=1920:1080:interp_algo=lanczos" \ -c:v h264_nvenc output.mp4
scale_npp Interpolation Algorithms:
| Algorithm | Quality | Speed | Best For |
|---|---|---|---|
| Low | Fastest | Pixel art, nearest neighbor |
| Medium | Fast | General use |
| Good | Medium | Smooth scaling |
| High | Slow | High quality upscaling |
| Best | Slowest | Downscaling |
GPU Chromakey (Green Screen)
# Green screen removal on GPU ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i greenscreen.mp4 \ -hwaccel cuda -hwaccel_output_format cuda -i background.mp4 \ -filter_complex "[0:v]chromakey_cuda=0x00FF00:0.3:0.1[fg];[1:v][fg]overlay_cuda" \ -c:v h264_nvenc output.mp4
Hybrid GPU + CPU Filter Pipelines
When mixing CPU and GPU filters, use
hwupload_cuda and hwdownload:
# CPU decode -> CPU filter -> GPU encode ffmpeg -i input.mp4 \ -vf "fade=in:0:30,hwupload_cuda" \ -c:v h264_nvenc output.mp4 # GPU decode -> CPU filter (drawtext) -> GPU encode ffmpeg -hwaccel cuda -hwaccel_output_format cuda \ -i input.mp4 \ -vf "hwdownload,format=nv12,drawtext=text='Hello':fontsize=48,hwupload_cuda" \ -c:v h264_nvenc output.mp4 # Complete hybrid pipeline: GPU scale -> CPU text -> GPU encode ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \ -filter_complex "\ [0:v]scale_cuda=1920:1080[scaled];\ [scaled]hwdownload,format=nv12[cpu];\ [cpu]drawtext=text='Watermark':fontsize=48:x=10:y=10[text];\ [text]hwupload_cuda[gpu]" \ -map "[gpu]" \ -c:v h264_nvenc output.mp4
NVENC Quality Optimization
# High quality with lookahead ffmpeg -i input.mp4 \ -c:v hevc_nvenc \ -preset p5 \ -tune hq \ -rc vbr \ -cq 23 \ -b:v 0 \ -rc-lookahead 32 \ -spatial-aq 1 \ -temporal-aq 1 \ -c:a copy \ output.mp4 # Constant quality mode (CQP) ffmpeg -i input.mp4 \ -c:v h264_nvenc \ -preset p4 \ -rc constqp \ -qp 23 \ -c:a copy \ output.mp4
NVENC Two-Pass Encoding
# Two-pass for best quality (ABR) ffmpeg -i input.mp4 \ -c:v h264_nvenc \ -preset p5 \ -2pass 1 \ -b:v 5M \ -c:a copy \ output.mp4
NVENC B-Frames and GOP
# Enable B-frames for better compression ffmpeg -i input.mp4 \ -c:v hevc_nvenc \ -preset p4 \ -bf 4 \ -b_ref_mode 2 \ -g 250 \ -c:a copy \ output.mp4
GPU Memory Management
Critical Concepts
PCIe transfers between CPU and GPU memory are the primary bottleneck. Keeping frames in GPU memory throughout the pipeline is critical for performance.
Memory Flow Patterns:
Pattern 1: Full GPU Pipeline (OPTIMAL) Input File -> GPU Decode -> GPU Filter -> GPU Encode -> Output File | | | v v v [GPU Memory - No PCIe Transfer] Pattern 2: CPU Filter Insertion (SUBOPTIMAL) Input -> GPU Decode -> hwdownload -> CPU Filter -> hwupload -> GPU Encode -> Output | | v v [PCIe Transfer] [PCIe Transfer] Pattern 3: Software Decode + GPU Encode Input -> CPU Decode -> hwupload -> GPU Encode -> Output | v [PCIe Transfer]
Command Patterns Comparison
# OPTIMAL: Full GPU pipeline - no CPU-GPU transfers ffmpeg -y -vsync 0 \ -hwaccel cuda \ -hwaccel_output_format cuda \ -i input.mp4 \ -vf scale_cuda=1280:720 \ -c:v h264_nvenc \ output.mp4 # SUBOPTIMAL: GPU decode, CPU filter, GPU encode - 2 transfers ffmpeg -hwaccel cuda -hwaccel_output_format cuda \ -i input.mp4 \ -vf "hwdownload,format=nv12,drawtext=text='Hello',hwupload_cuda" \ -c:v h264_nvenc output.mp4 # AVOID: Missing -hwaccel_output_format - frame copies to CPU after decode ffmpeg -hwaccel cuda -i input.mp4 \ -vf scale=1280:720 \ -c:v h264_nvenc output.mp4
Why
matters: Without it, decoded frames transfer from GPU to CPU via PCIe, get processed, then transfer back to GPU for encoding. This can reduce throughput by up to 50%.-hwaccel_output_format cuda
Memory Monitoring
# NVIDIA GPU memory and utilization nvidia-smi dmon -s m # Watch GPU utilization in real-time watch -n 1 nvidia-smi # Intel GPU monitoring intel_gpu_top
Best Practices
- Use
to keep decoded frames in GPU memory-hwaccel_output_format - Use GPU filters when available (scale_cuda, overlay_cuda, pad_cuda)
- Batch similar operations to minimize filter graph complexity
- Monitor GPU memory for high-resolution content
- Use
to prevent frame duplication overhead-vsync 0
Multi-GPU Encoding
NVIDIA Multi-GPU
# Specify GPU by index (parallel processing on different GPUs) ffmpeg -hwaccel cuda \ -hwaccel_device 0 \ -hwaccel_output_format cuda \ -i input1.mp4 \ -c:v h264_nvenc output1.mp4 & ffmpeg -hwaccel cuda \ -hwaccel_device 1 \ -hwaccel_output_format cuda \ -i input2.mp4 \ -c:v h264_nvenc output2.mp4 & wait # Initialize specific CUDA device for filters ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \ -init_hw_device cuda=gpu1:1 \ -filter_complex "[0:v]scale_cuda=1280:720[scaled]" \ -map "[scaled]" \ -c:v h264_nvenc output.mp4
1:N Encoding (Single Input, Multiple Outputs)
More efficient than separate processes - shares CUDA context:
# Single input to multiple resolutions (ABR ladder) ffmpeg -y -vsync 0 \ -hwaccel cuda \ -hwaccel_output_format cuda \ -i input.mp4 \ -filter_complex "[0:v]split=3[v1][v2][v3];\ [v1]scale_cuda=1920:1080[hd];\ [v2]scale_cuda=1280:720[sd];\ [v3]scale_cuda=640:360[mobile]" \ -map "[hd]" -c:v h264_nvenc -b:v 5M output_1080p.mp4 \ -map "[sd]" -c:v h264_nvenc -b:v 2M output_720p.mp4 \ -map "[mobile]" -c:v h264_nvenc -b:v 500k output_360p.mp4
Parallel Encoding Strategy
For maximum throughput:
- Run 4+ parallel FFmpeg sessions
- Use inputs with 15+ seconds to amortize initialization
- Measure aggregate FPS across all sessions
# Environment variables for faster initialization export CUDA_VISIBLE_DEVICES=0 export CUDA_DEVICE_MAX_CONNECTIONS=2 # Parallel encoding script for i in {1..4}; do ffmpeg -hwaccel cuda -hwaccel_output_format cuda \ -i input_$i.mp4 \ -c:v h264_nvenc \ output_$i.mp4 & done wait
Intel Quick Sync Video (QSV)
Requirements
- Intel CPU with integrated graphics (Sandy Bridge+)
- Intel Media SDK or oneVPL
- FFmpeg built with
or--enable-libmfx--enable-libvpl
Check QSV Support
# Check available QSV codecs ffmpeg -encoders | grep qsv ffmpeg -decoders | grep qsv # Verify Intel GPU access (Linux) ls /dev/dri/ vainfo # Check VAAPI support
Basic QSV Encoding
# Initialize QSV device ffmpeg -init_hw_device qsv=hw \ -filter_hw_device hw \ -i input.mp4 \ -c:v h264_qsv \ -preset medium \ -b:v 5M \ output.mp4 # H.265/HEVC QSV ffmpeg -init_hw_device qsv=hw \ -filter_hw_device hw \ -i input.mp4 \ -c:v hevc_qsv \ -preset medium \ -b:v 4M \ output.mp4 # AV1 QSV (Intel Arc, 12th gen+) ffmpeg -init_hw_device qsv=hw \ -filter_hw_device hw \ -i input.mp4 \ -c:v av1_qsv \ -preset medium \ -b:v 3M \ output.mp4
Full QSV Pipeline
# Decode and encode on GPU ffmpeg -hwaccel qsv \ -hwaccel_output_format qsv \ -i input.mp4 \ -c:v h264_qsv \ -preset medium \ -b:v 5M \ output.mp4
QSV Quality Settings
# Constant quality (recommended) ffmpeg -hwaccel qsv -hwaccel_output_format qsv \ -i input.mp4 \ -c:v hevc_qsv \ -preset slow \ -global_quality 22 \ -look_ahead 1 \ output.mp4 # VBR with lookahead ffmpeg -hwaccel qsv -hwaccel_output_format qsv \ -i input.mp4 \ -c:v h264_qsv \ -preset medium \ -b:v 5M \ -maxrate 7M \ -bufsize 10M \ -look_ahead 1 \ -look_ahead_depth 40 \ output.mp4
QSV Quality Parameters:
| Parameter | Description | Range |
|---|---|---|
| Quality level (like CRF) | 1-51 (lower = better) |
| Encoding speed/quality | , , , , , , |
| Enable lookahead | 0 or 1 |
| Lookahead frames | 10-100 |
Multi-GPU QSV (Linux)
# Select specific GPU by render device ffmpeg -init_hw_device qsv=hw:/dev/dri/renderD129 \ -filter_hw_device hw \ -i input.mp4 \ -c:v h264_qsv output.mp4
QSV with Scaling
# GPU scaling with vpp_qsv ffmpeg -hwaccel qsv \ -hwaccel_output_format qsv \ -i input.mp4 \ -vf "vpp_qsv=w=1280:h=720" \ -c:v h264_qsv \ -preset medium \ output.mp4
VVC/H.266 QSV Decoding (FFmpeg 7.1+)
# Hardware VVC decoding ffmpeg -hwaccel qsv \ -hwaccel_output_format qsv \ -c:v vvc_qsv \ -i input.vvc \ -c:v h264_qsv \ output.mp4
AMD AMF
Requirements
- AMD GPU (GCN or newer)
- AMD drivers with AMF support
- FFmpeg built with
--enable-amf
Check AMF Support
ffmpeg -encoders | grep amf
Basic AMF Encoding
# H.264 AMF ffmpeg -i input.mp4 \ -c:v h264_amf \ -quality balanced \ -b:v 5M \ output.mp4 # H.265/HEVC AMF ffmpeg -i input.mp4 \ -c:v hevc_amf \ -quality balanced \ -b:v 4M \ output.mp4 # AV1 AMF (RDNA3+) ffmpeg -i input.mp4 \ -c:v av1_amf \ -quality balanced \ -b:v 3M \ output.mp4
AMF Quality Presets
| Preset | Description |
|---|---|
| speed | Fastest encoding |
| balanced | Recommended |
| quality | Best quality |
AMD Hardware Upscaling (FFmpeg 8.0+)
# Super resolution upscaling ffmpeg -i input.mp4 \ -vf "sr_amf=4096:2160:algorithm=sr1-1" \ -c:v hevc_amf \ output.mp4
VAAPI (Linux)
Requirements
- Intel, AMD, or NVIDIA GPU on Linux
- VAAPI drivers (intel-media-driver, mesa-va-drivers)
- FFmpeg built with
--enable-vaapi
Check VAAPI Support
# Check VAAPI info vainfo # List VAAPI codecs ffmpeg -encoders | grep vaapi ffmpeg -decoders | grep vaapi
Basic VAAPI Encoding
# H.264 VAAPI ffmpeg -vaapi_device /dev/dri/renderD128 \ -i input.mp4 \ -vf 'format=nv12,hwupload' \ -c:v h264_vaapi \ -b:v 5M \ output.mp4 # H.265/HEVC VAAPI ffmpeg -vaapi_device /dev/dri/renderD128 \ -i input.mp4 \ -vf 'format=nv12,hwupload' \ -c:v hevc_vaapi \ -b:v 4M \ output.mp4
Full VAAPI Pipeline
# Decode and encode on GPU ffmpeg -hwaccel vaapi \ -hwaccel_device /dev/dri/renderD128 \ -hwaccel_output_format vaapi \ -i input.mp4 \ -c:v h264_vaapi \ -b:v 5M \ output.mp4
VAAPI Scaling
ffmpeg -hwaccel vaapi \ -hwaccel_device /dev/dri/renderD128 \ -hwaccel_output_format vaapi \ -i input.mp4 \ -vf 'scale_vaapi=w=1280:h=720' \ -c:v h264_vaapi \ output.mp4
VVC VAAPI Decoding (FFmpeg 8.0+)
FFmpeg 8.0 adds VVC/H.266 hardware decoding on Intel and AMD GPUs via VAAPI.
# Hardware VVC/H.266 decoding ffmpeg -hwaccel vaapi \ -hwaccel_device /dev/dri/renderD128 \ -hwaccel_output_format vaapi \ -i input.vvc \ -c:v h264_vaapi \ output.mp4 # VVC decode + transcode to H.265 ffmpeg -hwaccel vaapi \ -hwaccel_device /dev/dri/renderD128 \ -hwaccel_output_format vaapi \ -i input.mkv \ -c:v hevc_vaapi \ -b:v 4M \ output.mp4 # VVC with Screen Content Coding (SCC) support # FFmpeg 8.0 adds full SCC support including: # - IBC (Inter Block Copy) # - Palette Mode # - ACT (Adaptive Color Transform) ffmpeg -hwaccel vaapi \ -hwaccel_device /dev/dri/renderD128 \ -i screen_recording.vvc \ output.mp4
VVC VAAPI Requirements:
- Intel Xe2 graphics (Lunar Lake) or newer for full VVC support
- FFmpeg 8.0 or later
- Intel media driver with VVC support
Apple VideoToolbox
Requirements
- macOS 10.8+ or iOS 8+
- FFmpeg built with
--enable-videotoolbox
Check VideoToolbox Support
ffmpeg -encoders | grep videotoolbox ffmpeg -decoders | grep videotoolbox
Basic VideoToolbox Encoding
# H.264 VideoToolbox ffmpeg -i input.mp4 \ -c:v h264_videotoolbox \ -b:v 5M \ output.mp4 # H.265/HEVC VideoToolbox ffmpeg -i input.mp4 \ -c:v hevc_videotoolbox \ -b:v 4M \ -tag:v hvc1 \ output.mp4 # ProRes VideoToolbox ffmpeg -i input.mp4 \ -c:v prores_videotoolbox \ -profile:v 3 \ output.mov
VideoToolbox Quality Settings
# Quality-based encoding ffmpeg -i input.mp4 \ -c:v h264_videotoolbox \ -q:v 65 \ output.mp4 # Hardware accelerated decode + encode ffmpeg -hwaccel videotoolbox \ -i input.mp4 \ -c:v h264_videotoolbox \ -b:v 5M \ output.mp4
Vulkan Video (FFmpeg 7.1+/8.0)
Overview
Vulkan Video provides cross-platform GPU acceleration using Vulkan compute shaders. Unlike proprietary hardware accelerators, Vulkan codecs are based on compute shaders and work on any implementation of Vulkan 1.3.
FFmpeg 7.1 Vulkan Features
- H.264 Vulkan encoding
- H.265/HEVC Vulkan encoding
FFmpeg 8.0 Vulkan Features (New)
- AV1 Vulkan encoding - GPU-accelerated AV1 via compute shaders
- VP9 Vulkan decoding - Hardware-accelerated VP9 decode
- FFv1 Vulkan encode/decode - Lossless codec for archival/capture
- ProRes RAW Vulkan decode - Apple ProRes RAW hardware decode
Benefits of Vulkan Compute Codecs:
- Cross-platform: Same code works on AMD, Intel, and NVIDIA
- No vendor lock-in: Works with any Vulkan 1.3 driver
- Ideal for: Lossless screen capture, high-throughput archival, professional workflows
Basic Vulkan Encoding
# H.264 Vulkan ffmpeg -init_hw_device vulkan \ -i input.mp4 \ -c:v h264_vulkan \ -b:v 5M \ output.mp4 # H.265/HEVC Vulkan ffmpeg -init_hw_device vulkan \ -i input.mp4 \ -c:v hevc_vulkan \ -b:v 4M \ output.mp4 # AV1 Vulkan (FFmpeg 8.0+) ffmpeg -init_hw_device vulkan \ -i input.mp4 \ -c:v av1_vulkan \ -b:v 3M \ output.mp4 # FFv1 Vulkan Lossless (FFmpeg 8.0+) ffmpeg -init_hw_device vulkan \ -i input.mp4 \ -c:v ffv1_vulkan \ output.mkv
Full Vulkan Pipeline
# Complete Vulkan decode-filter-encode ffmpeg -init_hw_device vulkan=vk \ -filter_hw_device vk \ -hwaccel vulkan \ -hwaccel_output_format vulkan \ -i input.mp4 \ -vf "scale_vulkan=1280:720" \ -c:v h264_vulkan \ output.mp4
VP9 Vulkan Decoding (FFmpeg 8.0+)
# Hardware decode VP9 with Vulkan ffmpeg -init_hw_device vulkan \ -hwaccel vulkan \ -hwaccel_output_format vulkan \ -i input.webm \ -c:v h264_vulkan \ output.mp4
ProRes RAW Vulkan Decoding (FFmpeg 8.0+)
# Hardware decode ProRes RAW with Vulkan ffmpeg -init_hw_device vulkan \ -hwaccel vulkan \ -i input.mov \ -c:v libx264 \ output.mp4
Lossless Screen Capture with FFv1 Vulkan
# High-throughput lossless capture (Linux X11) ffmpeg -init_hw_device vulkan \ -f x11grab -framerate 60 -i :0.0 \ -c:v ffv1_vulkan \ screen_capture.mkv # Lossless screen recording (Windows) ffmpeg -init_hw_device vulkan \ -f gdigrab -framerate 60 -i desktop \ -c:v ffv1_vulkan \ screen_capture.mkv
Upcoming Vulkan Codecs
The next minor update will add:
- ProRes (encode and decode)
- VC-2 (encode and decode)
Vulkan Filters (FFmpeg 8.0+)
Vulkan filters provide cross-platform GPU acceleration for video processing.
Vulkan Filter Reference
| Filter | Description | Key Parameters |
|---|---|---|
| GPU scaling | , , |
| Compositing | , |
| Rotate/transpose | , |
| Horizontal flip | - |
| Box blur | , |
| Gaussian blur | , , |
| Chromatic aberration | , |
| NLMeans denoising | , , |
| Deinterlacing | , , |
| Transitions | , , |
| Libplacebo processing | , , |
Vulkan Scaling
# Basic Vulkan scaling ffmpeg -init_hw_device vulkan=vk \ -filter_hw_device vk \ -i input.mp4 \ -vf "hwupload,scale_vulkan=1920:1080,hwdownload,format=yuv420p" \ output.mp4 # Full Vulkan pipeline with scaling ffmpeg -init_hw_device vulkan=vk \ -filter_hw_device vk \ -hwaccel vulkan -hwaccel_output_format vulkan \ -i input.mp4 \ -vf "scale_vulkan=1280:720" \ -c:v h264_vulkan output.mp4 # Vulkan scaling with specific scaler ffmpeg -init_hw_device vulkan \ -i input.mp4 \ -vf "hwupload,scale_vulkan=w=1920:h=1080:scaler=lanczos,hwdownload,format=yuv420p" \ output.mp4
Vulkan Compositing (overlay_vulkan)
# Picture-in-picture with Vulkan ffmpeg -init_hw_device vulkan=vk \ -filter_hw_device vk \ -i main.mp4 -i overlay.mp4 \ -filter_complex "\ [0:v]hwupload[main];\ [1:v]hwupload,scale_vulkan=320:180[pip];\ [main][pip]overlay_vulkan=x=10:y=10[out];\ [out]hwdownload,format=yuv420p" \ -map "[out]" output.mp4 # Full Vulkan overlay pipeline ffmpeg -init_hw_device vulkan=vk \ -filter_hw_device vk \ -hwaccel vulkan -hwaccel_output_format vulkan -i main.mp4 \ -hwaccel vulkan -hwaccel_output_format vulkan -i overlay.mp4 \ -filter_complex "[0:v][1:v]overlay_vulkan=x=W-w-10:y=10" \ -c:v h264_vulkan output.mp4
Vulkan Rotation and Transpose
# Transpose (rotate 90° clockwise) ffmpeg -init_hw_device vulkan \ -i input.mp4 \ -vf "hwupload,transpose_vulkan=dir=clock,hwdownload,format=yuv420p" \ output.mp4 # Horizontal flip ffmpeg -init_hw_device vulkan \ -i input.mp4 \ -vf "hwupload,flip_vulkan,hwdownload,format=yuv420p" \ output.mp4
transpose_vulkan directions:
| dir | Rotation |
|---|---|
| 90° counter-clockwise + vertical flip |
| 90° clockwise |
| 90° counter-clockwise |
| 90° clockwise + vertical flip |
Vulkan Blur Effects
# Gaussian blur ffmpeg -init_hw_device vulkan \ -i input.mp4 \ -vf "hwupload,gblur_vulkan=sigma=5,hwdownload,format=yuv420p" \ output.mp4 # Box blur (averaging) ffmpeg -init_hw_device vulkan \ -i input.mp4 \ -vf "hwupload,avgblur_vulkan=sizeX=5:sizeY=5,hwdownload,format=yuv420p" \ output.mp4
Vulkan Chromatic Aberration
# Add chromatic aberration effect ffmpeg -init_hw_device vulkan \ -i input.mp4 \ -vf "hwupload,chromaber_vulkan=dist_x=-0.002:dist_y=0.002,hwdownload,format=yuv420p" \ output.mp4
Vulkan Denoising (nlmeans_vulkan)
# Non-local means denoising on GPU ffmpeg -init_hw_device vulkan \ -i noisy.mp4 \ -vf "hwupload,nlmeans_vulkan=s=3:p=7:r=15,hwdownload,format=yuv420p" \ output.mp4 # Full Vulkan denoising pipeline ffmpeg -init_hw_device vulkan=vk \ -filter_hw_device vk \ -hwaccel vulkan -hwaccel_output_format vulkan \ -i noisy.mp4 \ -vf "nlmeans_vulkan=s=4" \ -c:v h264_vulkan output.mp4
Vulkan Deinterlacing (bwdif_vulkan)
# Bob-Weaver deinterlacing ffmpeg -init_hw_device vulkan \ -i interlaced.mp4 \ -vf "hwupload,bwdif_vulkan=mode=send_frame,hwdownload,format=yuv420p" \ output.mp4 # Full Vulkan deinterlace pipeline ffmpeg -init_hw_device vulkan=vk \ -filter_hw_device vk \ -hwaccel vulkan -hwaccel_output_format vulkan \ -i interlaced.mp4 \ -vf "bwdif_vulkan=1" \ -c:v h264_vulkan output.mp4
Vulkan Transitions (xfade_vulkan)
# GPU-accelerated crossfade between two clips ffmpeg -init_hw_device vulkan=vk \ -filter_hw_device vk \ -i clip1.mp4 -i clip2.mp4 \ -filter_complex "\ [0:v]hwupload[v1];\ [1:v]hwupload[v2];\ [v1][v2]xfade_vulkan=transition=fade:duration=1:offset=4[out];\ [out]hwdownload,format=yuv420p" \ -map "[out]" output.mp4
xfade_vulkan transitions:
fade, dissolve, wipeleft, wiperight, wipeup, wipedown, slideleft, slideright, slideup, slidedown, and more.
Libplacebo Processing (Advanced)
Libplacebo provides advanced color management and processing:
# HDR tonemapping with libplacebo ffmpeg -init_hw_device vulkan \ -i hdr_input.mp4 \ -vf "hwupload,libplacebo=tonemapping=hable:colorspace=bt709:color_primaries=bt709:color_trc=bt709,hwdownload,format=yuv420p" \ sdr_output.mp4 # Scaling with libplacebo (high quality) ffmpeg -init_hw_device vulkan \ -i input.mp4 \ -vf "hwupload,libplacebo=w=3840:h=2160:upscaler=ewa_lanczos,hwdownload,format=yuv420p10le" \ output.mp4
OpenCL Filters
OpenCL provides GPU acceleration that works across vendors (NVIDIA, AMD, Intel).
OpenCL Filter Reference
| Filter | Description | Key Parameters |
|---|---|---|
| GPU scaling | , |
| Compositing | , |
| Padding | , , , , |
| Chroma keying | , , |
| Sharpening | , , , , , |
| NLMeans denoising | , , |
| Stabilization | , , |
| HDR tonemapping | , , |
| Box blur | , |
| Custom kernel | , , etc. |
OpenCL Initialization
# List OpenCL devices ffmpeg -init_hw_device opencl=gpu:0.0 -filter_hw_device gpu -f lavfi -i nullsrc -vf "hwupload,avgblur_opencl=2" -t 1 -f null - # Basic OpenCL filter pattern ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i input.mp4 \ -vf "hwupload,<filter_opencl>,hwdownload,format=yuv420p" \ output.mp4
OpenCL Scaling
# GPU scaling with OpenCL ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i input.mp4 \ -vf "hwupload,scale_opencl=1920:1080,hwdownload,format=yuv420p" \ output.mp4
OpenCL Compositing
# Picture-in-picture with OpenCL ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i main.mp4 -i overlay.mp4 \ -filter_complex "\ [0:v]hwupload[main];\ [1:v]hwupload,scale_opencl=320:180[pip];\ [main][pip]overlay_opencl=x=10:y=10[out];\ [out]hwdownload,format=yuv420p" \ -map "[out]" output.mp4
OpenCL Chroma Keying (colorkey_opencl)
# Green screen removal with OpenCL ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i greenscreen.mp4 -i background.mp4 \ -filter_complex "\ [0:v]hwupload,colorkey_opencl=0x00FF00:0.3:0.1[fg];\ [1:v]hwupload[bg];\ [bg][fg]overlay_opencl[out];\ [out]hwdownload,format=yuv420p" \ -map "[out]" output.mp4
OpenCL Denoising (nlmeans_opencl)
# Non-local means denoising on GPU ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i noisy.mp4 \ -vf "hwupload,nlmeans_opencl=s=3:p=7:r=15,hwdownload,format=yuv420p" \ output.mp4 # Strong denoising ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i noisy.mp4 \ -vf "hwupload,nlmeans_opencl=s=5:p=7:r=21,hwdownload,format=yuv420p" \ output.mp4
OpenCL Stabilization (deshake_opencl)
# GPU-accelerated video stabilization ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i shaky.mp4 \ -vf "hwupload,deshake_opencl,hwdownload,format=yuv420p" \ output.mp4 # Tripod mode (camera stays in one place) ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i shaky.mp4 \ -vf "hwupload,deshake_opencl=tripod=1,hwdownload,format=yuv420p" \ output.mp4
OpenCL Sharpening (unsharp_opencl)
# Basic sharpening ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i input.mp4 \ -vf "hwupload,unsharp_opencl=lx=5:ly=5:la=1.0,hwdownload,format=yuv420p" \ output.mp4 # Subtle sharpening for HD video ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i input.mp4 \ -vf "hwupload,unsharp_opencl=lx=3:ly=3:la=0.5,hwdownload,format=yuv420p" \ output.mp4
OpenCL HDR Tonemapping (tonemap_opencl)
# HDR to SDR conversion with GPU ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i hdr_input.mp4 \ -vf "hwupload,tonemap_opencl=tonemap=hable:transfer=bt709:matrix=bt709:primaries=bt709,hwdownload,format=yuv420p" \ sdr_output.mp4 # Reinhard tonemapping ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i hdr_input.mp4 \ -vf "hwupload,tonemap_opencl=tonemap=reinhard:param=0.5,hwdownload,format=yuv420p" \ sdr_output.mp4
tonemap_opencl algorithms:
| Algorithm | Description |
|---|---|
| No tonemapping |
| Simple clipping |
| Linear scaling |
| Gamma-based |
| Reinhard operator |
| Hable/Filmic (most popular) |
| Mobius (preserves blacks) |
| ITU-R BT.2390 (broadcast standard) |
OpenCL Blur
# Box blur ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i input.mp4 \ -vf "hwupload,avgblur_opencl=sizeX=5:sizeY=5,hwdownload,format=yuv420p" \ output.mp4
OpenCL Custom Convolution
# Edge detection kernel ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i input.mp4 \ -vf "hwupload,convolution_opencl=0m='-1 -1 -1 -1 8 -1 -1 -1 -1':0rdiv=1:0bias=128,hwdownload,format=yuv420p" \ output.mp4 # Sharpen kernel ffmpeg -init_hw_device opencl=gpu \ -filter_hw_device gpu \ -i input.mp4 \ -vf "hwupload,convolution_opencl=0m='0 -1 0 -1 5 -1 0 -1 0':0rdiv=1:0bias=0,hwdownload,format=yuv420p" \ output.mp4
GPU Filter Comparison
Cross-Platform Filter Availability
| Filter Type | CUDA | Vulkan | OpenCL | VAAPI | QSV |
|---|---|---|---|---|---|
| Scaling | ✅ | ✅ | ✅ | ✅ | ✅ |
| Overlay | ✅ | ✅ | ✅ | - | - |
| Denoising | ✅ | ✅ | ✅ | - | - |
| Deinterlace | ✅ | ✅ | - | ✅ | ✅ |
| Chromakey | ✅ | - | ✅ | - | - |
| Tonemapping | - | ✅ | ✅ | ✅ | - |
| Stabilization | - | - | ✅ | - | - |
| Padding | ✅ | - | ✅ | - | - |
Performance Comparison
| API | Platform Support | Filter Variety | Performance |
|---|---|---|---|
| CUDA | NVIDIA only | Excellent | Best on NVIDIA |
| Vulkan | Cross-platform | Good (growing) | Very good |
| OpenCL | Cross-platform | Good | Good |
| VAAPI | Linux only | Limited | Good on Linux |
| QSV | Intel only | Limited | Good on Intel |
Docker with Hardware Acceleration
NVIDIA GPU in Docker
# Run with NVIDIA GPU support docker run --gpus all \ --rm \ -v $(pwd):/data \ jrottenberg/ffmpeg:nvidia \ -hwaccel cuda \ -hwaccel_output_format cuda \ -i /data/input.mp4 \ -c:v h264_nvenc \ /data/output.mp4
Intel QSV in Docker
# Run with Intel GPU access docker run --rm \ --device=/dev/dri:/dev/dri \ -v $(pwd):/data \ jrottenberg/ffmpeg:vaapi \ -hwaccel qsv \ -i /data/input.mp4 \ -c:v h264_qsv \ /data/output.mp4
Troubleshooting
Common Issues
"No NVENC capable devices found"
# Check GPU support nvidia-smi # Verify driver version nvidia-smi --query-gpu=driver_version --format=csv # Check CUDA version nvcc --version
"Cannot load libcuda.so"
# Set library path (Linux) export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
QSV "Unsupported format"
# Force pixel format conversion ffmpeg -i input.mp4 \ -vf "format=nv12" \ -c:v h264_qsv \ output.mp4
VAAPI permission denied
# Add user to video/render group sudo usermod -aG video $USER sudo usermod -aG render $USER # Re-login or use newgrp
Performance Debugging
# Benchmark encode speed ffmpeg -benchmark -i input.mp4 -c:v h264_nvenc -f null - # Show hardware decode stats ffmpeg -hwaccel cuda -hwaccel_output_format cuda -benchmark -i input.mp4 -f null - # Monitor GPU usage (NVIDIA) nvidia-smi dmon -s u # Monitor GPU usage (Intel) intel_gpu_top
Best Practices
- Use full GPU pipelines when possible to avoid CPU-GPU memory transfers
- Match decode and encode hardware for best performance
- Use appropriate presets - faster isn't always better for quality
- Enable lookahead and AQ for quality-critical encodes
- Test on target hardware - quality varies by GPU generation
- Monitor GPU memory for high-resolution content
- Consider power efficiency for laptops and servers
- Update drivers regularly for performance and feature improvements
Recommended Settings by Use Case
Live Streaming
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input \ -c:v h264_nvenc -preset p3 -tune ll -zerolatency 1 -b:v 6M \ -f flv rtmp://server/live/stream
VOD Encoding (Quality)
ffmpeg -i input.mp4 \ -c:v hevc_nvenc -preset p6 -tune hq \ -rc vbr -cq 22 -b:v 0 \ -rc-lookahead 32 -spatial-aq 1 \ output.mp4
Batch Processing
# Multiple parallel streams on one GPU ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input1.mp4 -c:v h264_nvenc output1.mp4 & ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input2.mp4 -c:v h264_nvenc output2.mp4 & wait
This guide covers hardware acceleration fundamentals. For specific platform optimizations and advanced configurations, consult the respective vendor documentation.