Learn-skills.dev video-edit

Complete video editing toolkit - silence removal, auto-captions, vertical crop, YouTube clipping, 3D transitions, and social media compression. Use when user asks to edit video, remove silences, add captions/subtitles, crop to vertical/shorts, download YouTube clips, compress video, or create video teasers.

install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/aiagentwithdhruv/skills/video-edit" ~/.claude/skills/neversight-learn-skills-dev-video-edit && rm -rf "$T"
manifest: data/skills-md/aiagentwithdhruv/skills/video-edit/SKILL.md
source content

Video Editing Toolkit

Goal

Complete video production pipeline: silence removal, auto-captions, vertical cropping, YouTube clipping, compression, and 3D transitions.

Scripts (7 total)

ScriptPurpose
jump_cut_vad_singlepass.py
Remove silences with neural VAD (Silero)
auto_captions.py
Generate + burn styled subtitles (Whisper + FFmpeg)
vertical_crop.py
Auto-crop 16:9 → 9:16 with face tracking
youtube_clip.py
Download YouTube + AI chapter clipping
compress_video.py
Social media compression with platform presets
simple_video_edit.py
Full pipeline: silence removal + transcription + metadata + upload
insert_3d_transition.py
3D swivel teaser insertion

Quick Start Recipes

Recipe 1: Full Social Media Pipeline

# 1. Remove silences
python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 .tmp/edited.mp4

# 2. Add captions
python3 ./scripts/auto_captions.py .tmp/edited.mp4 .tmp/captioned.mp4 --word-level --max-words 2

# 3. Crop to vertical for Shorts/Reels
python3 ./scripts/vertical_crop.py .tmp/captioned.mp4 .tmp/vertical.mp4

# 4. Compress for platform
python3 ./scripts/compress_video.py .tmp/vertical.mp4 output.mp4 --preset instagram-reel

Recipe 2: YouTube → Shorts Pipeline

# 1. Download + auto-clip YouTube video
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip --output-dir clips/

# 2. Add captions to best clip
python3 ./scripts/auto_captions.py clips/chapters/01_intro.mp4 .tmp/captioned.mp4 --word-level

# 3. Crop to vertical
python3 ./scripts/vertical_crop.py .tmp/captioned.mp4 .tmp/vertical.mp4

# 4. Compress for YouTube Shorts
python3 ./scripts/compress_video.py .tmp/vertical.mp4 short.mp4 --preset youtube-shorts

Recipe 3: Quick Edit + Upload

# All-in-one: silence removal + transcription + metadata + Auphonic upload
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "My Video"

Script 1: VAD Silence Removal

File:

./scripts/jump_cut_vad_singlepass.py

How It Works

  1. Extracts audio as WAV (16kHz mono)
  2. Runs Silero VAD to detect speech segments
  3. Merges close segments, adds padding
  4. Uses FFmpeg trim+concat to join segments in single pass
  5. Hardware encodes with hevc_videotoolbox (H.265, 17Mbps, 30fps)

CLI Arguments

ArgumentDefaultDescription
--min-silence
0.5Min silence duration to cut (seconds)
--min-speech
0.25Min speech duration to keep (seconds)
--padding
100Padding around speech (ms)
--merge-gap
0.3Merge segments closer than this (seconds)
--keep-start
trueAlways start from 0:00

Usage

python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 output.mp4
python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 output.mp4 --min-silence 1.0 --padding 200

Processing Time

~8 minutes for a 49-min 4K video


Script 2: Auto-Captions

File:

./scripts/auto_captions.py

How It Works

  1. Transcribes video using faster-whisper (word-level timestamps)
  2. Generates SRT subtitles (segment-level or word-level)
  3. Burns styled captions into video with FFmpeg

CLI Arguments

ArgumentDefaultDescription
--model
baseWhisper model: tiny, base, small, medium, large-v3
--language
autoForce language (en, es, hi, etc.)
--word-level
falseShort punchy captions (1-3 words at a time)
--max-words
3Max words per caption in word-level mode
--font-size
22Caption font size (auto-scales for vertical)
--font-color
whiteText color: white, yellow, cyan, green, red, orange, pink
--outline-color
blackOutline color
--position
bottomCaption position: bottom, top, middle
--srt-only
falseOnly generate SRT, don't burn
--srt-path
autoCustom SRT output path
--bold
trueBold text

Usage

# Basic captions
python3 ./scripts/auto_captions.py input.mp4 output.mp4

# Word-level (CapCut/Submagic style)
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --word-level --max-words 2

# Yellow captions at top
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --font-color yellow --position top

# SRT only (no burn)
python3 ./scripts/auto_captions.py input.mp4 --srt-only

# High accuracy
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --model large-v3

Dependencies

pip install faster-whisper

Script 3: Vertical Crop

File:

./scripts/vertical_crop.py

How It Works

  1. Samples frames throughout the video
  2. Detects faces using OpenCV Haar cascade
  3. Smooths face positions to avoid jitter
  4. Crops 16:9 → 9:16 centered on the speaker
  5. For moving subjects: segments video into 2s chunks with per-segment tracking

CLI Arguments

ArgumentDefaultDescription
--ratio
9:16Target aspect ratio (e.g., 9:16, 4:5, 1:1)
--position
autoCrop position: auto (face tracking), left, center, right
--smoothing
30Smoothing window in frames
--sample-interval
5Sample every N frames for detection

Usage

# Auto face-tracking crop
python3 ./scripts/vertical_crop.py input.mp4 output.mp4

# Square crop (Instagram post)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --ratio 1:1

# 4:5 crop (Instagram feed)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --ratio 4:5

# Center crop (no tracking)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --position center

Dependencies

pip install opencv-python numpy

Script 4: YouTube Clip

File:

./scripts/youtube_clip.py

How It Works

  1. Downloads video via yt-dlp (up to 4K)
  2. Extracts YouTube chapters if available
  3. Falls back to AI chapter generation (Whisper + Claude)
  4. Clips video into individual chapter files

CLI Arguments

ArgumentDefaultDescription
--download-only
falseOnly download, don't clip
--max-quality
1080Max quality: 480, 720, 1080, 1440, 2160
--audio-only
falseDownload audio only (MP3)
--start
noneManual clip start (seconds)
--end
noneManual clip end (seconds)
--auto-clip
falseAI-powered chapter detection and clipping
--use-yt-chapters
trueUse YouTube chapters if available
--max-clips
noneLimit number of clips
--whisper-model
baseWhisper model for transcription
--reencode
falseRe-encode clips (precise cuts)
--output-dir
.tmp/clipsOutput directory

Usage

# Download only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --download-only

# Auto-clip into chapters
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip

# Extract specific range
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --start 60 --end 180 -o clip.mp4

# Download audio only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --audio-only

# Top 3 clips only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip --max-clips 3

Dependencies

pip install yt-dlp faster-whisper anthropic

Script 5: Video Compressor

File:

./scripts/compress_video.py

Platform Presets

PresetResolutionCRFMax DurationMax SizeNotes
youtube
1080p18nonenoneHigh quality
youtube-shorts
1080x19202060snoneVertical
instagram-reel
1080x19202390snoneVertical
instagram-post
1080x10802360snoneSquare/vertical
tiktok
1080x192023180snoneVertical
twitter
1080p23140s512MBAuto-bitrate
linkedin
1080p23600s200MBAuto-bitrate
telegram
720p28none50MBFor bots
whatsapp
720p28120s16MBAggressive
small
480p30nonenoneQuick sharing

CLI Arguments

ArgumentDefaultDescription
--preset
nonePlatform preset (see table above)
--resolution
1080Max height in pixels
--crf
23Quality (0=lossless, 51=worst)
--audio-bitrate
128kAudio bitrate
--target-size
noneTarget file size in MB
--max-duration
noneMax duration in seconds
--list-presets
falseShow all presets

Usage

# Platform preset
python3 ./scripts/compress_video.py input.mp4 output.mp4 --preset instagram-reel
python3 ./scripts/compress_video.py input.mp4 output.mp4 --preset whatsapp

# Target file size
python3 ./scripts/compress_video.py input.mp4 output.mp4 --target-size 25

# Custom
python3 ./scripts/compress_video.py input.mp4 output.mp4 --resolution 720 --crf 28

# List all presets
python3 ./scripts/compress_video.py input.mp4 output.mp4 --list-presets

Script 6: Simple Video Edit (Full Pipeline)

File:

./scripts/simple_video_edit.py

How It Works

  1. FFmpeg silence detection + cutting
  2. Audio normalization (loudnorm)
  3. Whisper transcription
  4. Claude-generated YouTube metadata (summary + chapters)
  5. Auphonic upload → YouTube (private draft)

CLI Arguments

ArgumentDefaultDescription
--video
requiredInput video path
--title
requiredYouTube title
--thumbnail
noneThumbnail image path
--no-upload
falseSkip Auphonic upload
--no-normalize
falseSkip audio normalization
--upload-only
falseSkip editing, just upload
--silence-threshold
-35Silence threshold in dB
--silence-duration
3.0Min silence duration (seconds)

Usage

# Full pipeline
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "My Video"

# Local only
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "Test" --no-upload

Dependencies

pip install anthropic faster-whisper requests python-dotenv
# Requires: ANTHROPIC_API_KEY, AUPHONIC_API_KEY in .env

Script 7: 3D Swivel Teaser

File:

./scripts/insert_3d_transition.py

How It Works

  1. Extracts frames from later in video (default: 60s onwards)
  2. Creates 3D rotating "swivel" animation via Remotion
  3. Splits video: intro → transition → main content
  4. Re-encodes and concatenates with audio preserved

CLI Arguments

ArgumentDefaultDescription
--insert-at
3Where to insert teaser (seconds)
--duration
5Teaser duration (seconds)
--teaser-start
60Where to sample content from (seconds)
--bg-color
#2d3436Background color (hex)
--bg-image
noneBackground image path

Final Timeline

[0-3s intro] [3-8s swivel teaser @ 100x] [8s onwards: edited content]
Audio: Original audio plays continuously

Dependencies

pip install torch  # For Silero VAD (jump_cut script)
brew install ffmpeg node  # macOS
cd video_effects && npm install  # For Remotion 3D rendering

All Dependencies (Install Once)

# Core (required for all scripts)
brew install ffmpeg

# Silence removal
pip install torch

# Auto-captions + YouTube clip
pip install faster-whisper

# Vertical crop
pip install opencv-python numpy

# YouTube download
pip install yt-dlp

# AI features (chapters, metadata)
pip install anthropic

# Full pipeline (simple_video_edit)
pip install requests python-dotenv

# 3D transitions
brew install node
cd scripts/video_effects && npm install

Troubleshooting

IssueSolution
Cuts feel abrupt
--padding 200
in jump_cut
Too much cut
--min-silence 1.0
in jump_cut
Captions too small
--font-size 32
in auto_captions
Vertical video detectedFont auto-scales 1.3x
Won't play in QuickTimeEnsure hvc1 codec tag
Face not detectedTry
--position center
in vertical_crop
YouTube download failsUpdate yt-dlp:
pip install -U yt-dlp
File too largeUse
--preset whatsapp
or
--target-size 25
Hardware encoder failsAuto-falls back to software (libx264)

Technical Details

  • macOS: Hardware encoding (hevc_videotoolbox / h264_videotoolbox)
  • Fallback: libx264/libx265 with CRF
  • Audio: AAC 128-192kbps
  • Uses
    hvc1
    codec tag for QuickTime compatibility
  • All scripts support
    --help
    for full argument list

Schema

Inputs

NameTypeRequiredDescription
input_video
file_pathYesInput video file path
recipe
stringNoPipeline recipe: full-social, youtube-shorts, quick-edit
preset
stringNoCompression preset: youtube, instagram-reel, tiktok, whatsapp, etc.

Outputs

NameTypeDescription
output_video
file_pathProcessed video file path

Credentials

NameSource
ANTHROPIC_API_KEY
.env (for AI chapters)
AUPHONIC_API_KEY
.env (for upload)

Composable With

Skills that chain well with this one:

pan-3d-transition
,
recreate-thumbnails

Cost

Free locally (FFmpeg + Silero VAD)