Skillshub jump-cut-vad

Jump Cut Editor (VAD-based)

install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/Harmeet10000/skills/jump-cut-vad" ~/.claude/skills/comeonoliver-skillshub-jump-cut-vad && rm -rf "$T"
manifest: skills/Harmeet10000/skills/jump-cut-vad/SKILL.md
source content

Jump Cut Editor (VAD-based)

Automatically remove silences from talking-head videos using neural voice activity detection (Silero VAD). More accurate than FFmpeg silence detection, especially for videos with background noise, breathing sounds, or quiet speech.

Execution Script

execution/jump_cut_vad.py


Quick Start

# Basic silence removal
python3 execution/jump_cut_vad.py input.mp4 output.mp4

# With audio enhancement and color grading
python3 execution/jump_cut_vad.py input.mp4 output.mp4 \
    --enhance-audio \
    --apply-lut .tmp/cinematic.cube

# With "cut cut" restart detection (removes mistakes)
python3 execution/jump_cut_vad.py input.mp4 output.mp4 \
    --detect-restarts \
    --enhance-audio

# Fine-tuned parameters
python3 execution/jump_cut_vad.py input.mp4 output.mp4 \
    --min-silence 0.8 \
    --padding 150 \
    --enhance-audio

What It Does

  1. Extracts audio from video as WAV
  2. Runs Silero VAD (neural voice activity detection) to identify speech segments
  3. Optionally detects "cut cut" restart phrases and removes mistake segments
  4. Concatenates speech segments with padding
  5. Applies audio enhancement (optional): EQ, compression, loudness normalization
  6. Applies color grading (optional): LUT-based color correction

CLI Arguments

ArgumentDefaultDescription
input
requiredInput video file path
output
requiredOutput video file path
--min-silence
0.5Minimum silence gap to cut (seconds)
--min-speech
0.25Minimum speech duration to keep (seconds)
--padding
100Padding around speech in milliseconds
--merge-gap
0.3Merge segments closer than this (seconds)
--keep-start
truePreserve intro (start from 0:00)
--no-keep-start
-Allow cutting silence at the beginning
--enhance-audio
falseApply audio enhancement chain
--detect-restarts
falseDetect "cut cut" and remove mistakes
--restart-phrase
"cut cut"Custom restart trigger phrase
--whisper-model
baseWhisper model for restart detection
--apply-lut
nonePath to LUT file for color grading

Features

1. Silero VAD (Voice Activity Detection)

Uses a neural network trained specifically for voice detection. Much better than FFmpeg's volume-based silence detection:

Silero VADFFmpeg silencedetect
Detects actual speechDetects volume drops
Ignores breathingCuts on breathing pauses
Works with background noiseFails with background noise
Handles quiet speechMisses quiet speech

2. "Cut Cut" Restart Detection

Say "cut cut" during recording to mark a mistake. The script will:

  1. Detect the phrase using Whisper transcription
  2. Remove the segment containing "cut cut"
  3. Remove the previous segment (where the mistake is)

This lets you redo takes naturally without stopping the recording.

# Enable restart detection
python3 execution/jump_cut_vad.py input.mp4 output.mp4 --detect-restarts

# Custom restart phrase
python3 execution/jump_cut_vad.py input.mp4 output.mp4 \
    --detect-restarts --restart-phrase "start over"

3. Audio Enhancement

Applies a professional voice processing chain:

highpass=f=80            # Remove rumble below 80Hz
lowpass=f=12000          # Remove harsh highs above 12kHz
equalizer (200Hz, -1dB)  # Reduce muddiness
equalizer (3kHz, +2dB)   # Boost presence/clarity
acompressor              # Gentle compression (3:1 ratio)
loudnorm=I=-16           # YouTube loudness standard (-16 LUFS)
python3 execution/jump_cut_vad.py input.mp4 output.mp4 --enhance-audio

4. LUT Color Grading

Apply color grading using standard LUT files:

# Apply .cube LUT
python3 execution/jump_cut_vad.py input.mp4 output.mp4 \
    --apply-lut .tmp/cinematic.cube

Supported formats:

.cube
,
.3dl
,
.dat
,
.m3d
,
.csp


Parameter Tuning

Silence Detection

GoalParameterValue
More aggressive cuts
--min-silence
0.3-0.4
Preserve natural pauses
--min-silence
0.8-1.0
Keep very short utterances
--min-speech
0.1-0.2
Ignore brief sounds
--min-speech
0.4-0.5

Padding

Goal
--padding
value
Tight cuts50-80
Natural feel100-150
Extra breathing room200-300

Recording Workflow

With Restart Detection

  1. Start recording
  2. Speak naturally
  3. Make a mistake → Say "cut cut" → Pause briefly → Redo from checkpoint
  4. Continue recording
  5. Stop when done

The script automatically removes:

  • The segment containing "cut cut"
  • The previous segment (your mistake)

Without Restart Detection

  1. Start recording
  2. Speak with natural pauses
  3. Long pauses (>0.5s default) will be cut
  4. Finish and run the script

Dependencies

System Requirements

brew install ffmpeg  # macOS

Python Dependencies

pip install torch  # For Silero VAD
pip install whisper  # For restart detection (optional)

Silero VAD is downloaded automatically from torch.hub on first run.


Example Output

🎬 Jump Cut Editor (Silero VAD)
   Input: .tmp/recording.mp4
   Output: .tmp/recording_edited.mp4

📏 Video duration: 180.00s
🎵 Extracting audio...
🎯 Running Silero VAD (min_silence=0.5s, min_speech=0.25s)...
   Found 24 speech segments
     1. 0.00s - 8.32s (8.32s)
     2. 9.12s - 15.44s (6.32s)
     3. 16.28s - 22.60s (6.32s)
     ... and 21 more
📎 After merging close segments: 18 segments
🔲 After adding 100ms padding: 18 segments
📌 Preserving intro: extended first segment to start at 0:00
✂️  Concatenating 18 segments...
🎧 Audio enhancement enabled
✅ Output saved to .tmp/recording_edited.mp4

📊 Stats:
   Original: 180.00s
   New: 142.34s
   Removed: 37.66s (20.9%)

vs. simple_video_edit.py

Featurejump_cut_vad.pysimple_video_edit.py
Silence detectionNeural VADFFmpeg volume-based
AccuracyHighMedium
Restart detectionYes ("cut cut")No
Audio enhancementFull chainBasic loudnorm
LUT color gradingYesNo
Whisper transcriptionOptionalBuilt-in
YouTube uploadNoYes (Auphonic)

Use

jump_cut_vad.py
for: Better silence detection, restart phrase support, audio enhancement, color grading.

Use

simple_video_edit.py
for: End-to-end YouTube workflow with metadata generation and upload.


Troubleshooting

"No speech detected"

  • Check that audio track exists in the video
  • Try lowering
    --min-speech
    to 0.1

Cuts feel too aggressive

  • Increase
    --padding
    (e.g., 150-200)
  • Increase
    --min-silence
    (e.g., 0.8)

Breathing sounds being cut

VAD should handle this automatically. If not:

  • Increase
    --merge-gap
    to 0.5
  • Increase
    --padding
    slightly

Restart detection not finding "cut cut"

  • Ensure you speak the phrase clearly
  • Try
    --whisper-model medium
    for better accuracy
  • Check that Whisper is installed:
    pip install whisper

LUT not applying

  • Check file path is correct
  • Ensure format is supported (.cube, .3dl, .dat, .m3d, .csp)
  • Check FFmpeg has lut3d filter:
    ffmpeg -filters | grep lut3d

Performance

Hardware Encoding (Apple Silicon)

The script automatically uses hardware encoding (

h264_videotoolbox
) on macOS when available:

EncodingSpeedFile SizeWhen Used
Hardware (h264_videotoolbox)5-10x faster~10-20% largermacOS with Apple Silicon/Intel
Software (libx264)BaselineBaselineFallback if hardware unavailable

Benchmark (90-minute 1080p video):

  • Software encoding: ~20-25 minutes
  • Hardware encoding: ~2-4 minutes

The script checks for hardware encoder availability at startup and caches the result. You'll see either:

  • 🚀 Hardware encoding enabled (h264_videotoolbox)
  • 💻 Using software encoding (libx264)

Output

  • Deliverable: Edited video at specified output path
  • Format: MP4 (H.264)
  • Encoding: Hardware (10 Mbps) or Software (CRF 18), auto-detected
  • Audio: AAC 192kbps (enhanced if
    --enhance-audio
    )
  • Resolution/FPS: Matches source