Claude-skill-registry autocut-shorts

Main orchestration skill for automatic creation of short-form content (TikTok, YouTube Shorts, Instagram Reels) from long videos. Fully automated workflow: download video, transcribe, detect highlights (transcript + laughter + sentiment + scenes), trim segments, resize to 9:16 portrait, and add subtitles. Finds viral-worthy moments like OpusClip and Vizard.ai.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/autocut-shorts" ~/.claude/skills/majiayu000-claude-skill-registry-autocut-shorts && rm -rf "$T"
manifest: skills/data/autocut-shorts/SKILL.md
source content

Autocut Shorts

This is the main orchestration skill that combines all other skills to automatically create short-form content from long videos.

What It Does

This skill automates the entire workflow:

  1. Download video from YouTube URL (if provided)
  2. Transcribe audio using Whisper or Gemini API
  3. Perform speaker diarization (pyannote or Gemini) - identifies who speaks when
  4. Detect highlights using combined analysis:
    • Transcript analysis (hooks, viral phrases)
    • Speaker dynamics (debates, interactions, overlapping speech)
    • Laughter detection (humorous moments)
    • Sentiment analysis (emotional peaks)
    • Scene detection (cut points)
  5. Select best segments (15-60 seconds each)
  6. Trim video to highlight segments
  7. Resize to 9:16 portrait format (1080x1920)
  8. Add burned-in subtitles with speaker labels
  9. Export multiple clips ready for upload

When to Use

  • User wants to create TikTok clips from a YouTube video
  • Converting podcasts to short-form content
  • Finding viral moments in vlogs or tutorials
  • Repurposing gaming content for Shorts/Reels
  • Batch processing multiple videos

Available Scripts

scripts/autocut.py

Main autocut workflow script.

Usage:

python skills/autocut-shorts/scripts/autocut.py <video_or_url> [options]

Options:

  • --source
    : Source type (file, youtube) - auto-detected
  • --num-clips
    : Number of clips to generate (default: 5)
  • --min-duration
    : Minimum clip duration in seconds (default: 15)
  • --max-duration
    : Maximum clip duration in seconds (default: 60)
  • --platform
    : Target platform (tiktok, shorts, reels, facebook) - default: tiktok
  • --output-dir
    : Output directory (default:
    ./shorts/
    )
  • --transcription-model
    : Transcription model (auto, whisper, gemini) - default: auto
  • --diarization-model
    : Speaker diarization (auto, pyannote, gemini, none) - default: auto
  • --huggingface-token
    : HuggingFace token for pyannote (or use env var)
  • --focus-speaker
    : Extract clips only for specific speaker (SPEAKER_00, etc.)
  • --gemini-api-key
    : Gemini API key (or use env var)
  • --skip-transcribe
    : Skip transcription if already have transcript
  • --skip-diarization
    : Skip speaker diarization
  • --skip-scenes
    : Skip scene detection
  • --skip-laughter
    : Skip laughter detection
  • --skip-sentiment
    : Skip sentiment analysis
  • --transcript-path
    : Use existing transcript file
  • --style
    : Subtitle style (tiktok, shorts, reels) - default: tiktok

Examples:

Basic autocut from file:

python skills/autocut-shorts/scripts/autocut.py video.mp4

Autocut from YouTube URL:

python skills/autocut-shorts/scripts/autocut.py "https://www.youtube.com/watch?v=VIDEO_ID"

Generate 10 clips for Instagram Reels:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --num-clips 10 --platform reels --style reels

Use Gemini for transcription:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model gemini

Custom duration range:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --min-duration 20 --max-duration 45

Use existing transcript:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcript-path video.srt --skip-transcribe

scripts/quick_cut.py

Quick cut without full analysis (faster).

Usage:

python skills/autocut-shorts/scripts/quick_cut.py <video_path> [options]

Options:

  • --timestamps
    : JSON file with timestamps to cut
  • --output-dir
    : Output directory
  • --platform
    : Target platform

Example:

python skills/autocut-shorts/scripts/quick_cut.py video.mp4 --timestamps cuts.json

Workflow Steps

Step 1: Download (Optional)

If URL provided:

  • Downloads from YouTube using yt-dlp
  • Best quality MP4
  • Saves to temp directory

Step 2: Transcribe

Extracts audio and transcribes:

  • Auto mode: Chooses based on requirements
  • Whisper: Local processing, good for privacy
  • Gemini: Cloud processing, better quality + features

Step 3: Detect Highlights

Runs detection modules:

  • Transcript analysis: Viral phrases, hooks, questions
  • Laughter detection: Funny moments (if enabled)
  • Sentiment analysis: Emotional peaks (if enabled)
  • Scene detection: Visual cut points (if enabled)

Step 4: Score and Rank

Combines all signals:

Virality Score = 
  35% Transcript (hooks, viral content) +
  25% Laughter (humor) +
  25% Sentiment (emotion) +
  15% Scenes (visual transitions)

Ranks all segments and selects top N.

Step 5: Trim

For each highlight:

  • Extends 2-3 seconds before/after for context
  • Trims using FFmpeg (stream copy for speed)
  • Validates duration constraints

Step 6: Resize to Portrait

Converts to 9:16:

  • Smart crop (focus on subjects)
  • 1080x1920 resolution
  • Maintains quality

Step 7: Add Subtitles

Burns in captions:

  • Platform-specific styling
  • White text with black outline
  • Bottom position
  • Readable size (24-28px)

Step 8: Export

Saves final clips:

  • Named:
    {original}_short_{index}.mp4
  • Organized in output directory
  • JSON report with metadata

Output Format

Directory Structure

shorts/
  video_short_001.mp4
  video_short_002.mp4
  video_short_003.mp4
  report.json

JSON Report

{
  "success": true,
  "source": {
    "type": "youtube",
    "url": "https://youtube.com/watch?v=...",
    "title": "Video Title",
    "duration": 1200.5
  },
  "processing": {
    "transcription_model": "gemini-flash-lite-latest",
    "detection_methods": ["transcript", "laughter", "sentiment", "scenes"],
    "platform": "tiktok"
  },
  "results": {
    "total_clips": 5,
    "clips": [
      {
        "rank": 1,
        "filename": "video_short_001.mp4",
        "start_time": 45.2,
        "end_time": 72.5,
        "duration": 27.3,
        "virality_score": 0.92,
        "text": "This is the key moment...",
        "output_path": "shorts/video_short_001.mp4"
      }
    ],
    "total_duration": 135.5,
    "avg_virality_score": 0.78
  },
  "performance": {
    "total_time": 180.5,
    "transcription_time": 45.2,
    "analysis_time": 67.3,
    "processing_time": 68.0
  }
}

Platform Presets

TikTok

  • Resolution: 1080x1920
  • Duration: 15-60 seconds
  • Subtitle style: TikTok
  • Output naming:
    _tiktok_{index}.mp4

YouTube Shorts

  • Resolution: 1080x1920
  • Duration: 15-60 seconds
  • Subtitle style: Shorts
  • Output naming:
    _shorts_{index}.mp4

Instagram Reels

  • Resolution: 1080x1920
  • Duration: 15-90 seconds
  • Subtitle style: Reels
  • Output naming:
    _reels_{index}.mp4

Facebook Reels

  • Resolution: 1080x1920
  • Duration: 15-90 seconds
  • Subtitle style: Default
  • Output naming:
    _facebook_{index}.mp4

Viral Detection Algorithm

High-Value Signals

Transcript (35% weight):

  • Viral phrases ("you won't believe", "this changes everything")
  • Hooks ("let me tell you", "here's the secret")
  • Questions and answers
  • Story beats

Laughter (25% weight):

  • Explicit laughter markers
  • High-confidence laughter detection
  • Audience reactions

Sentiment (25% weight):

  • Positive emotions (excitement, joy)
  • Surprise moments
  • Negative emotions (controversy, drama)
  • Emotional intensity > 0.7

Scenes (15% weight):

  • Scene transitions
  • Visual changes
  • Topic shifts

Scoring

virality_score = (
    transcript_score * 0.35 +
    laughter_score * 0.25 +
    sentiment_score * 0.25 +
    scene_score * 0.15
)

Premium Clips (0.8-1.0): Must include Excellent Clips (0.6-0.8): High priority Good Clips (0.4-0.6): Consider including

AI Agent Decision Guide

When to Use pyannote vs Gemini Diarization

Default Behavior (--diarization-model auto): The AI agent automatically selects based on context:

# Use pyannote when:
if "podcast" in user_request or "interview" in user_request:
    return "pyannote"  # Multi-speaker, needs accuracy
if "accurate" in user_request or "precise" in user_request:
    return "pyannote"  # User explicitly wants accuracy
if "panel" in user_request or "debate" in user_request:
    return "pyannote"  # Complex multi-speaker scenarios
if "overlapping" in user_request or "talk over" in user_request:
    return "pyannote"  # Overlapping speech detection
if "privacy" in user_request or "offline" in user_request:
    return "pyannote"  # Local processing needed

# Use Gemini when:
if "quick" in user_request or "fast" in user_request:
    return "gemini"  # Speed priority
if "single speaker" in user_request or "monologue" in user_request:
    return "gemini"  # Simple scenario
if "no diarization" in user_request or "skip speakers" in user_request:
    return "none"  # User doesn't want speaker detection

# Default for ambiguous cases:
return "pyannote" if likely_multi_speaker(video) else "gemini"

Decision Matrix:

ScenarioRecommendedReason
Podcast with 2-3 hostspyannoteHigh accuracy for multi-speaker
Interview (host + guest)pyannotePrecise speaker separation
Panel discussionpyannoteHandles 4+ speakers well
Single speaker vloggeminiFaster, good enough
Gaming commentarygeminiUsually 1-2 speakers
Tutorial videogeminiSingle speaker, speed matters
Debate/competitivepyannoteOverlapping speech detection
Privacy-sensitivepyannoteLocal processing

Examples by Use Case:

# Podcast - use pyannote automatically
python skills/autocut-shorts/scripts/autocut.py podcast.mp4

# Interview - use pyannote for accuracy
python skills/autocut-shorts/scripts/autocut.py interview.mp4

# Vlog - use gemini (single speaker, faster)
python skills/autocut-shorts/scripts/autocut.py vlog.mp4

# Force pyannote explicitly
python skills/autocut-shorts/scripts/autocut.py video.mp4 --diarization-model pyannote

# Skip diarization for simple content
python skills/autocut-shorts/scripts/autocut.py tutorial.mp4 --diarization-model none

# Extract only host's segments
python skills/autocut-shorts/scripts/autocut.py podcast.mp4 --focus-speaker SPEAKER_00

Smart Defaults

The agent automatically detects:

  1. Content type (podcast, vlog, tutorial, gaming, etc.)
  2. Likely speaker count based on audio patterns
  3. User priority (speed vs accuracy vs privacy)
  4. Available resources (GPU, internet, API keys)

Override any time: Users can always override with

--diarization-model
flag.

Integration

This skill uses all other skills:

  • youtube-downloader
    : Download from URL
  • video-transcriber
    : Transcribe audio
  • scene-detector
    : Find visual cut points
  • laughter-detector
    : Find funny moments
  • sentiment-analyzer
    : Find emotional peaks
  • highlight-scanner
    : Combine all signals
  • video-trimmer
    : Cut segments
  • portrait-resizer
    : Convert to 9:16
  • subtitle-overlay
    : Add captions

Common Use Cases

Podcast to Shorts

python skills/autocut-shorts/scripts/autocut.py podcast.mp4 --num-clips 10 --platform shorts

Vlog Highlights

python skills/autocut-shorts/scripts/autocut.py vlog.mp4 --num-clips 5 --platform tiktok

YouTube to TikTok

python skills/autocut-shorts/scripts/autocut.py "https://youtube.com/watch?v=..." --platform tiktok

Tutorial Clips

python skills/autocut-shorts/scripts/autocut.py tutorial.mp4 --min-duration 30 --max-duration 60

Performance

Processing Time (approximate):

  • 1-minute video: ~30-60 seconds
  • 10-minute video: ~3-5 minutes
  • 30-minute video: ~8-12 minutes
  • 1-hour video: ~15-25 minutes

Breakdown:

  • Download: 5-30 seconds (depends on video)
  • Transcription: 20-60 seconds
  • Detection: 10-30 seconds per method
  • Trimming: 1-5 seconds per clip
  • Resizing: 5-10 seconds per clip
  • Subtitles: 5-10 seconds per clip

Error Handling

  • Download failure: Retries up to 3 times
  • Transcription failure: Falls back to alternative model
  • No highlights found: Returns error with suggestions
  • Processing failure: Reports which step failed
  • Partial success: Reports successful clips vs failed

Tips

  • Use Gemini transcription for best highlight detection
  • Provide more clips requested than needed (filter by score)
  • 15-30 second clips perform best on TikTok
  • 30-60 second clips work well for Shorts/Reels
  • Keep 2-3 second buffer around highlights
  • Test different platforms for best engagement
  • Use transcript-only mode for faster processing
  • Batch process multiple videos for efficiency

References