ZeinaGuard-Pro video-editing
Edit video files with FFmpeg — trim, merge, add subtitles, effects, voiceovers, and reformat.
git clone https://github.com/Mo7amed-Mahmoud/ZeinaGuard-Pro
T=$(mktemp -d) && git clone --depth=1 https://github.com/Mo7amed-Mahmoud/ZeinaGuard-Pro "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.local/secondary_skills/video-editing" ~/.claude/skills/mo7amed-mahmoud-zeinaguard-pro-video-editing && rm -rf "$T"
.local/secondary_skills/video-editing/SKILL.mdVideo Editing
Edit existing video files server-side using FFmpeg. This skill covers the full editing workflow: trimming, splicing, concatenating, transitions, text overlays, subtitles, audio mixing, effects, and format conversion.
When to Use
Trimming & Cutting
-
"Trim this video to 0:30-1:15" / "Cut out the first 10 seconds"
-
"Extract a clip from 2:00 to 3:30"
-
"Remove the intro" / "Cut the ending"
Merging & Compositing
-
"Merge these clips together" / "Concatenate these videos"
-
"Put two videos side by side" / "Split screen"
-
"Picture-in-picture"
Text, Titles & Subtitles
-
"Add subtitles" / "Add captions"
-
"Put a title card at the beginning"
-
"Add text overlay"
Audio
-
"Add background music" / "Mix in this audio track"
-
"Add a voiceover" / "Narrate this video" (uses ElevenLabs)
-
"Mute this video" / "Remove the audio"
-
"Replace the audio with this track"
Speed & Motion
-
"Speed up this video 2x" / "Slow motion"
-
"Make this part slow-mo"
-
"Reverse this clip"
-
"Loop this clip 3 times"
Visual Effects & Fixes
-
"Rotate this video 90 degrees" / "Flip horizontally"
-
"Stabilize this shaky footage"
-
"Crop out the watermark" / "Remove the watermark"
-
"Add my logo to this video" (image overlay / branding)
-
"Color correct" / "Make it brighter" / "Adjust contrast"
Format & Size
-
"Convert this MP4 to WebM" / "Change format"
-
"Compress this video" / "Make this file smaller"
-
"Resize to 720p" / "Downscale" / "Change resolution"
-
"Extract the audio as MP3"
-
"Create a GIF from this clip"
-
"Make a thumbnail from this video" (frame extraction as image)
Social Media & Platform
-
"Make TikTok clips from this" / "Break this into Reels"
-
"Make this vertical for Shorts"
-
"Reframe for Instagram" / "LinkedIn" / "Pinterest" / "Facebook" / "Snapchat"
-
"Convert this to square for Instagram feed"
AI-Powered Analysis
-
"Find the best moments" / "Auto-trim for virality"
-
"Extract the highlights" / "Score these clips"
-
"Which parts are most engaging?"
Cleanup & Optimization
-
"Remove dead space" / "Remove silence"
-
"Shorten this by cutting out boring parts"
-
"Tighten up this video"
Chunking & Splitting
-
"Split this into short clips" / "Chunk this for TikTok"
-
"Break this talk into Reels-length videos"
-
"Make 15-second clips from this"
No Direction (user just uploads a video)
- Probe the file, report what it is, and ask what they want to do
When NOT to Use
- User wants to create an animated video from scratch using code (use the
skill instead)video-js - User wants a browser-based video editor UI (build as a
artifact with this skill powering the backend)react-vite
User Interaction — ALWAYS Ask Before Acting
When a user uploads a video without specific instructions, you MUST ask what they want to do with it. Do NOT assume a workflow or start processing automatically.
Probe the video first (to report basic info), then ask the user what they'd like. Here's a good pattern:
- Probe the file and tell the user what they uploaded (duration, resolution, format)
- Ask what they want to do. Offer common options like:
-
Trim or cut specific sections
-
Find the best/most viral moments
-
Break it into short clips for a specific platform (TikTok, Reels, Shorts, X)
-
Remove dead space / silence
-
Add text, subtitles, or a voiceover
-
Reframe for a different aspect ratio
-
Convert to a different format
-
Something else
When the user specifies a platform, the full pipeline includes reframing to that platform's native format. Do not just chunk the video and leave it in the original aspect ratio. Always deliver platform-ready output — correct aspect ratio, resolution, and duration range.
Platform Specifications
| Platform | Aspect Ratio | Resolution | Duration Range | Ideal Length | Max File Size |
|----------|-------------|------------|---------------|-------------|--------------|
| TikTok | 9:16 vertical | 1080×1920 | 10–45s | 25s | 287 MB |
| Instagram Reels | 9:16 vertical | 1080×1920 | 10–30s | 20s | 250 MB |
| Instagram Stories | 9:16 vertical | 1080×1920 | 1–60s | 15s | 250 MB |
| Instagram Feed | 1:1 square or 4:5 portrait | 1080×1080 or 1080×1350 | 3–60s | 30s | 250 MB |
| YouTube Shorts | 9:16 vertical | 1080×1920 | 15–60s | 40s | 256 MB |
| YouTube (standard) | 16:9 landscape | 1920×1080 | any | any | 256 GB |
| X / Twitter | 16:9 landscape | 1280×720 | 10–45s | 25s | 512 MB |
| X / Twitter (square) | 1:1 square | 720×720 | 10–45s | 25s | 512 MB |
| Facebook Reels | 9:16 vertical | 1080×1920 | 10–30s | 20s | 250 MB |
| Facebook Feed | 16:9 or 1:1 | 1920×1080 or 1080×1080 | any | 15–60s | 4 GB |
| LinkedIn | 16:9 landscape or 1:1 | 1920×1080 or 1080×1080 | 10–60s | 30s | 5 GB |
| Pinterest | 9:16 or 2:3 vertical | 1080×1920 or 1000×1500 | 6–60s | 15–30s | 2 GB |
| Snapchat Spotlight | 9:16 vertical | 1080×1920 | 5–60s | 15–30s | 250 MB |
When the user says a platform name, apply both the correct duration targets AND the correct aspect ratio/resolution. For example:
-
"Make TikTok clips" → chunk to 10-45s AND reframe to 9:16 (1080×1920)
-
"Instagram feed posts" → chunk to 3-60s AND reframe to 1:1 (1080×1080) or 4:5 (1080×1350)
-
"YouTube Shorts" → chunk to 15-60s AND reframe to 9:16 (1080×1920)
-
"Clips for X" → chunk to 10-45s, keep 16:9 (1280×720)
When the user asks for clips or chunking without specifying a platform, ask which platform they're targeting so you can apply the right duration targets and reframing.
When you deliver results, always present the output files so the user can download and preview them. Never just describe what was done — show the files.
Environment
FFmpeg 6.1.2 is pre-installed in Replit with full codec support:
-
Video codecs: H.264 (libx264), H.265 (libx265), VP8/VP9 (libvpx), AV1 (libaom, libsvtav1, librav1e), Theora
-
Audio codecs: AAC, MP3 (libmp3lame), Opus, Vorbis, FLAC, WAV
-
Image formats: PNG, JPEG, WebP, JPEG XL (libjxl)
-
Subtitle rendering: libass (for styled subtitles), libfreetype + libfontconfig (for drawtext)
-
Filters: All standard filters available — scale, crop, overlay, drawtext, concat, fade, crossfade, etc.
-
Tools:
,ffmpeg
(both available on PATH)ffprobe
No additional installation is needed for FFmpeg itself.
Architecture Decisions
Script-based editing (simple tasks)
For one-off or single-operation tasks (trim a clip, convert a format, extract audio), write a Node.js script in
scripts/src/usingfluent-ffmpegor directchild_process.execFile calls to FFmpeg.
pnpm --filter @workspace/scripts add fluent-ffmpeg pnpm --filter @workspace/scripts add -D @types/fluent-ffmpeg
API-based editing (complex workflows or UI)
For multi-step editing workflows or when building a video editor UI, add routes to the API server (
artifacts/api-server/) that accept video files, process them with FFmpeg, and return results.
pnpm --filter @workspace/api-server add fluent-ffmpeg multer pnpm --filter @workspace/api-server add -D @types/fluent-ffmpeg @types/multer
Choosing the right approach
| Scenario | Approach |
|----------|----------|
| "Trim this video to 0:30-1:15" | Script in
scripts/src/ |
| "Convert this MP4 to WebM" | Script in
scripts/src/ |
| "Build me a video editor app" | React-vite artifact + API routes |
| "Add subtitles to all my videos" | Script in
scripts/src/ |
| "Merge these 5 clips with transitions" | Script in
scripts/src/ (or API if repeated) |
Instructions
1. Probe Before Processing
Always probe the input file first to understand its properties. This prevents errors from wrong assumptions about codecs, resolution, frame rate, or duration.
import { execFile } from 'child_process'; import { promisify } from 'util'; const execFileAsync = promisify(execFile); async function probeVideo(filePath: string) { const { stdout } = await execFileAsync('ffprobe', [ '-v', 'quiet', '-print_format', 'json', '-show_format', '-show_streams', filePath ]); return JSON.parse(stdout); }
2. Use fluent-ffmpeg for Complex Operations
fluent-ffmpeg provides a chainable API that is easier to maintain than raw command strings. Use it for multi-step operations.
import ffmpeg from 'fluent-ffmpeg'; function trimVideo(input: string, output: string, start: string, duration: string): Promise<void> { return new Promise((resolve, reject) => { ffmpeg(input) .setStartTime(start) .setDuration(duration) .output(output) .on('end', resolve) .on('error', reject) .run(); }); }
3. Use execFile for Simple One-Liners
For straightforward operations where the FFmpeg command is well-known, calling FFmpeg directly is cleaner.
await execFileAsync('ffmpeg', [ '-i', inputPath, '-vf', 'scale=1280:720', '-c:a', 'copy', outputPath ]);
4. File Management
-
Store uploaded/input files in a
ortmp/
directoryuploads/ -
Store output files where the user can access them — either
for web serving or a known output directorypublic/ -
Clean up temporary files after processing
-
For large files, stream the output rather than buffering in memory
5. Progress Reporting
FFmpeg outputs progress to stderr. Parse it for progress reporting in longer operations:
ffmpeg(input) .output(output) .on('progress', (progress) => { console.log(`Processing: ${progress.percent?.toFixed(1)}% done`); }) .on('end', () => console.log('Done')) .run();
6. Error Handling
FFmpeg errors are often cryptic. Always:
-
Log the full stderr output on failure
-
Validate input files exist before processing
-
Check that output codecs are compatible with the output container
-
Use
flag to overwrite output files without prompting-y
AI-Powered Virality Scoring
Use AI vision to analyze video content and score it on viral potential. This feature has two modes depending on the workflow.
This feature requires an AI integration (OpenAI or Gemini) to be set up. Set up the AI integration following the
ai-integrations-openaiorai-integrations-gemini skill before proceeding. OpenAI is preferred because gpt-5.2 handles image analysis well and is the default for most tasks. Gemini (gemini-2.5-flash) is a good alternative, especially for high-volume analysis since it supports video/image input natively.
Two Scoring Modes
Mode 1: Segment scoring (find the best moments) — For when the user says "find the best clips" or "auto-trim for virality." Detects scenes, scores small segments, and exports the top moments. Good for finding highlights in raw footage.
Mode 2: Clip scoring (rank ready-to-post clips)**— For when clips already exist (from chunking, manual trimming, etc.) and the user wants to know which ones to post first. Scores each complete clip with a richer evaluation including narrative arc and standalone quality.**This is the preferred mode when used with the chunking pipeline
Recommended Pipeline
When the user has a long video and wants social-ready clips, use this order:
Long Video → Dead Space Removal → Chunking (10-60s clips) → Clip-Level Scoring → Reframe for Platform → Output
This way the AI scores complete, self-contained clips rather than tiny scene fragments. The scores are much more meaningful because each clip is what will actually be posted.
For "find the best moment in this video" requests (no chunking), use the original segment-level pipeline:
Video → Scene Detection → Frame Extraction → Segment Scoring → Rank → Trim/Export
When the user asks for auto-trim, ask which output they want
Present these three options and let the user choose each time. Do not assume a default:
-
Best clip — The single highest-scoring segment/clip, trimmed and exported
-
Multiple clips — Several top segments/clips exported as separate files, ranked by score
-
Highlight reel — Top moments stitched together with crossfade transitions into one video
Scoring Criteria
Segment scoring evaluates: visual dynamism, emotional impact, hook potential, pacing/energy, uniqueness, and shareability. Each factor gets a 1-10 score, weighted to produce a final virality score.
Clip scoring adds three additional criteria for complete clips:
-
Narrative completeness (weight: 15%) — Does the clip tell a complete micro-story? Does it have a clear beginning, middle, and end? Would a viewer feel satisfied or intrigued, not confused?
-
Hook-to-payoff flow (weight: 10%) — Does the clip open with something that grabs attention and deliver on that promise? Or does it start slow and meander?
-
Standalone quality (weight: 10%) — Would this clip make sense to someone who hasn't seen the full video? Can it be posted without context?
These three criteria reduce the weights of the original six factors proportionally so the total still sums to 100%.
Frame Extraction
- Segments (2-10s): Extract 3 frames evenly spread across the segment.
- Clips (10-60s): Extract 5-8 frames to capture the full arc. Always include the first frame (hook evaluation) and last frame (payoff evaluation).
Workflow
For the complete step-by-step implementation with code examples, see:
— Full virality analysis pipeline, AI prompts, scoring criteria, and output assemblyvirality-scoring.md
Key Gotchas (learned from testing)
-
Scene detection: Use
withffmpeg
+select
filters and parse stderr — theshowinfo
approach doesn't work reliably in Replit.ffprobe -f lavfi -
Content safety: OpenAI vision may reject frames from documentary/medical/news content. Always wrap AI calls in try/catch and skip failed segments gracefully.
-
Segment indexing: Use
when filtering short segments, not the loop counterindex: segments.length
.i -
Package setup: For scripts, install
directly (openai
) and create the client withpnpm add -w openai
/AI_INTEGRATIONS_OPENAI_BASE_URL
env vars. No need for the full workspace integration library.AI_INTEGRATIONS_OPENAI_API_KEY -
Timeouts: A 90-second video with ~13 segments takes 1-2 minutes for AI analysis. Warn the user and set generous timeouts.
-
Minimum segment duration: Use 2.0s minimum — segments under 2 seconds waste API calls and don't produce useful scores.
-
Prefer clip-level scoring: When the video has already been chunked, always score the chunks rather than re-running segment detection. The clips are what will actually be posted.
Dead Space Removal
Automatically detect and remove silence, dead air, and filler from a video to produce a tighter, more engaging cut. This is especially useful for talks, interviews, podcasts, tutorials, and raw footage.
How It Works
-
Silence detection — Use FFmpeg's
filter to find all silent intervals (configurable threshold and minimum duration)silencedetect -
Segment extraction — Extract all non-silent segments as individual clips
-
Reassembly — Concatenate the non-silent segments back together with optional brief crossfade transitions
-
Optional: AI-assisted filler removal — For more aggressive editing (removing "ums", filler words, repetitive sections), combine silence detection with AI transcription analysis
When the user asks to remove dead space
Ask these clarifying questions if not clear from context:
- How aggressive? Light (remove only true silence, > 1s gaps) vs. aggressive (remove short pauses > 0.3s for a fast-paced edit)
- Transitions? Hard cuts between segments (default) or brief crossfades (0.2-0.5s) for smoother flow
Configuration
| Preset | Silence Threshold | Min Silence Duration | Use Case |
|--------|------------------|---------------------|----------|
| Light | -40dB | 1.0s | Talks, interviews — remove obvious dead air |
| Medium | -35dB | 0.5s | Podcasts, tutorials — tighter pacing |
| Aggressive | -30dB | 0.3s | Fast-paced edits, social media — maximum tightness |
Implementation
For the complete FFmpeg commands and Node.js pipeline, see:
— See the "Dead Space Removal" sectiondead-space-and-chunking.md
Quick Summary
Video → Silence Detection (FFmpeg silencedetect) → Identify Non-Silent Ranges → Extract Segments → Concatenate → Output
Social Media Chunking
Break a longer video (ad, promo, talk, presentation, livestream) into self-contained clips suitable for TikTok, Reels, and Shorts. Each clip should be 10-60 seconds and feel complete on its own.
How It Works (2)
-
Scene detection + silence detection — Find natural break points using both visual scene changes and audio silence gaps
-
Smart boundary selection — Merge adjacent segments into clips targeting 10-60 seconds, preferring to break at silence or scene changes rather than mid-sentence
-
AI content analysis (optional) — Score each potential clip with AI to identify which ones are worth posting, suggest captions, and recommend clip order
-
Export — Output each clip as a separate file, numbered and ranked
Clip Length Guidelines
| Platform | Ideal Length | Max Length | Notes |
|----------|-------------|------------|-------|
| TikTok | 15-45s | 10 min | 15-30s performs best for new accounts |
| Instagram Reels | 15-30s | 90s | Under 30s gets more reach |
| YouTube Shorts | 30-60s | 60s | Hard cap at 60 seconds |
| X / Twitter | 15-45s | 2m 20s | Shorter gets more engagement |
When the user asks to chunk a video
Ask which platform they're targeting (or default to TikTok at 15-45s). The chunking algorithm should:
-
Never cut mid-sentence — Always break at silence gaps or natural pauses
-
Prefer scene boundaries — Break where the visual content changes
-
Ensure each clip stands alone — Each clip should have a clear beginning, not start mid-thought
-
Target the platform's sweet spot — Not just "under 60s" but actually the ideal range for that platform
Implementation (2)
For the complete pipeline with code examples, see:
— See the "Social Media Chunking" sectiondead-space-and-chunking.md
Quick Summary (2)
Video → Scene Detection + Silence Detection → Merge into 10-60s Clips at Natural Boundaries → (Optional) AI Scoring → Export Individual Clips
Combining with Other Features
These features work well together in a pipeline:
Long Video → Dead Space Removal → Chunking into Clips → Virality Scoring → Reframe for TikTok → Output
This takes a raw long-form video and produces polished, platform-ready social clips end to end.
Voiceovers with ElevenLabs
Add AI-generated voiceovers to videos using ElevenLabs text-to-speech, then mix the audio into the video with FFmpeg.
Prerequisites
ElevenLabs is available as a Replit integration (connector). The user needs to connect their ElevenLabs account through Replit's integration system — no manual API keys needed. Search for the integration and propose it to the user:
const results = await searchIntegrations("elevenlabs"); // Then propose the connector so the user can authorize await proposeIntegration("connector:ccfg_elevenlabs_...");
After authorization, use
addIntegrationto wire it to the project, then uselistConnections('elevenlabs') in the code execution sandbox to get the credentials.
Workflow (2)
- Generate voiceover audio — Send text to ElevenLabs TTS API, receive audio file
- Mix into video — Use FFmpeg to add the voiceover as a new audio track, optionally ducking (lowering volume of) existing audio
Voiceover Modes
When the user asks for a voiceover, they may want one of these:
-
Replace audio — Remove existing audio entirely, use only the voiceover
-
Mix over — Layer voiceover on top of existing audio (with existing audio volume reduced)
-
Add as track — Keep existing audio at full volume, add voiceover on top
Ask which mode the user prefers if not clear from context.
Implementation (3)
For the complete step-by-step implementation with code examples, see:
— ElevenLabs TTS integration, voice selection, audio mixing with FFmpeg, and timed voiceover segmentsvoiceover.md
Quick Summary (3)
Text Script → ElevenLabs TTS → Audio File → FFmpeg Mix with Video → Output
Key capabilities:
-
Choose from ElevenLabs' voice library or cloned voices
-
Control voice settings (stability, similarity boost, style)
-
Generate voiceover for the full video or specific time segments
-
Duck existing audio under voiceover sections
-
Support for multiple languages
Social Media Reframing
Automatically reframe videos for different social media platforms. Each platform has a preferred aspect ratio and resolution — this feature handles the conversion intelligently.
Platform Specs
| Platform | Aspect Ratio | Resolution | Notes |
|----------|-------------|------------|-------|
| YouTube | 16:9 | 1920x1080 | Standard landscape, also supports 4K (3840x2160) |
| TikTok / Reels | 9:16 | 1080x1920 | Vertical video, max 60s for Reels, 10min for TikTok |
| Instagram Feed | 1:1 | 1080x1080 | Square format |
| X / Twitter | 16:9 or 1:1 | 1280x720 or 720x720 | Supports both, 2min 20s max |
Reframing Strategies
When converting between aspect ratios, there are three strategies:
-
Center crop (default) — Crop from the center to fill the target ratio. Fast, works well for most content. May cut off edges.
-
Letterbox/pillarbox — Add black bars (or blurred background) to fit without cropping. Preserves all content but adds empty space.
-
Blurred fill — Use a blurred, scaled-up version of the video as the background behind the original. Looks much better than black bars, especially for vertical reframing. This is the recommended approach for going from 16:9 to 9:16.
When the user asks to reframe, ask which platform they're targeting. Use blurred fillby default for aspect ratio changes that would lose significant content (e.g., landscape to portrait). Usecenter cropwhen the content is center-focused. Useletterbox only if the user explicitly asks for it.
Implementation (4)
For the complete FFmpeg commands for each reframing strategy, see:
— See the "Social Media Reframing" sectionoperations.md
Multi-platform Export
When the user wants to export for multiple platforms at once, generate all versions in a batch:
Input (16:9) → YouTube (copy) + TikTok (9:16 blurred fill) + Instagram (1:1 center crop) + X (1280x720 copy)
This is a common workflow — the user shoots in landscape and needs versions for every platform.
Common Operations Reference
For detailed FFmpeg commands and patterns for each operation type, see:
-
— Complete command reference for all supported editing operationsoperations.md -
— AI-powered virality analysis and auto-trim pipelinevirality-scoring.md -
— ElevenLabs voiceover generation and audio mixingvoiceover.md -
— Dead space removal and social media chunking pipelinesdead-space-and-chunking.md
Building a Video Editor UI
If the user wants a visual editor interface:
-
Create a
artifact for the frontendreact-vite -
Add API routes to
for video processingartifacts/api-server/ -
Use
for file uploads to the APImulter -
Process videos with FFmpeg on the server
-
Return processed files for download or preview
-
Use the
skill for persisting uploaded and processed files if neededobject-storage
Key frontend considerations:
-
Use
element for preview playback<video> -
Show a timeline UI for trim/cut operations
-
Display progress during processing (poll an API status endpoint or use Server-Sent Events)
-
Provide download links for processed output
Tested Bugs & Caveats (Do Not Regress)
These issues were discovered during real-world testing and are documented here to prevent regressions:
-
Scene detection: The
approach does NOT work in Replit. Useffprobe -f lavfi
withffmpeg
and parseselect='gt(scene,threshold)',showinfo
from stderr.pts_time -
Segment indexing: Use
when building segments, NOT the loop counterindex: segments.length
— filtered segments cause index mismatches.i -
Content safety: OpenAI vision may reject frames (medical/documentary content). Always wrap AI calls in try/catch and skip gracefully.
-
Frame extraction timestamp clamping: Clamp the last frame timestamp to
. Without this, the last clip in a video fails becausesegment.endTime - 0.1
can exceed the actual file duration due to floating point arithmetic.startTime + 1.0 * duration -
filter: Only strips audio silence, NOT video frames. Must use segmented extract+concat approach for proper dead space removal.silenceremove -
Dead space with many segments: Videos with lots of short dialogue (50+ speaking segments) can be very slow to process with re-encoding. Use
for testing, or skip dead space removal for dialogue-heavy content where there's little dead space anyway.-preset ultrafast -
OpenAI for scripts: Install with
, create client withpnpm add -w openai
andAI_INTEGRATIONS_OPENAI_BASE_URL
env vars.AI_INTEGRATIONS_OPENAI_API_KEY -
AI scoring duration: A 90-second video with ~13 segments takes 1-2 minutes for AI analysis. A 3.5-minute video with 9 clips takes ~3 minutes. Always warn the user.
Limitations
-
No GPU acceleration — Replit containers do not have GPU access, so hardware encoding (NVENC, VAAPI) is unavailable. Use software encoding (libx264, libx265, libvpx-vp9).
-
Memory and CPU — Very large files or high-resolution encoding (4K+) may be slow or hit memory limits. For large files, prefer stream-copy (
) where possible and avoid unnecessary re-encoding.-c copy -
Disk space — Video files are large. Clean up temporary files after processing. Monitor disk usage for batch operations.
-
No real-time preview — FFmpeg processes offline. Users cannot preview effects in real-time during editing (unlike desktop NLEs). Generate short previews of segments instead.