ZeinaGuard-Pro video-editing

Edit video files with FFmpeg — trim, merge, add subtitles, effects, voiceovers, and reformat.

install

source · Clone the upstream repo

git clone https://github.com/Mo7amed-Mahmoud/ZeinaGuard-Pro

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Mo7amed-Mahmoud/ZeinaGuard-Pro "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.local/secondary_skills/video-editing" ~/.claude/skills/mo7amed-mahmoud-zeinaguard-pro-video-editing && rm -rf "$T"

manifest: .local/secondary_skills/video-editing/SKILL.md

source content

Video Editing

Edit existing video files server-side using FFmpeg. This skill covers the full editing workflow: trimming, splicing, concatenating, transitions, text overlays, subtitles, audio mixing, effects, and format conversion.

When to Use

Trimming & Cutting

"Trim this video to 0:30-1:15" / "Cut out the first 10 seconds"
"Extract a clip from 2:00 to 3:30"
"Remove the intro" / "Cut the ending"

Merging & Compositing

"Merge these clips together" / "Concatenate these videos"
"Put two videos side by side" / "Split screen"
"Picture-in-picture"

Text, Titles & Subtitles

"Add subtitles" / "Add captions"
"Put a title card at the beginning"
"Add text overlay"

Audio

"Add background music" / "Mix in this audio track"
"Add a voiceover" / "Narrate this video" (uses ElevenLabs)
"Mute this video" / "Remove the audio"
"Replace the audio with this track"

Speed & Motion

"Speed up this video 2x" / "Slow motion"
"Make this part slow-mo"
"Reverse this clip"
"Loop this clip 3 times"

Visual Effects & Fixes

"Rotate this video 90 degrees" / "Flip horizontally"
"Stabilize this shaky footage"
"Crop out the watermark" / "Remove the watermark"
"Add my logo to this video" (image overlay / branding)
"Color correct" / "Make it brighter" / "Adjust contrast"

Format & Size

"Convert this MP4 to WebM" / "Change format"
"Compress this video" / "Make this file smaller"
"Resize to 720p" / "Downscale" / "Change resolution"
"Extract the audio as MP3"
"Create a GIF from this clip"
"Make a thumbnail from this video" (frame extraction as image)

Social Media & Platform

"Make TikTok clips from this" / "Break this into Reels"
"Make this vertical for Shorts"
"Reframe for Instagram" / "LinkedIn" / "Pinterest" / "Facebook" / "Snapchat"
"Convert this to square for Instagram feed"

AI-Powered Analysis

"Find the best moments" / "Auto-trim for virality"
"Extract the highlights" / "Score these clips"
"Which parts are most engaging?"

Cleanup & Optimization

"Remove dead space" / "Remove silence"
"Shorten this by cutting out boring parts"
"Tighten up this video"

Chunking & Splitting

"Split this into short clips" / "Chunk this for TikTok"
"Break this talk into Reels-length videos"
"Make 15-second clips from this"

No Direction (user just uploads a video)

Probe the file, report what it is, and ask what they want to do

When NOT to Use

User wants to create an animated video from scratch using code (use the
```
video-js
```
skill instead)
User wants a browser-based video editor UI (build as a
```
react-vite
```
artifact with this skill powering the backend)

User Interaction — ALWAYS Ask Before Acting

When a user uploads a video without specific instructions, you MUST ask what they want to do with it. Do NOT assume a workflow or start processing automatically.

Probe the video first (to report basic info), then ask the user what they'd like. Here's a good pattern:

Probe the file and tell the user what they uploaded (duration, resolution, format)
Ask what they want to do. Offer common options like:

Trim or cut specific sections
Find the best/most viral moments
Break it into short clips for a specific platform (TikTok, Reels, Shorts, X)
Remove dead space / silence
Add text, subtitles, or a voiceover
Reframe for a different aspect ratio
Convert to a different format
Something else

When the user specifies a platform, the full pipeline includes reframing to that platform's native format. Do not just chunk the video and leave it in the original aspect ratio. Always deliver platform-ready output — correct aspect ratio, resolution, and duration range.

Platform Specifications

|----------|-------------|------------|---------------|-------------|--------------|

| TikTok | 9:16 vertical | 1080×1920 | 10–45s | 25s | 287 MB |

| Instagram Reels | 9:16 vertical | 1080×1920 | 10–30s | 20s | 250 MB |

| Instagram Stories | 9:16 vertical | 1080×1920 | 1–60s | 15s | 250 MB |

| Instagram Feed | 1:1 square or 4:5 portrait | 1080×1080 or 1080×1350 | 3–60s | 30s | 250 MB |

| YouTube Shorts | 9:16 vertical | 1080×1920 | 15–60s | 40s | 256 MB |

| YouTube (standard) | 16:9 landscape | 1920×1080 | any | any | 256 GB |

| X / Twitter | 16:9 landscape | 1280×720 | 10–45s | 25s | 512 MB |

| X / Twitter (square) | 1:1 square | 720×720 | 10–45s | 25s | 512 MB |

| Facebook Reels | 9:16 vertical | 1080×1920 | 10–30s | 20s | 250 MB |

| Facebook Feed | 16:9 or 1:1 | 1920×1080 or 1080×1080 | any | 15–60s | 4 GB |

| LinkedIn | 16:9 landscape or 1:1 | 1920×1080 or 1080×1080 | 10–60s | 30s | 5 GB |

| Pinterest | 9:16 or 2:3 vertical | 1080×1920 or 1000×1500 | 6–60s | 15–30s | 2 GB |

| Snapchat Spotlight | 9:16 vertical | 1080×1920 | 5–60s | 15–30s | 250 MB |

When the user says a platform name, apply both the correct duration targets AND the correct aspect ratio/resolution. For example:

"Make TikTok clips" → chunk to 10-45s AND reframe to 9:16 (1080×1920)
"Instagram feed posts" → chunk to 3-60s AND reframe to 1:1 (1080×1080) or 4:5 (1080×1350)
"YouTube Shorts" → chunk to 15-60s AND reframe to 9:16 (1080×1920)
"Clips for X" → chunk to 10-45s, keep 16:9 (1280×720)

When the user asks for clips or chunking without specifying a platform, ask which platform they're targeting so you can apply the right duration targets and reframing.

When you deliver results, always present the output files so the user can download and preview them. Never just describe what was done — show the files.

Environment

FFmpeg 6.1.2 is pre-installed in Replit with full codec support:

Video codecs: H.264 (libx264), H.265 (libx265), VP8/VP9 (libvpx), AV1 (libaom, libsvtav1, librav1e), Theora
Audio codecs: AAC, MP3 (libmp3lame), Opus, Vorbis, FLAC, WAV
Image formats: PNG, JPEG, WebP, JPEG XL (libjxl)
Subtitle rendering: libass (for styled subtitles), libfreetype + libfontconfig (for drawtext)
Filters: All standard filters available — scale, crop, overlay, drawtext, concat, fade, crossfade, etc.
Tools:
```
ffmpeg
```
,
```
ffprobe
```
(both available on PATH)

No additional installation is needed for FFmpeg itself.

Architecture Decisions

Script-based editing (simple tasks)

For one-off or single-operation tasks (trim a clip, convert a format, extract audio), write a Node.js script in

scripts/src/

using

fluent-ffmpeg

or direct

child_process.execFile

calls to FFmpeg.


pnpm --filter @workspace/scripts add fluent-ffmpeg

pnpm --filter @workspace/scripts add -D @types/fluent-ffmpeg

API-based editing (complex workflows or UI)

For multi-step editing workflows or when building a video editor UI, add routes to the API server (

artifacts/api-server/

) that accept video files, process them with FFmpeg, and return results.


pnpm --filter @workspace/api-server add fluent-ffmpeg multer

pnpm --filter @workspace/api-server add -D @types/fluent-ffmpeg @types/multer

Choosing the right approach

| Scenario | Approach |

|----------|----------|

| "Trim this video to 0:30-1:15" | Script in

scripts/src/

| "Convert this MP4 to WebM" | Script in

scripts/src/

| "Build me a video editor app" | React-vite artifact + API routes |

| "Add subtitles to all my videos" | Script in

scripts/src/

| "Merge these 5 clips with transitions" | Script in

scripts/src/

(or API if repeated) |

Instructions

1. Probe Before Processing

Always probe the input file first to understand its properties. This prevents errors from wrong assumptions about codecs, resolution, frame rate, or duration.


import { execFile } from 'child_process';

import { promisify } from 'util';

const execFileAsync = promisify(execFile);

async function probeVideo(filePath: string) {

const { stdout } = await execFileAsync('ffprobe', [

'-v', 'quiet',

'-print_format', 'json',

'-show_format',

'-show_streams',

filePath

]);

return JSON.parse(stdout);

}

2. Use fluent-ffmpeg for Complex Operations

fluent-ffmpeg provides a chainable API that is easier to maintain than raw command strings. Use it for multi-step operations.


import ffmpeg from 'fluent-ffmpeg';

function trimVideo(input: string, output: string, start: string, duration: string): Promise<void> {

return new Promise((resolve, reject) => {

ffmpeg(input)

.setStartTime(start)

.setDuration(duration)

.output(output)

.on('end', resolve)

.on('error', reject)

.run();

});

}

3. Use execFile for Simple One-Liners

For straightforward operations where the FFmpeg command is well-known, calling FFmpeg directly is cleaner.


await execFileAsync('ffmpeg', [

'-i', inputPath,

'-vf', 'scale=1280:720',

'-c:a', 'copy',

outputPath

]);

4. File Management

Store uploaded/input files in a
```
tmp/
```
or
```
uploads/
```
directory
Store output files where the user can access them — either
```
public/
```
for web serving or a known output directory
Clean up temporary files after processing
For large files, stream the output rather than buffering in memory

5. Progress Reporting

FFmpeg outputs progress to stderr. Parse it for progress reporting in longer operations:


ffmpeg(input)

.output(output)

.on('progress', (progress) => {

console.log(`Processing: ${progress.percent?.toFixed(1)}% done`);

})

.on('end', () => console.log('Done'))

.run();

6. Error Handling

FFmpeg errors are often cryptic. Always:

Log the full stderr output on failure
Validate input files exist before processing
Check that output codecs are compatible with the output container
Use
```
-y
```
flag to overwrite output files without prompting

AI-Powered Virality Scoring

Use AI vision to analyze video content and score it on viral potential. This feature has two modes depending on the workflow.

This feature requires an AI integration (OpenAI or Gemini) to be set up. Set up the AI integration following the

ai-integrations-openai

ai-integrations-gemini

skill before proceeding. OpenAI is preferred because gpt-5.2 handles image analysis well and is the default for most tasks. Gemini (gemini-2.5-flash) is a good alternative, especially for high-volume analysis since it supports video/image input natively.

Two Scoring Modes

Mode 1: Segment scoring (find the best moments) — For when the user says "find the best clips" or "auto-trim for virality." Detects scenes, scores small segments, and exports the top moments. Good for finding highlights in raw footage.

Mode 2: Clip scoring (rank ready-to-post clips)— For when clips already exist (from chunking, manual trimming, etc.) and the user wants to know which ones to post first. Scores each complete clip with a richer evaluation including narrative arc and standalone quality.This is the preferred mode when used with the chunking pipeline

Recommended Pipeline

When the user has a long video and wants social-ready clips, use this order:


Long Video → Dead Space Removal → Chunking (10-60s clips) → Clip-Level Scoring → Reframe for Platform → Output

This way the AI scores complete, self-contained clips rather than tiny scene fragments. The scores are much more meaningful because each clip is what will actually be posted.

For "find the best moment in this video" requests (no chunking), use the original segment-level pipeline:


Video → Scene Detection → Frame Extraction → Segment Scoring → Rank → Trim/Export

When the user asks for auto-trim, ask which output they want

Present these three options and let the user choose each time. Do not assume a default:

Best clip — The single highest-scoring segment/clip, trimmed and exported
Multiple clips — Several top segments/clips exported as separate files, ranked by score
Highlight reel — Top moments stitched together with crossfade transitions into one video

Scoring Criteria

Segment scoring evaluates: visual dynamism, emotional impact, hook potential, pacing/energy, uniqueness, and shareability. Each factor gets a 1-10 score, weighted to produce a final virality score.

Clip scoring adds three additional criteria for complete clips:

Narrative completeness (weight: 15%) — Does the clip tell a complete micro-story? Does it have a clear beginning, middle, and end? Would a viewer feel satisfied or intrigued, not confused?
Hook-to-payoff flow (weight: 10%) — Does the clip open with something that grabs attention and deliver on that promise? Or does it start slow and meander?
Standalone quality (weight: 10%) — Would this clip make sense to someone who hasn't seen the full video? Can it be posted without context?

These three criteria reduce the weights of the original six factors proportionally so the total still sums to 100%.

Frame Extraction

Segments (2-10s): Extract 3 frames evenly spread across the segment.
Clips (10-60s): Extract 5-8 frames to capture the full arc. Always include the first frame (hook evaluation) and last frame (payoff evaluation).

Workflow

For the complete step-by-step implementation with code examples, see:

```
virality-scoring.md
```
— Full virality analysis pipeline, AI prompts, scoring criteria, and output assembly

Key Gotchas (learned from testing)

Scene detection: Use
```
ffmpeg
```
with
```
select
```
+
```
showinfo
```
filters and parse stderr — the
```
ffprobe -f lavfi
```
approach doesn't work reliably in Replit.
Content safety: OpenAI vision may reject frames from documentary/medical/news content. Always wrap AI calls in try/catch and skip failed segments gracefully.
Segment indexing: Use
```
index: segments.length
```
when filtering short segments, not the loop counter
```
i
```
.
Package setup: For scripts, install
```
openai
```
directly (
```
pnpm add -w openai
```
) and create the client with
```
AI_INTEGRATIONS_OPENAI_BASE_URL
```
/
```
AI_INTEGRATIONS_OPENAI_API_KEY
```
env vars. No need for the full workspace integration library.
Timeouts: A 90-second video with ~13 segments takes 1-2 minutes for AI analysis. Warn the user and set generous timeouts.
Minimum segment duration: Use 2.0s minimum — segments under 2 seconds waste API calls and don't produce useful scores.
Prefer clip-level scoring: When the video has already been chunked, always score the chunks rather than re-running segment detection. The clips are what will actually be posted.

Dead Space Removal

Automatically detect and remove silence, dead air, and filler from a video to produce a tighter, more engaging cut. This is especially useful for talks, interviews, podcasts, tutorials, and raw footage.

How It Works

Silence detection — Use FFmpeg's
```
silencedetect
```
filter to find all silent intervals (configurable threshold and minimum duration)
Segment extraction — Extract all non-silent segments as individual clips
Reassembly — Concatenate the non-silent segments back together with optional brief crossfade transitions
Optional: AI-assisted filler removal — For more aggressive editing (removing "ums", filler words, repetitive sections), combine silence detection with AI transcription analysis

When the user asks to remove dead space

Ask these clarifying questions if not clear from context:

How aggressive? Light (remove only true silence, > 1s gaps) vs. aggressive (remove short pauses > 0.3s for a fast-paced edit)
Transitions? Hard cuts between segments (default) or brief crossfades (0.2-0.5s) for smoother flow

Configuration

|--------|------------------|---------------------|----------|

| Light | -40dB | 1.0s | Talks, interviews — remove obvious dead air |

| Medium | -35dB | 0.5s | Podcasts, tutorials — tighter pacing |

| Aggressive | -30dB | 0.3s | Fast-paced edits, social media — maximum tightness |

Implementation

For the complete FFmpeg commands and Node.js pipeline, see:

```
dead-space-and-chunking.md
```
— See the "Dead Space Removal" section

Quick Summary


Video → Silence Detection (FFmpeg silencedetect) → Identify Non-Silent Ranges → Extract Segments → Concatenate → Output

Social Media Chunking

Break a longer video (ad, promo, talk, presentation, livestream) into self-contained clips suitable for TikTok, Reels, and Shorts. Each clip should be 10-60 seconds and feel complete on its own.

How It Works (2)

Scene detection + silence detection — Find natural break points using both visual scene changes and audio silence gaps
Smart boundary selection — Merge adjacent segments into clips targeting 10-60 seconds, preferring to break at silence or scene changes rather than mid-sentence
AI content analysis (optional) — Score each potential clip with AI to identify which ones are worth posting, suggest captions, and recommend clip order
Export — Output each clip as a separate file, numbered and ranked

Clip Length Guidelines

|----------|-------------|------------|-------|

| TikTok | 15-45s | 10 min | 15-30s performs best for new accounts |

| Instagram Reels | 15-30s | 90s | Under 30s gets more reach |

| YouTube Shorts | 30-60s | 60s | Hard cap at 60 seconds |

| X / Twitter | 15-45s | 2m 20s | Shorter gets more engagement |

When the user asks to chunk a video

Ask which platform they're targeting (or default to TikTok at 15-45s). The chunking algorithm should:

Never cut mid-sentence — Always break at silence gaps or natural pauses
Prefer scene boundaries — Break where the visual content changes
Ensure each clip stands alone — Each clip should have a clear beginning, not start mid-thought
Target the platform's sweet spot — Not just "under 60s" but actually the ideal range for that platform

Implementation (2)

For the complete pipeline with code examples, see:

```
dead-space-and-chunking.md
```
— See the "Social Media Chunking" section

Quick Summary (2)


Video → Scene Detection + Silence Detection → Merge into 10-60s Clips at Natural Boundaries → (Optional) AI Scoring → Export Individual Clips

Combining with Other Features

These features work well together in a pipeline:


Long Video → Dead Space Removal → Chunking into Clips → Virality Scoring → Reframe for TikTok → Output

This takes a raw long-form video and produces polished, platform-ready social clips end to end.

Voiceovers with ElevenLabs

Add AI-generated voiceovers to videos using ElevenLabs text-to-speech, then mix the audio into the video with FFmpeg.

Prerequisites

ElevenLabs is available as a Replit integration (connector). The user needs to connect their ElevenLabs account through Replit's integration system — no manual API keys needed. Search for the integration and propose it to the user:


const results = await searchIntegrations("elevenlabs");

// Then propose the connector so the user can authorize

await proposeIntegration("connector:ccfg_elevenlabs_...");

After authorization, use

addIntegration

to wire it to the project, then use

listConnections('elevenlabs')

in the code execution sandbox to get the credentials.

Workflow (2)

Generate voiceover audio — Send text to ElevenLabs TTS API, receive audio file
Mix into video — Use FFmpeg to add the voiceover as a new audio track, optionally ducking (lowering volume of) existing audio

Voiceover Modes

When the user asks for a voiceover, they may want one of these:

Replace audio — Remove existing audio entirely, use only the voiceover
Mix over — Layer voiceover on top of existing audio (with existing audio volume reduced)
Add as track — Keep existing audio at full volume, add voiceover on top

Ask which mode the user prefers if not clear from context.

Implementation (3)

For the complete step-by-step implementation with code examples, see:

```
voiceover.md
```
— ElevenLabs TTS integration, voice selection, audio mixing with FFmpeg, and timed voiceover segments

Quick Summary (3)


Text Script → ElevenLabs TTS → Audio File → FFmpeg Mix with Video → Output

Key capabilities:

Choose from ElevenLabs' voice library or cloned voices
Control voice settings (stability, similarity boost, style)
Generate voiceover for the full video or specific time segments
Duck existing audio under voiceover sections
Support for multiple languages

Social Media Reframing

Automatically reframe videos for different social media platforms. Each platform has a preferred aspect ratio and resolution — this feature handles the conversion intelligently.

Platform Specs

|----------|-------------|------------|-------|

| YouTube | 16:9 | 1920x1080 | Standard landscape, also supports 4K (3840x2160) |

| TikTok / Reels | 9:16 | 1080x1920 | Vertical video, max 60s for Reels, 10min for TikTok |

| Instagram Feed | 1:1 | 1080x1080 | Square format |

Reframing Strategies

When converting between aspect ratios, there are three strategies:

Center crop (default) — Crop from the center to fill the target ratio. Fast, works well for most content. May cut off edges.
Letterbox/pillarbox — Add black bars (or blurred background) to fit without cropping. Preserves all content but adds empty space.
Blurred fill — Use a blurred, scaled-up version of the video as the background behind the original. Looks much better than black bars, especially for vertical reframing. This is the recommended approach for going from 16:9 to 9:16.

When the user asks to reframe, ask which platform they're targeting. Use blurred fillby default for aspect ratio changes that would lose significant content (e.g., landscape to portrait). Usecenter cropwhen the content is center-focused. Useletterbox only if the user explicitly asks for it.

Implementation (4)

For the complete FFmpeg commands for each reframing strategy, see:

```
operations.md
```
— See the "Social Media Reframing" section

Multi-platform Export

When the user wants to export for multiple platforms at once, generate all versions in a batch:


Input (16:9) → YouTube (copy) + TikTok (9:16 blurred fill) + Instagram (1:1 center crop) + X (1280x720 copy)

This is a common workflow — the user shoots in landscape and needs versions for every platform.

Common Operations Reference

For detailed FFmpeg commands and patterns for each operation type, see:

```
operations.md
```
— Complete command reference for all supported editing operations
```
virality-scoring.md
```
— AI-powered virality analysis and auto-trim pipeline
```
voiceover.md
```
— ElevenLabs voiceover generation and audio mixing
```
dead-space-and-chunking.md
```
— Dead space removal and social media chunking pipelines

Building a Video Editor UI

If the user wants a visual editor interface:

Create a
```
react-vite
```
artifact for the frontend
Add API routes to
```
artifacts/api-server/
```
for video processing
Use
```
multer
```
for file uploads to the API
Process videos with FFmpeg on the server
Return processed files for download or preview
Use the
```
object-storage
```
skill for persisting uploaded and processed files if needed

Key frontend considerations:

Use
```
<video>
```
element for preview playback
Show a timeline UI for trim/cut operations
Display progress during processing (poll an API status endpoint or use Server-Sent Events)
Provide download links for processed output

Tested Bugs & Caveats (Do Not Regress)

These issues were discovered during real-world testing and are documented here to prevent regressions:

Scene detection: The
```
ffprobe -f lavfi
```
approach does NOT work in Replit. Use
```
ffmpeg
```
with
```
select='gt(scene,threshold)',showinfo
```
and parse
```
pts_time
```
from stderr.
Segment indexing: Use
```
index: segments.length
```
when building segments, NOT the loop counter
```
i
```
— filtered segments cause index mismatches.
Content safety: OpenAI vision may reject frames (medical/documentary content). Always wrap AI calls in try/catch and skip gracefully.
Frame extraction timestamp clamping: Clamp the last frame timestamp to
```
segment.endTime - 0.1
```
. Without this, the last clip in a video fails because
```
startTime + 1.0 * duration
```
can exceed the actual file duration due to floating point arithmetic.
```
silenceremove
```
filter: Only strips audio silence, NOT video frames. Must use segmented extract+concat approach for proper dead space removal.
Dead space with many segments: Videos with lots of short dialogue (50+ speaking segments) can be very slow to process with re-encoding. Use
```
-preset ultrafast
```
for testing, or skip dead space removal for dialogue-heavy content where there's little dead space anyway.

OpenAI for scripts: Install with

pnpm add -w openai

, create client with

AI_INTEGRATIONS_OPENAI_BASE_URL

and

AI_INTEGRATIONS_OPENAI_API_KEY

env vars.

AI scoring duration: A 90-second video with ~13 segments takes 1-2 minutes for AI analysis. A 3.5-minute video with 9 clips takes ~3 minutes. Always warn the user.

Limitations

No GPU acceleration — Replit containers do not have GPU access, so hardware encoding (NVENC, VAAPI) is unavailable. Use software encoding (libx264, libx265, libvpx-vp9).
Memory and CPU — Very large files or high-resolution encoding (4K+) may be slow or hit memory limits. For large files, prefer stream-copy (
```
-c copy
```
) where possible and avoid unnecessary re-encoding.
Disk space — Video files are large. Clean up temporary files after processing. Monitor disk usage for batch operations.
No real-time preview — FFmpeg processes offline. Users cannot preview effects in real-time during editing (unlike desktop NLEs). Generate short previews of segments instead.