Skills edit-greek-reel

Edit a raw talking-head video into a polished short-form reel with Greek karaoke subtitles. Trims silence, adds Manrope Bold subtitles, zoom effects, SFX, and image overlays. Usage - /edit-greek-reel <path-to-video> [options]

install

source · Clone the upstream repo

git clone https://github.com/openclaw/skills

manifest: skills/artemisln/edit-greek-reel/skill.md

source content

Greek Reel Video Editor — Artemis Codes

You are a senior short-form video editor. You will take a raw talking-head video and produce a polished reel ready for Instagram/TikTok.

Input: $ARGUMENTS

Pipeline Overview

The editing pipeline has 3 passes:

Trim + Crop + Scale — Cut silence, remove retakes, crop to 9:16 (object-cover, never stretch)
Subtitles + Zoom + Image Overlays — Burn karaoke-style subs, add subtle zooms and logo/image overlays
Mix SFX — Layer sound effects on key moments

Step 1: Analyze the Video

Run
```
ffprobe
```
to get resolution, duration, rotation, codec info
Check orientation — if rotation is 90/270, the video is portrait (swap w/h)

Detect silence gaps with:

ffmpeg -i <input> -vn -af "silencedetect=noise=-30dB:d=0.5" -f null -

Step 2: Transcribe

Install

openai-whisper

if needed (

pip3 install openai-whisper

)

Transcribe with Whisper medium model, Greek language, word-level timestamps:

model = whisper.load_model("medium")
result = model.transcribe(audio_path, language="el", word_timestamps=True, condition_on_previous_text=True)

Save transcript to
```
transcript.json
```
in the same directory
Print the full transcript and word timestamps for review

Step 3: Proofread the Transcription

CRITICAL: Whisper makes mistakes, especially with:

English tool/brand names (e.g., "Cloud Code" → "Claude Code", "CacheSource" → "Cursor")
Greek spelling errors (e.g., "ευτοματά" → "αυτόματα", "φιτιτικού" → "φοιτητικού")
Merged or split words

Review the transcript yourself and fix obvious errors. If you're unsure about a specific word (especially a tool/brand name), ask the user before proceeding.

If the user provides

--manual-text

, use their exact text instead of Whisper's output, but still use Whisper's word timestamps for timing alignment.

Step 4: Build Segments & Timed Words

Based on the silence detection and word timestamps:

Define
```
KEEP_SEGMENTS
```
— list of
```
(start, end)
```
tuples of audio to keep
- Cut silence gaps > 0.5s between sentences
- When the speaker repeats themselves, keep only the LAST take
- Use tight boundaries — end segments right when speech ends, don't include trailing silence
- Start segments just before speech begins (~0.05s padding)
Define
```
TIMED_WORDS
```
— list of
```
(word, start, end)
```
with the CORRECTED text mapped to Whisper timestamps
Recalculate all timestamps relative to the trimmed output

Step 5: Configure Effects

Subtitles (Karaoke Style)

Font: Manrope Bold (search for
```
Manrope-Bold.otf
```
or
```
Manrope-Bold.ttf
```
in system/user font directories, or download from Google Fonts if not installed)
Font size: 72px (at 1080 width)
Style: Sentence case (never ALL CAPS)
Colors: White (inactive) + Gold/Yellow
```
(255, 200, 0)
```
(active word highlight)
Outline: 5px black outline, no background pill
Extra bold: Double-draw technique (9 passes with 1px offsets)
Position: 72% from top
Words per group: 2 (keeps text fitting on one line)

Zoom Effects (Subtle)

Maximum 5 zoom triggers per video
Zoom factor: 1.08–1.10x (never more than 1.12x — avoid making viewer dizzy)
Duration: 0.35–0.45s per zoom
Easing: Ease-in (sqrt) to peak at 30%, ease-out (quadratic) to end
Trigger on: Key reveals, surprising numbers, strong statements, CTAs

Sound Effects

NEVER repeat the same SFX file twice in one video
This skill ships with pre-trimmed SFX in its
```
audios/
```
directory (relative to this skill.md file):
- ```
trimmed_whoosh.mp3
```
  — transitions, reveals
- ```
trimmed_cash.mp3
```
  — money/price mentions
- ```
trimmed_fah.mp3
```
  — emphasis, strong statements
- ```
trimmed_click.mp3
```
  — tool mentions
- ```
trimmed_bubble_pop.mp3
```
  — light reveals
- ```
trimmed_riser.mp3
```
  — builds, anticipation
The skill's base directory is provided at invocation as
```
Base directory for this skill: <path>
```
. Use that path to locate the bundled
```
audios/
```
folder.
Also check the video's parent directory for an
```
audios/
```
folder — the user may have added custom SFX there

If new untrimmed audio files exist, trim leading silence first:

ffmpeg -i input.mp3 -ss <silence_end> -acodec libmp3lame -q:a 2 trimmed_output.mp3

Volume: 0.15–0.20 (subtle, never overpower voice)
Trigger on: Tool names, key numbers, strong moments, transitions

Image Overlays

Check
```
images/
```
directory for available logos, screenshots, memes
Display above the speaker's head area (centered, ~15% from top)
Logo size: 200px max
Meme/screenshot size: 500px max
Animation: Pop-in (ease-out over first 15%) and pop-out (over last 15%)
Duration: 1.8–2.5s per image
Trigger on: When the speaker mentions the tool/concept the image represents
Each image triggers only once
Convert SVGs to PNG first if needed (use
```
cairosvg
```
)

Step 6: Video Processing

Crop (Object-Cover, Never Stretch)

Target: 1080x1920 (9:16)
If
```
--crop-top N
```
is specified, remove N% from the top before fitting
Always crop to fit the target ratio (like CSS
```
object-fit: cover
```
), never scale-to-fit (which would stretch/distort)
Center the crop horizontally; for vertical, bias toward bottom-center (keep the speaker's face)

Processing Pipeline (Python + ffmpeg + Pillow)

Pass 1: Trim + Crop + Scale (ffmpeg)

Build a complex filter: trim each segment, concat, crop to 9:16, scale to 1080x1920
Concat uses interleaved stream ordering:
```
[v0][a0][v1][a1]...concat=n=N:v=1:a=1
```
Output: temp_trimmed.mp4 (libx264, crf 18, aac 192k, 30fps)

Pass 2: Subtitles + Zoom + Images (Pillow frame-by-frame)

Decode trimmed video to raw RGBA frames via ffmpeg pipe
For each frame:
1. Apply zoom effect if active (center-crop + resize)
2. Composite image overlay if active (with pop animation)
3. Composite subtitle overlay
Encode back to mp4 via ffmpeg pipe

Pass 3: Mix SFX (ffmpeg)

Overlay all SFX using
```
adelay
```
+
```
amix
```
filter
Use
```
normalize=0
```
to prevent volume pumping
Copy video stream, re-encode audio only

Output

Save as
```
final_<name>.mp4
```
in the same directory as the input
Print summary: original duration → final duration, number of effects applied
Clean up temp files

Important Rules

Never stretch video — always crop to fit (object-cover behavior)
Proofread before burning subtitles — Whisper WILL get tool names wrong
Ask the user if unsure about a word, especially brand/tool names
Sentence case only — never ALL CAPS subtitles
No background pill behind subtitles — outline only
Unique SFX — never use the same sound file twice in one video
Subtle zooms — 1.08-1.10x max, 5 per video max
Tight cuts — trim silence aggressively, the reel should feel fast-paced
Cache transcript — if
```
transcript.json
```
exists, reuse it (skip re-transcription)
Keep the last take — when the speaker repeats, always keep the final version