Skills ai-video-remix

Name: ai-video-remix
Author: openclaw

AI-driven video remix generator that uses ShotAI semantic search + LLM planning + Remotion rendering to produce styled video compositions from a user's local video library. Use when the user asks to create a video remix, highlight reel, travel vlog, sports highlight, nature montage, or any styled video cut from their library. Triggers on requests like "帮我做一个混剪", "make a travel vlog from my library", "create a sports highlight", or "generate a video with my footage". Requires ShotAI (local MCP server) to be running. Works with any OpenAI-compatible LLM API or falls back to heuristic mode with no API key.

install

source · Clone the upstream repo

git clone https://github.com/openclaw/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/abu-shotai/ai-video-remix" ~/.claude/skills/openclaw-skills-ai-video-remix && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/abu-shotai/ai-video-remix" ~/.openclaw/skills/openclaw-skills-ai-video-remix && rm -rf "$T"

manifest: skills/abu-shotai/ai-video-remix/SKILL.md

AI Video Remix Skill

Generate styled video compositions from a local ShotAI video library using natural language.

Prerequisites

See references/setup.md for full installation instructions, including:

ShotAI download and setup
ffmpeg installation
yt-dlp installation (for auto music)
Node.js dependencies

Quick Start

cd /path/to/ai-video-editor
cp .env.example .env    # fill in SHOTAI_URL, SHOTAI_TOKEN, and optionally AGENT_PROVIDER
npm install
npx tsx src/skill/cli.ts "帮我做一个旅行混剪"

Pipeline (8 steps)

Agent: parseIntent — LLM extracts theme, selects composition, optionally overrides music style
Agent: refineQueries — LLM rewrites per-slot search terms to match library content
ShotAI: pickShots — Semantic search per slot, scored by similarity+duration+mood, best shot selected
Music: resolveMusic — yt-dlp YouTube search+download, or local MP3 if
```
--bgm
```
provided
ffmpeg: extractClip — Each shot trimmed to independent
```
.mp4
```
clip file
Agent: annotateClips — LLM assigns per-clip visual effect params (tone, dramatic, kenBurns, caption)
File Server — HTTP server serves clips to Remotion renderer
Remotion: render — Composition rendered to final MP4

CLI Usage

npx tsx src/skill/cli.ts "<request>" [options]

Options:
  --composition <id>   Override composition (skip LLM selection)
  --bgm <path>         Local MP3 path (skip YouTube search)
  --output <dir>       Output directory (default: ./output)
  --lang <zh|en>       Output language: zh Chinese (default) / en English
                       Affects: video title, per-clip captions & location labels, attribution line
  --probe              Scan library first, let LLM plan slots from actual content

Compositions

ID	Label	Best For
`CyberpunkCity`	赛博朋克夜景	Neon city, night scenes, sci-fi
`TravelVlog`	旅行 Vlog	Multi-city travel with location cards
`MoodDriven`	情绪驱动混剪	Fast/slow emotion cuts
`NatureWild`	自然野生动物	BBC nature documentary style
`SwitzerlandScenic`	瑞士风光	Alpine/scenic travel with captions
`SportsHighlight`	体育集锦	ESPN-style with goal captions

Modes

Standard mode (default): LLM picks composition + generates search queries from registry templates.

Probe mode (

--probe

): Scans library videos first (names, shot samples, mood/scene tags), then LLM generates custom slots tailored to what actually exists.

Choose probe mode when: library content is unknown, user wants "best of my library", or standard slots return low-quality shots.

Environment Variables

See references/config.md for all environment variables and LLM provider setup.

Troubleshooting & Quality Tuning

See references/tuning.md for solutions to:

Clip boundary flicker / 1–2 frame flash at cuts
Red flash artifact in CyberpunkCity (GlitchFlicker on short clips)
Low-quality or off-topic shots
Music download failures

Recommended

.env

defaults for best quality:

MIN_SCORE=0.5    # filter short/low-quality shots

Writing ShotAI Search Queries

ShotAI uses semantic search powered by AI-generated tags and embedding vectors. Query quality is the single biggest factor in shot relevance — invest time here.

Query construction rules

Always write full sentences or rich phrases, never bare keywords.

The search engine understands semantic similarity (

"ocean"

matches

"sea"

"waves"

"shoreline"

), so richer context produces better recall.

Quality	Example	When to use
⭐ Detailed description	`"A white seagull with spread wings gliding smoothly over calm blue ocean water, golden sunset light reflecting on the waves"`	Best precision — use for hero shots
⭐ Full sentence	`"A seagull flying gracefully over the ocean at sunset"`	Good balance of precision and recall
Short phrase	`"seagull flying over ocean"`	Acceptable fallback
Single keyword	`"seagull"`	Avoid — low precision, noisy results

What to include in a query

Describe the visual content of the ideal shot across these dimensions:

Subject: what/who is in frame (

a lone hiker

city traffic at night

athlete celebrating

)

Action: what is happening (

walking slowly through fog

speeding through intersection

jumping with arms raised

)

Environment: location, setting, time of day (

rain-soaked Tokyo street

mountain meadow at golden hour

empty stadium under floodlights

)

Mood / atmosphere: emotional tone (
```
melancholic
```
,
```
tense
```
,
```
euphoric
```
,
```
serene
```
)

Camera feel: implied movement or framing (

wide establishing shot

tight close-up

slow pan

handheld shaky

)

Not all dimensions are needed every time — include whichever are most distinctive for the shot you want.

The refineQueries step

When the agent runs

refineQueries

, it rewrites the composition's default slot queries to better match the user's actual library. Apply these principles:

Start from the slot's semantic intent — what emotional or narrative role does this shot play in the composition?
Incorporate any context from the user's request — location names, event names, specific subjects mentioned
Expand synonyms — if the slot says
```
"water"
```
, try
```
"river flowing through forest"
```
or
```
"lake reflecting mountains"
```
based on what the library likely contains
Avoid negations —
```
"not indoors"
```
does not work; instead describe the positive version (
```
"outdoor daylight scene"
```
)
One query per slot — make it specific rather than trying to cover multiple scenarios

Examples: slot query → refined query

Slot default: "city at night"
User request: "帮我做一个东京旅行混剪"
Refined:      "Neon-lit Tokyo street at night, pedestrians crossing under glowing signs, rain reflections on pavement"

Slot default: "nature landscape"
User request: "trip to Patagonia last month"
Refined:      "Dramatic Patagonia mountain landscape, snow-capped peaks under stormy clouds, vast open wilderness"

Slot default: "athlete in action"
User request: "basketball highlight from last game"
Refined:      "Basketball player driving to the hoop, explosive movement, crowd in background blurred"

Adding a New Composition

See references/composition-guide.md to add a new Remotion composition to the registry.

Safety and Fallback

If
```
SHOTAI_URL
```
or
```
SHOTAI_TOKEN
```
is unset, display a warning: "ShotAI MCP server is not configured. Set
```
SHOTAI_URL
```
and
```
SHOTAI_TOKEN
```
in your
```
.env
```
file. Download ShotAI at https://www.shotai.io."
If the ShotAI MCP server returns an error (connection refused, HTTP 4xx/5xx), display the error message and stop — do not fabricate shot results.
Never fabricate video file paths, shot timestamps, or similarity scores.
If music download fails (yt-dlp error or network unreachable), suggest using
```
--bgm <local.mp3>
```
to provide a local audio file instead.
If Remotion render fails, display the error output and suggest checking Node.js version (18+) and that all clip files were extracted successfully.
If the LLM provider is unreachable, fall back to heuristic mode: use composition default queries directly without refinement, and skip
```
annotateClips
```
(use composition default effect params).

License

MIT-0 — Free to use, modify, and redistribute. No attribution required. See https://spdx.org/licenses/MIT-0.html