videodb
See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.
git clone https://github.com/video-db/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/video-db/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/python" ~/.claude/skills/video-db-skills-videodb && rm -rf "$T"
python/SKILL.mdVideoDB Skill
Perception + memory + actions for video, live streams, and desktop sessions.
Use this skill when you need to:
1) Desktop Perception
- Start/stop a desktop session capturing screen, mic, and system audio
- Stream live context and store episodic session memory
- Run real-time alerts/triggers on what’s spoken and what's happening on screen
- Produce session summaries, a searchable timeline, and playable evidence links
2) Video ingest + stream
- Ingest a file or URL and return a playable web stream link
- Transcode/normalize: codec, bitrate, fps, resolution, aspect ratio
3) Index + search (timestamps + evidence)
- Build visual, spoken, and keyword indexes
- Search and return exact moments with timestamps and playable evidence
- Auto-create clips from search results
4) Timeline editing + generation
- Subtitles: generate, translate, burn-in
- Overlays: text/image/branding, motion captions
- Audio: background music, voiceover, dubbing
- Programmatic composition and exports via timeline operations
5) Live streams (RTSP) + monitoring
- Connect RTSP/live feeds
- Run real-time visual and spoken understanding and emit events/alerts for monitoring workflows
Common inputs
- Local file path, public URL, or RTSP URL
- Desktop capture request: start / stop / summarize session
- Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules
Common outputs
- Stream URL — make it playable:
https://console.videodb.io/player?url={STREAM_URL} - Search results with timestamps and evidence links
- Generated assets: subtitles, audio, images, clips
- Event/alert payloads for live streams
- Desktop session summaries and memory entries
Canonical prompts (examples)
- “Start desktop capture and alert when a password field appears.”
- “Record my session and produce an actionable summary when it ends.”
- “Ingest this file and return a playable stream link.”
- “Index this folder and find every scene with people, return timestamps.”
- “Generate subtitles, burn them in, and add light background music.”
- “Connect this RTSP URL and alert when a person enters the zone.”
Running Python code
CRITICAL: Always
cd to the user's project directory before running Python code. This ensures load_dotenv(".env") finds the correct .env file.
from dotenv import load_dotenv load_dotenv(".env") import videodb conn = videodb.connect()
This reads
VIDEO_DB_API_KEY from:
- Environment (if already exported)
- Project's
file in current directory.env
If the key is missing,
videodb.connect() raises AuthenticationError automatically.
Do NOT write a script file when a short inline command works.
When writing inline Python (
python -c "..."), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead:
python << 'EOF' from dotenv import load_dotenv load_dotenv(".env") import videodb conn = videodb.connect() coll = conn.get_collection() print(f"Videos: {len(coll.get_videos())}") EOF
Setup
When the user asks to "setup videodb" or similar:
1. Install SDK
pip install "videodb[capture]" python-dotenv
If
videodb[capture] fails on Linux, install without the capture extra:
pip install videodb python-dotenv
2. Configure API key
The user must set
VIDEO_DB_API_KEY using either method:
- Export in terminal (recommended):
export VIDEO_DB_API_KEY=your-key - Project
file: Save.env
in the project'sVIDEO_DB_API_KEY=your-key
file.env
Get a free API key at https://console.videodb.io (50 free uploads, no credit card).
Do NOT read, write, or handle the API key yourself. Always let the user set it.
Quick Reference
Upload media
# URL video = coll.upload(url="https://example.com/video.mp4") # YouTube video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID") # Local file video = coll.upload(file_path="/path/to/video.mp4")
Transcript + subtitle
# force=True skips the error if the video is already indexed video.index_spoken_words(force=True) text = video.get_transcript_text() stream_url = video.add_subtitle()
Search inside videos
from videodb.exceptions import InvalidRequestError video.index_spoken_words(force=True) # search() raises InvalidRequestError when no results are found. # Always wrap in try/except and treat "No results found" as empty. try: results = video.search("product demo") shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise
Scene search
import re from videodb import SearchType, IndexType, SceneExtractionType from videodb.exceptions import InvalidRequestError # index_scenes() has no force parameter — it raises an error if a scene # index already exists. Extract the existing index ID from the error. try: scene_index_id = video.index_scenes( extraction_type=SceneExtractionType.shot_based, prompt="Describe the visual content in this scene.", ) except Exception as e: match = re.search(r"id\s+([a-f0-9]+)", str(e)) if match: scene_index_id = match.group(1) else: raise # Use score_threshold to filter low-relevance noise (recommended: 0.3+) try: results = video.search( query="person writing on a whiteboard", search_type=SearchType.semantic, index_type=IndexType.scene, scene_index_id=scene_index_id, score_threshold=0.3, ) shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise
Timeline editing
Use the Editor API to compose videos, images, audio, and text. See reference/editor.md for full workflow.
from videodb.editor import Timeline, Track, Clip, VideoAsset, ImageAsset, AudioAsset, Fit timeline = Timeline(conn) timeline.resolution = "1280x720" video_track = Track() video_track.add_clip(0, Clip(asset=VideoAsset(id=video.id, start=10), duration=20)) audio_track = Track() audio_track.add_clip(0, Clip(asset=AudioAsset(id=music.id, volume=0.2), duration=20)) timeline.add_track(video_track) timeline.add_track(audio_track) stream_url = timeline.generate_stream()
Transcode video (resolution / quality change)
from videodb import TranscodeMode, VideoConfig, AudioConfig # Change resolution, quality, or aspect ratio server-side job_id = conn.transcode( source="https://example.com/video.mp4", callback_url="https://example.com/webhook", mode=TranscodeMode.economy, video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"), audio_config=AudioConfig(mute=False), )
Reframe aspect ratio (for social platforms)
Warning:
reframe() is a slow server-side operation. For long videos it can take
several minutes and may time out. Best practices:
- Always limit to a short segment using
/start
when possibleend - For full-length videos, use
for async processingcallback_url - Trim the video on a
first, then reframe the shorter resultTimeline
from videodb import ReframeMode # Always prefer reframing a short segment: reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart) # Async reframe for full-length videos (returns None, result via webhook): video.reframe(target="vertical", callback_url="https://example.com/webhook") # Presets: "vertical" (9:16), "square" (1:1), "landscape" (16:9) reframed = video.reframe(start=0, end=60, target="square") # Custom dimensions reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})
Generative media
image = coll.generate_image( prompt="a sunset over mountains", aspect_ratio="16:9", )
Error handling
from videodb.exceptions import AuthenticationError, InvalidRequestError try: conn = videodb.connect() except AuthenticationError: print("Check your VIDEO_DB_API_KEY") try: video = coll.upload(url="https://example.com/video.mp4") except InvalidRequestError as e: print(f"Upload failed: {e}")
Common pitfalls
| Scenario | Error message | Solution |
|---|---|---|
| Indexing an already-indexed video | | Use to skip if already indexed |
| Scene index already exists | | Extract the existing from the error with |
| Search finds no matches | | Catch the exception and treat as empty results () |
| Reframe times out | Blocks indefinitely on long videos | Use / to limit segment, or pass for async |
| Negative timestamps on Timeline | Silently produces broken stream | Always validate before creating |
/ fails | or | Plan-gated features — inform the user about plan limits |
Additional docs
Reference documentation is in the
reference/ directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.
- reference/api-reference.md - Complete VideoDB Python SDK API reference
- reference/index.md - Scene indexing & extraction workflow (index, extract frames, read back, manage)
- reference/index-reference.md - Scene index code reference (methods, SceneCollection/Scene/Frame classes)
- reference/search.md - In-depth guide to video search (spoken word and scene-based)
- reference/editor.md - Timeline editing workflow guide (4-layer model, use cases, examples)
- reference/editor-reference.md - Editor code reference (constructors, parameters, enums)
- reference/streaming.md - HLS streaming and instant playback
- reference/generative.md - AI-powered media generation (images, video, audio)
- reference/rtstream.md - Live stream ingestion workflow (RTSP/RTMP)
- reference/rtstream-reference.md - RTStream SDK methods and AI pipelines
- reference/capture.md - Desktop capture workflow
- reference/capture-reference.md - Capture SDK and WebSocket events
- reference/use-cases.md - Common video processing patterns and examples
Screen Recording (Desktop Capture)
Use
ws_listener.py to capture WebSocket events during recording sessions. Desktop capture supports macOS only.
Quick Start
- Start listener:
python scripts/ws_listener.py --cwd=<PROJECT_ROOT> & - Get WebSocket ID:
cat /tmp/videodb_ws_id - Run capture code (see reference/capture.md for full workflow)
- Events written to:
/tmp/videodb_events.jsonl
Query Events
import json events = [json.loads(l) for l in open("/tmp/videodb_events.jsonl")] # Get all transcripts transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"] # Get visual descriptions from last 5 minutes import time cutoff = time.time() - 300 recent_visual = [e for e in events if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff]
Utility Scripts
- scripts/ws_listener.py - WebSocket event listener (dumps to JSONL)
For complete capture workflow, see reference/capture.md.
Do not use ffmpeg, moviepy, or local encoding tools when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, volume control, fade transitions, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (speed changes, crop/zoom, colour grading, keyframe animation).
When to use what
| Problem | VideoDB solution |
|---|---|
| Platform rejects video aspect ratio or resolution | or with |
| Need to resize video for Twitter/Instagram/TikTok | or |
| Need to change resolution (e.g. 1080p → 720p) | with |
| Need to overlay audio/music on video | on an Editor with volume control |
| Need to add subtitles | or on Editor |
| Need to combine/trim clips | on an Editor |
| Need to compose images with voiceover | + on separate Editor tracks |
| Need to generate voiceover, music, or SFX | , , |