Claude-skill-registry extract-moves-from-video
Guidance for extracting text-based game commands, moves, or inputs from video recordings using OCR and frame analysis. This skill applies when extracting user inputs from screen recordings of text-based games (Zork, interactive fiction), terminal sessions, or any video where typed commands need to be recovered. It covers OCR preprocessing, region-of-interest extraction, domain-aware validation, and deduplication strategies.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/extract-moves-from-video" ~/.claude/skills/majiayu000-claude-skill-registry-extract-moves-from-video && rm -rf "$T"
skills/data/extract-moves-from-video/SKILL.mdExtract Moves from Video
This skill provides structured guidance for extracting text-based commands or moves from video recordings, particularly for text-based games, terminal sessions, or any screen recording where typed inputs need to be recovered.
When to Use This Skill
This skill applies when:
- Extracting game commands from recordings of text-based games (Zork, interactive fiction, MUDs)
- Recovering typed inputs from terminal session recordings
- Extracting user commands from any screen recording where text input is visible
- OCR-based extraction of sequential text entries from video
Strategic Approach
Phase 1: Environment Assessment
Before processing, assess the available tools and estimate resource requirements:
-
Check available tools in one comprehensive sweep:
- FFmpeg for frame extraction
- OCR engines (tesseract, pytesseract)
- Image processing libraries (opencv-python, Pillow)
- Python environment and package managers (pip, uv, conda)
-
Analyze video characteristics:
- Duration and frame rate
- Resolution and text clarity
- Location of command input area (typically fixed position)
- Text color contrast against background
-
Estimate processing time:
- Benchmark OCR on 3-5 sample frames before full processing
- Calculate expected total time based on frame count and benchmark
Phase 2: Region of Interest (ROI) Identification
Critical optimization: Identify and crop to the command input region before OCR.
For text-based games like Zork:
- Command input typically appears at a fixed screen location (often bottom)
- Command prompts have consistent markers (e.g.,
prefix)> - Cropping to ROI dramatically improves OCR accuracy and speed
To identify the ROI:
- Extract a few sample frames where commands are visible
- Manually or programmatically identify the bounding box of the input area
- Apply consistent cropping to all frames before OCR
Phase 3: Frame Extraction Strategy
Select frame extraction rate based on content characteristics:
- Text-based games: Commands persist on screen for seconds; 1-3 second intervals typically suffice
- Fast-paced inputs: May require higher frequency (0.5 second intervals)
- Start conservative: Begin with lower frequency, increase only if commands are missed
# Example: Extract frames at 1 frame per second ffmpeg -i video.mp4 -vf "fps=1" frames/frame_%04d.png
Phase 4: OCR Preprocessing
Apply preprocessing to improve OCR accuracy:
- Convert to grayscale
- Apply thresholding (binary threshold for high contrast text)
- Consider additional techniques if needed:
- Contrast enhancement
- Noise reduction
- Dilation/erosion for text clarity
- Inversion if text is light on dark background
See
references/ocr_video_processing.md for detailed preprocessing techniques.
Phase 5: Domain-Aware Extraction and Validation
Key insight: Use domain knowledge to validate and correct OCR results.
For text-based games:
- Obtain or construct a list of valid commands for the game
- Use command vocabulary for spell-checking OCR output
- Identify command syntax patterns (e.g.,
,VERB NOUN
)DIRECTION - Flag entries that don't match known patterns for manual review
Common Zork-style commands include:
- Directions: n, s, e, w, ne, nw, se, sw, up, down
- Actions: get, take, drop, put, open, close, read, examine, look, inventory
- Combinations:
,get lamp
,put sword in caseopen mailbox
Phase 6: Deduplication and Cleaning
Handle duplicates arising from:
- Same command captured across multiple frames
- OCR variations of the same command (e.g., "get lamp" vs "get 1amp")
Deduplication strategy:
- Normalize whitespace and case
- Use fuzzy matching to group similar entries
- When OCR variations exist, prefer the version matching known vocabulary
- Remove incomplete/partial commands (single letters that aren't valid directions)
Phase 7: Validation
Before finalizing, validate the extracted command list:
- Syntax validation: Verify commands match expected patterns
- Sequence plausibility: Check that command order makes logical sense
- Coverage check: Estimate if extracted count matches expected (based on video length)
- Interpreter testing (if available): Run commands through a game interpreter to verify validity
Common Pitfalls
- Skipping ROI extraction: Processing full frames wastes time and reduces accuracy
- Inadequate preprocessing: Raw frames often need contrast/threshold adjustments
- Ignoring domain knowledge: Valid command vocabulary enables validation and correction
- Ad-hoc cleaning scripts: Design one robust cleaning pipeline rather than multiple iterations
- No early validation: Test on sample frames before processing entire video
- Timeout misestimation: Benchmark before committing to full processing
- Capturing game output as commands: Filter to only lines with command prompt markers
Verification Checklist
- ROI identified and applied to frame extraction
- Preprocessing parameters tested on sample frames
- OCR benchmarked for time estimation
- Domain vocabulary used for validation
- Duplicates and near-duplicates removed
- Output validated against expected command syntax
- Command count reasonable for video duration