Claude-prime media-processor
Specialized visual and multimedia processing tools. Use this skill whenever a task involves complex visual content — UI mockups, dense screenshots, design images, charts, artwork — where precise details like spacing, hex colors, font sizes, and component hierarchy need to be extracted accurately. Also use for: reviewing or auditing existing UI against designs, comparing screenshots for visual regressions, transcribing audio/video, extracting data from PDFs with complex layouts, and generating images. Trigger whenever the user wants to implement from a design, review or compare UI screenshots, analyze visual details precisely, describe artwork or aesthetic content, or process any media file (audio, video, PDF).
git clone https://github.com/avibebuilder/claude-prime
T=$(mktemp -d) && git clone --depth=1 https://github.com/avibebuilder/claude-prime "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/media-processor" ~/.claude/skills/avibebuilder-claude-prime-media-processor && rm -rf "$T"
.claude/skills/media-processor/SKILL.mdMedia Processor
Specialized tools for extracting precise visual details (exact colors, spacing, hierarchy), processing audio/video, and generating images.
Tools
All scripts live in
scripts/ relative to this skill's directory. They auto-select the best model per task and handle retries, large file uploads, and error reporting.
| Script | Purpose |
|---|---|
| Analyze images, transcribe audio/video, extract data from PDFs |
| Generate and edit images (paid plan required) |
| Convert PDF, DOCX, XLSX, PPTX to Markdown; extract page ranges and images |
Requires
GEMINI_API_KEY in environment or .env in this skill's directory. Run any script with --help for setup details and available parameters.
Quick start — image analysis:
python <skill-dir>/scripts/gemini_batch_process.py \ --files <image-path> \ --task analyze \ --prompt "<tailored prompt>" \ --output <output-path>.md
Prompt Quality Matters
The prompt sent to the processing model is the single biggest factor in output quality. Tailor prompts to what the task actually needs — generic prompts produce generic results.
What makes a good analysis prompt:
- Ask for the specific details the task requires (hex colors, spacing in px, component hierarchy) rather than "describe this image"
- Structure the ask as a numbered list — the model mirrors the structure back, making output easy to parse
- Name the desired output format ("as a markdown table", "as JSON", "as a component tree")
- Include implementation context when relevant ("for React with Tailwind") so the model emphasizes useful details
Example prompt patterns:
UI implementation: "Extract component hierarchy, layout type, exact hex colors, typography (sizes/weights), spacing in px, interactive states, icons and decorative elements"
Chart data: "Extract chart type, axes with units, every data point with exact values, legend entries with colors. Output as a markdown table"
Design review: "Compare this screenshot against the design. Flag differences in spacing, colors, alignment, missing elements, and visual inconsistencies. Note exact values for each discrepancy"
Pasted Images
When a user pastes images in chat, they are auto-saved to:
$CLAUDE_DIR/image-cache/<current_session_id>/<image_number>.png
Use
ls "$CLAUDE_DIR/image-cache/" to discover the session ID, then list its contents to find available images.
Model Overrides
Scripts auto-select models per task (see model-routing.md). Override with
--model <model-id> when the default isn't enough — for example, --model gemini-3.1-pro-preview for complex visual analysis where the pro model catches more detail than flash.
References
| Reference | When to read |
|---|---|
| api-gotchas.md | Before using image generation, video processing, or raw API calls — prevents common failures |
| model-routing.md | When choosing or overriding the default model for a task |
| media-optimization.md | When files are too large to upload — ffmpeg compression recipes |
Gotchas
- Rate limits — scripts retry up to 3 times with backoff. If still rate-limited after retries, stop and ask the user to check their API key quota or provide a new key.
- Model IDs change — Google frequently rotates preview model IDs. If you get a 404, the model was likely superseded — check the models page for current IDs.
- Safety filters — the API may refuse some content. Report clearly to the user rather than retrying.
- Large files auto-upload — files >20MB automatically use the File API (2GB max, 48h retention). No action needed.