Claude-prime media-processor

Specialized visual and multimedia processing tools. Use this skill whenever a task involves complex visual content — UI mockups, dense screenshots, design images, charts, artwork — where precise details like spacing, hex colors, font sizes, and component hierarchy need to be extracted accurately. Also use for: reviewing or auditing existing UI against designs, comparing screenshots for visual regressions, transcribing audio/video, extracting data from PDFs with complex layouts, and generating images. Trigger whenever the user wants to implement from a design, review or compare UI screenshots, analyze visual details precisely, describe artwork or aesthetic content, or process any media file (audio, video, PDF).

install
source · Clone the upstream repo
git clone https://github.com/avibebuilder/claude-prime
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/avibebuilder/claude-prime "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/media-processor" ~/.claude/skills/avibebuilder-claude-prime-media-processor && rm -rf "$T"
manifest: .claude/skills/media-processor/SKILL.md
source content

Media Processor

Specialized tools for extracting precise visual details (exact colors, spacing, hierarchy), processing audio/video, and generating images.

Tools

All scripts live in

scripts/
relative to this skill's directory. They auto-select the best model per task and handle retries, large file uploads, and error reporting.

ScriptPurpose
gemini_batch_process.py
Analyze images, transcribe audio/video, extract data from PDFs
image_gen.py
Generate and edit images (paid plan required)
document_converter.py
Convert PDF, DOCX, XLSX, PPTX to Markdown; extract page ranges and images

Requires

GEMINI_API_KEY
in environment or
.env
in this skill's directory. Run any script with
--help
for setup details and available parameters.

Quick start — image analysis:

python <skill-dir>/scripts/gemini_batch_process.py \
  --files <image-path> \
  --task analyze \
  --prompt "<tailored prompt>" \
  --output <output-path>.md

Prompt Quality Matters

The prompt sent to the processing model is the single biggest factor in output quality. Tailor prompts to what the task actually needs — generic prompts produce generic results.

What makes a good analysis prompt:

  • Ask for the specific details the task requires (hex colors, spacing in px, component hierarchy) rather than "describe this image"
  • Structure the ask as a numbered list — the model mirrors the structure back, making output easy to parse
  • Name the desired output format ("as a markdown table", "as JSON", "as a component tree")
  • Include implementation context when relevant ("for React with Tailwind") so the model emphasizes useful details

Example prompt patterns:

UI implementation: "Extract component hierarchy, layout type, exact hex colors, typography (sizes/weights), spacing in px, interactive states, icons and decorative elements"

Chart data: "Extract chart type, axes with units, every data point with exact values, legend entries with colors. Output as a markdown table"

Design review: "Compare this screenshot against the design. Flag differences in spacing, colors, alignment, missing elements, and visual inconsistencies. Note exact values for each discrepancy"

Pasted Images

When a user pastes images in chat, they are auto-saved to:

$CLAUDE_DIR/image-cache/<current_session_id>/<image_number>.png

Use

ls "$CLAUDE_DIR/image-cache/"
to discover the session ID, then list its contents to find available images.

Model Overrides

Scripts auto-select models per task (see model-routing.md). Override with

--model <model-id>
when the default isn't enough — for example,
--model gemini-3.1-pro-preview
for complex visual analysis where the pro model catches more detail than flash.

References

ReferenceWhen to read
api-gotchas.mdBefore using image generation, video processing, or raw API calls — prevents common failures
model-routing.mdWhen choosing or overriding the default model for a task
media-optimization.mdWhen files are too large to upload — ffmpeg compression recipes

Gotchas

  • Rate limits — scripts retry up to 3 times with backoff. If still rate-limited after retries, stop and ask the user to check their API key quota or provide a new key.
  • Model IDs change — Google frequently rotates preview model IDs. If you get a 404, the model was likely superseded — check the models page for current IDs.
  • Safety filters — the API may refuse some content. Report clearly to the user rather than retrying.
  • Large files auto-upload — files >20MB automatically use the File API (2GB max, 48h retention). No action needed.