Claude-skill-registry look-at

This skill should be used when the user asks to 'look at', 'analyze', 'describe', 'extract from', or 'what's in' media files like PDFs, images, diagrams, screenshots, or charts. Triggers include: 'what does this image show', 'extract the table from this PDF', 'describe this diagram', 'what's in this screenshot', 'analyze this chart', 'read this image', 'get text from this PDF', 'summarize this document', or requests for specific data extraction from visual or document files. Use when analyzed/interpreted content is needed rather than literal file reading (which uses Read tool).

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/look-at" ~/.claude/skills/majiayu000-claude-skill-registry-look-at && rm -rf "$T"
manifest: skills/data/look-at/SKILL.md
source content

Look At - Multimodal File Analysis

Fast, cost-effective file analysis using Google's Gemini 2.5 Flash Lite model for PDFs, images, diagrams, and other media files.

Tool Selection Enforcement

Rationalization Table - STOP When Thinking:

ExcuseRealityDo Instead
"I can read images directly with Read"You'll waste thousands of context tokens showing the full imageUse look_at for analysis
"I'll use Read for this PDF"You'll lose table structure and visual information by extracting raw textUse look_at for PDFs with tables/charts/diagrams
"Just a quick glance at the file"Your quick glances still consume full context tokensUse look_at for targeted extraction
"I need exact text, so Read is required"Gemini's extraction is accurate for most use casesUse look_at first, Read only if extraction insufficient
"look_at adds complexity"You gain context savings and faster processingUse look_at for media files
"The file is small"Your small files still waste context if uninterpretedSize doesn't determine tool choice, content type does
"I'll process it myself"You waste reasoning tokens on trivial extractionDelegate to look_at

Red Flags - STOP Immediately When Thinking:

  • If you catch yourself thinking "Let me Read this image/PDF/screenshot" → STOP. Use look_at for media files.
  • If you catch yourself thinking "I can see the image directly" → STOP. Seeing it directly still wastes context. Use look_at.
  • If you catch yourself thinking "Just need to glance at this diagram" → STOP. Glancing still costs context tokens. Use look_at.
  • If you catch yourself thinking "The PDF is text-based, so Read is fine" → STOP. If it has structure/tables/charts, use look_at.

Cost & Context Benefits

ScenarioRead Toollook_at Tool
PDF with tableExtracts raw text (~1000 tokens), loses table structureExtracts table as structured data (~100 tokens)
ScreenshotLoads entire image (~500 tokens), requires interpretationDescribes content (~50 tokens)
DiagramShows image (~800 tokens), requires analysisExplains architecture (~100 tokens)
Multi-page PDFAll pages loaded (~5000 tokens)Extracts specific sections (~200 tokens)

look_at saves 80-95% of context tokens by extracting only relevant information.

When to Use

Use look_at when you need:

  • Media files the Read tool cannot interpret
  • Extracting specific information or summaries from documents
  • Describing visual content in images or diagrams
  • Analyzing charts, tables, or structured data in PDFs
  • When analyzed/extracted data is needed, not raw file contents

Never use look_at when:

  • Source code or plain text files needing exact contents (use Read)
  • Files that need editing afterward (need literal content from Read)
  • Simple file reading where no interpretation is needed
  • Exact formatting or structure must be preserved

How It Works

  1. Provide a file path and a specific goal (what to extract)
  2. The helper script uploads the file to Gemini's API
  3. Gemini 2.5 Flash Lite analyzes the file and extracts requested information
  4. Only the relevant extracted information is returned (saves context tokens)

Usage Pattern

CRITICAL - Display Requirement: Always set the Bash tool

description
parameter to show a clean invocation:

description: "look-at: [goal text]"

Never display the full Python command to the user.

# Basic usage
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
    --file "/path/to/file.pdf" \
    --goal "Extract the title and date from this document"

# With custom model
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
    --file "/path/to/diagram.png" \
    --goal "Describe the architecture shown in this diagram" \
    --model "gemini-2.5-flash"

IMPORTANT:

  • Always use absolute paths for files
  • Always set Bash tool
    description
    to
    "look-at: [goal]"
    for clean UX

Response Rules

When using look_at, the response includes:

  • Only the extracted information matching the goal
  • Clear statement if requested information is not found
  • Concise output focused on the goal (no preamble)

Use this extracted information directly in continued work without loading the full file into context.

Supported File Types

TypeExtensionsMIME Types
Images.jpg, .jpeg, .png, .webp, .heic, .heifimage/*
Videos.mp4, .mpeg, .mov, .avi, .webmvideo/*
Audio.wav, .mp3, .aiff, .aac, .ogg, .flacaudio/*
Documents.pdf, .txt, .csv, .md, .htmlapplication/pdf, text/*

Model Options

ModelUse CaseSpeedCost
gemini-2.5-flash-lite
Default - fast, cheap analysisFastestLowest
gemini-3-flash
More complex extraction needsFastLow
gemini-3-pro-preview
Highest accuracy requiredMediumMedium

Default is gemini-2.5-flash-lite for optimal speed/cost ratio.

Common Patterns

REMEMBER: Always use

description: "look-at: [goal]"
in the Bash tool call.

Extract Specific Information

# Bash tool call with:
# description: "look-at: Extract the executive summary section"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
    --file "report.pdf" \
    --goal "Extract the executive summary section"

Describe Visual Content

# Bash tool call with:
# description: "look-at: List all UI elements and their layout"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
    --file "screenshot.png" \
    --goal "List all UI elements and their layout"

Analyze Diagrams

# Bash tool call with:
# description: "look-at: Explain the data flow and component relationships"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
    --file "architecture.png" \
    --goal "Explain the data flow and component relationships"

Extract Structured Data

# Bash tool call with:
# description: "look-at: Extract the table data as JSON"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
    --file "table.pdf" \
    --goal "Extract the table data as JSON with columns: name, value, date"

Environment Setup

Required environment variable:

export GOOGLE_API_KEY="your-api-key-here"

Required Python package:

pip install google-genai

For pixi-managed projects, add to

pixi.toml
:

[dependencies]
google-genai = ">=1.0.0"

Cost Optimization

  • Gemini 2.5 Flash Lite is the most cost-effective option
  • Only extracts requested information (saves on output tokens)
  • Avoids loading full files into main conversation context
  • Use specific goals to minimize unnecessary processing

Troubleshooting

IssueSolution
API key not setSet
GOOGLE_API_KEY
environment variable
File not foundUse absolute paths, verify file exists
Large file timeoutBreak into smaller files or use lower-quality images
Rate limit errorsAdd retry logic or use batch processing
Empty responseCheck that goal is clear and specific

Examples

See

examples/
directory for:

  • analyze_pdf.sh
    - PDF document extraction
  • describe_image.sh
    - Image analysis
  • extract_table.sh
    - Structured data extraction

Related Skills

  • /gemini-batch
    - For batch processing of many files
  • Standard
    Read
    tool - For text files needing exact contents