Memento-Skills image-analysis
Analyze local images using vision-capable LLM. Use when the user question depends on visual content from a local image file — visual question answering, describing images, reading text in images, identifying objects, etc.
install
source · Clone the upstream repo
git clone https://github.com/Memento-Teams/Memento-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Memento-Teams/Memento-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/builtin/skills/image-analysis" ~/.claude/skills/memento-teams-memento-skills-image-analysis && rm -rf "$T"
manifest:
builtin/skills/image-analysis/SKILL.mdsource content
Image Analysis
Analyze local images using the project's configured LLM (via litellm). Works with any vision-capable model (GPT-4o, Claude 3, Gemini, etc.).
Quick start
# Analyze an image with a question python3 scripts/analyze_image.py --image "/path/to/image.png" --prompt "Describe what you see in the image" # Use a specific model (overrides project config) python3 scripts/analyze_image.py --image "/path/to/photo.jpg" --prompt "What text is visible?" --model "openai/gpt-4o" # Basic mode: extract image metadata without LLM (no API key needed) python3 scripts/analyze_image.py --image "/path/to/image.png" --basic # Increase output length and timeout python3 scripts/analyze_image.py --image "/path/to/diagram.png" --prompt "Explain this diagram" --max-tokens 4096 --timeout 120
Options
| Flag | Description | Default |
|---|---|---|
| Path to local image file (required) | — |
| Question or instruction for the image | "Describe this image in detail" |
| Override model id (e.g. ) | project config |
| Max output tokens | |
| HTTP timeout in seconds | |
| Extract image metadata only (no LLM needed) | off |
Model configuration
The script reads model/API configuration from the project's config (
middleware/config). Ensure your configured model supports vision (multimodal) input.
Override with
--model to use a specific model for this call.
Basic mode
When
--basic is used (or when no LLM is configured), the script uses Pillow to extract:
- Image format, size, color mode
- EXIF metadata (camera, date, GPS if available)
- Color statistics (dominant colors, histogram)
Supported formats
PNG, JPEG, GIF, WebP, BMP, TIFF, and other common image formats.