Memento-Skills image-analysis

Analyze local images using vision-capable LLM. Use when the user question depends on visual content from a local image file — visual question answering, describing images, reading text in images, identifying objects, etc.

install
source · Clone the upstream repo
git clone https://github.com/Memento-Teams/Memento-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Memento-Teams/Memento-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/builtin/skills/image-analysis" ~/.claude/skills/memento-teams-memento-skills-image-analysis && rm -rf "$T"
manifest: builtin/skills/image-analysis/SKILL.md
source content

Image Analysis

Analyze local images using the project's configured LLM (via litellm). Works with any vision-capable model (GPT-4o, Claude 3, Gemini, etc.).

Quick start

# Analyze an image with a question
python3 scripts/analyze_image.py --image "/path/to/image.png" --prompt "Describe what you see in the image"

# Use a specific model (overrides project config)
python3 scripts/analyze_image.py --image "/path/to/photo.jpg" --prompt "What text is visible?" --model "openai/gpt-4o"

# Basic mode: extract image metadata without LLM (no API key needed)
python3 scripts/analyze_image.py --image "/path/to/image.png" --basic

# Increase output length and timeout
python3 scripts/analyze_image.py --image "/path/to/diagram.png" --prompt "Explain this diagram" --max-tokens 4096 --timeout 120

Options

FlagDescriptionDefault
--image
Path to local image file (required)
--prompt
Question or instruction for the image"Describe this image in detail"
--model
Override model id (e.g.
openai/gpt-4o
)
project config
--max-tokens
Max output tokens
2048
--timeout
HTTP timeout in seconds
60
--basic
Extract image metadata only (no LLM needed)off

Model configuration

The script reads model/API configuration from the project's config (

middleware/config
). Ensure your configured model supports vision (multimodal) input.

Override with

--model
to use a specific model for this call.

Basic mode

When

--basic
is used (or when no LLM is configured), the script uses Pillow to extract:

  • Image format, size, color mode
  • EXIF metadata (camera, date, GPS if available)
  • Color statistics (dominant colors, histogram)

Supported formats

PNG, JPEG, GIF, WebP, BMP, TIFF, and other common image formats.