Claude-skill-registry imggen
Use this skill when users want to generate images using OpenAI's image generation API (DALL-E or gpt-image-1), or extract text from images using OCR. Invoke when users request AI-generated images, artwork, logos, illustrations, visual content from text prompts, or need to extract text/data from images.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/imggen" ~/.claude/skills/majiayu000-claude-skill-registry-imggen && rm -rf "$T"
skills/data/imggen/SKILL.mdimggen - OpenAI Image Generation and OCR CLI
Generate images from text prompts and extract text from images using OpenAI's APIs.
Overview
imggen is a command-line tool that interfaces with OpenAI's image generation API. It supports multiple models (gpt-image-1, dall-e-3, dall-e-2) and provides options for image size, quality, format, and style.
Prerequisites
binary installed and available in PATHimggen
environment variable set with a valid OpenAI API keyOPENAI_API_KEY- Sufficient OpenAI API credits for image generation
Usage
imggen [flags] "prompt"
Available Flags
| Flag | Short | Default | Description |
|---|---|---|---|
| | | Model: gpt-image-1, dall-e-3, dall-e-2 |
| | | Image dimensions |
| | | Quality level |
| | | Number of images (1-10 for gpt-image-1, 1 for dall-e-3) |
| | auto-generated | Output filename or directory |
| | | Output format: png, jpeg, webp |
| | Style for dall-e-3: vivid, natural | |
| | | Transparent background (gpt-image-1 + png/webp only) |
| | Prompt (can be specified multiple times) | |
| | | Number of parallel workers for multiple prompts |
| | Override API key |
Model-Specific Parameters
gpt-image-1 (Default, Recommended)
- Sizes: 1024x1024, 1536x1024 (landscape), 1024x1536 (portrait), auto
- Quality: auto, low, medium, high
- Max images: 10 per request
- Supports: Transparent backgrounds, multiple output formats
dall-e-3
- Sizes: 1024x1024, 1024x1792, 1792x1024
- Quality: standard, hd
- Max images: 1 per request
- Supports: Style parameter (vivid/natural)
dall-e-2
- Sizes: 256x256, 512x512, 1024x1024
- Max images: 10 per request
Instructions
- Verify
is set in the environmentOPENAI_API_KEY - Construct the imggen command with appropriate flags based on user requirements
- Execute the command using Bash tool
- Report the generated filename and any revised prompt returned by the API
- If the user wants to view the image, use Read tool on the generated file
Output Format
The tool outputs:
- Progress message: "Generating N image(s) with MODEL..."
- Saved filename: "Saved: filename.png"
- Cost information: "Cost: $X.XXXX (N image(s) @ $X.XXXX each)"
- Revised prompt (if returned by API): "Revised prompt: ..."
- Completion message: "Done!"
Generated files are saved to the current working directory with timestamp-based names (e.g.,
image-20251216-120000.png) unless --output is specified.
Cost Tracking
All image generation costs are automatically logged to
~/.imggen/sessions.db. View costs using the cost subcommand:
# View total costs imggen cost # View today's costs imggen cost today # View this week's costs (last 7 days) imggen cost week # View this month's costs (last 30 days) imggen cost month # View costs by provider imggen cost provider
Interactive Mode Cost Commands
In interactive mode (
imggen -i), use the cost or $ command:
- Today's costscost today
- This week's costscost week
- This month's costscost month
- All-time totalcost total
- Breakdown by providercost provider
- Current session's costscost session
Database Management
Manage the SQLite database storing sessions and cost data:
# Reset database (delete all data) imggen db reset # Reset with backup of old data imggen db reset --backup # Show database location and stats imggen db info
Examples
Basic image generation
imggen "a sunset over mountains"
High-quality landscape with DALL-E 3
imggen -m dall-e-3 -s 1792x1024 -q hd "panoramic view of a futuristic city"
Multiple images with gpt-image-1
imggen -n 4 -q high "abstract geometric pattern"
Logo with transparent background
imggen -t -f png "minimalist tech company logo, flat design"
Custom output filename
imggen -o hero-image.png "website hero banner with gradient"
Natural style portrait
imggen -m dall-e-3 --style natural "professional headshot, studio lighting"
Multiple prompts via command line
# Generate multiple images with --prompt flag imggen --prompt "a sunset" --prompt "a cat" --prompt "a dog" -o ./output # Short form with parallel processing (3 workers) imggen -P "sunset" -P "mountains" -P "ocean" -o ./images -p 3
Batch generation from file
# From a text file (one prompt per line) imggen batch prompts.txt -o ./output # From a JSON file with per-prompt options imggen batch prompts.json -o ./output # With parallel processing imggen batch prompts.txt -o ./output -p 3
Multiple Prompts
Generate multiple images from command-line prompts using the
--prompt/-P flag:
imggen --prompt "a sunset over mountains" --prompt "a cat playing piano" -o ./output
This processes all prompts and saves images to the output directory with indexed filenames:
001-a-sunset-over-mountains.png002-a-cat-playing-piano.png
Use
--parallel/-p to control concurrent processing (default: 1 = sequential).
Batch Generation
Generate multiple images from a file of prompts using the
batch subcommand:
imggen batch <input-file> [flags]
Input File Formats
Text file (.txt) - One prompt per line (lines starting with
# are ignored):
a sunset over mountains a cat playing piano abstract geometric art
JSON file (.json) - Array of objects with optional per-prompt settings:
[ {"prompt": "a sunset over mountains"}, {"prompt": "a cat playing piano", "model": "dall-e-3", "quality": "hd"}, {"prompt": "abstract art", "size": "1792x1024"} ]
Batch Flags
| Flag | Short | Default | Description |
|---|---|---|---|
| | current dir | Output directory |
| | | Default model |
| | model default | Default image size |
| | model default | Default quality level |
| | | Output format |
| | | Number of parallel workers |
| | Stop on first error | |
| | Delay between requests (ms) |
Error Handling
Common errors and solutions:
- "API key required": Set
environment variableOPENAI_API_KEY - "invalid size": Use a size supported by the selected model
- "supports maximum N images": Reduce
value--count - "does not support --style": Only dall-e-3 supports style flag
- "does not support --transparent": Only gpt-image-1 supports transparency
Pricing Reference
Costs per image (USD):
gpt-image-1
| Size | Low | Medium | High |
|---|---|---|---|
| 1024x1024 | $0.011 | $0.042 | $0.167 |
| 1536x1024 | $0.016 | $0.063 | $0.250 |
| 1024x1536 | $0.016 | $0.063 | $0.250 |
dall-e-3
| Size | Standard | HD |
|---|---|---|
| 1024x1024 | $0.040 | $0.080 |
| 1024x1792 | $0.080 | $0.120 |
| 1792x1024 | $0.080 | $0.120 |
dall-e-2
| Size | Cost |
|---|---|
| 256x256 | $0.016 |
| 512x512 | $0.018 |
| 1024x1024 | $0.020 |
OCR (Optical Character Recognition)
Extract text from images using OpenAI's vision API with optional structured output support.
OCR Usage
imggen ocr <image-path> [flags]
OCR Flags
| Flag | Short | Default | Description |
|---|---|---|---|
| | | Model: gpt-5.2, gpt-5-mini, gpt-5-nano |
| | JSON schema file for structured output | |
| | Name for the JSON schema | |
| | Suggest a JSON schema based on image content | |
| | auto | Custom extraction prompt |
| | stdout | Output file |
| Image URL instead of file path | ||
| | Override API key | |
| | | Log HTTP requests and responses |
OCR Models
| Model | Cost (Input) | Cost (Output) | Best For |
|---|---|---|---|
| gpt-5-nano | $0.05/1M tokens | $0.40/1M tokens | Ultra budget, simple text |
| gpt-5-mini | $0.25/1M tokens | $2.00/1M tokens | Cost-effective, most OCR tasks |
| gpt-5.2 | $1.75/1M tokens | $14.00/1M tokens | Complex documents, highest accuracy |
OCR Examples
Basic text extraction
imggen ocr document.png
Extract from URL
imggen ocr --url https://example.com/image.png
Save output to file
imggen ocr receipt.jpg -o extracted.txt
Structured output with JSON schema
# Create a schema file (invoice_schema.json): # { # "type": "object", # "properties": { # "vendor": {"type": "string"}, # "date": {"type": "string"}, # "total": {"type": "number"}, # "items": { # "type": "array", # "items": { # "type": "object", # "properties": { # "name": {"type": "string"}, # "price": {"type": "number"} # }, # "required": ["name", "price"], # "additionalProperties": false # } # } # }, # "required": ["vendor", "date", "total"], # "additionalProperties": false # } imggen ocr receipt.jpg --schema invoice_schema.json -o invoice.json
Auto-suggest a JSON schema
# Analyze image and suggest appropriate schema imggen ocr document.png --suggest-schema # Save suggested schema to file imggen ocr document.png --suggest-schema -o suggested_schema.json
Use higher accuracy model
imggen ocr complex-document.pdf -m gpt-5.2
Custom extraction prompt
imggen ocr business-card.jpg -p "Extract the name, title, email, and phone number"
OCR Structured Output
When using the
--schema flag, the output will be structured JSON matching your schema. This is useful for:
- Extracting data from receipts, invoices, forms
- Parsing business cards, ID documents
- Converting tables and structured content to JSON
- Data entry automation
The schema must follow JSON Schema draft-07 format with
additionalProperties: false for strict validation.
OCR Tips
- Use gpt-5-nano for simple text extraction (plain documents, basic receipts)
- Use gpt-5-mini (default) for most OCR tasks (receipts, business cards, forms)
- Use gpt-5.2 for complex documents (dense tables, handwriting, multi-language)
- Suggest schema first if unsure about document structure
- Custom prompts help when you need specific fields or formatting
- Supported formats: PNG, JPEG, GIF, WEBP, PDF (first page)