Claude-skill-registry gemini-image
Invoke Google Gemini for image generation and understanding using the Python google-genai SDK. Supports gemini-3-pro-image-preview (generation + understanding), gemini-2.5-flash-image (fast generation), and vision models for analysis.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/gemini-image" ~/.claude/skills/majiayu000-claude-skill-registry-gemini-image-bf43af && rm -rf "$T"
manifest:
skills/data/gemini-image/SKILL.mdsource content
Gemini Image Skill
Invoke Google Gemini models for image generation, image understanding, and visual analysis using the Python
google-genai SDK.
Available Models
| Model ID | Description | Best For | Output Format |
|---|---|---|---|
| Best image generation + understanding | High-quality image gen, complex visual analysis | JPEG |
| Fast image generation | Quick image creation | PNG |
| Multimodal understanding | Image analysis without generation | N/A |
| Fast vision | Quick image analysis | N/A |
Configuration
API Key:
${GEMINI_API_KEY}
Usage
Image Generation
python -c " from google import genai from google.genai import types client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) response = client.models.generate_content( model='gemini-3-pro-image-preview', # Returns JPEG | Use gemini-2.5-flash-image for PNG contents='Generate an image of a sunset over mountains', config=types.GenerateContentConfig( response_modalities=['IMAGE', 'TEXT'] ) ) # Map mime types to file extensions mime_to_ext = {'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif', 'image/webp': '.webp'} # Save generated image if response.candidates and response.candidates[0].content: for part in response.candidates[0].content.parts: if hasattr(part, 'inline_data') and part.inline_data: ext = mime_to_ext.get(part.inline_data.mime_type, '.png') filename = f'output{ext}' # Data is already raw bytes - no base64 decode needed with open(filename, 'wb') as f: f.write(part.inline_data.data) print(f'Image saved to {filename} ({part.inline_data.mime_type})') elif hasattr(part, 'text'): print(part.text) "
Image Understanding (Analyze Image from File)
python -c " from google import genai from google.genai import types import base64 client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) # Read image file - must be base64 encoded for INPUT with open('IMAGE_PATH', 'rb') as f: image_data = base64.b64encode(f.read()).decode('utf-8') response = client.models.generate_content( model='gemini-3-pro-preview', contents=[ types.Content(parts=[ types.Part(text='Describe this image in detail'), types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data)) ]) ] ) print(response.text) "
Image Understanding (From URL)
python -c " from google import genai from google.genai import types import urllib.request import base64 client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) # Fetch image from URL - must be base64 encoded for INPUT url = 'IMAGE_URL_HERE' with urllib.request.urlopen(url) as response: image_data = base64.b64encode(response.read()).decode('utf-8') response = client.models.generate_content( model='gemini-3-pro-preview', contents=[ types.Content(parts=[ types.Part(text='What is in this image?'), types.Part(inline_data=types.Blob(mime_type='image/jpeg', data=image_data)) ]) ] ) print(response.text) "
Workflow
When this skill is invoked:
-
Determine the task type:
- Image Generation: User wants to create an image
- Image Understanding: User wants to analyze an existing image
- Image Editing: User wants to modify an image (generation with reference)
-
Select the appropriate model:
- Image generation →
(JPEG) orgemini-3-pro-image-preview
(PNG)gemini-2.5-flash-image - Image analysis →
orgemini-3-pro-previewgemini-2.5-flash
- Image generation →
-
Prepare the input:
- For generation: Text prompt describing desired image
- For understanding: Load image file as base64
-
Execute and handle output:
- Generation: Save binary image data to file
- Understanding: Return text description
Example Invocations
Generate Product Image
python -c " from google import genai from google.genai import types client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) response = client.models.generate_content( model='gemini-3-pro-image-preview', contents='Create a professional product photo of a sleek wireless headphone on a white background, studio lighting', config=types.GenerateContentConfig( response_modalities=['IMAGE', 'TEXT'] ) ) mime_to_ext = {'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif', 'image/webp': '.webp'} if response.candidates and response.candidates[0].content: for part in response.candidates[0].content.parts: if hasattr(part, 'inline_data') and part.inline_data: ext = mime_to_ext.get(part.inline_data.mime_type, '.png') with open(f'headphone{ext}', 'wb') as f: f.write(part.inline_data.data) print(f'Image saved to headphone{ext}') "
Analyze Screenshot
python -c " from google import genai from google.genai import types import base64 client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) with open('screenshot.png', 'rb') as f: image_data = base64.b64encode(f.read()).decode('utf-8') response = client.models.generate_content( model='gemini-3-pro-preview', contents=[ types.Content(parts=[ types.Part(text='Analyze this UI screenshot. Identify any usability issues and suggest improvements.'), types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data)) ]) ] ) print(response.text) "
OCR / Extract Text from Image
python -c " from google import genai from google.genai import types import base64 client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) with open('document.png', 'rb') as f: image_data = base64.b64encode(f.read()).decode('utf-8') response = client.models.generate_content( model='gemini-3-pro-preview', contents=[ types.Content(parts=[ types.Part(text='Extract all text from this image. Preserve formatting where possible.'), types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data)) ]) ] ) print(response.text) "
Compare Two Images
python -c " from google import genai from google.genai import types import base64 client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) with open('image1.png', 'rb') as f: img1_data = base64.b64encode(f.read()).decode('utf-8') with open('image2.png', 'rb') as f: img2_data = base64.b64encode(f.read()).decode('utf-8') response = client.models.generate_content( model='gemini-3-pro-preview', contents=[ types.Content(parts=[ types.Part(text='Compare these two images. What are the key differences?'), types.Part(inline_data=types.Blob(mime_type='image/png', data=img1_data)), types.Part(inline_data=types.Blob(mime_type='image/png', data=img2_data)) ]) ] ) print(response.text) "
Image Generation Parameters
When generating images, you can customize:
config=types.GenerateContentConfig( response_modalities=['IMAGE', 'TEXT'], # Request both image and description temperature=1.0, # Higher = more creative # Additional parameters may be model-specific )
Supported Image Formats
Input (for understanding):
- PNG (
)image/png - JPEG (
)image/jpeg - GIF (
)image/gif - WebP (
)image/webp
Output (from generation):
- PNG (default,
)image/png - The API returns raw bytes in
(NOT base64 encoded)part.inline_data.data - Check
to determine the actual format returnedpart.inline_data.mime_type
Error Handling
Common errors and solutions:
- Image too large: Resize image before sending (max varies by model)
- Unsupported format: Convert to PNG/JPEG
- Generation blocked: Adjust prompt to comply with safety guidelines
- Rate limiting: Implement retry with exponential backoff
Notes
- Image generation requires
in configresponse_modalities=['IMAGE', 'TEXT'] - For best results with generation, be specific and descriptive in prompts
- Image understanding works with both local files and URLs
- Multiple images can be sent in a single request for comparison
- Gemini 3 Pro Image is NOT available via CLI - must use Python SDK
Tools to Use
- Bash: Execute Python commands
- Read: Load image files (binary mode)
- Write: Save generated images
- Glob: Find image files in directories