Claude-skill-registry gemini-image

Invoke Google Gemini for image generation and understanding using the Python google-genai SDK. Supports gemini-3-pro-image-preview (generation + understanding), gemini-2.5-flash-image (fast generation), and vision models for analysis.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/gemini-image" ~/.claude/skills/majiayu000-claude-skill-registry-gemini-image-bf43af && rm -rf "$T"
manifest: skills/data/gemini-image/SKILL.md
source content

Gemini Image Skill

Invoke Google Gemini models for image generation, image understanding, and visual analysis using the Python

google-genai
SDK.

Available Models

Model IDDescriptionBest ForOutput Format
gemini-3-pro-image-preview
Best image generation + understandingHigh-quality image gen, complex visual analysisJPEG
gemini-2.5-flash-image
Fast image generationQuick image creationPNG
gemini-3-pro-preview
Multimodal understandingImage analysis without generationN/A
gemini-2.5-flash
Fast visionQuick image analysisN/A

Configuration

API Key:

${GEMINI_API_KEY}

Usage

Image Generation

python -c "
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

response = client.models.generate_content(
    model='gemini-3-pro-image-preview',  # Returns JPEG | Use gemini-2.5-flash-image for PNG
    contents='Generate an image of a sunset over mountains',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE', 'TEXT']
    )
)

# Map mime types to file extensions
mime_to_ext = {'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif', 'image/webp': '.webp'}

# Save generated image
if response.candidates and response.candidates[0].content:
    for part in response.candidates[0].content.parts:
        if hasattr(part, 'inline_data') and part.inline_data:
            ext = mime_to_ext.get(part.inline_data.mime_type, '.png')
            filename = f'output{ext}'
            # Data is already raw bytes - no base64 decode needed
            with open(filename, 'wb') as f:
                f.write(part.inline_data.data)
            print(f'Image saved to {filename} ({part.inline_data.mime_type})')
        elif hasattr(part, 'text'):
            print(part.text)
"

Image Understanding (Analyze Image from File)

python -c "
from google import genai
from google.genai import types
import base64

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

# Read image file - must be base64 encoded for INPUT
with open('IMAGE_PATH', 'rb') as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='Describe this image in detail'),
            types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data))
        ])
    ]
)
print(response.text)
"

Image Understanding (From URL)

python -c "
from google import genai
from google.genai import types
import urllib.request
import base64

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

# Fetch image from URL - must be base64 encoded for INPUT
url = 'IMAGE_URL_HERE'
with urllib.request.urlopen(url) as response:
    image_data = base64.b64encode(response.read()).decode('utf-8')

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='What is in this image?'),
            types.Part(inline_data=types.Blob(mime_type='image/jpeg', data=image_data))
        ])
    ]
)
print(response.text)
"

Workflow

When this skill is invoked:

  1. Determine the task type:

    • Image Generation: User wants to create an image
    • Image Understanding: User wants to analyze an existing image
    • Image Editing: User wants to modify an image (generation with reference)
  2. Select the appropriate model:

    • Image generation →
      gemini-3-pro-image-preview
      (JPEG) or
      gemini-2.5-flash-image
      (PNG)
    • Image analysis →
      gemini-3-pro-preview
      or
      gemini-2.5-flash
  3. Prepare the input:

    • For generation: Text prompt describing desired image
    • For understanding: Load image file as base64
  4. Execute and handle output:

    • Generation: Save binary image data to file
    • Understanding: Return text description

Example Invocations

Generate Product Image

python -c "
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

response = client.models.generate_content(
    model='gemini-3-pro-image-preview',
    contents='Create a professional product photo of a sleek wireless headphone on a white background, studio lighting',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE', 'TEXT']
    )
)

mime_to_ext = {'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif', 'image/webp': '.webp'}

if response.candidates and response.candidates[0].content:
    for part in response.candidates[0].content.parts:
        if hasattr(part, 'inline_data') and part.inline_data:
            ext = mime_to_ext.get(part.inline_data.mime_type, '.png')
            with open(f'headphone{ext}', 'wb') as f:
                f.write(part.inline_data.data)
            print(f'Image saved to headphone{ext}')
"

Analyze Screenshot

python -c "
from google import genai
from google.genai import types
import base64

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

with open('screenshot.png', 'rb') as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='Analyze this UI screenshot. Identify any usability issues and suggest improvements.'),
            types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data))
        ])
    ]
)
print(response.text)
"

OCR / Extract Text from Image

python -c "
from google import genai
from google.genai import types
import base64

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

with open('document.png', 'rb') as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='Extract all text from this image. Preserve formatting where possible.'),
            types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data))
        ])
    ]
)
print(response.text)
"

Compare Two Images

python -c "
from google import genai
from google.genai import types
import base64

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

with open('image1.png', 'rb') as f:
    img1_data = base64.b64encode(f.read()).decode('utf-8')
with open('image2.png', 'rb') as f:
    img2_data = base64.b64encode(f.read()).decode('utf-8')

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='Compare these two images. What are the key differences?'),
            types.Part(inline_data=types.Blob(mime_type='image/png', data=img1_data)),
            types.Part(inline_data=types.Blob(mime_type='image/png', data=img2_data))
        ])
    ]
)
print(response.text)
"

Image Generation Parameters

When generating images, you can customize:

config=types.GenerateContentConfig(
    response_modalities=['IMAGE', 'TEXT'],  # Request both image and description
    temperature=1.0,  # Higher = more creative
    # Additional parameters may be model-specific
)

Supported Image Formats

Input (for understanding):

  • PNG (
    image/png
    )
  • JPEG (
    image/jpeg
    )
  • GIF (
    image/gif
    )
  • WebP (
    image/webp
    )

Output (from generation):

  • PNG (default,
    image/png
    )
  • The API returns raw bytes in
    part.inline_data.data
    (NOT base64 encoded)
  • Check
    part.inline_data.mime_type
    to determine the actual format returned

Error Handling

Common errors and solutions:

  • Image too large: Resize image before sending (max varies by model)
  • Unsupported format: Convert to PNG/JPEG
  • Generation blocked: Adjust prompt to comply with safety guidelines
  • Rate limiting: Implement retry with exponential backoff

Notes

  • Image generation requires
    response_modalities=['IMAGE', 'TEXT']
    in config
  • For best results with generation, be specific and descriptive in prompts
  • Image understanding works with both local files and URLs
  • Multiple images can be sent in a single request for comparison
  • Gemini 3 Pro Image is NOT available via CLI - must use Python SDK

Tools to Use

  • Bash: Execute Python commands
  • Read: Load image files (binary mode)
  • Write: Save generated images
  • Glob: Find image files in directories