Skillshub image-generation

Image Generation & Editing Skill

install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/michaelboeding/skills/image-generation" ~/.claude/skills/comeonoliver-skillshub-image-generation && rm -rf "$T"
manifest: skills/michaelboeding/skills/image-generation/SKILL.md
source content

Image Generation & Editing Skill

Generate and edit images using AI (Google Gemini Nano Banana Pro, OpenAI DALL-E 3).

Capabilities:

  • 🎨 Generate: Create new images from text descriptions
  • ✏️ Edit: Modify existing images (add/remove elements, change colors)
  • 🛍️ Product Placement: Put products into scenes
  • 🎭 Style Transfer: Apply artistic styles to photos
  • 🖼️ Composite: Combine multiple images into one

Quick Examples

Users can specify what they want:

User SaysModeWhat Happens
"Generate an image of a sunset"GenerateText-to-image, no reference needed
"Create a logo for my coffee shop"GenerateText-to-image with text rendering
"Edit this image: add a hat to the cat"EditUser provides image, AI modifies it
"Remove the background from this photo"EditUser provides image, AI edits it
"Put this product on a kitchen counter"ProductUser provides product + optional scene
"Make this photo look like Van Gogh painted it"StyleUser provides photo, AI applies style
"Combine these photos into a group shot"CompositeUser provides multiple images

Prerequisites

Environment variables must be configured for the APIs to work. At least one API key is required:

  • OPENAI_API_KEY
    - For OpenAI DALL-E 3 image generation
  • GOOGLE_API_KEY
    - For Google Gemini (Nano Banana / Nano Banana Pro)

See the repository README for setup instructions.

Available APIs

OpenAI GPT Image (Recommended for pure generation)

  • Models:
    • gpt-image-1.5
      (state of the art, best quality)
    • gpt-image-1
      (great quality, cost-effective)
    • gpt-image-1-mini
      (fastest, most affordable)
  • Best for: High-quality generation, transparency, text rendering, image editing
  • Sizes: 1024x1024 (square), 1536x1024 (landscape), 1024x1536 (portrait), or
    auto
  • Quality: low (fast), medium (balanced), high (best), or
    auto
  • Background: transparent, opaque, or
    auto
  • Output formats: png (default), jpeg (faster), webp
  • Compression: 0-100% (for jpeg/webp)
  • Features:
    • Image editing with up to 16 input images
    • Transparent backgrounds
    • Streaming with partial images
    • High input fidelity for preserving faces/logos
    • Inpainting with masks
    • 32,000 character prompts

⚠️ Note: DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026.

Google Gemini Native Image Generation (Recommended for editing)

  • Nano Banana (
    gemini-2.5-flash-image
    ): Fast, efficient, 1K resolution, up to 3 reference images
  • Nano Banana Pro (
    gemini-3-pro-image-preview
    ): Professional quality, up to 4K, thinking mode, up to 14 reference images (default)
  • Aspect ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
  • Resolutions (Pro only): 1K, 2K, 4K
  • Features:
    • Image editing (add/remove elements, color changes)
    • Product placement and composition
    • Style transfer
    • Advanced text rendering
    • Google Search grounding (Pro only)
    • Thinking mode for complex prompts (Pro only)

Workflow

Step 1: Gather Requirements (REQUIRED)

⚠️ Use interactive questioning — ask ONE question at a time.

Question Flow

⚠️ Use the

AskUserQuestion
tool for each question below. Do not just print questions in your response — use the tool to create interactive prompts with the options shown.

Q0: Model Selection

"Which image generation model would you like to use?

  • Google Gemini (Nano Banana Pro) - Up to 4K, 14 reference images, style transfer, thinking mode (Recommended)
  • OpenAI GPT Image 1.5 - State of the art, transparency, streaming, up to 16 input images
  • OpenAI GPT Image 1 - Great quality, transparency, image editing
  • OpenAI GPT Image 1 Mini - Fastest, most affordable"

Wait for response. If user doesn't have a preference, recommend Gemini for editing/reference tasks or GPT Image 1.5 for pure generation.

Q1: Reference

"I'll generate that image for you! First — do you have any reference images?

  • Product photos to include
  • Style references
  • Images to edit
  • No, generate from scratch"

Wait for response.

Q2: Aspect Ratio

"What aspect ratio?

  • 1:1 (square)
  • 16:9 (landscape/widescreen)
  • 9:16 (portrait/vertical)
  • 4:3 / 3:4 (classic)
  • Other (2:3, 3:2, 4:5, 5:4, 21:9)
  • Or specify"

Wait for response.

Q3: Resolution

"What resolution?

  • 1K (fast)
  • 2K (balanced)
  • 4K (highest quality)"

Wait for response.

Q4: Style

"Any style preferences?

  • Photorealistic
  • Artistic/painterly
  • Cartoon/illustration
  • 3D render
  • Or describe your own"

Wait for response.

Quick Reference

QuestionDetermines
ReferenceGeneration vs editing mode
Aspect RatioImage dimensions
ResolutionQuality level
StylePrompt enhancement direction

Parsing:

  • If user provides reference images → use image editing mode
  • If user doesn't answer all questions → use sensible defaults and note assumptions
  • Parse: subject, style, mood, special requirements (colors, text, composition)

Step 2: Craft the Prompt

Transform the user request into an effective image generation prompt:

  1. Be specific: Add details the user might not have mentioned
  2. Describe style: "digital art", "oil painting", "photograph", "3D render"
  3. Include lighting: "soft lighting", "dramatic shadows", "golden hour"
  4. Specify quality: "highly detailed", "8k", "professional"

Example transformation:

  • User: "a cat in space"
  • Enhanced: "A majestic orange tabby cat floating in outer space, surrounded by colorful nebulae and distant stars, wearing a small astronaut helmet, digital art style, highly detailed, vibrant colors, cinematic lighting"

Step 3: Select the API

Use the model selected by the user in Q0:

  1. Check which API keys are configured in environment:

    • OPENAI_API_KEY
      → GPT Image models available
    • GOOGLE_API_KEY
      → Gemini (Nano Banana Pro) available
  2. If the user's selected model isn't available: Inform them and offer alternatives.

  3. Model mapping from Q0:

    • "Google Gemini (Nano Banana Pro)" → Use
      gemini.py
      with
      gemini-3-pro-image-preview
    • "OpenAI GPT Image 1.5" → Use
      openai_image.py
      with
      gpt-image-1.5
    • "OpenAI GPT Image 1" → Use
      openai_image.py
      with
      gpt-image-1
    • "OpenAI GPT Image 1 Mini" → Use
      openai_image.py
      with
      gpt-image-1-mini

Step 4: Generate the Image

Execute the appropriate script from

${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/
:

For OpenAI GPT Image - Text to Image:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "your enhanced prompt" \
  --model "gpt-image-1" \
  --size "1024x1024" \
  --quality "high" \
  --output "/path/to/output.png"

For OpenAI GPT Image - With Transparent Background:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "A product icon with no background" \
  --model "gpt-image-1" \
  --background "transparent" \
  --quality "high" \
  --output "/path/to/output.png"

For OpenAI GPT Image - Image Editing (with reference images):

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Add a wizard hat to this cat" \
  --model "gpt-image-1" \
  --image "/path/to/cat.jpg" \
  --input-fidelity "high" \
  --output "/path/to/output.png"

For OpenAI GPT Image - Multiple Reference Images:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Create a gift basket containing these items" \
  --model "gpt-image-1" \
  --image "/path/to/item1.png" \
  --image "/path/to/item2.png" \
  --image "/path/to/item3.png" \
  --output "/path/to/output.png"

For OpenAI GPT Image - With Mask (Inpainting):

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Replace the pool with a garden" \
  --model "gpt-image-1" \
  --image "/path/to/scene.jpg" \
  --mask "/path/to/mask.png" \
  --output "/path/to/output.png"

For OpenAI GPT Image - Streaming with Partial Images:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "A beautiful sunset over mountains" \
  --model "gpt-image-1" \
  --stream \
  --partial-images 2 \
  --output "/path/to/output.png"

For Google Gemini (Nano Banana Pro) - Text to Image:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "your enhanced prompt" \
  --model "gemini-3-pro-image-preview" \
  --aspect-ratio "1:1" \
  --resolution "2K" \
  --output "/path/to/output.png"

For Google Gemini - With Reference Images (editing, product placement, etc.):

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "Add a wizard hat to this cat" \
  --image "/path/to/cat.jpg" \
  --aspect-ratio "1:1" \
  --resolution "2K"

For Google Gemini - Multiple Reference Images (composition, style transfer):

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "Place this product on the kitchen counter in this scene" \
  --image "/path/to/product.png" \
  --image "/path/to/kitchen.jpg" \
  --aspect-ratio "16:9" \
  --resolution "2K"

For Google Gemini (Nano Banana - faster, fewer features):

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "your enhanced prompt" \
  --model "gemini-2.5-flash-image" \
  --aspect-ratio "1:1"

Step 5: Deliver the Result

  1. Show the generated image to the user
  2. Provide the enhanced prompt used (so they can iterate)
  3. Offer to:
    • Generate variations
    • Try a different style
    • Use a different API/model
    • Refine the prompt

Error Handling

Missing API key: Inform the user which key is needed and how to set it up:

API rate limit: Suggest waiting or trying the other API.

Content policy violation: Rephrase the prompt to be more appropriate.

Generation failed: Retry with simplified prompt or different API.

Reference Image Use Cases

Both OpenAI GPT Image and Google Gemini support reference images for advanced editing:

OpenAI GPT Image: Up to 16 input images, with

input_fidelity: high
for preserving faces/logos Google Gemini: Nano Banana (up to 3), Nano Banana Pro (up to 14)

Image Editing

  • "Add a santa hat to this person" + person.jpg
  • "Remove the background and replace with a beach scene" + product.jpg
  • "Change the sofa color to blue" + living_room.jpg

Product Placement

  • "Place this product on a marble kitchen counter" + product.png + kitchen.jpg
  • "Show this watch on a person's wrist" + watch.png + arm.jpg

Style Transfer

  • "Transform this photo into Van Gogh's Starry Night style" + photo.jpg
  • "Make this look like a watercolor painting" + landscape.jpg

Multi-Image Composition

  • "Create a group photo of these people in an office" + person1.jpg + person2.jpg + person3.jpg
  • "Combine these elements into a cohesive scene" + element1.png + element2.png + background.jpg

Character Consistency

  • "Show this character from a different angle" + character.jpg
  • "Put this person in a superhero costume" + person.jpg

Tip: For best results with reference images, be specific about what you want to preserve vs. change.

Prompt Engineering Tips

For Photorealism

  • Include "photograph", "DSLR", "35mm film"
  • Specify camera settings: "shallow depth of field", "bokeh"
  • Add lighting: "natural light", "studio lighting"

For Artistic Styles

  • Reference art movements: "impressionist", "art nouveau", "cyberpunk"
  • Name artist styles: "in the style of Studio Ghibli", "Moebius style"
  • Specify medium: "watercolor", "oil painting", "pencil sketch"

For Consistency

  • Use seed values when available
  • Save successful prompts for reference
  • Note which API produced best results for similar requests

API Comparison

FeatureGPT Image 1.5GPT Image 1GPT Image 1 MiniNano BananaNano Banana Pro
ProviderOpenAIOpenAIOpenAIGoogleGoogle
Model IDgpt-image-1.5gpt-image-1gpt-image-1-minigemini-2.5-flash-imagegemini-3-pro-image-preview
Best forState of the artQuality + valueSpeed + costFast generationProfessional assets
Sizes1024², 1536x1024, 1024x1536, autoSameSame1K onlyUp to 4K
Quality optionslow, medium, high, autoSameSameN/AN/A
Aspect ratios3 + autoSameSame10 options10 options
Reference imagesUp to 16Up to 16Up to 16Up to 3Up to 14
Image editingYesYesYesYesYes
Inpainting (mask)YesYesYesYesYes
Transparent backgroundYesYesYesNoNo
StreamingYesYesYesNoNo
Input fidelityhigh/lowhigh/lowlow onlyN/AN/A
Output formatspng, jpeg, webpSameSamepngpng
Compression0-100%SameSameNoNo
Text renderingExcellentExcellentGoodGoodExcellent
Thinking modeNoNoNoNoYes
Max prompt length32,000 chars32,000 chars32,000 charsN/AN/A
Speed~30-60s~20-40s~10-20s~10-20s~30-60s

⚠️ DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026. Use GPT Image models instead.