Baoyu-skills baoyu-image-gen
[Deprecated: use baoyu-imagine] AI image generation with OpenAI, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.
git clone https://github.com/JimLiu/baoyu-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/JimLiu/baoyu-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/baoyu-image-gen" ~/.claude/skills/jimliu-baoyu-skills-baoyu-image-gen && rm -rf "$T"
skills/baoyu-image-gen/SKILL.mdImage Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Z.AI GLM-Image, MiniMax, Jimeng (即梦), Seedream (豆包) and Replicate.
User Input Tools
When this skill prompts the user, follow this tool-selection rule (priority order):
- Prefer built-in user-input tools exposed by the current agent runtime — e.g.,
,AskUserQuestion
,request_user_input
,clarify
, or any equivalent.ask_user - Fallback: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.
- Batching: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.
Concrete
AskUserQuestion references below are examples — substitute the local equivalent in other runtimes.
Script Directory
{baseDir} = this SKILL.md's directory. Main script: {baseDir}/scripts/main.ts. Resolve ${BUN_X}: prefer bun; else npx -y bun; else suggest brew install oven-sh/bun/bun.
Step 0: Load Preferences ⛔ BLOCKING
This step MUST complete before any image generation — generation is blocked until EXTEND.md exists.
Check these paths in order; first hit wins:
| Path | Scope |
|---|---|
| Project |
| XDG |
| User home |
- Found → load, parse, apply. If
is null → ask model only.default_model.[provider] - Not found → run first-time setup (
) using AskUserQuestion to collect provider + model + quality + save location. Save EXTEND.md, then continue. Do not generate images before this completes.references/config/first-time-setup.md
EXTEND.md keys: default provider, default quality, default aspect ratio, default image size, OpenAI image API dialect, default models, batch worker cap, provider-specific batch limits. Schema:
references/config/preferences-schema.md.
Usage
Minimum working examples — see
references/usage-examples.md for the full set including per-provider invocations and batch mode.
# Basic ${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png # With aspect ratio and high quality ${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9 --quality 2k # Prompt from files ${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png # With reference image ${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png # Specific provider ${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider dashscope --model qwen-image-2.0-pro # Batch mode ${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4
Options
| Option | Description |
|---|---|
, | Prompt text |
| Read prompt from files (concatenated) |
| Output image path (required in single-image mode) |
| JSON batch file for multi-image generation |
| Worker count for batch mode (default: auto, max from config, built-in default 10) |
| Force provider (default: auto-detect) |
, | Model ID — see provider references for defaults and allowed values |
| Aspect ratio (, , , …) |
| Explicit size (e.g., ) |
| Quality preset (default: ) |
| Image size for Google/OpenRouter (default: from quality) |
| OpenAI-compatible endpoint dialect — use for gateways that expect aspect-ratio plus |
| Reference images. Supported by Google multimodal, OpenAI GPT Image edits, Azure OpenAI edits (PNG/JPG only), OpenRouter multimodal models, Replicate supported families, MiniMax subject-reference, Seedream 5.0/4.5/4.0. Not supported by Jimeng, Seedream 3.0, SeedEdit 3.0 |
| Number of images. Replicate requires (single-output save semantics) |
| JSON output |
Environment Variables
| Variable | Description |
|---|---|
| OpenAI API key |
| Azure OpenAI API key |
| OpenRouter API key |
| Google API key |
| DashScope API key |
(alias ) | Z.AI API key |
| MiniMax API key |
| Replicate API token |
, | Jimeng (即梦) Volcengine credentials |
| Seedream (豆包) Volcengine ARK API key |
| Per-provider model override (, , , /, , , , , ) |
(alias ) | Azure default deployment |
| Per-provider endpoint override |
| Azure image API version (default ) |
| Jimeng region (default ) |
| | |
, | Optional OpenRouter attribution |
| Override batch worker cap |
| Per-provider concurrency (e.g., ) |
| Per-provider start-gap |
Load priority: CLI args > EXTEND.md > env vars >
<cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env
Model Resolution
Priority (highest → lowest) applies to every provider:
- CLI flag
--model <id> - EXTEND.md
default_model.[provider] - Env var
<PROVIDER>_IMAGE_MODEL - Built-in default
For Azure,
--model / default_model.azure is the Azure deployment name. AZURE_OPENAI_DEPLOYMENT is the preferred env var; AZURE_OPENAI_IMAGE_MODEL is kept as a backward-compatible alias.
EXTEND.md overrides env vars: if EXTEND.md sets
default_model.google: "gemini-3-pro-image-preview" and the env var sets GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview, EXTEND.md wins.
Display model info before each generation:
Using [provider] / [model]Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL
OpenAI-Compatible Gateway Dialects
provider=openai means the auth and routing entrypoint is OpenAI-compatible. It does not guarantee the upstream image API uses OpenAI native semantics. When a gateway expects a different wire format, set default_image_api_dialect in EXTEND.md, OPENAI_IMAGE_API_DIALECT, or --imageApiDialect:
: pixelopenai-native
(size
) and native OpenAI quality fields1536x1024
: aspect-ratioratio-metadata
(size
) plus16:9
(metadata.resolution
) and1K|2K|4Kmetadata.orientation
Use
openai-native for the OpenAI native API or strict clones; try ratio-metadata for compatibility gateways in front of Gemini or similar models. Current limitation: ratio-metadata applies only to text-to-image; reference-image edits still need openai-native or a provider with first-class edit support.
Provider-Specific Guides
Each provider has its own quirks (model families, size rules, ref support, limits). Read these when the user picks that provider or asks for non-default behavior:
| Provider | Reference |
|---|---|
| DashScope (Qwen-Image families, custom sizes) | |
| Z.AI (GLM-Image, cogview-4) | |
| MiniMax (image-01, subject-reference) | |
OpenRouter (multimodal models, flow) | |
| Replicate (nano-banana, Seedream, Wan) | |
Provider Selection
provided + no--ref
→ auto-select Google → OpenAI → Azure → OpenRouter → Replicate → Seedream → MiniMax (MiniMax's subject reference is more specialized toward character/portrait consistency)--provider
specified → use it (if--provider
, must be google/openai/azure/openrouter/replicate/seedream/minimax)--ref- Only one API key present → use that provider
- Multiple keys → default priority: Google → OpenAI → Azure → OpenRouter → DashScope → Z.AI → MiniMax → Replicate → Jimeng → Seedream
Quality Presets
| Preset | Google imageSize | OpenAI size | OpenRouter size | Replicate resolution | Use case |
|---|---|---|---|---|---|
| 1K | 1024px | 1K | 1K | Quick previews |
(default) | 2K | 2048px | 2K | 2K | Covers, illustrations, infographics |
Google/OpenRouter
imageSize can be overridden with --imageSize 1K|2K|4K.
Aspect Ratios
Supported:
1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1.
- Google multimodal:
imageConfig.aspectRatio - OpenAI: closest supported size
- OpenRouter:
; if onlyimageGenerationOptions.aspect_ratio
is given, the ratio is inferred--size <WxH> - Replicate: behavior is model-specific —
usesgoogle/nano-banana*
,aspect_ratio
uses documented Replicate ratios, Wan 2.7 mapsbytedance/seedream-*
to a concrete--arsize - MiniMax: official
values; ifaspect_ratio
is given without--size <WxH>
, sends--ar
/width
forheightimage-01
Generation Mode
Default: sequential. Batch parallel: enabled automatically when
--batchfile contains 2+ pending tasks.
| Situation | Prefer | Why |
|---|---|---|
| One image, or 1-2 simple images | Sequential | Lower coordination overhead, easier debugging |
| Multiple images with saved prompt files | Batch () | Reuses finalized prompts, applies shared throttling/retries, predictable throughput |
| Each image still needs its own reasoning / prompt writing / style exploration | Subagents | Work is still exploratory, each needs independent analysis |
Input is + (e.g. from ) | Batch — use to assemble the payload | The outline + prompt files already contain everything needed |
Rule of thumb: once prompt files are saved and the task is "generate all of these", prefer batch over subagents. Use subagents only when generation is coupled with per-image thinking or divergent creative exploration.
Parallel behavior:
- Default worker count is automatic, capped by config, built-in default 10
- Provider-specific throttling applies only in batch mode; defaults are tuned for throughput while avoiding RPM bursts
- Override with
--jobs <count> - Each image retries up to 3 attempts
- Final output includes success count, failure count, and per-image failure reasons
Error Handling
- Missing API key → error with setup instructions
- Generation failure → auto-retry up to 3 attempts per image
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint
References
| File | Content |
|---|---|
| Extended CLI examples across providers and batch mode |
| DashScope families, sizes, limits |
| Z.AI GLM-image / cogview-4 |
| MiniMax image-01 + subject reference |
| OpenRouter multimodal flow |
| Replicate supported families + guardrails |
| EXTEND.md schema |
| First-time setup flow |
Extension Support
Custom configurations via EXTEND.md. See Step 0 for paths and schema.