Skills image-gen
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/0xfango/marswave-image-gen" ~/.claude/skills/clawdbot-skills-image-gen && rm -rf "$T"
skills/0xfango/marswave-image-gen/SKILL.mdWhen to Use
- User wants to generate an AI image from a text description
- User says "generate image", "draw", "create picture", "配图"
- User says "生成图片", "画一张", "AI图"
- User needs a cover image, illustration, or concept art
When NOT to Use
- User wants to create audio content (use
,/podcast
)/speech - User wants to create a video (use
)/explainer - User wants to edit an existing image (not supported)
- User wants to extract content from a URL (use
)/content-parser
Purpose
Generate AI images using the Labnana API. Supports text prompts with optional reference images, multiple resolutions, and aspect ratios. Images are saved as local files.
Hard Constraints
- No shell scripts. Construct curl commands from the API reference files listed in Resources
- Always read
for API key and headersshared/authentication.md - Follow
for error handlingshared/common-patterns.md - Image generation uses a different base URL:
https://api.labnana.com/openapi/v1 - Always read config following
before any interactionshared/config-pattern.md - Output saved to
— never.listenhub/image-gen/YYYY-MM-DD-{jobId}/~/Downloads/
Step -1: API Key Check
Follow
shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.
Step 0: Config Setup
Follow
shared/config-pattern.md Step 0.
If file doesn't exist — ask location, then create immediately:
mkdir -p ".listenhub/image-gen" echo '{"outputDir":".listenhub","outputMode":"inline"}' > ".listenhub/image-gen/config.json" CONFIG_PATH=".listenhub/image-gen/config.json" # (or $HOME/.listenhub/image-gen/config.json for global)
Then run Setup Flow below.
If file exists — read config, display summary, and confirm:
当前配置 (image-gen): 输出方式:{inline / download / both}
Ask: "使用已保存的配置?" → 确认,直接继续 / 重新配置
Setup Flow (first run or reconfigure)
- outputMode: Follow
§ Setup Flow Question.shared/output-mode.md
Save immediately:
# Follow shared/output-mode.md § Save to Config NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}') echo "$NEW_CONFIG" > "$CONFIG_PATH" CONFIG=$(cat "$CONFIG_PATH")
Interaction Flow
Step 1: Image Description
Free text input. Ask the user:
Describe the image you want to generate.
If the prompt is very short (< 10 words) and the user hasn't asked for verbatim generation, offer to help enrich the prompt. Otherwise, use as-is.
Step 2: Model
Ask:
Question: "Which model?" Options: - "pro (recommended)" — gemini-3-pro-image-preview, higher quality - "flash" — gemini-3.1-flash-image-preview, faster and cheaper, unlocks extreme aspect ratios (1:4, 4:1, 1:8, 8:1)
Step 3: Resolution and Aspect Ratio
Ask both together (independent parameters):
Question: "What resolution?" Options: - "1K" — Standard quality - "2K (recommended)" — High quality, good balance - "4K" — Ultra high quality, slower generation
Question: "What aspect ratio?" Options (all models): - "16:9" — Landscape, widescreen - "1:1" — Square - "9:16" — Portrait, phone screen - "Other" — 2:3, 3:2, 3:4, 4:3, 21:9
If flash model was selected, also offer:
1:4 (narrow portrait), 4:1 (wide landscape), 1:8 (extreme portrait), 8:1 (panoramic)
Step 4: Reference Images (optional)
Question: "Any reference images for style guidance?" Options: - "Yes, I have URL(s)" — Provide reference image URLs - "No references" — Generate from prompt only
If yes, collect URLs (comma-separated, max 14). For each URL, infer mimeType from suffix and build:
{ "fileData": { "fileUri": "<url>", "mimeType": "<inferred>" } }
Suffix mapping:
.jpg/.jpeg → image/jpeg, .png → image/png, .webp → image/webp, .gif → image/gif
Step 5: Confirm & Generate
Summarize all choices:
Ready to generate image: Prompt: {prompt text} Model: {pro / flash} Resolution: {1K / 2K / 4K} Aspect ratio: {ratio} References: {yes (N URLs) / no} Proceed?
Wait for explicit confirmation before calling the API.
Workflow
- Build request: Construct JSON with provider, model, prompt, imageConfig, and optional referenceImages
- Submit:
with timeout of 600sPOST https://api.labnana.com/openapi/v1/images/generation - Extract image: Parse base64 data from response
- Decode and present result
Read
OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
or inline
: Decode base64 to a temp file, then use the Read tool.both
JOB_ID=$(date +%s) echo "$BASE64_DATA" | base64 -D > /tmp/image-gen-${JOB_ID}.jpg
Then use the Read tool on
/tmp/image-gen-{jobId}.jpg. The image displays inline in the conversation.
Present:
图片已生成!
or download
: Save to the artifact directory.both
JOB_ID=$(date +%s) DATE=$(date +%Y-%m-%d) JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}" mkdir -p "$JOB_DIR" echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"
Present:
图片已生成! 已保存到 .listenhub/image-gen/{YYYY-MM-DD}-{jobId}/: {jobId}.jpg
Base64 decoding (cross-platform):
# Linux echo "$BASE64_DATA" | base64 -d > output.jpg # macOS echo "$BASE64_DATA" | base64 -D > output.jpg # or echo "$BASE64_DATA" | base64 --decode > output.jpg
Retry logic: On 429 (rate limit), wait 15 seconds and retry. Max 3 retries.
Prompt Handling
Default: Pass the user's prompt directly without modification.
When to offer optimization:
- Prompt is very short (a few words) AND user hasn't requested verbatim
- Ask: "Would you like help enriching the prompt with style/lighting/composition details?"
When to never modify:
- Long, detailed, or structured prompts — treat the user as experienced
- User says "use this prompt exactly"
Optimization techniques (if user agrees):
- Style: "cyberpunk" → add "neon lights, futuristic, dystopian"
- Scene: time of day, lighting, weather
- Quality: "highly detailed", "8K quality", "cinematic composition"
- Always use English keywords (models trained on English)
- Show optimized prompt before submitting
API Reference
- Image generation:
shared/api-image.md - Error handling:
§ Error Handlingshared/common-patterns.md
Composability
- Invokes: nothing (direct API call)
- Invoked by: platform skills for cover images (Phase 2)
Example
User: "Generate an image: cyberpunk city at night"
Agent workflow:
- Prompt is short → offer enrichment → user declines
- Ask model → "pro"
- Ask resolution → "2K"
- Ask ratio → "16:9"
- No references
RESPONSE=$(curl -sS -X POST "https://api.labnana.com/openapi/v1/images/generation" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" \ -H "Content-Type: application/json" \ --max-time 600 \ -d '{ "provider": "google", "model": "gemini-3-pro-image-preview", "prompt": "cyberpunk city at night", "imageConfig": {"imageSize": "2K", "aspectRatio": "16:9"} }') BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data') JOB_ID=$(date +%s) DATE=$(date +%Y-%m-%d) JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}" mkdir -p "$JOB_DIR" echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"
Decode the base64 data per
outputMode (see shared/output-mode.md).