Skills IMA Studio

install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/allenfancy-gan/ima-all-ai" ~/.claude/skills/openclaw-skills-ima-studio && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/allenfancy-gan/ima-all-ai" ~/.openclaw/skills/openclaw-skills-ima-studio && rm -rf "$T"
manifest: skills/allenfancy-gan/ima-all-ai/SKILL.md
source content

IMA AI Creation

⚠️ 重要:模型 ID 参考

CRITICAL: When calling the script, you MUST use the exact model_id (second column), NOT the friendly model name. Do NOT infer model_id from the friendly name (e.g., ❌

nano-banana-pro
is WRONG; ✅
gemini-3-pro-image
is CORRECT).

Quick Reference Table:

图像模型 (Image Models)

友好名称 (Friendly Name)model_id说明 (Notes)
Nano Banana2
gemini-3.1-flash-image
❌ NOT nano-banana-2, 预算选择 4-13 pts
Nano Banana Pro
gemini-3-pro-image
❌ NOT nano-banana-pro, 高质量 10-18 pts
SeeDream 4.5
doubao-seedream-4.5
✅ Recommended default, 5 pts
Midjourney
midjourney
✅ Same as friendly name, 8-10 pts

视频模型 (Video Models)

友好名称 (Friendly Name)model_id (t2v)model_id (i2v)说明 (Notes)
Wan 2.6
wan2.6-t2v
wan2.6-i2v
⚠️ Note -t2v/-i2v suffix
IMA Video Pro (Sevio 1.0)
ima-pro
ima-pro
✅ IMA native quality model
IMA Video Pro Fast (Sevio 1.0-Fast)
ima-pro-fast
ima-pro-fast
✅ IMA native low-latency model
Kling O1
kling-video-o1
kling-video-o1
⚠️ Note video- prefix
Kling 2.6
kling-v2-6
kling-v2-6
⚠️ Note v prefix
Hailuo 2.3
MiniMax-Hailuo-2.3
MiniMax-Hailuo-2.3
⚠️ Note MiniMax- prefix
Hailuo 2.0
MiniMax-Hailuo-02
MiniMax-Hailuo-02
⚠️ Note 02 not 2.0
Google Veo 3.1
veo-3.1-generate-preview
veo-3.1-generate-preview
⚠️ Note -generate-preview suffix
Sora 2 Pro
sora-2-pro
sora-2-pro
✅ Straightforward
Pixverse
pixverse
pixverse
✅ Same as friendly name

音乐模型 (Music Models)

友好名称 (Friendly Name)model_id说明 (Notes)
Suno (sonic v4)
sonic
⚠️ Simplified to sonic
DouBao BGM
GenBGM
❌ NOT doubao-bgm
DouBao Song
GenSong
❌ NOT doubao-song

语音模型 (Speech/TTS Models)

友好名称 (Friendly Name)model_id说明 (Notes)
seed-tts-2.0
seed-tts-2.0
✅ Same as friendly name (default)

How to get the correct model_id:

  1. Check this table first
  2. Use
    --list-models --task-type <type>
    to query available models
  3. Refer to command examples in this SKILL.md

Runtime truth source:

GET /open/v1/product/list
(or
--list-models
).
Any table in this document is guidance; actual availability depends on current product list.

Example:

# ❌ WRONG: Inferring from friendly name
--model-id nano-banana-pro

# ✅ CORRECT: Using exact model_id from table
--model-id gemini-3-pro-image

📚 Optional Knowledge Enhancement (ima-knowledge-ai)

This skill is fully runnable as a standalone package. If

ima-knowledge-ai
is installed, the agent may read its references for workflow decomposition and consistency guidance.

Recommended optional reads:

  1. Check for workflow complexity — Read

    ima-knowledge-ai/references/workflow-design.md
    if:

    • User mentions: "MV"、"宣传片"、"完整作品"、"配乐"、"soundtrack"
    • Task spans multiple media types (image + video, video + music, etc.)
    • Complex multi-step workflows that need task decomposition
  2. Check for visual consistency needs — Read

    ima-knowledge-ai/references/visual-consistency.md
    if:

    • User mentions: "系列"、"多张"、"同一个"、"角色"、"续"、"series"、"same"
    • Task involves: multiple images/videos, character continuity, product shots
    • Second+ request about same subject (e.g., "旺财在游泳" after "生成旺财照片")
  3. Check video modes — Read

    ima-knowledge-ai/references/video-modes.md
    if:

    • Any video generation task
    • Need to understand: image_to_video vs reference_image_to_video difference
  4. Check model selection — Read

    ima-knowledge-ai/references/model-selection.md
    if:

    • Unsure which model to use
    • Need cost/quality trade-off guidance
    • User specifies budget or quality requirements

Why this matters:

  • Multi-media workflows need proper task sequencing (e.g., video duration → matching music duration)
  • AI generation defaults to 独立生成 each time — without reference images, results will be inconsistent
  • Wrong video mode = wrong result (image_to_video ≠ reference_image_to_video)
  • Model choice affects cost and quality significantly

Example multi-media workflow:

User: "帮我做个产品宣传MV,有背景音乐,主角是旺财小狗"

❌ Wrong: 
  1. Generate dog image (random look)
  2. Generate video (different dog)
  3. Generate music (unrelated)

✅ Right:
  1. Read workflow-design.md + visual-consistency.md
  2. Generate Master Reference: 旺财小狗图片
  3. Generate video shots using image_to_video with 旺财 as first frame
  4. Get video duration (e.g., 15s)
  5. Generate BGM with matching duration and mood

How to check:

# Step 0: Determine media type first (image / video / music / speech)
# From user request: "画"/"生成图"/"image" → image; "视频"/"video" → video; "音乐"/"歌"/"music"/"BGM" → music; "语音"/"朗读"/"TTS"/"speech" → speech
# Then choose task_type and model from the corresponding section (image: text_to_image/image_to_image; video: text_to_video/...; music: text_to_music; speech: text_to_speech)

# Step 1: Read knowledge base based on task type
if multi_media_workflow:
    read("~/.openclaw/skills/ima-knowledge-ai/references/workflow-design.md")

if "same subject" or "series" or "character":
    read("~/.openclaw/skills/ima-knowledge-ai/references/visual-consistency.md")

if video_generation:
    read("~/.openclaw/skills/ima-knowledge-ai/references/video-modes.md")

# Step 2: Execute with proper sequencing and reference images
# (see workflow-design.md for specific patterns)

No exceptions — for simple single-media requests, you can proceed directly. For complex multi-media workflows, read the knowledge base first.


📥 User Input Parsing (Media Type & Task Routing)

Purpose: So that any agent parses user intent consistently, first determine the media type from the user's request, then choose task_type and model.

1. User phrasing → media type (do this first)

User intent / keywordsMedia typetask_type examples
画 / 生成图 / 图片 / image / 画一张 / 图生图image
text_to_image
,
image_to_image
视频 / 生成视频 / video / 图生视频 / 文生视频video
text_to_video
,
image_to_video
,
first_last_frame_to_video
,
reference_image_to_video
音乐 / 歌 / BGM / 背景音乐 / music / 作曲music
text_to_music
语音 / 朗读 / TTS / 语音合成 / 配音 / speech / read aloud / text-to-speechspeech
text_to_speech

If the request mixes media (e.g. "宣传片+配乐"), treat as multi-media workflow: read

workflow-design.md
, then plan image → video → music steps and use the correct task_type for each step.

2. Model and parameter parsing

  • Image: For model name → model_id and size/aspect_ratio parsing, follow the same rules as in ima-image-ai skill (User Input Parsing section).

  • Video: For task_type (t2v / i2v / first_last / reference), model alias → model_id, and duration/resolution/aspect_ratio, follow ima-video-ai skill (User Input Parsing section).
    Sevio alias normalization in

    ima-all-ai
    :

    • Ima Sevio 1.0
      ima-pro
    • Ima Sevio 1.0-Fast
      /
      Ima Sevio 1.0 Fast
      ima-pro-fast
      Routing rule:
    • Normalize alias first
    • Then resolve against runtime product list for the selected
      task_type
    • If model is absent in current category, return available model_ids from
      --list-models
  • Music: Suno (

    sonic
    ) vs DouBao BGM/Song — infer from "BGM"/"背景音乐" → BGM; "带歌词"/"人声" → Suno or Song. Use model_id
    sonic
    ,
    GenBGM
    ,
    GenSong
    per "Recommended Defaults" and "Music Generation" tables below.

  • Speech (TTS): Get model_id from

    GET /open/v1/product/list?category=text_to_speech
    or run script with
    --task-type text_to_speech --list-models
    . Map user intent to parameters using product
    form_config
    :

    User intent / phrasingParameter (if in form_config)Notes
    女声 / 女声朗读 / female voicevoice_id / voice_typeUse value from form_config options
    男声 / 男声朗读 / male voicevoice_id / voice_typeUse value from form_config options
    语速快/慢 / speed up/slowspeede.g. 0.8–1.2
    音调 / pitchpitchIf supported
    大声/小声 / volumevolumeIf supported

    If the user does not specify, use form_config defaults. Pass extra params via

    --extra-params '{"speed":1.0}'
    . Only send parameters present in the product’s credit_rules/attributes or form_config (script reflection strips others on retry).


⚙️ How This Skill Works

For transparency: This skill uses a bundled Python script (

scripts/ima_create.py
) to call the IMA Open API. The script:

  • Sends your prompt to two IMA-owned domains (see "Network Endpoints" below)
  • Uses
    --user-id
    only locally as a key for storing your model preferences
  • Returns image/video/music URLs when generation is complete

What gets sent to IMA servers:

  • ✅ Your prompt/description (image/video/music)
  • ✅ Model selection (SeeDream/Wan/Suno/etc.)
  • ✅ Generation parameters (size, duration, style, etc.)
  • ❌ NO API key in prompts (key is used for authentication only)
  • ❌ NO user_id (it's only used locally)

What's stored locally:

  • ~/.openclaw/memory/ima_prefs.json
    - Your model preferences (< 1 KB)
  • ~/.openclaw/logs/ima_skills/
    - Generation logs (auto-deleted after 7 days)

🌐 Network Endpoints Used

DomainOwnerPurposeData SentPrivacy
api.imastudio.com
IMA StudioMain API (product list, task creation, task polling)Prompts, model IDs, generation params, your API keyStandard HTTPS, data processed for AI generation
imapi.liveme.com
IMA StudioImage/Video upload service (presigned URL generation)Your API key, file metadata (MIME type, extension)Standard HTTPS, used for image/video tasks only
*.aliyuncs.com
,
*.esxscloud.com
Alibaba Cloud (OSS)Image/video storage (file upload, CDN delivery)Raw image/video bytes (via presigned URL, NO API key)IMA-managed OSS buckets, presigned URLs expire after 7 days

Key Points:

  • Music tasks (
    text_to_music
    ) and TTS tasks (
    text_to_speech
    ) only use
    api.imastudio.com
    .
  • Image/video tasks require
    imapi.liveme.com
    to obtain presigned URLs for uploading input images.
  • Your API key is sent to both
    api.imastudio.com
    and
    imapi.liveme.com
    (both owned by IMA Studio).
  • Verify network calls:
    tcpdump -i any -n 'host api.imastudio.com or host imapi.liveme.com'
    . See this document: 🌐 Network Endpoints Used and ⚠️ Credential Security Notice for full disclosure.

⚠️ Credential Security Notice

Your API key is sent to both IMA-owned domains:

  1. Authorization: Bearer ima_xxx...
    api.imastudio.com
    (main API)
  2. Query param
    appUid=ima_xxx...
    imapi.liveme.com
    (upload service)

Security best practices:

  • 🧪 Use test keys for experiments: Generate a separate API key for testing.
  • 🔍 Monitor usage: Check
    https://imastudio.com/dashboard
    for unauthorized activity.
  • ⏱️ Rotate keys: Regenerate your API key periodically (monthly recommended).
  • 📊 Review logs: Check
    ~/.openclaw/logs/ima_skills/
    for unexpected API calls.

Why two domains? IMA Studio uses a microservices architecture:

  • api.imastudio.com
    : Core AI generation API
  • imapi.liveme.com
    : Specialized image/video upload service (shared infrastructure)

Both domains are operated by IMA Studio. The same API key grants access to both services.

Agent Execution (Internal Reference)

Note for users: You can review the script source at

scripts/ima_create.py
anytime.
The agent uses this script to simplify API calls. Music tasks use only
api.imastudio.com
, while image/video tasks also call
imapi.liveme.com
for file uploads (see "Network Endpoints" above).

Use the bundled script internally for all task types — it ensures correct parameter construction:

# ─── Image Generation ──────────────────────────────────────────────────────────

# Basic text-to-image (default model)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_image \
  --model-id doubao-seedream-4.5 --prompt "a cute puppy on grass, photorealistic" \
  --user-id {user_id} --output-json

# Text-to-image with size override (Nano Banana2)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_image \
  --model-id gemini-3.1-flash-image --prompt "city skyline at sunset, 4K" \
  --size 2k --user-id {user_id} --output-json

# Image-to-image with input URL
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type image_to_image \
  --model-id doubao-seedream-4.5 --prompt "turn into oil painting style" \
  --input-images https://example.com/photo.jpg --user-id {user_id} --output-json

# ─── Video Generation ──────────────────────────────────────────────────────────

# Basic text-to-video (default model, 5s 720P)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_video \
  --model-id wan2.6-t2v --prompt "a puppy dancing happily, cinematic" \
  --user-id {user_id} --output-json

# Text-to-video with extra params (10s 1080P)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_video \
  --model-id wan2.6-t2v --prompt "dramatic ocean waves, sunset" \
  --extra-params '{"duration":10,"resolution":"1080P","aspect_ratio":"16:9"}' \
  --user-id {user_id} --output-json

# Image-to-video (animate static image)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type image_to_video \
  --model-id wan2.6-i2v --prompt "camera slowly zooms in, gentle movement" \
  --input-images https://example.com/photo.jpg --user-id {user_id} --output-json

# First-last frame video (two images)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type first_last_frame_to_video \
  --model-id kling-video-o1 --prompt "smooth transition between frames" \
  --input-images https://example.com/frame1.jpg https://example.com/frame2.jpg \
  --user-id {user_id} --output-json

# ─── Music Generation ──────────────────────────────────────────────────────────

# Basic text-to-music (Suno default)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_music \
  --model-id sonic --prompt "upbeat electronic music, 120 BPM, no vocals" \
  --user-id {user_id} --output-json

# Music with custom lyrics (Suno custom mode)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_music \
  --model-id sonic --prompt "pop ballad, emotional" \
  --extra-params '{"custom_mode":true,"lyrics":"Your custom lyrics here...","vocal_gender":"female"}' \
  --user-id {user_id} --output-json

# Background music (DouBao BGM)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_music \
  --model-id GenBGM --prompt "relaxing ambient music for meditation" \
  --user-id {user_id} --output-json

# ─── Text-to-Speech (TTS) ─────────────────────────────────────────────────────

# List TTS models first to get model_id, then generate speech
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_speech --list-models

# TTS: use model_id from list above (prompt = text to speak)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_speech \
  --model-id <model_id from list> --prompt "Text to be spoken here." \
  --user-id {user_id} --output-json

The script outputs JSON with

url
,
model_name
,
credit
— use these values in the UX protocol messages below. The script internals (product list query, parameter construction, polling) are invisible to users.


Overview

Call IMA Open API to create AI-generated content. All endpoints require an

ima_*
API key. The core flow is: query products → create task → poll until done.


🔒 Security & Transparency Policy

This skill is community-maintained and open for inspection.

✅ What Users CAN Do

Full transparency:

  • Review all source code: Check
    scripts/ima_create.py
    and
    ima_logger.py
    anytime
  • Verify network calls: Music tasks use
    api.imastudio.com
    only; image/video tasks also use
    imapi.liveme.com
    (see "Network Endpoints" section)
  • Inspect local data: View
    ~/.openclaw/memory/ima_prefs.json
    and log files
  • Control privacy: Delete preferences/logs anytime, or disable file writes (see below)

Configuration allowed:

  • Set API key in environment or agent config:
    • Environment variable:
      export IMA_API_KEY=ima_your_key_here
    • OpenClaw/MCP config: Add
      IMA_API_KEY
      to agent's environment configuration
    • Get your key at: https://imastudio.com
  • Use scoped/test keys: Test with limited API keys, rotate after testing
  • Disable file writes: Make prefs/logs read-only or symlink to
    /dev/null

Data control:

  • View stored data:
    cat ~/.openclaw/memory/ima_prefs.json
  • Delete preferences:
    rm ~/.openclaw/memory/ima_prefs.json
    (resets to defaults)
  • Delete logs:
    rm -rf ~/.openclaw/logs/ima_skills/
    (auto-cleanup after 7 days anyway)

⚠️ Advanced Users: Fork & Modify

If you need to modify this skill for your use case:

  1. Fork the repository (don't modify the original)
  2. Update your fork with your changes
  3. Test thoroughly with limited API keys
  4. Document your changes for troubleshooting

Note: Modified skills may break API compatibility or introduce security issues. Official support only covers the unmodified version.

❌ What to AVOID (Security Risks)

Actions that could compromise security:

  • ❌ Sharing API keys publicly or in skill files
  • ❌ Modifying API endpoints to unknown servers
  • ❌ Disabling SSL/TLS certificate verification
  • ❌ Logging sensitive user data (prompts, IDs, etc.)
  • ❌ Bypassing authentication or billing mechanisms

Why this matters:

  1. API Compatibility: Skill logic aligns with IMA Open API schema
  2. Security: Malicious modifications could leak credentials or bypass billing
  3. Support: Modified skills may not be supported
  4. Community: Breaking changes affect all users

📋 Privacy & Data Handling Summary

What this skill does with your data:

Data TypeSent to IMA?Stored Locally?User Control
Prompts (image/video/music)✅ Yes (required for generation)❌ NoNone (required)
API key✅ Yes (authentication header)❌ NoSet via env var
user_id (optional CLI arg)Never (local preference key only)✅ Yes (as prefs file key)Change
--user-id
value
Model preferences❌ No✅ Yes (~/.openclaw)Delete anytime
Generation logs❌ No✅ Yes (~/.openclaw)Auto-cleanup 7 days

Privacy recommendations:

  1. Use test/scoped API keys for initial testing
  2. Note:
    --user-id
    is never sent to IMA servers - it's only used locally as a key for storing preferences in
    ~/.openclaw/memory/ima_prefs.json
  3. Review source code at
    scripts/ima_create.py
    to verify network calls (search for
    create_task
    function)
  4. Rotate API keys after testing or if compromised

Get your IMA API key: Visit https://imastudio.com to register and get started.

🔧 For Skill Maintainers Only

Version control:

  • All changes must go through Git with proper version bumps (semver)
  • CHANGELOG.md must document all changes
  • Production deployments require code review

File checksums (optional):

# Verify skill integrity
sha256sum SKILL.md scripts/ima_create.py

If users report issues, verify file integrity first.


🧠 User Preference Memory (Image)

User preferences have highest priority when they exist. But preferences are only saved when users explicitly express model preferences — not from automatic model selection.

Storage:
~/.openclaw/memory/ima_prefs.json

Single file, shared across all IMA skills:

{
  "user_{user_id}": {
    "text_to_image":  { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5,  "last_used": "2026-02-27T03:07:27Z" },
    "image_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5,  "last_used": "2026-02-27T03:07:27Z" },
    "text_to_speech": { "model_id": "<from product list>", "model_name": "...", "credit": 2, "last_used": "..." }
  }
}

Model Selection Flow (Image Generation)

Step 1: Get knowledge-ai recommendation (if installed)

knowledge_recommended_model = read_ima_knowledge_ai()  # e.g., "SeeDream 4.5"

Step 2: Check user preference

user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type)  # e.g., {"model_id": "midjourney", ...}

Step 3: Decide which model to use

if user_pref exists:
    use_model = user_pref["model_id"]  # Highest priority
else:
    use_model = knowledge_recommended_model or fallback_default

Step 4: Check for mismatch (for later hint)

if user_pref exists and knowledge_recommended_model != user_pref["model_id"]:
    mismatch = True  # Will add hint in success message

When to Write (User Explicit Preference ONLY)

✅ Save preference when user explicitly specifies a model:

User saysAction
用XXX
/
换成XXX
/
改用XXX
Switch to model XXX + save as preference
以后都用XXX
/
默认用XXX
/
always use XXX
Save + confirm:
✅ 已记住!以后图片生成默认用 [XXX]
我喜欢XXX
/
我更喜欢XXX
Save as preference

❌ Do NOT save when:

  • Agent auto-selects from knowledge-ai → not user preference
  • Agent uses fallback default → not user preference
  • User says generic quality requests (see "Clear Preference" below) → clear preference instead

When to Clear (User Abandons Preference)

🗑️ Clear preference when user wants automatic selection:

User saysAction
用最好的
/
用最合适的
/
best
/
recommended
Clear pref + use knowledge-ai recommendation
推荐一个
/
你选一个
/
自动选择
Clear pref + use knowledge-ai recommendation
用默认的
/
用新的
Clear pref + use knowledge-ai recommendation
试试别的
/
换个试试
(without specific model)
Clear pref + use knowledge-ai recommendation
重新推荐
Clear pref + use knowledge-ai recommendation

Implementation:

del prefs[f"user_{user_id}"][task_type]
save_prefs(prefs)

⭐ Model Selection Priority (Image)

Selection flow:

  1. User preference (if exists) → Highest priority, always respect
  2. ima-knowledge-ai skill (if installed) → Professional recommendation based on task
  3. Fallback defaults → Use table below (only if neither 1 nor 2 exists)

Important notes:

  • User preference is only saved when user explicitly specifies a model (see "When to Write" above)
  • Knowledge-ai is always consulted (even when user pref exists) to detect mismatches
  • When mismatch detected → add gentle hint in success message (does NOT interrupt generation)

The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations.

When using user preference for image generation, show a line like:

🎨 根据你的使用习惯,将用 [Model Name] 帮你生成…
• 模型:[Model Name](你的常用模型)
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]

Preference Change Confirmation

When user switches to a different model than their saved preference:

💡 你之前喜欢用 [Old Model],这次换成了 [New Model]。
要把 [New Model] 设为以后的默认吗?
回复「是」保存 / 回复「否」仅本次使用

⭐ Recommended Defaults

These are fallback defaults — only used when no user preference exists.
Always default to the newest and most popular model. Do NOT default to the cheapest.

Task TypeDefault Modelmodel_idversion_idCostWhy
text_to_imageSeeDream 4.5
doubao-seedream-4.5
doubao-seedream-4-5-251128
5 ptsLatest doubao flagship, photorealistic 4K
text_to_image (budget)Nano Banana2
gemini-3.1-flash-image
gemini-3.1-flash-image
4 ptsFastest and cheapest option
text_to_image (premium)Nano Banana Pro
gemini-3-pro-image
gemini-3-pro-image-preview
10/10/18 ptsPremium quality, 1K/2K/4K options
text_to_image (artistic)Midjourney 🎨
midjourney
v6
8/10 ptsArtist-level aesthetics, creative styles
image_to_imageSeeDream 4.5
doubao-seedream-4.5
doubao-seedream-4-5-251128
5 ptsLatest, best i2i quality
image_to_image (budget)Nano Banana2
gemini-3.1-flash-image
gemini-3.1-flash-image
4 ptsCheapest option
image_to_image (premium)Nano Banana Pro
gemini-3-pro-image
gemini-3-pro-image-preview
10 ptsPremium quality
image_to_image (artistic)Midjourney 🎨
midjourney
v6
8/10 ptsArtist-level aesthetics, style transfer
text_to_videoWan 2.6
wan2.6-t2v
wan2.6-t2v
25 pts🔥 Most popular t2v, balanced cost
text_to_video (premium)Hailuo 2.3
MiniMax-Hailuo-2.3
MiniMax-Hailuo-2.3
38 ptsHigher quality
text_to_video (budget)Vidu Q2
viduq2
viduq2
5 ptsLowest cost t2v
image_to_videoWan 2.6
wan2.6-i2v
wan2.6-i2v
25 pts🔥 Most popular i2v, 1080P
image_to_video (premium)Kling 2.6
kling-v2-6
kling-v2-6
40-160 ptsPremium Kling i2v
first_last_frame_to_videoKling O1
kling-video-o1
kling-video-o1
48 ptsNewest Kling reasoning model
reference_image_to_videoKling O1
kling-video-o1
kling-video-o1
48 ptsBest reference fidelity
text_to_musicSuno (sonic-v4)
sonic
sonic
25 ptsLatest Suno engine, best quality
text_to_speech(query product list)Run
--task-type text_to_speech --list-models
; use first or user-preferred model_id

Premium options:

  • Image: Nano Banana Pro — Highest quality with size control (1K/2K/4K), higher cost (10-18 pts for text_to_image, 10 pts for image_to_image)
  • Video: Kling O1, Sora 2 Pro, Google Veo 3.1 — Premium quality with longer duration options

Quick selection guide (production as of 2026-02-27, sorted by popularity):

  • Image (4 models available)SeeDream 4.5 (5, default); artistic → Midjourney 🎨 (8-10); budget → Nano Banana2 (4, 512px); premium → Nano Banana Pro (10-18)
  • 🔥 Video from text (most popular)Wan 2.6 (25, balanced); premium → Hailuo 2.3 (38); budget → Vidu Q2 (5)
  • 🔥 Video from image (most popular)Wan 2.6 (25)
  • Music → Suno (25); DouBao BGM/Song (30 each)
  • Cheapest → Nano Banana2 512px (4) for image; Vidu Q2 (5) for video

Selection guide by use case:

Image Generation:

  • General image generation → SeeDream 4.5 (5pts)
  • Custom aspect ratio (16:9, 9:16, 4:3, etc.)SeeDream 4.5 🌟 or Nano Banana Pro/2 🆕 (native support)
  • Budget-conscious / fast generation → Nano Banana2 (4pts)
  • Highest quality with size control (1K/2K/4K) → Nano Banana Pro (text_to_image: 10-18pts, image_to_image: 10pts)
  • Artistic/creative styles, illustrations, paintingsMidjourney 🎨 (8-10pts)
  • Style transfer / image editing → SeeDream 4.5 (5pts) or Midjourney 🎨 (artistic)

Video Generation:

  • General video generation → Wan 2.6 (25pts, most popular)
  • Premium cinematic quality → Google Veo 3.1 (70-330pts) or Sora 2 Pro (122+pts)
  • Budget video → Vidu Q2 (5pts) or Hailuo 2.0 (5pts)
  • With audio support → Kling O1 (48+pts) or Google Veo 3.1 (70+pts)
  • First/last frame animation → Kling O1 (48+pts)
  • Reference image consistency → Kling O1 (48+pts) or Google Veo 3.1 (70+pts)

Music Generation:

  • Custom song with lyrics, vocals, styleSuno sonic-v5 (25pts, default, ~2min)
    • Full control: custom_mode, lyrics, vocal_gender, tags, negative_tags
    • Best for: complete songs, vocal tracks, artistic compositions
  • Background music / ambient loopDouBao BGM (30pts, ~30s)
    • Simplified: prompt-only, no advanced parameters
    • Best for: video backgrounds, ambient music, short loops
  • Simple song generationDouBao Song (30pts, ~30s)
    • Simplified: prompt-only
    • Best for: quick song generation, structured vocal compositions
  • User explicitly asks for cheapest → DouBao BGM/Song (6pts each) — only if explicitly requested

Speech (TTS) Generation:

  • Text-to-speech / 语音合成 / 朗读
    text_to_speech
    . Always query
    GET /open/v1/product/list?category=text_to_speech
    (or
    --list-models
    ) to get current model_id and credit. No fixed default; use first available or user preference. Voice/speed/format parameters: see "Model and parameter parsing" (TTS table) and "Speech (TTS) — text_to_speech" in this document.

⚠️ Technical Note for Suno:

model_version
inside
parameters.parameters
(e.g.,
"sonic-v5"
) is different from the outer
model_version
field (which is
"sonic"
). Always set both correctly when creating Suno tasks.

⚠️ Production Image Models (4 available):

  • SeeDream 4.5 (
    doubao-seedream-4.5
    ) — 5 pts, default
  • Midjourney 🎨 (
    midjourney
    ) — 8/10 pts for 480p/720p, artistic styles
  • Nano Banana2 (
    gemini-3.1-flash-image
    ) — 4/6/10/13 pts for 512px/1K/2K/4K
  • Nano Banana Pro (
    gemini-3-pro-image
    ) — 10/10/18 pts for 1K/2K/4K

All other image models mentioned in older documentation are no longer available in production.

🌟 Parameter Support Notes (All Task Types):

Image Models (text_to_image / image_to_image)

🆕 MAJOR UPDATE: Nano Banana series now has NATIVE aspect_ratio support!

  • Nano Banana Pro: ✅ Supports
    aspect_ratio
    (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELY
  • Nano Banana2: ✅ Supports
    aspect_ratio
    (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELY
  • SeeDream 4.5: ✅ Supports 8 ratios via virtual params (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9)
  • Midjourney: ❌ 1:1 only (fixed 1024x1024)

aspect_ratio support details:

  • aspect_ratio:
    • SeeDream 4.5: ✅ Supports 8 ratios via virtual params (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9)
    • Nano Banana2: ✅ Native support for 5 ratios (1:1, 16:9, 9:16, 4:3, 3:4)
    • Nano Banana Pro: ✅ Native support for 5 ratios (1:1, 16:9, 9:16, 4:3, 3:4)
    • Midjourney: ❌ 1:1 only (fixed 1024x1024)
  • size:
    • Nano Banana2: 512px, 1K, 2K, 4K (via different
      attribute_id
      s, 4-13 pts)
    • Nano Banana Pro: 1K, 2K, 4K (via different
      attribute_id
      s, 10-18 pts)
    • SeeDream 4.5: Adaptive default (5 pts)
    • Midjourney: 480p/720p (via
      attribute_id
      , 8/10 pts)
  • 8K: No model supports 8K (max is 4K via Nano Banana Pro)
  • Non-standard aspect ratios (7:3, 8:5, etc.): Not supported. Use closest supported ratio or video models.
  • n: Multiple outputs supported (1-4), credit × n

When user requests unsupported combinations for images:

  • Midjourney + aspect_ratio (16:9, etc.): Recommend SeeDream 4.5 or Nano Banana series instead
    ❌ Midjourney 暂不支持自定义 aspect_ratio(仅支持 1024x1024 方形)
    
    ✅ 推荐方案:
      1. SeeDream 4.5(支持虚拟参数 aspect_ratio)
         • 支持比例:1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9
         • 成本:5 积分(性价比最佳)
      2. Nano Banana Pro/2(原生支持 aspect_ratio)
         • 支持比例:1:1, 16:9, 9:16, 4:3, 3:4
         • 成本:4-18 积分(按尺寸)
    
    需要我帮你用 SeeDream 4.5 生成吗?
    
  • Any model + 8K: Inform user no model supports 8K, max is 4K (Nano Banana Pro)
  • Any model + non-standard ratio (7:3, 8:5, etc.): Non-standard ratio, not supported. Suggest closest supported ratio (e.g., 21:9 for ultra-wide, 2:3 for portrait)

Video Models (text_to_video / image_to_video / first_last_frame / reference_image)

  • resolution: 540P, 720P, 1080P, 2K, 4K (model-dependent, higher res = higher cost)
  • aspect_ratio: 16:9, 9:16, 1:1, 4:3 (model-dependent, check
    form_config
    )
  • duration: 4s, 5s, 10s, 15s (model-dependent, longer = higher cost)
  • ⚠️ generate_audio: Supported by Veo 3.1, Kling O1, Hailuo (check
    form_config
    )
  • prompt_extend: AI-powered prompt enhancement (most models support)
  • negative_prompt: Content exclusion (most models support)
  • shot_type: Single/multi-shot control (model-dependent)
  • seed: Reproducibility control (most models support, -1 = random)
  • n: Multiple outputs (1-4), credit × n

🆕 Special Case: Pixverse Model Parameter (v1.0.7+)

Auto-Inference Logic for Pixverse V5.5/V5/V4:

  • Problem: Pixverse V5.5, V5, V4 lack
    model
    field in
    form_config
    from Product List API
  • Backend Requirement: Backend requires
    model
    parameter (e.g.,
    "v5.5"
    ,
    "v5"
    ,
    "v4"
    )
  • Auto-Fix: System automatically extracts version from
    model_name
    and injects it
    • Example:
      model_name: "Pixverse V5.5"
      → auto-inject
      model: "v5.5"
    • Example:
      model_name: "Pixverse V4"
      → auto-inject
      model: "v4"
  • Note: V4.5 and V3.5 include
    model
    in
    form_config
    (no auto-inference needed)
  • Relevant Task Types: All video modes (text_to_video, image_to_video, first_last_frame_to_video, reference_image_to_video)

Error Prevention:

  • Without auto-inference:
    err_code=400017 err_msg=Invalid value for model
  • With auto-inference (v1.0.7+): Pixverse V5.5/V5/V4 work seamlessly ✅

Music Models (text_to_music)

Suno sonic-v5 (Full-Featured):

  • custom_mode: Suno only (enables vocal_gender, lyrics, tags support)
  • vocal_gender: Suno only (male/female/mixed, requires custom_mode=True)
  • lyrics: Suno only (custom lyrics support, requires custom_mode=True)
  • make_instrumental: Suno only (force instrumental, no vocals)
  • auto_lyrics: Suno only (AI-generated lyrics)
  • tags: Suno only (genre/style tags)
  • negative_tags: Suno only (exclude unwanted styles)
  • title: Suno only (song title)
  • duration: Fixed-length output (DouBao ~30s, Suno ~2min, not user-controllable)
  • n: Multiple outputs supported (1-2), credit × n

DouBao BGM/Song (Simplified):

  • prompt: Text description only
  • No advanced parameters (no custom_mode, lyrics, vocal control)
  • duration: Fixed ~30s output

🎵 Suno Prompt Writing Guide (for

gpt_description_prompt
):

When using Suno, structure your prompt with these elements:

  1. Genre/Style:

    • Examples:
      "lo-fi hip hop"
      ,
      "orchestral cinematic"
      ,
      "upbeat pop"
      ,
      "dark ambient"
      ,
      "indie folk"
      ,
      "electronic dance"
  2. Tempo/BPM:

    • Examples:
      "80 BPM"
      ,
      "fast tempo"
      ,
      "slow ballad"
      ,
      "moderate pace 110 BPM"
  3. Vocals Control:

    • No vocals:
      "no vocals"
      → set
      make_instrumental=true
    • With vocals:
      "female vocals"
      → set
      vocal_gender="female"
    • Male vocals:
      "male vocals"
      → set
      vocal_gender="male"
    • Mixed: Set
      vocal_gender="mixed"
  4. Mood/Emotion:

    • Examples:
      "happy and energetic"
      ,
      "melancholic"
      ,
      "tense and dramatic"
      ,
      "peaceful and calming"
  5. Negative Tags (exclude styles):

    • Use
      negative_tags
      :
      "heavy metal, distortion, screaming"
      to exclude unwanted elements
  6. Duration Hint:

    • Examples:
      "60 seconds"
      ,
      "30 second loop"
      ,
      "2 minute track"
    • Note: Suno typically generates ~2min, not strictly controllable

Example Suno prompts:

"upbeat lo-fi hip hop, 90 BPM, no vocals, relaxed and chill"
→ Set: make_instrumental=true

"emotional pop ballad, slow tempo, female vocals, melancholic"
→ Set: vocal_gender="female"

"orchestral cinematic trailer music, epic and dramatic, 120 BPM, no vocals"
→ Set: make_instrumental=true, tags="orchestral,cinematic,epic"

"acoustic indie folk, gentle guitar, male vocals, warm and nostalgic"
→ Set: vocal_gender="male", tags="acoustic,indie,folk"

⚠️ Technical Note for Suno:

model_version
inside
parameters.parameters
(e.g.,
"sonic-v5"
) is different from the outer
model_version
field (which is
"sonic"
). Always set both correctly.

Common Parameter Patterns

  • n (batch generation): Supported by ALL models. Cost = base_credit × n. Creates n independent resources.
  • seed: Supported by most models (-1 = random, >0 = reproducible results)
  • prompt_extend: AI-powered prompt enhancement (video models only)

Decision Tree: When User Requests Unsupported Features

User asks for custom aspect ratio image (e.g. "7:3 landscape")
  → ❌ Image models don't support custom ratios
  → ✅ Solution: "图片模型不支持自定义比例。建议用视频模型(Wan 2.6 t2v)生成16:9视频,然后截取首帧作为图片。"

User asks for 8K image
  → ❌ No model supports 8K
  → ✅ Solution: "当前最高支持4K分辨率(Nano Banana Pro,18积分)。要使用吗?"

User asks for video with audio
  → Check model: Veo 3.1 / Kling O1 / Hailuo have generate_audio
  → ✅ Solution: "Veo 3.1 和 Kling O1 支持音频生成(需在参数中设置 generate_audio=True)。要用哪个?"

User asks for long music (e.g. "5 minute track")
  → ❌ Duration not user-controllable
  → ✅ Solution: "Suno 生成约2分钟音乐。需要更长时长可以生成多段后拼接。"

User asks for 30s video
  → Check model: Most models max 15s
  → ✅ Solution: "当前最长15秒。可选模型:Wan 2.6(15s, 75积分), Kling O1(10s, 96积分)。"

When user requests unsupported combinations:

  • Video + audio (unsupported model) → "该模型不支持音频。建议用 Veo 3.1 或 Kling O1 (支持 generate_audio 参数)"
  • Music + custom duration → "音乐时长由模型固定(Suno约2分钟,DouBao约30秒),无法自定义"
  • Video duration > 15s → "当前最长15秒。可选模型:Wan 2.6(15s, 75积分), Kling O1(10s, 96积分)"

Note: Image-specific unsupported combinations (Midjourney + aspect_ratio, 8K, non-standard ratios) are documented in the "Image Models" section above.


🧠 User Preference Memory (Video)

User preferences have highest priority when they exist. But preferences are only saved when users explicitly express model preferences — not from automatic model selection.

Storage:
~/.openclaw/memory/ima_prefs.json

{
  "user_{user_id}": {
    "text_to_video":              { "model_id": "wan2.6-t2v",      "model_name": "Wan 2.6",  "credit": 25, "last_used": "..." },
    "image_to_video":             { "model_id": "wan2.6-i2v",      "model_name": "Wan 2.6",  "credit": 25, "last_used": "..." },
    "first_last_frame_to_video":  { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." },
    "reference_image_to_video":   { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." }
  }
}

Model Selection Flow (Video Generation)

Step 1: Get knowledge-ai recommendation (if installed)

knowledge_recommended_model = read_ima_knowledge_ai()  # e.g., "Wan 2.6"

Step 2: Check user preference

user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type)  # e.g., {"model_id": "kling-video-o1", ...}

Step 3: Decide which model to use

if user_pref exists:
    use_model = user_pref["model_id"]  # Highest priority
else:
    use_model = knowledge_recommended_model or fallback_default

Step 4: Check for mismatch (for later hint)

if user_pref exists and knowledge_recommended_model != user_pref["model_id"]:
    mismatch = True  # Will add hint in success message

When to Write (User Explicit Preference ONLY)

✅ Save preference when user explicitly specifies a model:

User saysAction
用XXX
/
换成XXX
/
改用XXX
Switch to model XXX + save as preference
以后都用XXX
/
默认用XXX
/
always use XXX
Save + confirm:
✅ 已记住!以后视频生成默认用 [XXX]
我喜欢XXX
/
我更喜欢XXX
Save as preference

❌ Do NOT save when:

  • Agent auto-selects from knowledge-ai → not user preference
  • Agent uses fallback default → not user preference
  • User says generic quality requests (see "Clear Preference" below) → clear preference instead

When to Clear (User Abandons Preference)

🗑️ Clear preference when user wants automatic selection:

User saysAction
用最好的
/
用最合适的
/
best
/
recommended
Clear pref + use knowledge-ai recommendation
推荐一个
/
你选一个
/
自动选择
Clear pref + use knowledge-ai recommendation
用默认的
/
用新的
Clear pref + use knowledge-ai recommendation
试试别的
/
换个试试
(without specific model)
Clear pref + use knowledge-ai recommendation
重新推荐
Clear pref + use knowledge-ai recommendation

Implementation:

del prefs[f"user_{user_id}"][task_type]
save_prefs(prefs)

⭐ Model Selection Priority (Video)

Selection flow:

  1. User preference (if exists) → Highest priority, always respect
  2. ima-knowledge-ai skill (if installed) → Professional recommendation based on task
  3. Fallback defaults → Use table below (only if neither 1 nor 2 exists)

Important notes:

  • User preference is only saved when user explicitly specifies a model (see "When to Write" above)
  • Knowledge-ai is always consulted (even when user pref exists) to detect mismatches
  • When mismatch detected → add gentle hint in success message (does NOT interrupt generation)

The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations.


💬 User Experience Protocol (IM / Feishu / Discord) v2.0 🆕

v2.0 Updates (aligned with ima-image-ai v1.3):

  • Added Step 0 for correct message ordering (fixes group chat bug)
  • Added Step 5 for explicit task completion
  • Enhanced Midjourney support with proper timing estimates
  • Now 6 steps total (0-5): Acknowledgment → Pre-Gen → Progress → Success/Failure → Done

This skill runs inside IM platforms (Feishu, Discord via OpenClaw).
Generation takes 10 seconds (music) up to 6 minutes (video). Never let users wait in silence.
Always follow all 6 steps below, every single time.

🗣️ User-Friendly First, Transparent on Request

Default to plain-language updates in normal user flows. If users ask for technical details, provide them transparently (script name, endpoints, and key parameters).

In standard progress messages, prioritize: model name, estimated/actual time, credits consumed, result URL, and natural-language status updates.


Estimated Generation Time (All Task Types)

Task TypeModelEstimated TimePoll EverySend Progress Every
text_to_imageSeeDream 4.525~60s5s20s
Nano Banana2 💚20~40s5s15s
Nano Banana Pro60~120s5s30s
Midjourney 🎨40~90s8s25s
image_to_imageSeeDream 4.525~60s5s20s
Nano Banana2 💚20~40s5s15s
Nano Banana Pro60~120s5s30s
Midjourney 🎨40~90s8s25s
text_to_videoWan 2.6, Hailuo 2.0/2.3, Vidu Q2, Pixverse60~120s8s30s
SeeDance 1.5 Pro, Kling 2.6, Veo 3.190~180s8s40s
Kling O1, Sora 2 Pro180~360s8s60s
image_to_videoSame ranges as text_to_video8s40s
first_last_frame / referenceKling O1, Veo 3.1180~360s8s60s
text_to_musicDouBao BGM / Song10~25s5s10s
Suno (sonic-v5)20~45s5s15s
text_to_speech(varies by model)5~30s3s10s

estimated_max_seconds
= upper bound of the range (e.g. 60 for SeeDream 4.5, 40 for Nano Banana2, 120 for Nano Banana Pro, 90 for Midjourney, 180 for Kling 2.6, 360 for Kling O1).


Step 0 — Initial Acknowledgment Reply (Normal Reply) 🆕

⚠️ CRITICAL: This step is essential for correct message ordering in IM platforms (Feishu, Discord).

Before doing anything else, reply to the user with a friendly acknowledgment message using your normal reply (not

message
tool). This reply will automatically appear FIRST in the conversation.

Example acknowledgment messages:

For images:

好的!来帮你画一只萌萌的猫咪 🐱
收到!马上为你生成一张 16:9 的风景照 🏔️
OK! Starting image generation with SeeDream 4.5 🎨

For videos:

好的!来帮你生成一段视频 🎬
收到!开始用 Wan 2.6 生成视频 🎥

For music:

好的!来帮你创作一首音乐 🎵

Rules:

  • Keep it short and warm (< 15 words)
  • Match the user's language (Chinese/English)
  • Include relevant emoji (🐱/🎨/🎬/🎵/✨)
  • This is your ONLY normal reply — all subsequent updates use
    message
    tool

Why this matters:

  • Normal replies automatically appear FIRST in the conversation thread
  • message
    tool pushes appear in chronological order AFTER your initial reply
  • This ensures users see: "好的!" → "🎨 开始生成..." → "⏳ 进度..." → "✅ 成功!" (correct order)
  • Without Step 0, the confirmation might appear LAST, confusing users

Step 1 — Pre-Generation Notification (Push via message tool)

After Step 0 reply, use the

message
tool to push a notification immediately:

[Emoji] 开始生成 [内容类型],请稍候…
• 模型:[Model Name]
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]

Emoji by content type:

  • 图片 →
    🎨
  • 视频 →
    🎬
    (加注:视频生成需要较长时间,我会定时汇报进度)
  • 音乐 →
    🎵

Cost transparency (new requirement):

  • Always show credit cost with model tier context
  • For expensive models (>50 pts), offer cheaper alternative proactively
  • Examples:
    • Balanced (default): "使用 Wan 2.6(25 积分,最新 Wan)"
    • Premium (user explicit): "使用高端模型 Kling O1(48-120 积分),质量最佳"
    • Premium (auto-selected): "使用 Wan 2.6(25 积分)。若需更高质量可选 Kling O1(48 积分起)"
    • Budget (user asked): "使用 Vidu Q2(5 积分,最省钱)"

Adapt language to match the user (Chinese / English). For video, always add a note that it takes longer. For expensive models, always mention cheaper alternatives unless user explicitly requested premium.


Step 2 — Progress Updates

Poll the task detail API every

[Poll Every]
seconds per the table.
Send a progress update every
[Send Progress Every]
seconds.

⏳ 正在生成中… [P]%
已等待 [elapsed]s,预计最长 [max]s

Progress formula:

P = min(95, floor(elapsed_seconds / estimated_max_seconds * 100))
  • Cap at 95% — never reach 100% until the API confirms
    success
  • If
    elapsed > estimated_max
    : freeze at 95%, append
    「快了,稍等一下…」
  • For video with max=360s: at 120s → 33%, at 250s → 69%, at 400s → 95% (frozen)

Step 3 — Success Notification

When task status =

success
:

For Video Tasks (text_to_video / image_to_video / first_last_frame / reference_image)

3.1 Send video player first (IM platforms like Feishu will render inline player):

# Get result URL from script output or task detail API
result = get_task_result(task_id)
video_url = result["medias"][0]["url"]

# Build caption
caption = f"""✅ 视频生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]

[视频描述]"""

# Add mismatch hint if user pref conflicts with knowledge-ai recommendation
if user_pref_exists and knowledge_recommended_model != used_model:
    caption += f"""

💡 提示:当前任务也许用 {knowledge_recommended_model} 也会不错({reason},{cost} pts)"""

# Send video with caption (use message tool if available)
message(
    action="send",
    media=video_url,  # ⚠️ Use HTTPS URL directly, NOT local file path
    caption=caption
)

Important:

  • Hint is non-intrusive — does NOT interrupt generation
  • Only shown when user pref conflicts with knowledge-ai recommendation
  • User can ignore the hint; video is already delivered

3.2 Then send link as text (for copying/sharing):

# Send link message immediately after video
message(action="send", text=f"🔗 视频链接(可复制分享):\n{video_url}")

⚠️ Critical for video:

  • Send video player FIRST (inline preview)
  • Send text link SECOND (for copying)
  • Include first-frame thumbnail URL if available:
    result["medias"][0]["cover"]

For Image Tasks (text_to_image / image_to_image)

# Build caption
caption = f"""✅ 图片生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]

🔗 原始链接:{image_url}"""

# Add mismatch hint if user pref conflicts with knowledge-ai recommendation
if user_pref_exists and knowledge_recommended_model != used_model:
    caption += f"""

💡 提示:当前任务也许用 {knowledge_recommended_model} 也会不错({reason},{cost} pts)"""

# Send image with caption
message(
    action="send",
    media=image_url,
    caption=caption
)

Important:

  • Hint is non-intrusive — does NOT interrupt generation
  • Only shown when user pref conflicts with knowledge-ai recommendation
  • User can ignore the hint; image is already delivered

For Music Tasks (text_to_music)

Send audio file with player:

✅ 音乐生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]
• 时长:约 [duration]

[音频URL或直接发送音频文件]

For TTS Tasks (text_to_speech) — Full UX Protocol (Steps 0–5)

Step 0 — Initial acknowledgment (normal reply)
First reply with a short acknowledgment, e.g.: 好的,正在帮你把这段文字转成语音。 / OK, converting this text to speech.

Step 1 — Pre-generation (message tool)
Push once:

🔊 开始语音合成,请稍候…
• 模型:[Model Name]
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]

Step 2 — Progress
Poll every 2–5s. Every 10–15s send:

⏳ 语音合成中… [P]%
,已等待 [elapsed]s,预计最长 [max]s. Cap progress at 95% until API returns success.

Step 3 — Success (message tool)
When

resource_status == 1
and
status != "failed"
, send media =
medias[0].url
and caption:

✅ 语音合成成功!
• 模型:[Model Name]
• 耗时:实际 [actual]s
• 消耗积分:[N pts]
🔗 原始链接:[url]

Use the URL from the API (do not use local file paths).

Step 4 — Failure (message tool)
On failure, send user-friendly message. TTS error translation (do not expose raw API errors):

Technical✅ Say (CN)✅ Say (EN)
401 Unauthorized密钥无效或未授权,请至 imaclaw.ai 生成新密钥API key invalid; generate at imaclaw.ai
4008 Insufficient points积分不足,请至 imaclaw.ai 购买积分Insufficient points; buy at imaclaw.ai
Invalid product attribute参数配置异常,请稍后重试Configuration error, try again later
Error 6006 / 6010积分或参数不匹配,请换模型或重试Points/params mismatch, try another model
resource_status == 2 / status failed语音合成失败,建议换模型或缩短文本Synthesis failed, try another model or shorter text
timeout合成超时,请稍后重试Timed out, try again later
Network error网络不稳定,请检查后重试Network unstable, check and retry
Text too long (TTS)文本过长,请缩短后重试Text too long, please shorten

Links: API key — https://www.imaclaw.ai/imaclaw/apikey ;Credits — https://www.imaclaw.ai/imaclaw/subscription

Step 5 — Done
After Step 0–4, no further reply needed. Do not send duplicate confirmations.


Step 4 — Failure Notification

When task status =

failed
or any API/network error, send:

❌ [内容类型]生成失败
• 原因:[natural_language_error_message]
• 建议改用:
  - [Alt Model 1]([特点],[N pts])
  - [Alt Model 2]([特点],[N pts])

需要我帮你用其他模型重试吗?

⚠️ CRITICAL: Error Message Translation

NEVER show technical error messages to users. Always translate API errors into natural language.
API key & credits: 密钥与积分管理入口为 imaclaw.ai(与 imastudio.com 同属 IMA 平台)。Key and subscription management: imaclaw.ai (same IMA platform as imastudio.com).

Technical Error❌ Never Say✅ Say Instead (Chinese)✅ Say Instead (English)
401 Unauthorized
🆕
Invalid API key / 401 Unauthorized❌ API密钥无效或未授权<br>💡 生成新密钥: https://www.imaclaw.ai/imaclaw/apikey❌ API key is invalid or unauthorized<br>💡 Generate API Key: https://www.imaclaw.ai/imaclaw/apikey
4008 Insufficient points
🆕
Insufficient points / Error 4008❌ 积分不足,无法创建任务<br>💡 购买积分: https://www.imaclaw.ai/imaclaw/subscription❌ Insufficient points to create this task<br>💡 Buy Credits: https://www.imaclaw.ai/imaclaw/subscription
"Invalid product attribute"
/
"Insufficient points"
Invalid product attribute生成参数配置异常,请稍后重试Configuration error, please try again later
Error 6006
(credit mismatch)
Error 6006积分计算异常,系统正在修复Points calculation error, system is fixing
Error 6009
(no matching rule)
Error 6009参数组合不匹配,已自动调整Parameter mismatch, auto-adjusted
Error 6010
(attribute_id mismatch)
Attribute ID does not match模型参数不匹配,请尝试其他模型Model parameters incompatible, try another model
error 400
(bad request)
error 400 / Bad request请求参数有误,请稍后重试Invalid request parameters, please try again
resource_status == 2
Resource status 2 / Failed生成过程遇到问题,建议换个模型试试Generation failed, please try another model
status == "failed"
(no details)
Task failed这次生成没成功,要不换个模型试试?Generation unsuccessful, try a different model?
timeout
Task timed out / Timeout error生成时间过长已超时,建议用更快的模型Generation took too long, try a faster model
Network error / Connection refusedConnection refused / Network error网络连接不稳定,请检查网络后重试Network connection unstable, check network and retry
Rate limit exceeded429 Too Many Requests / Rate limit请求过于频繁,请稍等片刻再试Too many requests, please wait a moment
Prompt moderation (Sora only)Content policy violation提示词包含敏感内容,请修改后重试Prompt contains restricted content, please modify
Model unavailableModel not available / 503 Service Unavailable当前模型暂时不可用,建议换个模型Model temporarily unavailable, try another model
Lyrics format error (Suno only) 🎵Invalid lyrics format歌词格式有误,请调整后重试Lyrics format error, adjust and retry
Prompt too short/long (Music) 🎵Prompt length invalid音乐描述过短或过长,请调整到合适长度 (建议20-100字)Music description too short or long, adjust to appropriate length (20-100 chars recommended)
Text too long (TTS) 🔊TTS text length文本过长,请缩短后重试Text too long, please shorten and retry

Generic fallback (when error is unknown):

  • Chinese:
    生成过程遇到问题,请稍后重试或换个模型试试
  • English:
    Generation encountered an issue, please try again or use another model

Best Practices:

  1. Focus on user action: Tell users what to do next, not what went wrong technically
  2. Be reassuring: Use phrases like "建议换个模型试试" instead of "失败了"
  3. Avoid blame: Never say "你的提示词有问题" → say "提示词需要调整一下"
  4. Provide alternatives: Always suggest 1-2 alternative models in the failure message
  5. 🆕 Include actionable links (v1.0.8+): For 401/4008 errors, provide clickable links to API key generation or credit purchase pages
  6. 🎵 Music-specific (v1.2.0+):
    • For Suno lyrics errors, suggest simplifying lyrics or using auto-generated lyrics (
      auto_lyrics=true
      )
    • For prompt length errors, give example length (e.g., "建议20-100字")
    • For BGM requests, recommend DouBao BGM over Suno
  7. 🔊 TTS-specific: Use the TTS error translation table in "For TTS Tasks (text_to_speech)" above; suggest another model via
    --list-models
    or shortening text.

Step 5 — Done (No Further Action Needed) 🆕

After sending Step 3 (success) or Step 4 (failure):

  1. DO NOT send any additional messages unless the user asks a follow-up question
  2. The task is complete — wait for the user's next request
  3. User preference has been saved (if generation succeeded)
  4. The conversation is ready for the next generation request

Why this step matters:

  • Prevents unnecessary "anything else?" messages that clutter the chat
  • Allows users to naturally continue the conversation when ready
  • Respects the asynchronous nature of IM platforms

Exception: If the user explicitly asks "还有别的吗?" or similar, then respond naturally.


🆕 Enhanced Error Handling (v1.0.8):

The Reflection mechanism (3 automatic retries) now provides specific, actionable suggestions for common errors:

  • 401 Unauthorized: System suggests generating a new API key with clickable link
  • 4008 Insufficient Points: System suggests purchasing credits with clickable link
  • 500 Internal Server Error: Automatic parameter degradation (size, resolution, duration, quality)
  • 6009 No Rule Match: Automatic parameter completion from credit_rules
  • 6010 Attribute Mismatch: Automatic credit_rule reselection
  • Timeout: Helpful info with dashboard link for background task status

All error handling is automatic and transparent — users receive natural language explanations with next steps.

Failure fallback by task type:

Task TypeFailed ModelFirst AltSecond Alt
text_to_imageSeeDream 4.5Nano Banana2 (4pts, fast)Nano Banana Pro (10-18pts, premium)
text_to_imageNano Banana2SeeDream 4.5 (5pts, better quality)Nano Banana Pro (10-18pts)
text_to_imageNano Banana ProSeeDream 4.5 (5pts)Nano Banana2 (4pts, budget)
image_to_imageSeeDream 4.5Nano Banana2 (4pts, fast)Nano Banana Pro (10pts)
image_to_imageNano Banana2SeeDream 4.5 (5pts)Nano Banana Pro (10pts)
image_to_imageNano Banana ProSeeDream 4.5 (5pts)Nano Banana2 (4pts)
text_to_videoKling O1Wan 2.6 (25pts)Vidu Q2 (5pts)
text_to_videoGoogle Veo 3.1Kling O1 (48pts)Sora 2 Pro (122pts)
text_to_videoAnyWan 2.6 (25pts, most popular)Hailuo 2.0 (5pts)
image_to_videoWan 2.6Kling O1 (48pts)Hailuo 2.0 i2v (25pts)
image_to_videoAnyWan 2.6 (25pts, most popular)Vidu Q2 Pro (20pts)
first_last / referenceKling O1Kling 2.6 (80pts)Veo 3.1 (70pts+)
text_to_music 🎵SunoDouBao BGM (30pts, 背景音乐)DouBao Song (30pts, 歌曲生成)
text_to_music 🎵DouBao BGMDouBao Song (30pts)Suno (25pts, 功能最强)
text_to_music 🎵DouBao SongDouBao BGM (30pts)Suno (25pts, 功能最强)
text_to_speech 🔊(any)Query
--list-models
for alternatives
Use another model_id from product list

Music-specific failure guidance:

  • If Suno fails → Recommend DouBao BGM (for background music) or DouBao Song (for songs)
  • If DouBao BGM fails → Try DouBao Song first (similar pricing), then Suno (more powerful)
  • If DouBao Song fails → Try DouBao BGM first (similar pricing), then Suno (more powerful)
  • For lyrics errors in Suno → Suggest simplifying lyrics or using
    auto_lyrics=true
  • For prompt length errors → Recommend 20-100 characters

TTS-specific failure guidance:

  • If TTS fails → Run
    --task-type text_to_speech --list-models
    and suggest another model_id; or shorten text / simplify content. Use the TTS error translation table in "For TTS Tasks" above for user-facing messages.

Supported Models at a Glance

Source: production

GET /open/v1/product/list
(2026-02-27). Model count reduced significantly. Always query product list API at runtime.

Image Generation (4 models each)

CategoryNamemodel_idCost
text_to_imageSeeDream 4.5 🌟
doubao-seedream-4.5
5 pts
text_to_imageMidjourney 🎨
midjourney
8/10 pts (480p/720p)
text_to_imageNano Banana2 💚
gemini-3.1-flash-image
4/6/10/13 pts
text_to_imageNano Banana Pro
gemini-3-pro-image
10/10/18 pts
image_to_imageSeeDream 4.5 🌟
doubao-seedream-4.5
5 pts
image_to_imageMidjourney 🎨
midjourney
8/10 pts (480p/720p)
image_to_imageNano Banana2 💚
gemini-3.1-flash-image
4/6/10/13 pts
image_to_imageNano Banana Pro
gemini-3-pro-image
10 pts

Midjourney attribute_ids: 5451/5452 (text_to_image), 5453/5454 (image_to_image)
Nano Banana2 size options: 512px (4pts), 1K (6pts), 2K (10pts), 4K (13pts)
Nano Banana Pro size options: 1K (10pts), 2K (10pts), 4K (18pts for t2i / 10pts for i2i)

Image Model Capabilities (Parameter Support)

⚠️ Critical: Models have varying parameter support. Custom aspect ratios are now supported by multiple models.

ModelCustom Aspect RatioMax ResolutionSize OptionsNotes
SeeDream 4.5✅ (via virtual params)4K (adaptive)8 aspect ratiosSupports 1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9 (5 pts)
Nano Banana2Native support 🆕4K (4096×4096)512px/1K/2K/4K + aspect ratiosSupports 1:1, 16:9, 9:16, 4:3, 3:4; size via
attribute_id
Nano Banana ProNative support 🆕4K (4096×4096)1K/2K/4K + aspect ratiosSupports 1:1, 16:9, 9:16, 4:3, 3:4; size via
attribute_id
Midjourney 🎨❌ (1:1 only)1024px (square)480p/720p via
attribute_id
Fixed 1024x1024, artistic style focus

Key Capabilities:

  • Aspect ratio control: SeeDream 4.5 (virtual params), Nano Banana Pro/2 (native support)
  • 8K: Not supported by any model (max is 4K)
  • Size control: Nano Banana2, Nano Banana Pro, and Midjourney support multiple size options via different
    attribute_id
    s
  • Budget option: Nano Banana2 is the cheapest at 4 pts for 512px, but 4K costs 13pts
  • 🎨 Artistic styles: Midjourney excels at creative, artistic, and illustration styles
  • 💡 Best value: SeeDream 4.5 at 5pts offers aspect ratio flexibility; Nano Banana2 512px at 4pts for fastest/cheapest

Video Generation

IMA Sevio Family (Dynamic Routing)

Namemodel_idTypical positioningtask_type support
IMA Video Pro (Sevio 1.0)
ima-pro
Higher quality / consistency
text_to_video
,
image_to_video
,
first_last_frame_to_video
,
reference_image_to_video
IMA Video Pro Fast (Sevio 1.0-Fast)
ima-pro-fast
Lower latency / faster iteration
text_to_video
,
image_to_video
,
first_last_frame_to_video
,
reference_image_to_video

Notes:

  • ima-all-ai
    does not hardcode Sevio availability; it resolves by runtime
    product/list
    .
  • If a Sevio model is temporarily absent for a category, use
    --list-models
    result as final source.
CategoryNamemodel_idCost Range
text_to_video (14)Wan 2.6 🔥
wan2.6-t2v
25-120 pts
Hailuo 2.3
MiniMax-Hailuo-2.3
32+ pts
Hailuo 2.0
MiniMax-Hailuo-02
5+ pts
Vidu Q2
viduq2
5-70 pts
SeeDance 1.5 Pro
doubao-seedance-1.5-pro
20+ pts
Sora 2 Pro
sora-2-pro
122+ pts
Kling O1
kling-video-o1
48-120 pts
Kling 2.6
kling-v2-6
80+ pts
Google Veo 3.1
veo-3.1-generate-preview
70-330 pts
Pixverse V5.5 / V5 / V4.5 / V4 / V3.5
pixverse
12-48 pts
image_to_video (14)Wan 2.6 🔥
wan2.6-i2v
25-120 pts
Hailuo 2.3 / 2.0
MiniMax-Hailuo-2.3/02
25-32 pts
Vidu Q2 Pro
viduq2-pro
20-70 pts
SeeDance 1.5 Pro
doubao-seedance-1.5-pro
47+ pts
Sora 2 Pro
sora-2-pro
122+ pts
Kling O1 / 2.6
kling-video-o1/v2-6
48-120 pts
Google Veo 3.1
veo-3.1-generate-preview
70-330 pts
Pixverse V5.5-V3.5
pixverse
12-48 pts
first_last_frame (11)Kling O1 🌟
kling-video-o1
48-120 pts
Kling 2.6
kling-v2-6
80+ pts
Others (9)Hailuo 2.0, Vidu Q2 Pro, SeeDance 1.5 Pro, Veo 3.1, Pixverse V5.5-V3.5
reference_image (6)Kling O1 🌟
kling-video-o1
48-120 pts
Google Veo 3.1
veo-3.1-generate-preview
70-330 pts
Others (4)Vidu Q2, Pixverse V5.5/V5/V4.5

| text_to_video | SeeDance 1.5 Pro / 1.0 Pro |

doubao-seedance-1.5-pro
/
doubao-seedance-1.0-pro
| 16 / 15 pts | | text_to_video | Sora 2 Pro / Sora 2 |
sora-2-pro
/
sora-2
| 120 / 35 pts | | text_to_video | Kling O1 / 2.6 / 2.5 Turbo / 1.6 |
kling-video-o1
/
kling-v2-6
/
kling-v2-5-turbo
/
kling-v1-6
| 48 / 80 / 24 / 32 pts | | text_to_video | Google Veo 3.1 Fast / 3.1 / 3.0 |
veo-3.1-fast-generate-preview
/
veo-3.1-generate-preview
/
veo-3.0-generate-preview
| 55 / 140 / 280 pts | | text_to_video | Pixverse V3.5–V5.5 |
pixverse
| 12 pts | | image_to_video | Wan 2.6 / 2.6 Flash / 2.5 / 2.2 Plus |
wan2.6-i2v
/
wan2.6-i2v-flash
/
wan2.5-i2v-preview
/
wan2.2-i2v-plus
| 25 / 12 / 12 / 10 pts | | image_to_video | Kling 2.1 Master |
kling-v2-1-master
| 150 pts | | first_last_frame_to_video | Kling O1 |
kling-video-o1
| 70 pts | | reference_image_to_video | Kling O1 / Vidu Q2 / Q1 |
kling-video-o1
/
viduq2
/
viduq1
| 48 / 10 / 25 pts |

Music Generation

CategoryNamemodel_idCostNotes
text_to_musicSuno
sonic
25 ptssonic-v5; custom_mode, lyrics, vocal_gender
text_to_musicDouBao BGM
GenBGM
30 ptsBackground music
text_to_musicDouBao Song
GenSong
30 ptsSong generation

Speech (TTS) — text_to_speech

Models and credits are not fixed. Always call

GET /open/v1/product/list?category=text_to_speech
(or run the script with
--task-type text_to_speech --list-models
) to get current model_id, attribute_id, and credit.

ima-all-ai has complete TTS capability: This document and the bundled

ima_create.py
provide full TTS support (routing, parameters, create/poll, UX protocol Steps 0–5, error translation). The ima-tts-ai skill is an optional standalone package with the same specification.

TTS Task Detail — Response Shape

Poll

POST /open/v1/tasks/detail
until completion. For TTS,
medias[]
uses the same structure as other IMA audio tasks:

FieldTypeMeaning
resource_status
int or null0=处理中, 1=可用, 2=失败, 3=已删除;null 视为 0
status
string"pending" / "processing" / "success" / "failed"
url
stringAudio URL when resource_status=1 (mp3/wav)
duration_str
stringOptional, e.g. "12s"
format
stringOptional, e.g. "mp3", "wav"

Success example: When all medias have

resource_status == 1
and
status != "failed"
, read
medias[0].url
(or
watermark_url
). Example:
{"medias":[{"resource_status":1,"status":"success","url":"https://cdn.../output.mp3","duration_str":"12s","format":"mp3"}]}
.

TTS Create Task — Request Shape

task_type
:
"text_to_speech"
. No image input:
src_img_url: []
,
input_images: []
.
prompt
(text to speak) must be inside
parameters[].parameters
, not at top level. Extra fields (e.g. voice_id, speed) come from product
form_config
; pass via
--extra-params
and only include params present in the product’s credit_rules/form_config.

TTS Common Mistakes

MistakeFix
prompt at top levelPut prompt inside
parameters[].parameters
(script does this)
Wrong or missing attribute_idAlways call product list first; use credit_rules
Single pollPoll until all medias have resource_status == 1
Ignoring status when resource_status=1Check status != "failed"
Sending params not in form_config/credit_rulesUse only params from product list; script reflection strips others on retry

Always call

GET /open/v1/product/list?category=<type>
first to get the live
attribute_id
and
form_config
defaults required for task creation.

There are two equivalent route systems serving the same backend logic:

RouteAuthUse Case
/open/v1/
Authorization: Bearer ima_*
only
Third-party / agent access
/api/v3/
Token + API Key (dual auth)Frontend App

This skill documents the

/open/v1/
Open API. All business logic (credit validation, N-flattening, risk control) runs identically on both paths.

Environment

Base URL:

https://api.imastudio.com

Required/recommended headers for all

/open/v1/
endpoints:

HeaderRequiredValueNotes
Authorization
Bearer ima_your_api_key_here
API key authentication
x-app-source
ima_skills
Fixed value — identifies skill-originated requests
x_app_language
recommended
en
/
zh
Product label language; defaults to
en
if omitted
Authorization: Bearer ima_your_api_key_here
x-app-source: ima_skills
x_app_language: en

📤 When to Upload Images (Quick Reference)

The IMA Open API does NOT accept raw bytes or base64 images. All image inputs must be public HTTPS URLs.

Task TypeInput Required?Upload Before Create?Notes
text_to_image❌ NoPrompt only
image_to_image✅ Yes (1 image)✅ Upload firstSingle input image
text_to_video❌ NoPrompt only
image_to_video✅ Yes (1 image)✅ Upload firstSingle input image
first_last_frame_to_video✅ Yes (2 images)✅ Upload firstFirst + last frame
reference_image_to_video✅ Yes (1+ images)✅ Upload firstReference image(s)
text_to_music❌ NoPrompt only
text_to_speech❌ NoPrompt only (text to speak)

Upload flow:

  1. User provides local file path or bytes → call
    prepare_image_url()
    (see section below)
  2. User provides HTTPS URL → use directly, no upload needed
  3. Use the returned CDN URL (
    fdl
    ) as the value for
    input_images
    /
    src_img_url

Example workflow (image_to_image):

# User provides local file
image_url = prepare_image_url("/path/to/photo.jpg", api_key)
# → Returns: https://ima-ga.esxscloud.com/webAgent/privite/2026/02/27/..._uuid.jpeg

# Then create task with this URL
create_task(
    task_type="image_to_image",
    input_images=[image_url],  # Use uploaded URL
    prompt="turn into oil painting"
)

⚠️ MANDATORY: Always Query Product List First

CRITICAL: You MUST call

/open/v1/product/list
BEFORE creating any task.
The
attribute_id
field is REQUIRED in the create request. If it is
0
or missing, you get:
"Invalid product attribute"
"Insufficient points"
→ task fails completely.
NEVER construct a create request from the model table alone. Always fetch the product first.

How to get attribute_id (all task types)

# Query product list with the correct category
GET /open/v1/product/list?app=ima&platform=web&category=<task_type>
# task_type: text_to_image | image_to_image | text_to_video | image_to_video |
#            first_last_frame_to_video | reference_image_to_video | text_to_music | text_to_speech

# Walk the V2 tree to find your target model (type=3 leaf nodes only)
for group in response["data"]:
    for version in group.get("children", []):
        if version["type"] == "3" and version["model_id"] == target_model_id:
            attribute_id  = version["credit_rules"][0]["attribute_id"]
            credit        = version["credit_rules"][0]["points"]
            model_version = version["id"]    # = version_id / model_version
            model_name    = version["name"]
            form_defaults = {f["field"]: f["value"] for f in version["form_config"]}
            break

Quick Reference: Known attribute_ids

Pre-queried values for convenience. Always call the product list at runtime for accuracy.

ModelTask Typemodel_idattribute_idcreditNotes
text_to_image
SeeDream 4.5text_to_image
doubao-seedream-4.5
23415 ptsDefault, balanced
Nano Banana Pro (1K)text_to_image
gemini-3-pro-image
239910 pts1024×1024
Nano Banana Pro (2K)text_to_image
gemini-3-pro-image
240010 pts2048×2048
Nano Banana Pro (4K)text_to_image
gemini-3-pro-image
240118 pts4096×4096
text_to_video
Wan 2.6 (720P, 5s)text_to_video
wan2.6-t2v
205725 ptsDefault, balanced
Wan 2.6 (1080P, 5s)text_to_video
wan2.6-t2v
205840 pts
Wan 2.6 (720P, 10s)text_to_video
wan2.6-t2v
205950 pts
Wan 2.6 (1080P, 10s)text_to_video
wan2.6-t2v
206080 pts
Wan 2.6 (720P, 15s)text_to_video
wan2.6-t2v
206175 pts
Wan 2.6 (1080P, 15s)text_to_video
wan2.6-t2v
2062120 pts
Kling O1 (5s, std)text_to_video
kling-video-o1
231348 ptsLatest Kling
Kling O1 (5s, pro)text_to_video
kling-video-o1
231460 pts
Kling O1 (10s, std)text_to_video
kling-video-o1
231596 pts
Kling O1 (10s, pro)text_to_video
kling-video-o1
2316120 pts
text_to_music
Suno (sonic-v4)text_to_music
sonic
237025 ptsDefault
DouBao BGMtext_to_music
GenBGM
439930 pts
DouBao Songtext_to_music
GenSong
439830 pts
All othersany→ query
/open/v1/product/list
Always runtime query

⚠️ Production warning:

attribute_id
and
credit
values change frequently in production. Always call
/open/v1/product/list
at runtime; above table is pre-queried reference only (2026-02-27).

Common Mistakes (and resulting errors)

MistakeError
attribute_id
is
0
or missing
"Invalid product attribute"
+
"Insufficient points"
attribute_id
outdated (production changed)
Same errors; always query product list first
attribute_id
doesn't match parameter combination
Error 6010: "Attribute ID does not match the calculated rule"
prompt
at outer
parameters[]
level
Prompt ignored; wrong routing
cast
missing from inner
parameters.parameters
Billing validation failure
credit
value wrong or missing
Error 6006
model_name
/
model_version
missing
Wrong backend routing
Skipped product list, used table values directlyAll of the above

⚠️ Critical for Google Veo 3.1 and multi-rule models:

Models like Google Veo 3.1 have multiple

credit_rules
, each with a different
attribute_id
for different parameter combinations:

  • 720p + 4s + optimized
    → attribute_id A
  • 720p + 8s + optimized
    → attribute_id B
  • 4K + 4s + high
    → attribute_id C

The script automatically selects the correct

attribute_id
by matching your parameters (
duration
,
resolution
,
compression_quality
,
generate_audio
) against each rule's
attributes
. If the match fails, you get error 6010.

Fix: The bundled script now checks these video-specific parameters for smart credit_rule selection. Always use the script, not manual API construction.


Core Flow

1. GET /open/v1/product/list?app=ima&platform=web&category=<type>
   → REQUIRED: Get attribute_id, credit, model_version, model_name, form_config defaults

[If input image required]
2. Upload image → get public HTTPS URL
   → See "Image Upload" section below

3. POST /open/v1/tasks/create
   → Must include: attribute_id, model_name, model_version, credit, cast, prompt (nested!)

4. POST /open/v1/tasks/detail  {"task_id": "..."}
   → Poll until medias[].resource_status == 1
   → Extract url from completed media

Image Upload (Required Before Image Tasks)

The IMA Open API does NOT accept raw bytes or base64 images. All image inputs must be public HTTPS URLs.

When a user provides an image (local file, bytes, base64), you must upload it first and get a URL. This is exactly what the IMA frontend does before every image task.

Real Upload Flow (from IMA Frontend Source)

The frontend uses a two-step presigned URL flow via the IM platform:

Step 1: GET /api/rest/oss/getuploadtoken   → returns { ful, fdl }
          ful = presigned PUT URL (upload destination, expires ~7 days)
          fdl = final CDN download URL (use this as input_images value)

Step 2: PUT {ful}  with raw image bytes + Content-Type header
          → image is stored in Aliyun OSS: zhubite-imagent-bot.oss-us-east-1.aliyuncs.com
          → accessible via CDN: https://ima-ga.esxscloud.com/...

Step 1: Get Upload Token

GET https://imapi.liveme.com/api/rest/oss/getuploadtoken

Required query parameters (11 total — sourced directly from frontend

generateUploadInfo
):

ParameterExampleDescription
appUid
ima_xxx...
Use IMA API key directly — no separate login needed
appId
webAgent
App identifier (fixed)
appKey
32jdskjdk320eew
App secret (fixed, used for
sign
generation)
cmimToken
ima_xxx...
Use IMA API key directly — same as appUid
sign
117CF6CF...
IM auth HMAC:
SHA1("webAgent|32jdskjdk320eew|{timestamp}|{nonce}").upper()
timestamp
1772042430
Unix timestamp (seconds), generated per request
nonce
CxI1FLI5ajLJZ1jlxZmeg
Random nonce string, generated per request
fService
privite
Fixed: storage service type
fType
picture
picture
for images,
video
,
audio
fSuffix
jpeg
File extension:
jpeg
,
png
,
mp4
,
mp3
fContentType
image/jpeg
MIME type of the file

简化认证:直接使用 IMA API key 填充

appUid
cmimToken
参数,无需单独获取凭证。

Response:

{
  "ful": "https://zhubite-imagent-bot.oss-us-east-1.aliyuncs.com/webAgent/privite/2026/02/26/..._uuid.jpeg?Expires=...&OSSAccessKeyId=...&Signature=...",
  "fdl": "https://ima-ga.esxscloud.com/webAgent/privite/2026/02/26/..._uuid.jpeg",
  "ful_expire": "...",
  "fdl_expire": "...",
  "fdl_key": "..."
}

Step 2: Upload Image via Presigned URL

PUT {ful}
Content-Type: image/jpeg
Body: [raw image bytes]

No auth headers needed — the presigned URL already encodes the credentials.

Step 3: Use
fdl
as the Image URL

After the PUT succeeds, use

fdl
(the CDN URL) as the value for
input_images
/
src_img_url
.

Python Implementation

import hashlib, time, uuid, requests, mimetypes

# ── 🌐 IMA Upload Service Endpoint (IMA-owned, for image/video uploads) ──────
IMA_IM_BASE = "https://imapi.liveme.com"

# ── 🔑 Hardcoded APP_KEY (Public, Shared Across All Users) ──────────────────
# This APP_KEY is a PUBLIC identifier used by IMA Studio's image/video upload 
# service. It is NOT a secret—it's intentionally shared across all users and 
# embedded in the IMA web frontend. This key is used to generate HMAC signatures 
# for upload token requests, but your IMA API key (ima_xxx...) is the ACTUAL 
# authentication credential. Think of APP_KEY as a "client ID" rather than a 
# "client secret."
#
# ⚠️ Security Note: Your ima_xxx... API key is the sensitive credential. It is 
# sent to imapi.liveme.com as query parameters (appUid, cmimToken). Always use 
# test keys for experiments and rotate your API key regularly.
#
# 📖 See SECURITY.md for complete disclosure and network verification guide.
APP_ID    = "webAgent"
APP_KEY   = "32jdskjdk320eew"   # Public shared key (used for HMAC sign generation)
APP_UID   = "<your_app_uid>"    # POST /api/v3/login/app → data.user_id
APP_TOKEN = "<your_app_token>"  # POST /api/v3/login/app → data.token


def _gen_sign() -> tuple[str, str, str]:
    """Generate per-request (sign, timestamp, nonce)."""
    nonce = uuid.uuid4().hex[:21]
    ts    = str(int(time.time()))
    raw   = f"{APP_ID}|{APP_KEY}|{ts}|{nonce}"
    sign  = hashlib.sha1(raw.encode()).hexdigest().upper()
    return sign, ts, nonce


def get_upload_token(app_uid: str, app_token: str,
                     suffix: str, content_type: str) -> dict:
    """Step 1: Get presigned upload URL from IMA's upload service.
    
    Calls GET imapi.liveme.com/api/rest/oss/getuploadtoken with exactly 11 params.
    Returns: { "ful": "<presigned PUT URL>", "fdl": "<CDN download URL>" }
    
    Args:
        app_uid: Your IMA API key (ima_xxx...), used as appUid parameter
        app_token: Your IMA API key (ima_xxx...), used as cmimToken parameter
        suffix: File extension (jpeg, png, mp4, mp3)
        content_type: MIME type (image/jpeg, video/mp4, etc.)
    
    Security Note:
        Your IMA API key (ima_xxx...) is sent to imapi.liveme.com as query 
        parameters (appUid, cmimToken). This is IMA Studio's image/video upload 
        service, separate from the main api.imastudio.com API. Both domains are 
        owned by IMA Studio—this is part of IMA's microservices architecture.
        
        Why two domains?
        - api.imastudio.com: Core AI generation API (product list, task creation)
        - imapi.liveme.com: Specialized upload service (presigned URL generation)
        
        Your API key grants access to both services. For security verification, 
        see SECURITY.md section "Network Traffic Verification."
    """
    sign, ts, nonce = _gen_sign()
    r = requests.get(
        f"{IMA_IM_BASE}/api/rest/oss/getuploadtoken",
        params={
            # App Key params
            "appUid":       app_uid,       # APP_UID
            "appId":        APP_ID,
            "appKey":       APP_KEY,
            "cmimToken":    app_token,     # APP_TOKEN
            "sign":         sign,
            "timestamp":    ts,
            "nonce":        nonce,
            # File params
            "fService":     "privite",     # fixed
            "fType":        "picture",     # picture / video / audio
            "fSuffix":      suffix,        # jpeg / png / mp4 / mp3
            "fContentType": content_type,
        },
    )
    r.raise_for_status()
    return r.json()["data"]


def upload_image_to_oss(image_bytes: bytes, content_type: str, ful: str) -> None:
    """Step 2: PUT image bytes to the presigned OSS URL. No auth needed."""
    resp = requests.put(ful, data=image_bytes, headers={"Content-Type": content_type})
    resp.raise_for_status()


def prepare_image_url(source, api_key: str) -> str:
    """
    Full workflow: upload any image and return the CDN URL (fdl).
    
    Args:
        source: file path (str), raw bytes, or already-public HTTPS URL
        api_key: IMA API key for upload authentication
    
    Returns: public HTTPS CDN URL ready to use as input_images value
    """
    # Already a public URL → use directly, no upload needed
    if isinstance(source, str) and source.startswith("https://"):
        return source
    
    # Read file bytes
    if isinstance(source, str):
        ext = source.rsplit(".", 1)[-1].lower() if "." in source else "jpeg"
        with open(source, "rb") as f:
            image_bytes = f.read()
        content_type = mimetypes.guess_type(source)[0] or "image/jpeg"
    else:
        image_bytes = source
        ext = "jpeg"
        content_type = "image/jpeg"

    # Step 1: Get presigned URL using API key directly
    token_data = get_upload_token(api_key, ext, content_type)
    ful = token_data["ful"]
    fdl = token_data["fdl"]

    # Step 2: Upload to OSS
    upload_image_to_oss(image_bytes, content_type, ful)

    # Step 3: Return CDN URL
    return fdl   # use this as input_images / src_img_url value

OSS path format:

webAgent/privite/{YYYY}/{MM}/{DD}/{timestamp}_{uid}_{uuid}.{ext}
CDN base:
https://ima-ga.esxscloud.com/
OSS bucket:
zhubite-imagent-bot.oss-us-east-1.aliyuncs.com


Quick Reference

Task Types (category values)

categoryCapabilityInput
text_to_image
Text → Imageprompt
image_to_image
Image → Imageprompt + input_images
text_to_video
Text → Videoprompt
image_to_video
Image → Videoprompt + input_images
first_last_frame_to_video
First+Last Frame → Videoprompt + src_img_url[2]
reference_image_to_video
Reference Image → Videoprompt + src_img_url[1+]
text_to_music
Text → Musicprompt
text_to_speech
Text → Speechprompt (text to speak)

Detail API status values

Each media in

medias[]
has two fields:

FieldTypeValuesDescription
resource_status
int (or
null
)
0
,
1
,
2
,
3
0=处理中, 1=可用, 2=失败, 3=已删除。API 可能返回
null
,需当作 0。
status
string
"pending"
,
"processing"
,
"success"
,
"failed"
任务状态文案。轮询时以
resource_status
为准;
status == "failed"
表示失败。

Poll on

resource_status
first, then ensure
status
is not
"failed"
:

resource_status
status
MeaningAction
0
or
null
pending
/
processing
处理中Keep polling; do not stop (null = 0)
1
success
(or
completed
)
完成Read
url
; stop only when all medias are 1
1
failed
失败 (status 优先)Stop, handle error
2
any失败Stop, handle error
3
any已删除Stop

Important: (1) Treat

resource_status: null
as 0. (2) Stop only when all medias have
resource_status == 1
. (3) When
resource_status=1
, still check
status != "failed"
.


API 1: Product List

GET /open/v1/product/list?app=ima&platform=web&category=text_to_image

Internally calls downstream

/v1/products/listv2
. Returns a V2 tree structure:
type=2
nodes are model groups,
type=3
nodes are versions (leaves). Only
type=3
nodes contain
credit_rules
and
form_config
.

webAgent
is auto-converted to
ima
by the gateway — you can use either value for
app
.

[
  {
    "id": "SeeDream",
    "type": "2",
    "name": "SeeDream",
    "model_id": "",
    "children": [
      {
        "id": "doubao-seedream-4-0-250828",
        "type": "3",
        "name": "SeeDream 4.0",
        "model_id": "doubao-seedream-4.0",
        "credit_rules": [
          { "attribute_id": 332, "points": 5, "attributes": { "default": "enabled" } }
        ],
        "form_config": [
          { "field": "size", "type": "tags", "value": "1K",
            "options": [{"label":"1K","value":"1K"}, {"label":"2K","value":"2K"}] }
        ]
      }
    ]
  }
]

How to pick a version for task creation:

  1. Traverse nodes to find
    type=3
    leaves (versions)
  2. Use
    model_id
    and
    id
    (=
    model_version
    ) from the leaf
  3. Pick
    credit_rules[].attribute_id
    matching your desired quality/size (
    attributes
    field shows the config)
  4. Use
    form_config[].value
    as default
    parameters
    values

credit_rules[].attribute_id
→ required for task creation as
attribute_id
.
credit_rules[].points
→ required for task creation as
credit
and
cast.points
.


API 2: Create Task

POST /open/v1/tasks/create

Request Structure

{
  "task_type": "text_to_image",
  "enable_multi_model": false,
  "src_img_url": [],
  "upload_img_src": "",
  "parameters": [
    {
      "attribute_id": 8538,
      "model_id":      "doubao-seedream-4.5",
      "model_name":    "SeeDream 4.5",
      "model_version": "doubao-seedream-4-5-251128",
      "app":           "ima",
      "platform":      "web",
      "category":      "text_to_image",
      "credit":        5,
      "parameters": {
        "prompt":       "a beautiful mountain sunset, photorealistic",
        "size":         "4k",
        "n":            1,
        "input_images": [],
        "cast":         {"points": 5, "attribute_id": 8538}
      }
    }
  ]
}

Field Reference

FieldRequiredDescription
task_type
Must match
parameters[].category
parameters[].attribute_id
From
credit_rules[].attribute_id
in product list
parameters[].model_id
From
type=3
leaf node
model_id
parameters[].model_version
From
type=3
leaf node
id
parameters[].app
Use
ima
(or
webAgent
, auto-converted)
parameters[].platform
Use
web
parameters[].category
Must match top-level
task_type
parameters[].credit
Must equal
credit_rules[].points
. Error 6006 if wrong.
parameters[].parameters.prompt
The actual prompt text used by downstream service
parameters[].parameters.cast
{"points": N, "attribute_id": N}
— mirrors credit
parameters[].parameters.n
Number of outputs (usually
1
). Gateway flattens N>1 into separate resources.
parameters[].parameters.input_images
image tasksArray of input image URLs
top-level
src_img_url
multi-imageArray for first_last_frame / reference tasks

N-Field Flattening (Gateway Internal Logic)

When

n > 1
, the gateway automatically:

  1. Generates
    n
    independent
    resourceBizId
    values
  2. Deducts credits
    n
    times (one per resource)
  3. Creates
    n
    separate tasks in the downstream service

Response

medias[]
will contain
n
items. Poll until all have
resource_status == 1
.

Response

{
  "code": 0,
  "data": {
    "id": "task_abc123",
    "biz_id": "biz_xxx",
    "task_type": "text_to_image",
    "medias": [],
    "generate_count": 1,
    "created_at": 1700000000000,
    "timeout_at": 1700000300000
  }
}

data.id
= task ID for polling.
timeout_at
= Unix ms deadline.


API 3: Task Detail (Poll)

POST /open/v1/tasks/detail
{"task_id": "<id from create response>"}

Poll every 2–5s (8s+ for video). Completed response:

{
  "id": "task_abc",
  "medias": [{
    "resource_status": 1,
    "status": "success",
    "url": "https://cdn.../output.jpg",
    "cover": "https://cdn.../cover.jpg",
    "format": "jpg",
    "width": 1024,
    "height": 1024
  }]
}

Polling stop condition (must implement exactly):

  • Treat
    resource_status: null
    (or missing) as 0 (processing). Do not stop when you see null; backend may serialize Go
    *int
    as null.
  • Stop only when ALL
    medias[].resource_status == 1
    and no
    status == "failed"
    . If you return on the first media with
    resource_status == 1
    while others are still 0, the task is not fully done and you will keep polling or get inconsistent state.
  • Stop immediately if any
    status == "failed"
    or
    resource_status == 2
    or
    resource_status == 3
    .

Task Type Examples

text_to_image ✅ Verified

No image input.

src_img_url: []
,
input_images: []
. See API 2 for full example.

text_to_video ✅ Verified

Extra fields vs text_to_image — all from

form_config
defaults:

{
  "task_type": "text_to_video",
  "src_img_url": [],
  "parameters": [{
    "attribute_id":  4838,
    "model_id":      "wan2.6-t2v",
    "model_name":    "Wan 2.6",
    "model_version": "wan2.6-t2v",
    "category":      "text_to_video",
    "credit":        3,
    "app": "ima", "platform": "web",
    "parameters": {
      "prompt":          "a puppy dancing happily, sunny meadow",
      "negative_prompt": "",
      "prompt_extend":   false,
      "duration":        5,
      "resolution":      "1080P",
      "aspect_ratio":    "16:9",
      "shot_type":       "single",
      "seed":            -1,
      "n":               1,
      "input_images":    [],
      "cast":            {"points": 3, "attribute_id": 4838}
    }
  }]
}

Video-specific fields from

form_config
:
duration
(seconds),
resolution
,
aspect_ratio
,
shot_type
,
negative_prompt
,
prompt_extend
. Poll every 8s (video generation is slower). Response
medias[].cover
= first-frame thumbnail.

text_to_music

No image input.

src_img_url: []
,
input_images: []
.

image_to_image ✅ Verified

{
  "task_type": "image_to_image",
  "src_img_url": ["https://...input.jpg"],
  "parameters": [{
    "attribute_id":  8560,
    "model_id":      "doubao-seedream-4.5",
    "model_version": "doubao-seedream-4-5-251128",
    "category":      "image_to_image",
    "credit":        5,
    "app": "ima", "platform": "web",
    "parameters": {
      "prompt":       "turn into oil painting style",
      "size":         "4k",
      "n":            1,
      "input_images": ["https://...input.jpg"],
      "cast":         {"points": 5, "attribute_id": 8560}
    }
  }]
}

⚠️

size
must be from
form_config
options (e.g.
"2k"
,
"4k"
,
"2048x2048"
).
"adaptive"
is NOT valid for SeeDream 4.5 i2i — causes error 400. Top-level
src_img_url
and
parameters.input_images
must both contain the input image URL. Some i2i models (e.g.
doubao-seededit-3.0-i2i
) may not be available in test environments — fall back to SeeDream 4.5.

image_to_video / first_last_frame_to_video / reference_image_to_video

{
  "src_img_url": ["https://first-frame.jpg", "https://last-frame.jpg"]
}

Index 0 = first frame (or reference), index 1 = last frame (first_last_frame only).


Common Mistakes

MistakeFix
attribute_id
not from
credit_rules
Always fetch product list first
credit
value wrong
Must exactly match
credit_rules[].points
— error 6006
prompt
at wrong location
Put prompt in
parameters[].parameters.prompt
(nested), not only at top level
Polling
biz_id
instead of
id
Use
id
(task ID) for
/tasks/detail
Single-poll instead of loopPoll until
resource_status == 1
for ALL medias
Missing
app
/
platform
in parameters
Required fields — use
ima
/
web
category
mismatch
parameters[].category
must match top-level
task_type
resource_status == 2
not handled
Check for failure, don't loop forever
status == "failed"
ignored
resource_status=1
+
status="failed"
means actual failure
n > 1
and only checking first media
All
n
media items must reach
resource_status == 1

Complete Python Example

See the Python example sections throughout this documentation for implementation guidance covering all 7 task types.