Pexo-skills videoagent-image-studio
git clone https://github.com/pexoai/pexo-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/pexoai/pexo-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/videoagent-image-studio" ~/.claude/skills/pexoai-pexo-skills-videoagent-image-studio && rm -rf "$T"
skills/videoagent-image-studio/SKILL.md🎨 VideoAgent Image Studio
Use when: User asks to generate, draw, create, or make any kind of image, photo, illustration, icon, logo, or artwork.
Generate images with 8 state-of-the-art AI models. This skill automatically picks the best model for the job and handles all the complexity — including Midjourney's async polling — so you can focus on the conversation.
Quick Reference
| User Intent | Model | Speed |
|---|---|---|
| Artistic, cinematic, painterly | | ~15s |
| Photorealistic, portrait, product | | ~8s |
| General purpose, balanced | | ~10s |
| Quick draft, fast iteration | | ~2s |
| Image with text, logo, poster | | ~10s |
| Vector art, icon, flat design | | ~8s |
| Anime, stylized illustration | | ~5s |
| Gemini-powered, consistent style | | ~12s |
How to Generate an Image
Step 1 — Enhance the prompt
Before calling the script, expand the user's prompt with style, lighting, and quality descriptors appropriate for the chosen model.
- Midjourney: Add
,cinematic lighting
,ultra detailed
,--v 7--style raw - Flux: Add
,masterpiece
,highly detailed
,sharp focusprofessional photography - Ideogram: Be explicit about text content, font style, and layout
- Recraft: Specify
,vector illustration
,flat designicon style
Step 2 — Run the script
node {baseDir}/tools/generate.js \ --model <model_id> \ --prompt "<enhanced prompt>" \ --aspect-ratio <ratio>
All parameters:
| Parameter | Default | Description |
|---|---|---|
| | Model ID from the table above |
| (required) | The image generation prompt |
| | , , , , , , |
| | Number of images (1–4; Midjourney always returns 4) |
| — | Things to avoid (not supported by Midjourney) |
| — | Seed for reproducibility |
Step 3 — Return the result
The script always waits and returns the final image URL(s). No polling required.
{ "success": true, "model": "flux-pro", "imageUrl": "https://...", "images": ["https://..."] }
Send the
imageUrl to the user.
Midjourney Actions
After generating a 4-image grid with Midjourney, offer the user these options:
# Upscale image #2 (subtle, preserves details) node {baseDir}/tools/generate.js \ --model midjourney \ --action upscale \ --index 2 \ --job-id <job_id> # Create a strong variation of image #3 node {baseDir}/tools/generate.js \ --model midjourney \ --action variation \ --index 3 \ --job-id <job_id> \ --variation-type 1 # Regenerate with same prompt node {baseDir}/tools/generate.js \ --model midjourney \ --action reroll \ --job-id <job_id>
Upscale types:
0 = Subtle (default, best for photos), 1 = Creative (best for illustrations)
Variation types:
0 = Subtle (default), 1 = Strong (dramatic changes)
Example Conversations
User: "Draw a snow leopard on a snowy mountain with cinematic lighting"
# Choose midjourney for artistic quality node {baseDir}/tools/generate.js \ --model midjourney \ --prompt "a majestic snow leopard on a snowy mountain peak, cinematic lighting, dramatic atmosphere, ultra detailed --ar 16:9 --v 7" \ --aspect-ratio 16:9
🎨 Done! Which one to upscale? (U1-U4) Or create a variant? (V1-V4)
User: "Use Flux to generate a perfume product poster, white background"
# Choose flux-pro for photorealistic product shots node {baseDir}/tools/generate.js \ --model flux-pro \ --prompt "a luxury perfume bottle on a clean white background, professional product photography, soft shadows, 8k, highly detailed" \ --aspect-ratio 3:4
User: "Show me a quick draft"
# flux-schnell for instant previews node {baseDir}/tools/generate.js \ --model flux-schnell \ --prompt "..." \ --aspect-ratio 1:1
User: "Make me an App icon, flat style, blue theme"
# recraft for vector/icon style node {baseDir}/tools/generate.js \ --model recraft \ --prompt "a minimal flat design app icon, blue color scheme, simple geometric shapes, vector style, white background"
Setup
Zero API keys needed! All requests go through a hosted proxy that handles authentication server-side.
The skill works out of the box — just install and use.
Advanced: Custom proxy or token
If you want to use your own proxy or a persistent token, set these environment variables:
{ "skills": { "entries": { "videoagent-image-studio": { "enabled": true, "env": { "IMAGE_STUDIO_PROXY_URL": "https://your-proxy.vercel.app", "IMAGE_STUDIO_TOKEN": "your_token_here" } } } } }
| Variable | Required | Description |
|---|---|---|
| No | Custom proxy base URL (default: ) |
| No | Persistent token (auto-obtained if not set, 100 free uses per token) |
To deploy your own proxy, see the videoagent-audio-studio proxy as a reference implementation. You'll need
FAL_KEY and LEGNEXT_KEY as Vercel environment variables.
Changelog
v2.0.0
- Simplified async: The script now blocks until Midjourney completes. No more
/--async
flags needed in SKILL.md instructions.--poll - Unified output format: All models return the same
shape.{ success, imageUrl, images } - Reference images for Nano Banana: Pass
for character/style consistency across generations.--reference-images "url1,url2"
v1.3.0
- Added non-blocking async mode for Midjourney (
+--async
).--poll
v1.2.0
- Midjourney turbo mode enabled by default (~10-20s).
v1.1.0
- Switched Midjourney provider from TTAPI to Legnext.ai for better stability.
v1.0.0
- Initial release with Midjourney, Flux, SDXL, Nano Banana, Ideogram, Recraft.