Skills muapi-nano-banana
Reasoning-driven image generation using structured creative briefs (Gemini 3 style) — generates high-fidelity images via muapi.ai with logic-based prompting
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/anil-matcha/muapi-nano-banana-skill" ~/.claude/skills/openclaw-skills-muapi-nano-banana && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/anil-matcha/muapi-nano-banana-skill" ~/.openclaw/skills/openclaw-skills-muapi-nano-banana && rm -rf "$T"
manifest:
skills/anil-matcha/muapi-nano-banana-skill/SKILL.mdsource content
🍌 Nano-Banana Expert Skill (Gemini 3 Style)
A specialized skill for AI Agents to leverage "Reasoning-Driven" image generation. Based on the advanced prompting architecture of Google's Gemini 3 (Nano Banana Pro), this skill moves beyond keyword stuffing to structured, logic-based creative briefs.
Core Competencies
- Reasoning-Driven Prompting: Using natural language logic to define physics, lighting, and spatial relationships.
- Structured Creative Briefs: Implementing the "Perfect Prompt" formula:
.Subject + Action + Context + Composition + Lighting - Text Rendering Precision: Explicitly defining typography and signifiers for legible text integration.
- Contextual Grounding: Using "Search Grounding" logic (simulated) to anchor generations in real-world accuracy.
🏗️ Technical Specification
1. The "Perfect Prompt" Formula
| Component | Description | Example |
|---|---|---|
| Subject | Detailed entity description | "A stoic robot barista with exposed copper wiring" |
| Action | Dynamic interaction | "Pouring a latte art leaf with mechanical precision" |
| Context | Environment & Atmosphere | "Inside a neon-lit cyberpunk cafe at midnight" |
| Composition | Camera & Lens choice | "Close-up, 85mm lens, f/1.8 aperture" |
| Lighting | Mood & Direction | "Volumetric blue rim light, warm cafe glow" |
| Style | Aesthetic anchor | "Cinematic, photorealistic, 4K production value" |
2. Advanced Features
- Negative Constraint Logic: Instead of "no blurry," use "Ensure sharp focus on the subject's eyes."
- Identity Consistency: (Simulated) "Maintain consistent facial structure across variations."
- Text Integration: Use double quotes for specific text:
.The sign reads "OPEN 24/7"
🧠 Prompt Optimization Protocol (Agent Instruction)
Before calling the script, the Agent MUST rewrite the user's prompt into a logic-driven Reasoning Brief:
- NO KEYWORD SOUP: Remove "8k, masterpiece, ultra-detailed." Use full, descriptive sentences.
- PHYSICAL CONSISTENCY: Describe how elements interact (e.g., "The light from the crystal shards casts caustic patterns across the obsidian floor").
- TEXT PRECISION: If the user wants text, define it precisely:
.featuring a sign that says "STORE NAME" in a weathered serif font - OPTICAL DIRECTIVES: Specify lens behavior: Shallow Depth of Field (f/1.8), Macro Lens, Anamorphic Flare.
🚀 Protocol: Using Nano-Banana
Step 1: Define the Creative Logic
Provide the agent with a subject and a specific scenario.
Step 2: Invoke the Script
The
generate-nano-art.sh script translates the logic into a structured Gemini 3-style prompt.
# Generating a reasoning-driven image bash scripts/generate-nano-art.sh \ --subject "a glass chess piece" \ --action "shattering into liquid shards" \ --context "on a obsidian table" \ --style "macro photography"
⚠️ Constraints & Guardrails
- No Keyword Soup: MANDATORY - Do not use "trending on artstation, masterpiece, 8k". Use natural language descriptions.
- Physics Logic: Ensure the prompt describes physically possible lighting and reflection interactions.
- Full Sentences: The model parses relationships; use "light reflecting off the water" instead of "water, reflection".
⚙️ Implementation Details
This skill applies a "Logic Wrapper" around the
core/media/generate-image.sh primitive, converting fragmented inputs into a coherent, reasoning-ready narrative prompt.