Auto-claude-code-research-in-sleep paper-illustration
Generate publication-quality AI illustrations for academic papers using Gemini image generation. Creates architecture diagrams, method illustrations with Codex-supervised iterative refinement loop. Use when user says \"生成图表\", \"画架构图\", \"AI绘图\", \"paper illustration\", \"generate diagram\", or needs visual figures for papers.
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep
T=$(mktemp -d) && git clone --depth=1 https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/skills-codex/paper-illustration" ~/.claude/skills/wanshuiyin-auto-claude-code-research-in-sleep-paper-illustration-1c876e && rm -rf "$T"
skills/skills-codex/paper-illustration/SKILL.mdPaper Illustration: Multi-Stage Codex-Supervised Figure Generation
Generate publication-quality illustrations using a multi-stage workflow with Codex as the STRICT supervisor/reviewer.
Core Design Philosophy
┌──────────────────────────────────────────────────────────────────────────┐ │ MULTI-STAGE ITERATIVE WORKFLOW │ ├──────────────────────────────────────────────────────────────────────────┤ │ │ │ User Request │ │ │ │ │ ▼ │ │ ┌─────────────┐ │ │ │ Codex │ ◄─── Step 1: Parse request, create initial prompt │ │ │ (Planner) │ │ │ └──────┬──────┘ │ │ │ │ │ ▼ │ │ ┌─────────────┐ │ │ │ Gemini │ ◄─── Step 2: Optimize layout description │ │ │ (gemini-3-pro)│ - Refine component positioning │ │ │ Layout │ - Optimize spacing and grouping │ │ └──────┬──────┘ │ │ │ │ │ ▼ │ │ ┌─────────────┐ │ │ │ Gemini │ ◄─── Step 3: CVPR/NeurIPS style verification │ │ │ (gemini-3-pro)│ - Check color palette compliance │ │ │ Style │ - Verify arrow and font standards │ │ └──────┬──────┘ │ │ │ │ │ ▼ │ │ ┌─────────────┐ │ │ │ Paperbanana │ ◄─── Step 4: Render final image │ │ │ (gemini-3- │ - High-quality image generation │ │ │ pro-image) │ - Internal codename: Nano Banana Pro │ │ └──────┬──────┘ │ │ │ │ │ ▼ │ │ ┌─────────────┐ │ │ │ Codex │ ◄─── Step 5: STRICT visual review + SCORE (1-10) │ │ │ (Reviewer) │ - Verify EVERY arrow direction │ │ │ STRICT! │ - Verify EVERY block content │ │ └──────┬──────┘ - Verify aesthetics & visual appeal │ │ │ │ │ ▼ │ │ Score ≥ 9? ──YES──► Accept & Output │ │ │ │ │ NO │ │ │ │ │ ▼ │ │ Generate SPECIFIC improvement feedback ──► Loop back to Step 2 │ │ │ └──────────────────────────────────────────────────────────────────────────┘
Constants
- IMAGE_MODEL =
— Paperbanana (Nano Banana Pro) for image renderinggemini-3-pro-image-preview - REASONING_MODEL =
— Gemini for layout optimization and style checkinggemini-3-pro-preview - MAX_ITERATIONS = 5 — Maximum refinement rounds
- TARGET_SCORE = 9 — Minimum acceptable score (1-10) — RAISED FOR QUALITY
- OUTPUT_DIR =
— Output directoryfigures/ai_generated/ - API_KEY_ENV =
— Environment variableGEMINI_API_KEY
CVPR/ICLR/NeurIPS Top-Tier Conference Style Guide
What "CVPR Style" Actually Means:
Visual Standards
- Clean white background — No decorative patterns or gradients (unless subtle)
- Sans-serif fonts — Arial, Helvetica, or Computer Modern; minimum 14pt
- Subtle color palette — Not rainbow colors; use 3-5 coordinated colors
- Print-friendly — Must be readable in grayscale (many reviewers print papers)
- Professional borders — Thin (2-3px), solid colors, not flashy
Layout Standards
- Horizontal flow — Left-to-right is the standard for pipelines
- Clear grouping — Use subtle background boxes to group related modules
- Consistent sizing — Similar components should have similar sizes
- Balanced whitespace — Not cramped, not sparse
Arrow Standards (MOST CRITICAL)
- Thick strokes — 4-6px minimum (thin arrows disappear when printed)
- Clear arrowheads — Large, filled triangular heads
- Dark colors — Black or dark gray (#333333); avoid colored arrows
- Labeled — Every arrow should indicate what data flows through it
- No crossings — Reorganize layout to avoid arrow crossings
- CORRECT DIRECTION — Arrows must point to the RIGHT target!
Visual Appeal (科研风格 - Professional Academic Style)
目标:既不保守也不花哨,找到平衡点
✅ 应该有的视觉元素:
- Subtle gradient fills — 淡雅的渐变填充(同色系从浅到深),不是炫彩
- Rounded corners — 圆角矩形(6-10px radius),现代感但不夸张
- Clear visual hierarchy — 通过大小、颜色深浅区分层次
- Consistent color coding — 统一的配色方案(3-4种主色)
- Internal structure — 大模块内部显示子组件(如Encoder内部的layer结构)
- Professional typography — 清晰的标签,适当的字号层次
✅ 配色建议(学术专业):
- Inputs: 柔和的绿色系 (#10B981 / #34D399)
- Encoders: 专业的蓝色系 (#2563EB / #3B82F6)
- Fusion: 优雅的紫色系 (#7C3AED / #8B5CF6)
- Outputs: 温暖的橙色系 (#EA580C / #F97316)
- Arrows: 黑色或深灰 (#333333 / #1F2937)
- Background: 纯白 (#FFFFFF),不要花纹
❌ 要避免的过度装饰:
- ❌ Rainbow color schemes (彩虹配色)
- ❌ Heavy drop shadows (重阴影效果)
- ❌ 3D effects / perspective (3D透视)
- ❌ Excessive gradients (夸张的多色渐变)
- ❌ Clip art / cartoon icons (卡通图标)
- ❌ Decorative patterns in background (背景花纹)
- ❌ Glowing effects (发光效果)
- ❌ Too many small icons (过多小图标)
✓ 理想的视觉效果:
- 一眼看上去专业、清晰
- 有适度的视觉吸引力,但不抢眼
- 符合CVPR/NeurIPS论文的审美标准
- 打印友好(灰度模式下也能清晰辨认)
- 像精心设计的学术图表,而不是PPT模板
What to AVOID (CRITICAL)
- ❌ Rainbow color schemes (too many colors)
- ❌ Thin, hairline arrows (arrows must be THICK)
- ❌ Unlabeled connections
- ❌ Plain boring rectangles (add some visual interest)
- ❌ Over-decorated with shadows/glows/icons (too flashy)
- ❌ Small text that's unreadable when printed
- ❌ WRONG arrow directions — This is UNACCEPTABLE!
Scope
| Figure Type | Quality | Examples |
|---|---|---|
| Architecture diagrams | Excellent | Model architecture, pipeline, encoder-decoder |
| Method illustrations | Excellent | Conceptual diagrams, algorithm flowcharts |
| Conceptual figures | Good | Comparison diagrams, taxonomy trees |
Not for: Statistical plots (use
/paper-figure), photo-realistic images
Workflow: MUST EXECUTE ALL STEPS
Step 0: Pre-flight Check
# Check API key if [ -z "$GEMINI_API_KEY" ]; then echo "ERROR: GEMINI_API_KEY not set" echo "Get your key from: https://aistudio.google.com/app/apikey" echo "Set it: export GEMINI_API_KEY='your-key'" exit 1 fi # Create output directory mkdir -p figures/ai_generated
Step 1: Codex Plans the Figure (YOU ARE HERE)
CRITICAL: Codex must first analyze the user's request and create a detailed prompt.
Parse the input: $ARGUMENTS
Codex's task:
- Understand what figure the user wants
- Identify all components, connections, data flow
- Create a detailed, structured prompt for Gemini
- Include style requirements AND visual appeal requirements
Prompt Template for Codex to generate:
Create a PROFESSIONAL, VISUALLY APPEALING publication-quality academic diagram following CVPR/ICLR/NeurIPS standards. ## Visual Style: 科研风格 (Academic Professional Style) ### 目标:平衡 — 既不保守也不花哨 #### DO (应该有): - **Subtle gradients** — 同色系淡雅渐变(如 #2563EB → #3B82F6),不是多色炫彩 - **Rounded corners** — 圆角矩形(6-10px),现代感 - **Clear visual hierarchy** — 通过大小、深浅区分层次 - **Internal structure** — 大模块内显示子组件结构 - **Consistent color coding** — 统一的3-4色方案 - **Professional polish** — 精致但不夸张 #### DON'T (不要有): - ❌ Rainbow/multi-color gradients (彩虹渐变) - ❌ Heavy drop shadows (重阴影) - ❌ 3D effects / perspective (3D效果) - ❌ Glowing effects (发光效果) - ❌ Excessive decorative icons (过多装饰图标) - ❌ Plain boring rectangles (完全平淡的方块) #### 理想效果: 像顶会论文中精心设计的架构图 — 专业、清晰、有适度的视觉吸引力 ## Figure Type [Architecture Diagram / Pipeline / Comparison / etc.] ## Components to Include (BE SPECIFIC ABOUT CONTENT) 1. [Component 1]: - Label: "[exact text]" - Sub-label: "[smaller text below]" - Position: [left/center/right, top/middle/bottom] - Style: [border color, fill, internal structure] 2. [Component 2]: ... ## Layout - Direction: [left-to-right / top-to-bottom] - Spacing: [tight / normal / loose] - Grouping: [how components should be grouped] ## Connections (BE EXPLICIT ABOUT DIRECTION) EXACT arrow specifications: 1. [Component A] → [Component B]: Arrow goes FROM A TO B, label it "[data type]" 2. [Component C] → [Component D]: Arrow goes FROM C TO D, label it "[data type]" ... VERIFY: Each arrow must point to the CORRECT target! ## Style Requirements (CVPR/ICLR/NeurIPS Standard) ### Visual Style - Color palette: Professional academic colors - Inputs: Green (#10B981) - Encoders: Blue (#2563EB) - Fusion modules: Purple (#7C3AED) - Outputs: Orange (#EA580C) - Font: Sans-serif (Arial/Helvetica), minimum 14pt, bold for labels - Background: Clean white, no patterns - Blocks: Rounded rectangles (8-12px radius), subtle gradient fill, colored border (2-3px) - Subtle shadows for depth effect - Print-friendly (must work in grayscale) ### CRITICAL: Arrow & Data Flow Requirements 1. **ALL arrows must be VERY THICK** - minimum 5-6px stroke width 2. **ALL arrows must have CLEAR arrowheads** - large, visible triangular heads 3. **ALL arrows must be BLACK or DARK GRAY** - not colored 4. **Label EVERY arrow** with what data flows through it 5. **VERIFY arrow direction** - each arrow MUST point to the correct target 6. **No ambiguous connections** - every arrow should have a clear source and destination ### Logic Clarity Requirements 1. **Data flow must be immediately obvious** - viewer should understand the pipeline in 5 seconds 2. **No crossing arrows** - reorganize layout to avoid arrow crossings 3. **Consistent direction** - maintain left-to-right or top-to-bottom flow throughout 4. **Group related components** - use subtle background boxes or spacing to group modules 5. **Clear hierarchy** - main components larger, sub-components smaller ## Additional Requirements [Any specific requirements from user]
Step 2: Gemini Layout Optimization (gemini-3-pro)
Codex sends the initial prompt to Gemini (gemini-3-pro) for layout optimization.
#!/bin/bash # Step 2: Optimize layout using Gemini gemini-3-pro # This step refines component positioning and spacing set -e OUTPUT_DIR="figures/ai_generated" mkdir -p "$OUTPUT_DIR" API_KEY="${GEMINI_API_KEY}" URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent?key=$API_KEY" # The initial prompt from Codex INITIAL_PROMPT='[Codex fills in the detailed prompt here]' # Layout optimization request LAYOUT_REQUEST="You are an expert in academic figure layout design for CVPR/NeurIPS papers. Analyze this figure request and provide an OPTIMIZED LAYOUT DESCRIPTION: $INITIAL_PROMPT Provide: 1. **Optimized Component Positions**: Exact positions (left/center/right, top/middle/bottom) for each component 2. **Spacing Recommendations**: Specific spacing between components 3. **Grouping Strategy**: Which components should be visually grouped together 4. **Arrow Routing**: Optimal paths for arrows to avoid crossings 5. **Visual Hierarchy**: Size recommendations for main vs sub-components Output a DETAILED layout specification that will be used for rendering." # Build JSON payload python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$LAYOUT_REQUEST'''}]}] } with open("/tmp/gemini_layout_request.json", "w") as f: json.dump(payload, f, indent=2) print("Layout request created") PYTHON # Call Gemini gemini-3-pro-preview for layout optimization (DIRECT connection, no proxy) RESPONSE=$(curl -s --max-time 90 \ -X POST "$URL" \ -H 'Content-Type: application/json' \ -d @/tmp/gemini_layout_request.json) # Extract layout description LAYOUT_DESCRIPTION=$(echo "$RESPONSE" | python3 -c " import sys, json data = json.load(sys.stdin) try: print(data['candidates'][0]['content']['parts'][0]['text']) except: print('Error extracting layout') ") echo "=== Layout Optimization Complete ===" echo "$LAYOUT_DESCRIPTION" echo "$LAYOUT_DESCRIPTION" > "$OUTPUT_DIR/layout_description.txt"
Step 3: Gemini Style Verification (gemini-3-pro)
Codex sends the optimized layout to Gemini for CVPR/NeurIPS style verification.
#!/bin/bash # Step 3: Verify and enhance style compliance using Gemini gemini-3-pro API_KEY="${GEMINI_API_KEY}" URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent?key=$API_KEY" # Read layout from previous step LAYOUT=$(cat figures/ai_generated/layout_description.txt) # Style verification request STYLE_REQUEST="You are a CVPR/NeurIPS paper figure reviewer specializing in visual standards. Review and ENHANCE this figure specification for top-tier conference compliance: $LAYOUT Ensure compliance with: 1. **Color Palette**: Use professional academic colors (green for inputs, blue for encoders, purple for fusion, orange for outputs) 2. **Arrow Standards**: Thick (5-6px), black/dark gray, clear arrowheads, all labeled 3. **Font Standards**: Sans-serif, minimum 14pt, readable in print 4. **Visual Appeal (科研风格)**: - ✅ Subtle same-color gradients, rounded corners (6-10px), internal structure visible - ❌ NO heavy shadows, NO glowing effects, NO rainbow gradients Output an ENHANCED figure specification with explicit style instructions for rendering." # Build JSON payload python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$STYLE_REQUEST'''}]}] } with open("/tmp/gemini_style_request.json", "w") as f: json.dump(payload, f, indent=2) print("Style request created") PYTHON # Call Gemini gemini-3-pro-preview for style verification (DIRECT connection, no proxy) RESPONSE=$(curl -s --max-time 90 \ -X POST "$URL" \ -H 'Content-Type: application/json' \ -d @/tmp/gemini_style_request.json) # Extract style-enhanced specification STYLE_SPEC=$(echo "$RESPONSE" | python3 -c " import sys, json data = json.load(sys.stdin) try: print(data['candidates'][0]['content']['parts'][0]['text']) except: print('Error extracting style spec') ") echo "=== Style Verification Complete ===" echo "$STYLE_SPEC" echo "$STYLE_SPEC" > "figures/ai_generated/style_spec.txt"
Step 4: Paperbanana Image Rendering (gemini-3-pro-image-preview)
Codex sends the optimized, style-verified specification to Paperbanana for rendering.
#!/bin/bash # Step 4: Render image using Paperbanana (gemini-3-pro-image-preview) # Internal codename: Nano Banana Pro # Use DIRECT connection (no proxy) - proxy causes SSL errors set -e OUTPUT_DIR="figures/ai_generated" mkdir -p "$OUTPUT_DIR" API_KEY="${GEMINI_API_KEY}" URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$API_KEY" # Read the style-enhanced specification from previous step STYLE_SPEC=$(cat figures/ai_generated/style_spec.txt) # Add rendering instructions RENDER_PROMPT="Render a publication-quality academic diagram based on this specification: $STYLE_SPEC RENDERING REQUIREMENTS: - Output a clean, professional diagram suitable for CVPR/NeurIPS submission - Use vector-quality rendering with sharp edges and clear text - Ensure all elements are properly aligned and spaced - The diagram should be immediately understandable at a glance" # Build JSON payload using Python for proper escaping python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$RENDER_PROMPT'''}]}], "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]} } with open("/tmp/gemini_request.json", "w") as f: json.dump(payload, f, indent=2) print("JSON payload created") PYTHON # Call Paperbanana API WITHOUT proxy (direct connection works better) RESPONSE=$(curl -s --max-time 180 \ -X POST "$URL" \ -H 'Content-Type: application/json' \ -d @/tmp/gemini_request.json) # Check for error if echo "$RESPONSE" | grep -q '"error"'; then echo "API Error:" echo "$RESPONSE" | python3 -m json.tool 2>/dev/null || echo "$RESPONSE" exit 1 fi # Extract and save image echo "$RESPONSE" | python3 << 'PYTHON' import sys, json, base64 from pathlib import Path output_dir = Path("figures/ai_generated") data = json.load(sys.stdin) try: parts = data['candidates'][0]['content']['parts'] iteration = 1 # Codex increments this each iteration for part in parts: if 'text' in part: print(f"\n[Paperbanana]: {part['text'][:200]}...") elif 'inlineData' in part: img_data = base64.b64decode(part['inlineData']['data']) img_path = output_dir / f"figure_v{iteration}.png" with open(img_path, "wb") as f: f.write(img_data) print(f"\n✅ Image saved: {img_path}") print(f" Size: {len(img_data)/1024:.1f} KB") except Exception as e: print(f"Parse error: {e}") print(f"Raw response: {str(data)[:500]}") PYTHON
Step 5: Codex STRICT Visual Review & Scoring (MANDATORY)
Codex MUST read the generated image and perform a STRICT review:
- Visual Analysis: What does the image show in detail?
- Strengths: What's good about it?
- STRICT Verification: Check EVERY item below
- Score: Rate 1-10 (10 = perfect) — BE STRICT!
STRICT Review Template:
## Codex's STRICT Review of Figure v{N} ### What I See [Describe the generated image in DETAIL - every block, every arrow] ### Strengths - [Strength 1] - [Strength 2] ### ═══════════════════════════════════════════════════════════════ ### STRICT VERIFICATION CHECKLIST (ALL must pass for score ≥ 9) ### ═══════════════════════════════════════════════════════════════ #### A. Arrow Correctness Verification (CRITICAL - any failure = score ≤ 6) Check EACH arrow: - [ ] Arrow 1: [Source] → [Target] — Does it point to the CORRECT target? - [ ] Arrow 2: [Source] → [Target] — Does it point to the CORRECT target? - [ ] Arrow 3: [Source] → [Target] — Does it point to the CORRECT target? - [ ] Arrow 4: [Source] → [Target] — Does it point to the CORRECT target? - [ ] Arrow 5: [Source] → [Target] — Does it point to the CORRECT target? - [ ] Arrow 6: [Source] → [Target] — Does it point to the CORRECT target? #### B. Block Content Verification (any failure = score ≤ 7) Check EACH block: - [ ] Block 1 "[Name]": Has correct label? Has sub-label? Content correct? - [ ] Block 2 "[Name]": Has correct label? Has sub-label? Content correct? - [ ] Block 3 "[Name]": Has correct label? Has sub-label? Content correct? - [ ] Block 4 "[Name]": Has correct label? Has sub-label? Content correct? - [ ] Block 5 "[Name]": Has correct label? Has sub-label? Content correct? - [ ] Block 6 "[Name]": Has correct label? Has sub-label? Content correct? - [ ] Block 7 "[Name]": Has correct label? Has sub-label? Content correct? #### C. Arrow Visibility (any failure = score ≤ 7) - [ ] ALL arrows are THICK (≥5px visible stroke) - [ ] ALL arrows have CLEAR arrowheads (large triangular heads) - [ ] ALL arrows are BLACK or DARK GRAY (not light colors) - [ ] NO arrows are too thin or invisible #### D. Arrow Labels (any failure = score ≤ 7) - [ ] EVERY arrow has a text label - [ ] Labels are readable (not too small) - [ ] Labels correctly describe the data flowing #### E. Visual Appeal (科研风格 - Balanced Academic Style) (any failure = score ≤ 8) - [ ] **有适度视觉吸引力** — 有subtle渐变或圆角,但不夸张 - [ ] **不是平淡方块** — 有一定设计感 - [ ] **不过度装饰** — 没有重阴影、发光效果、彩虹配色 - [ ] **专业学术风格** — 像CVPR论文中的图表,不是PPT模板 - [ ] **Internal structure visible** — 大模块内部显示子组件结构 - [ ] **Color palette: 3-4种协调色** — 不是彩虹,也不是纯黑白 #### E2. Visual Appeal - RED FLAGS (immediate score ≤ 7 if found) - [ ] **NO heavy drop shadows** (重阴影 = too flashy) - [ ] **NO glowing effects** (发光效果 = too flashy) - [ ] **NO rainbow gradients** (彩虹渐变 = unprofessional) - [ ] **NO excessive decorative icons** (过多装饰图标 = distracting) #### F. Layout & Flow (any failure = score ≤ 7) - [ ] Clean horizontal left-to-right flow - [ ] No arrow crossings - [ ] Data flow traceable in 5 seconds - [ ] Balanced spacing (not cramped, not sparse) #### G. Style Compliance - [ ] CVPR/NeurIPS professional style - [ ] Color palette appropriate (not rainbow) - [ ] Font readable - [ ] Print-friendly (grayscale test) ### ═══════════════════════════════════════════════════════════════ ### Issues Found (BE SPECIFIC) 1. [Issue 1]: [EXACTLY what is wrong] → [How to fix] 2. [Issue 2]: [EXACTLY what is wrong] → [How to fix] 3. [Issue 3]: [EXACTLY what is wrong] → [How to fix] ### Score: X/10 ### STRICT Score Breakdown Guide: - **10**: Perfect. No issues. Publication-ready masterpiece. 视觉风格完美平衡。 - **9**: Excellent. Minor issues that don't affect understanding. 可以直接使用。 - **8**: Good but has noticeable issues. 视觉上太平淡或太花哨都需要改进。 - **7**: Usable but has clear problems. 箭头或内容有问题。 - **6**: Has arrow direction errors (箭头指向错误) OR missing major components. - **1-5**: Major issues. Unacceptable. ### Visual Style Scoring (视觉风格评分): - **太花哨 (Too flashy)**: 重阴影、发光效果、彩虹配色 → score ≤ 7 - **太平淡 (Too plain)**: 纯黑白方块、无任何视觉设计 → score ≤ 8 - **恰到好处 (Balanced)**: 适度渐变、圆角、清晰层次 → score 9-10 ### Verdict [ ] ACCEPT (score ≥ 9 AND all critical checks pass) [ ] REFINE (score < 9 OR any critical check fails) **If REFINE: List the EXACT issues that must be fixed**
Step 6: Decision Point
IF score >= 9 AND all critical checks pass: → Accept figure, generate LaTeX snippet, DONE ELSE IF iteration < MAX_ITERATIONS: → Generate SPECIFIC improvement prompt based on EXACT issues → Go to Step 2 (Gemini Layout) with refined prompt ELSE: → Max iterations reached, show best version → Ask user if they want to continue or accept
Step 7: Generate Improvement Prompt (for refinement)
Codex generates TARGETED improvement prompt with EXACT issues:
Refine this academic diagram. This is iteration {N}. ## ═══════════════════════════════════════════════════════════════ ## CRITICAL: Fix These EXACT Issues (from previous review) ## ═══════════════════════════════════════════════════════════════ ### Arrow Direction Errors (MUST FIX): 1. [EXACT issue]: Arrow from [A] to [B] is pointing to wrong target. It should point to [C] instead. 2. [EXACT issue]: ... ### Missing Arrow Labels (MUST FIX): 1. Arrow from [A] to [B] is missing label "[data type]" 2. ... ### Block Content Issues (MUST FIX): 1. Block "[Name]" has wrong label. Should be "[correct label]" 2. ... ### Visual Appeal Issues (SHOULD FIX): 1. Blocks are too plain. Add [gradients/shadows/internal structure] 2. ... ## Keep These Good Elements: - [What to preserve from previous version] ## Generate the improved figure with ALL issues fixed.
Step 8: Final Output
When figure is accepted (score ≥ 9):
% === AI-Generated Figure === \begin{figure*}[t] \centering \includegraphics[width=0.95\textwidth]{figures/ai_generated/figure_final.png} \caption{[Caption based on user's original request].} \label{fig:[label]} \end{figure*}
Key Rules (MUST FOLLOW - STRICT)
- NEVER skip the review step — Always read and STRICTLY score the image
- NEVER accept score < 9 — Keep refining until excellence
- VERIFY EVERY ARROW DIRECTION — Wrong direction = automatic fail (score ≤ 6)
- VERIFY EVERY BLOCK CONTENT — Wrong content = automatic fail (score ≤ 7)
- BE SPECIFIC in feedback — "Arrow from A to B points to wrong target C" not "arrow is wrong"
- SAVE all iterations — Keep version history for comparison
- Codex is the STRICT boss — Accept only excellence, not "good enough"
- ARROW CORRECTNESS IS NON-NEGOTIABLE — Any wrong arrow direction = reject
- VISUAL APPEAL MATTERS — Plain boring figures = score ≤ 8
- Target score is 9 — Not 8, not "good enough"
- USE MULTI-STAGE WORKFLOW — Codex → Gemini Layout → Gemini Style → Paperbanana → Codex Review
- USE CORRECT MODELS — gemini-3-pro for reasoning, gemini-3-pro-image-preview for rendering
Output Structure
figures/ai_generated/ ├── layout_description.txt # Step 2: Gemini layout optimization output ├── style_spec.txt # Step 3: Gemini style verification output ├── figure_v1.png # Iteration 1 (Paperbanana render) ├── figure_v2.png # Iteration 2 ├── figure_v3.png # Iteration 3 ├── figure_final.png # Accepted version (copy of best, score ≥ 9) ├── latex_include.tex # LaTeX snippet └── review_log.json # All review scores and STRICT feedback
Model Summary
| Stage | Model | Purpose |
|---|---|---|
| Step 1 | Codex | Parse request, create initial prompt |
| Step 2 | gemini-3-pro | Layout optimization (positioning, spacing, grouping) |
| Step 3 | gemini-3-pro | CVPR/NeurIPS style verification |
| Step 4 | gemini-3-pro-image-preview (Paperbanana) | High-quality image rendering |
| Step 5 | Codex | STRICT visual review and scoring |