Godogen visual-qa
install
source · Clone the upstream repo
git clone https://github.com/htdt/godogen
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/htdt/godogen "$T" && mkdir -p ~/.claude/skills && cp -r "$T/codex/skills/visual-qa" ~/.claude/skills/htdt-godogen-visual-qa-b0538b && rm -rf "$T"
manifest:
codex/skills/visual-qa/SKILL.mdsource content
Visual QA
CRITICAL: Your job is to find problems, not confirm that things look fine. Do not rationalize, justify, or explain away what you see. If it looks wrong, report it.
Backend
- Default (Gemini): Run the script below. All queries go to
.gemini-3-flash-preview - If the request includes
: inspect every referenced image directly with Codex's native image analysis. Do not run the Gemini script.--native - If the request includes
: run Gemini first, then do native image analysis. Aggregate verdicts as described below.--both
Mode Detection
From the request text, including any explicit flags and file paths:
- Reference image mentioned + 1 screenshot -> Static mode
- Reference image + multiple frames -> Dynamic mode — frames are 0.5s apart (2 FPS cadence)
- No reference, just a question about screenshots -> Question mode
Gemini Execution
Parse the request to construct the command. The script is at
.agents/skills/visual-qa/scripts/visual_qa.py.
# Static python3 .agents/skills/visual-qa/scripts/visual_qa.py --log .vqa.log [--context "Goal: ... Requirements: ... Verify: ..."] reference.png screenshot.png # Dynamic python3 .agents/skills/visual-qa/scripts/visual_qa.py --log .vqa.log [--context "..."] reference.png frame1.png frame2.png ... # Question python3 .agents/skills/visual-qa/scripts/visual_qa.py --log .vqa.log --question "the question" screenshot.png [frame2.png ...]
Always pass
--log .vqa.log. Print the script output as your response.
Native Execution
Inspect every image file referenced in the request directly. Analyze using the criteria and output format below. Never look at code — only images.
After producing output, append a debug log entry:
printf '%s\n' "$(cat <<'LOGEOF' {"ts":"$(date -u +%Y-%m-%dT%H:%M:%SZ)","mode":"MODE","model":"native","query":"QUERY","files":["FILE1","FILE2"],"output":"FIRST_LINE..."} LOGEOF )" >> .vqa.log
Aggregated Mode (--both
)
--both- Run the Gemini script and capture its output
- Inspect all images directly and do native analysis using the criteria below
- Produce the combined verdict:
- Either says
->failfail - Either says
and neither sayswarning
->failwarning - Both say
->passpass
- Either says
- Merge issue lists from both and deduplicate by location + description
- Label each issue source:
,[gemini]
, or[native][both] - Log both outputs to
.vqa.log
Analysis Criteria
Implementation Quality (static + dynamic)
Assets are usually fine — what breaks is how they are placed, scaled, and composed:
- Grid or uniform placement when the reference shows organic arrangement
- Uniform default scale when the reference shows varied, purposeful sizing
- Flat composition when the reference has depth and layering
- Stretched, tiled, or carelessly applied materials
- Objects unrelated to the environment, just placed on a flat plane
- Camera framing that does not match the reference perspective
Visual Bugs
- Z-fighting (flickering overlapping surfaces)
- Texture stretching, tiling seams, missing textures (magenta/checkerboard)
- Geometry clipping (objects visibly intersecting)
- Floating objects that should be grounded
- Shadow artifacts (detached, through walls, missing)
- Lighting leaks through opaque geometry
- Culling errors (missing faces, disappearing objects)
- UI overlap, truncated text, offscreen elements
Logical Inconsistencies
- Impossible orientations (sideways, upside-down, embedded in terrain)
- Scale mismatches (tree smaller than character, door too small)
- Misplaced objects (furniture on ceiling, rocks in sky)
- Broken spatial relationships (bridge not connecting, stairs into wall)
Placeholder Remnants
- Untextured primitives contrasting with surrounding detail
- Default Godot materials (grey StandardMaterial3D, magenta missing shader)
- Debug artifacts (collision shapes, nav mesh, axis gizmos)
Motion & Animation (dynamic mode only)
Compare consecutive frames (0.5s apart):
- Stuck entities (same position/pose across frames when movement is expected)
- Jitter or teleportation (large position jumps between frames)
- Sliding (position changes but pose is frozen)
- Physics breaks (objects through walls, endless bouncing, unnatural acceleration)
- Animation mismatches (walk anim at running speed, idle while moving)
- Camera issues (sudden jumps, clipping through geometry)
- Collision failures (overlapping objects that should collide)
Output Format
Static / Dynamic
### Verdict: {pass | fail | warning} ### Reference Match {1-3 sentences: does the game capture the reference's intent — placement logic, scaling, composition, camera? Distinguish lazy implementation from acceptable asset or engine limitations.} ### Goal Assessment {1-3 sentences from Task Context. "No task context provided." if none.} ### Issues {If none: "No issues detected." Otherwise:} #### Issue {N}: {short title} - **Type:** style mismatch | visual bug | logical inconsistency | motion anomaly | placeholder - **Severity:** major | minor | note - **Frames:** {dynamic only: which frames} - **Location:** {where in frame} - **Description:** {1-2 sentences} ### Summary {One sentence.}
Severity:
major and minor must be fixed. note is cosmetic.
Question Mode
### Answer {Direct, specific, actionable answer. Reference locations, frames, colors, objects.} ### Visual Evidence {What in the screenshots supports the answer. Reference specific frames and locations.}