Godogen visual-qa
install
source · Clone the upstream repo
git clone https://github.com/htdt/godogen
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/htdt/godogen "$T" && mkdir -p ~/.claude/skills && cp -r "$T/claude/skills/visual-qa" ~/.claude/skills/htdt-godogen-visual-qa && rm -rf "$T"
manifest:
claude/skills/visual-qa/SKILL.mdsource content
Visual QA
$ARGUMENTS
CRITICAL: Your job is to find problems, not confirm things look fine. Do not rationalize, justify, or explain away what you see. If it looks wrong, report it.
Backend
- Default (Gemini): Run the script below. All queries go to
.gemini-3-flash-preview
flag in arguments: Use Claude vision — read every image with the Read tool, analyze directly. Do NOT run the Gemini script.--native
flag in arguments: Run Gemini first, then do native analysis. Aggregate verdicts (details below).--both
Mode Detection
From the arguments — freeform text with file paths:
- Reference image mentioned + 1 screenshot → Static mode
- Reference image + multiple frames → Dynamic mode — frames are 0.5s apart (2 FPS cadence)
- No reference, just a question about screenshots → Question mode
Gemini Execution
Parse the arguments to construct the command. The script is at
${CLAUDE_SKILL_DIR}/scripts/visual_qa.py.
# Static python3 ${CLAUDE_SKILL_DIR}/scripts/visual_qa.py --log .vqa.log [--context "Goal: ... Requirements: ... Verify: ..."] reference.png screenshot.png # Dynamic python3 ${CLAUDE_SKILL_DIR}/scripts/visual_qa.py --log .vqa.log [--context "..."] reference.png frame1.png frame2.png ... # Question python3 ${CLAUDE_SKILL_DIR}/scripts/visual_qa.py --log .vqa.log --question "the question" screenshot.png [frame2.png ...]
Always pass
--log .vqa.log. Print the script output as your response.
Native Execution
Read every image file referenced in the arguments using the Read tool. Analyze using the criteria and output format below. Never look at code — only images.
After producing output, append a debug log entry:
printf '%s\n' "$(cat <<'LOGEOF' {"ts":"$(date -u +%Y-%m-%dT%H:%M:%SZ)","mode":"MODE","model":"native","query":"QUERY","files":["FILE1","FILE2"],"output":"FIRST_LINE..."} LOGEOF )" >> .vqa.log
Aggregated Mode (--both
)
--both- Run Gemini script, capture output
- Read all images with Read tool, do native analysis using criteria below
- Produce combined verdict:
- Either says
→failfail - Either says
and neitherwarning
→failwarning - Both
→passpass
- Either says
- Merge issue lists from both, deduplicate by location + description
- Label each issue source:
,[gemini]
, or[native][both] - Log both outputs to
.vqa.log
Analysis Criteria
Implementation Quality (static + dynamic)
Assets are usually fine — what breaks is how they're placed, scaled, composed:
- Grid/uniform placement when reference shows organic arrangement
- Uniform/default scale when reference shows varied, purposeful sizing
- Flat composition when reference has depth and layering
- Stretched, tiled, or carelessly applied materials
- Objects unrelated to environment (just placed on a flat plane)
- Camera framing doesn't match reference perspective
Visual Bugs
- Z-fighting (flickering overlapping surfaces)
- Texture stretching, tiling seams, missing textures (magenta/checkerboard)
- Geometry clipping (objects visibly intersecting)
- Floating objects that should be grounded
- Shadow artifacts (detached, through walls, missing)
- Lighting leaks through opaque geometry
- Culling errors (missing faces, disappearing objects)
- UI overlap, truncated text, offscreen elements
Logical Inconsistencies
- Impossible orientations (sideways, upside-down, embedded in terrain)
- Scale mismatches (tree smaller than character, door too small)
- Misplaced objects (furniture on ceiling, rocks in sky)
- Broken spatial relationships (bridge not connecting, stairs into wall)
Placeholder Remnants
- Untextured primitives contrasting with surrounding detail
- Default Godot materials (grey StandardMaterial3D, magenta missing shader)
- Debug artifacts (collision shapes, nav mesh, axis gizmos)
Motion & Animation (dynamic mode only)
Compare consecutive frames (0.5s apart):
- Stuck entities (same position/pose across frames when movement expected)
- Jitter/teleportation (large position jumps between frames)
- Sliding (position changes but pose frozen — ice-skating)
- Physics breaks (objects through walls, endless bouncing, unnatural acceleration)
- Animation mismatches (walk anim at running speed, idle while moving)
- Camera issues (sudden jumps, clipping through geometry)
- Collision failures (overlapping objects that should collide)
Output Format
Static / Dynamic
### Verdict: {pass | fail | warning} ### Reference Match {1-3 sentences: does the game capture the reference's *intent* — placement logic, scaling, composition, camera? Distinguish lazy implementation (fail) from asset/engine limitations (acceptable).} ### Goal Assessment {1-3 sentences from Task Context. "No task context provided." if none.} ### Issues {If none: "No issues detected." Otherwise:} #### Issue {N}: {short title} - **Type:** style mismatch | visual bug | logical inconsistency | motion anomaly | placeholder - **Severity:** major | minor | note - **Frames:** {dynamic only: which frames} - **Location:** {where in frame} - **Description:** {1-2 sentences} ### Summary {One sentence.}
Severity: major/minor = must fix. note = cosmetic, can ship.
Question Mode
### Answer {Direct, specific, actionable answer. Reference locations, frames, colors, objects.} ### Visual Evidence {What in the screenshots supports the answer. Reference specific frames and locations.}