Godogen visual-qa

install

source · Clone the upstream repo

git clone https://github.com/htdt/godogen

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/htdt/godogen "$T" && mkdir -p ~/.claude/skills && cp -r "$T/claude/skills/visual-qa" ~/.claude/skills/htdt-godogen-visual-qa && rm -rf "$T"

manifest: claude/skills/visual-qa/SKILL.md

source content

Visual QA

$ARGUMENTS

CRITICAL: Your job is to find problems, not confirm things look fine. Do not rationalize, justify, or explain away what you see. If it looks wrong, report it.

Backend

Default (Gemini): Run the script below. All queries go to
```
gemini-3-flash-preview
```
.
--native
flag in arguments: Use Claude vision — read every image with the Read tool, analyze directly. Do NOT run the Gemini script.
--both
flag in arguments: Run Gemini first, then do native analysis. Aggregate verdicts (details below).

Mode Detection

From the arguments — freeform text with file paths:

Reference image mentioned + 1 screenshot → Static mode
Reference image + multiple frames → Dynamic mode — frames are 0.5s apart (2 FPS cadence)
No reference, just a question about screenshots → Question mode

Gemini Execution

Parse the arguments to construct the command. The script is at

${CLAUDE_SKILL_DIR}/scripts/visual_qa.py

# Static
python3 ${CLAUDE_SKILL_DIR}/scripts/visual_qa.py --log .vqa.log [--context "Goal: ... Requirements: ... Verify: ..."] reference.png screenshot.png

# Dynamic
python3 ${CLAUDE_SKILL_DIR}/scripts/visual_qa.py --log .vqa.log [--context "..."] reference.png frame1.png frame2.png ...

# Question
python3 ${CLAUDE_SKILL_DIR}/scripts/visual_qa.py --log .vqa.log --question "the question" screenshot.png [frame2.png ...]

Always pass

--log .vqa.log

. Print the script output as your response.

Native Execution

Read every image file referenced in the arguments using the Read tool. Analyze using the criteria and output format below. Never look at code — only images.

After producing output, append a debug log entry:

printf '%s\n' "$(cat <<'LOGEOF'
{"ts":"$(date -u +%Y-%m-%dT%H:%M:%SZ)","mode":"MODE","model":"native","query":"QUERY","files":["FILE1","FILE2"],"output":"FIRST_LINE..."}
LOGEOF
)" >> .vqa.log

Aggregated Mode (

--both

)

Run Gemini script, capture output
Read all images with Read tool, do native analysis using criteria below
Produce combined verdict:
- Either says
```
fail
```
  →
```
fail
```
- Either says
```
warning
```
  and neither
```
fail
```
  →
```
warning
```
- Both
```
pass
```
  →
```
pass
```
Merge issue lists from both, deduplicate by location + description
Label each issue source:
```
[gemini]
```
,
```
[native]
```
, or
```
[both]
```
Log both outputs to
```
.vqa.log
```

Analysis Criteria

Implementation Quality (static + dynamic)

Assets are usually fine — what breaks is how they're placed, scaled, composed:

Grid/uniform placement when reference shows organic arrangement
Uniform/default scale when reference shows varied, purposeful sizing
Flat composition when reference has depth and layering
Stretched, tiled, or carelessly applied materials
Objects unrelated to environment (just placed on a flat plane)
Camera framing doesn't match reference perspective

Visual Bugs

Z-fighting (flickering overlapping surfaces)
Texture stretching, tiling seams, missing textures (magenta/checkerboard)
Geometry clipping (objects visibly intersecting)
Floating objects that should be grounded
Shadow artifacts (detached, through walls, missing)
Lighting leaks through opaque geometry
Culling errors (missing faces, disappearing objects)
UI overlap, truncated text, offscreen elements

Logical Inconsistencies

Impossible orientations (sideways, upside-down, embedded in terrain)
Scale mismatches (tree smaller than character, door too small)
Misplaced objects (furniture on ceiling, rocks in sky)
Broken spatial relationships (bridge not connecting, stairs into wall)

Placeholder Remnants

Untextured primitives contrasting with surrounding detail
Default Godot materials (grey StandardMaterial3D, magenta missing shader)
Debug artifacts (collision shapes, nav mesh, axis gizmos)

Motion & Animation (dynamic mode only)

Compare consecutive frames (0.5s apart):

Stuck entities (same position/pose across frames when movement expected)
Jitter/teleportation (large position jumps between frames)
Sliding (position changes but pose frozen — ice-skating)
Physics breaks (objects through walls, endless bouncing, unnatural acceleration)
Animation mismatches (walk anim at running speed, idle while moving)
Camera issues (sudden jumps, clipping through geometry)
Collision failures (overlapping objects that should collide)

Output Format

Static / Dynamic

### Verdict: {pass | fail | warning}

### Reference Match
{1-3 sentences: does the game capture the reference's *intent* — placement logic, scaling, composition, camera? Distinguish lazy implementation (fail) from asset/engine limitations (acceptable).}

### Goal Assessment
{1-3 sentences from Task Context. "No task context provided." if none.}

### Issues

{If none: "No issues detected." Otherwise:}

#### Issue {N}: {short title}
- **Type:** style mismatch | visual bug | logical inconsistency | motion anomaly | placeholder
- **Severity:** major | minor | note
- **Frames:** {dynamic only: which frames}
- **Location:** {where in frame}
- **Description:** {1-2 sentences}

### Summary
{One sentence.}

Severity: major/minor = must fix. note = cosmetic, can ship.

Question Mode

### Answer
{Direct, specific, actionable answer. Reference locations, frames, colors, objects.}

### Visual Evidence
{What in the screenshots supports the answer. Reference specific frames and locations.}