Godogen visual-qa

install

source · Clone the upstream repo

git clone https://github.com/htdt/godogen

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/htdt/godogen "$T" && mkdir -p ~/.claude/skills && cp -r "$T/codex/skills/visual-qa" ~/.claude/skills/htdt-godogen-visual-qa-b0538b && rm -rf "$T"

manifest: codex/skills/visual-qa/SKILL.md

source content

Visual QA

CRITICAL: Your job is to find problems, not confirm that things look fine. Do not rationalize, justify, or explain away what you see. If it looks wrong, report it.

Backend

Default (Gemini): Run the script below. All queries go to
```
gemini-3-flash-preview
```
.
If the request includes
```
--native
```
: inspect every referenced image directly with Codex's native image analysis. Do not run the Gemini script.
If the request includes
```
--both
```
: run Gemini first, then do native image analysis. Aggregate verdicts as described below.

Mode Detection

From the request text, including any explicit flags and file paths:

Reference image mentioned + 1 screenshot -> Static mode
Reference image + multiple frames -> Dynamic mode — frames are 0.5s apart (2 FPS cadence)
No reference, just a question about screenshots -> Question mode

Gemini Execution

Parse the request to construct the command. The script is at

.agents/skills/visual-qa/scripts/visual_qa.py

# Static
python3 .agents/skills/visual-qa/scripts/visual_qa.py --log .vqa.log [--context "Goal: ... Requirements: ... Verify: ..."] reference.png screenshot.png

# Dynamic
python3 .agents/skills/visual-qa/scripts/visual_qa.py --log .vqa.log [--context "..."] reference.png frame1.png frame2.png ...

# Question
python3 .agents/skills/visual-qa/scripts/visual_qa.py --log .vqa.log --question "the question" screenshot.png [frame2.png ...]

Always pass

--log .vqa.log

. Print the script output as your response.

Native Execution

Inspect every image file referenced in the request directly. Analyze using the criteria and output format below. Never look at code — only images.

After producing output, append a debug log entry:

printf '%s\n' "$(cat <<'LOGEOF'
{"ts":"$(date -u +%Y-%m-%dT%H:%M:%SZ)","mode":"MODE","model":"native","query":"QUERY","files":["FILE1","FILE2"],"output":"FIRST_LINE..."}
LOGEOF
)" >> .vqa.log

Aggregated Mode (

--both

)

Run the Gemini script and capture its output
Inspect all images directly and do native analysis using the criteria below
Produce the combined verdict:
- Either says
```
fail
```
  ->
```
fail
```
- Either says
```
warning
```
  and neither says
```
fail
```
  ->
```
warning
```
- Both say
```
pass
```
  ->
```
pass
```
Merge issue lists from both and deduplicate by location + description
Label each issue source:
```
[gemini]
```
,
```
[native]
```
, or
```
[both]
```
Log both outputs to
```
.vqa.log
```

Analysis Criteria

Implementation Quality (static + dynamic)

Assets are usually fine — what breaks is how they are placed, scaled, and composed:

Grid or uniform placement when the reference shows organic arrangement
Uniform default scale when the reference shows varied, purposeful sizing
Flat composition when the reference has depth and layering
Stretched, tiled, or carelessly applied materials
Objects unrelated to the environment, just placed on a flat plane
Camera framing that does not match the reference perspective

Visual Bugs

Z-fighting (flickering overlapping surfaces)
Texture stretching, tiling seams, missing textures (magenta/checkerboard)
Geometry clipping (objects visibly intersecting)
Floating objects that should be grounded
Shadow artifacts (detached, through walls, missing)
Lighting leaks through opaque geometry
Culling errors (missing faces, disappearing objects)
UI overlap, truncated text, offscreen elements

Logical Inconsistencies

Impossible orientations (sideways, upside-down, embedded in terrain)
Scale mismatches (tree smaller than character, door too small)
Misplaced objects (furniture on ceiling, rocks in sky)
Broken spatial relationships (bridge not connecting, stairs into wall)

Placeholder Remnants

Untextured primitives contrasting with surrounding detail
Default Godot materials (grey StandardMaterial3D, magenta missing shader)
Debug artifacts (collision shapes, nav mesh, axis gizmos)

Motion & Animation (dynamic mode only)

Compare consecutive frames (0.5s apart):

Stuck entities (same position/pose across frames when movement is expected)
Jitter or teleportation (large position jumps between frames)
Sliding (position changes but pose is frozen)
Physics breaks (objects through walls, endless bouncing, unnatural acceleration)
Animation mismatches (walk anim at running speed, idle while moving)
Camera issues (sudden jumps, clipping through geometry)
Collision failures (overlapping objects that should collide)

Output Format

Static / Dynamic

### Verdict: {pass | fail | warning}

### Reference Match
{1-3 sentences: does the game capture the reference's intent — placement logic, scaling, composition, camera? Distinguish lazy implementation from acceptable asset or engine limitations.}

### Goal Assessment
{1-3 sentences from Task Context. "No task context provided." if none.}

### Issues

{If none: "No issues detected." Otherwise:}

#### Issue {N}: {short title}
- **Type:** style mismatch | visual bug | logical inconsistency | motion anomaly | placeholder
- **Severity:** major | minor | note
- **Frames:** {dynamic only: which frames}
- **Location:** {where in frame}
- **Description:** {1-2 sentences}

### Summary
{One sentence.}

Severity:

major

and

minor

must be fixed.

note

is cosmetic.

Question Mode

### Answer
{Direct, specific, actionable answer. Reference locations, frames, colors, objects.}

### Visual Evidence
{What in the screenshots supports the answer. Reference specific frames and locations.}