Vibecosystem visual-verdict
Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.
git clone https://github.com/vibeeval/vibecosystem
T=$(mktemp -d) && git clone --depth=1 https://github.com/vibeeval/vibecosystem "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/visual-verdict" ~/.claude/skills/vibeeval-vibecosystem-visual-verdict && rm -rf "$T"
skills/visual-verdict/SKILL.mdVisual Verdict — Screenshot QA Skill
Pixel-imprecise human eyes miss visual bugs. This skill provides structured, scored visual review using screenshots, returning actionable verdicts rather than vague impressions.
When to Activate
- After a frontend-dev agent implements a UI component or page
- When cloning an external website (pairs with clone-website skill)
- After CSS changes to verify no visual regressions
- When a designer provides a Figma export or reference screenshot
- Before marking any UI task as complete
How It Works
1. Take screenshot of current implementation 2. Load reference (design mockup, Figma export, or previous version) 3. Score against 6 dimensions (0-100 each) 4. Compute weighted total score 5. Issue verdict: PASS (90+) / REVISE (60-89) / FAIL (<60) 6. List concrete mismatches with file + line fix hints 7. Loop: frontend-dev fixes → new screenshot → rescore → repeat until PASS
Prerequisites
- browser-use MCP installed and active (
)~/.mcp.json - Reference image available (Figma PNG export, screenshot, or URL)
- Implementation running (local dev server or deployed URL)
Scoring Dimensions
Dimension Weights
| Dimension | Weight | Description |
|---|---|---|
| Layout accuracy | 0.25 | Positioning, spacing, alignment of all elements |
| Typography | 0.15 | Font family, size, weight, line-height, letter-spacing |
| Color accuracy | 0.15 | Background, text, border, shadow, gradient colors |
| Responsive behavior | 0.15 | Breakpoint transitions, mobile/tablet rendering |
| Interactive states | 0.15 | Hover, focus, active, disabled, loading states |
| Content completeness | 0.15 | All text, images, icons, labels present |
Weighted Score Formula
total = (layout * 0.25) + (typography * 0.15) + (color * 0.15) + (responsive * 0.15) + (interactive * 0.15) + (content * 0.15)
Scoring Each Dimension (0-100)
Layout accuracy
- All elements exactly in position, spacing matches: 90-100
- Minor spacing deviation (< 4px): 75-89
- Noticeable misalignment (4-16px): 50-74
- Layout clearly wrong (columns swapped, elements missing): 0-49
Typography
- Font, size, weight, line-height all match: 90-100
- One attribute off (wrong weight only): 70-89
- Two attributes off: 50-69
- Wrong font family entirely: 0-49
Color accuracy
- All colors within 5% of reference: 90-100
- One element color clearly off: 65-89
- Multiple elements wrong color: 40-64
- Dark/light mode inverted: 0-39
Responsive behavior
- Tested at 3 breakpoints, all correct: 90-100
- One breakpoint breaks layout: 65-89
- Two breakpoints break: 40-64
- Not responsive at all: 0-39
Interactive states
- All states implemented and visible: 90-100
- Missing one minor state (loading): 70-89
- Missing hover or focus: 50-69
- No interactive states at all: 0-49
Content completeness
- All text, images, icons present: 90-100
- One non-critical item missing: 75-89
- One critical item missing (CTA, heading): 50-74
- Multiple items missing: 0-49
Verdict Thresholds
| Verdict | Score | Meaning |
|---|---|---|
| PASS | 90-100 | Acceptable for production |
| REVISE | 60-89 | Needs fixes, not shippable |
| FAIL | 0-59 | Major rework required |
Verdict Output Format
## Visual Verdict: [Component/Page Name] **Score: 85/100 — REVISE** **Attempt: 2/5** **Previous score: 72 (+13 this iteration)** ### Dimension Scores | Dimension | Score | Weight | Weighted | Issues | |-----------|-------|--------|----------|--------| | Layout | 92 | 0.25 | 23.0 | Minor padding on mobile | | Typography | 68 | 0.15 | 10.2 | Wrong font-weight on H2 | | Color | 96 | 0.15 | 14.4 | - | | Responsive | 80 | 0.15 | 12.0 | Card grid breaks at 768px | | Interactive | 82 | 0.15 | 12.3 | Missing hover on CTA | | Content | 91 | 0.15 | 13.7 | - | | **TOTAL** | | | **85.6** | | ### Concrete Mismatches (prioritized by impact) 1. **[Typography / HIGH]** H2 hero heading Expected: `font-weight: 700` Actual: `font-weight: 400` (normal weight) Fix: `src/components/Hero.tsx` → add `font-bold` class to `<h2>` 2. **[Responsive / MEDIUM]** Card grid layout Expected: 3-column grid collapses at 1024px Actual: Collapses at 768px — causes overflow between 768-1024 Fix: `src/components/CardGrid.tsx` → change `md:grid-cols-3` to `lg:grid-cols-3` 3. **[Interactive / MEDIUM]** Primary CTA button Expected: Darker background on hover (`bg-blue-700`) Actual: No hover state change Fix: `src/components/Hero.tsx` → add `hover:bg-blue-700 transition-colors` to CTA 4. **[Layout / LOW]** Mobile padding Expected: 16px horizontal padding on mobile Actual: 12px Fix: `src/components/Layout.tsx` → change `px-3` to `px-4` ### Next Steps Fix issues 1-3 above, then call visual-verdict again. Estimated score after fixes: ~94 (PASS)
Screenshot Workflow with browser-use MCP
Taking the Current Implementation Screenshot
Use browser-use MCP: 1. Navigate to implementation URL (e.g., http://localhost:3000/dashboard) 2. Wait for page fully loaded (no spinners) 3. Take full-page screenshot 4. Save to /tmp/verdict-current-{timestamp}.png
Loading the Reference
Three reference types accepted:
Figma PNG export: Designer exports component/page as PNG at 2x
Reference path: ~/Desktop/designs/dashboard-v2.png
Previous production screenshot: Regression check
Reference path: ~/.claude/benchmarks/screenshots/dashboard-prod.png
Live URL comparison: Compare two running versions
Reference URL A: http://localhost:3000/dashboard (current) Reference URL B: https://staging.app.com/dashboard (baseline)
Responsive Testing Breakpoints
Always test at these three widths unless told otherwise:
| Name | Width | Target |
|---|---|---|
| Mobile | 375px | iPhone SE |
| Tablet | 768px | iPad portrait |
| Desktop | 1440px | Standard laptop |
Integration with frontend-dev Agent
The typical iteration loop:
frontend-dev implements component → visual-verdict takes screenshot + scores → REVISE: sends concrete fix list to frontend-dev → frontend-dev applies fixes → visual-verdict rescores → Repeat until PASS (max 5 iterations) → PASS: task marked complete
Iteration Limit
Maximum 5 iterations per component. If PASS is not reached after 5 attempts:
ESCALATION: Visual QA failed after 5 iterations Component: [name] Best score achieved: [score] Blocker: [specific issue that keeps failing] Recommendation: [designer clarification needed / fundamentally wrong approach]
Integration with clone-website Skill
When cloning an external site, visual-verdict is the quality gate:
1. clone-website skill produces initial implementation 2. visual-verdict screenshots both original and clone 3. Scores similarity across all 6 dimensions 4. REVISE verdict triggers targeted fixes 5. PASS verdict confirms clone is production-ready
Color Comparison Method
Color is compared by extracting computed CSS values and comparing:
Reference color: #1D4ED8 (rgb 29, 78, 216) Actual color: #2563EB (rgb 37, 99, 235) Delta E (perceived): 6.2 — noticeable, score -15
Delta E thresholds:
- Delta E < 2: Imperceptible — full score
- Delta E 2-5: Barely noticeable — minor deduction
- Delta E 5-10: Noticeable — moderate deduction
- Delta E > 10: Clearly wrong — major deduction
Typography Extraction
Typography is checked by reading computed styles, not pixels:
// Extract from running implementation const element = document.querySelector('.hero-title') const styles = window.getComputedStyle(element) { fontFamily: styles.fontFamily, // Compare against reference fontSize: styles.fontSize, // "32px" vs expected "36px" fontWeight: styles.fontWeight, // "400" vs expected "700" lineHeight: styles.lineHeight, // "1.5" vs expected "1.25" letterSpacing: styles.letterSpacing // "normal" vs expected "-0.02em" }
Common Visual Bugs Caught
- Font weight 400 instead of 700 (bold not applying)
- Tailwind class not included in safelist (purged in build)
- Media query breakpoint uses px when rem expected
- Hover state not added after copy-paste from design
- Icon imported but wrong size prop
- Color defined in CSS variable but variable not set in theme
- Padding/margin using
when design showspx-3px-4 - Z-index issue hiding interactive element
- Border-radius value wrong (rounded vs rounded-lg vs rounded-full)
- Line-clamp not applied, text overflows on mobile
Saving Screenshots for Regression History
After a PASS verdict, save the screenshot as the new baseline:
cp /tmp/verdict-current-{timestamp}.png \ ~/.claude/benchmarks/screenshots/{component-name}-{date}.png
On the next change, this saved screenshot becomes the reference for regression comparison.
Remember: A visual that looks "pretty close" in a quick glance often has 3-5 compounding issues that add up to a broken experience. Score it. Trust the number, not the impression.