Claude-skill-registry agentic-vision
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/agentic-vision" ~/.claude/skills/majiayu000-claude-skill-registry-agentic-vision && rm -rf "$T"
manifest:
skills/data/agentic-vision/SKILL.mdsafety · automated scan (low risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
- references .env files
- references API keys
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content
Agentic Vision - The Sandwich Architecture
Version: 1.0.0 Last Updated: 2026-01-30
What is Agentic Vision?
Agentic Vision in Gemini 3 Flash converts image understanding from a static act into an agentic process. It combines visual reasoning with Code Execution.
Think → Act → Observe loop: 1. THINK: Analyze image, formulate plan 2. ACT: Generate and execute Python code (crop, measure, annotate) 3. OBSERVE: Process results, refine understanding
Key capability: Instead of "guessing" padding is
p-4, it MEASURES and returns 24px.
The Sandwich Architecture
REPLAY "SANDWICH" ARCHITECTURE ┌───────────────────────────────────────────────────────────────────┐ │ │ │ ┌──────────┐ │ │ │ Video │──────────────────────────────┐ │ │ │ Input │ │ │ │ └────┬─────┘ │ │ │ │ ▼ │ │ │ ┌─────────────────────────┐ │ │ │ │ PHASE 1: THE SURVEYOR │ │ │ │ │ (Agentic Vision Flash) │ │ │ │ ├─────────────────────────┤ │ │ │ │ 1. Measure Grids (px) │ │ │ │ │ 2. Extract Colors (hex) │ │ │ │ │ 3. Map Layout (JSON) │ ◄─── KEY │ │ └────────────┬────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────┐ ┌─────────────────────────┐ │ │ │ Gemini 3 Pro │◄────────────│ Architecture Specs │ │ │ │ (Code Gen) │ │ (Hard Data JSON) │ │ │ └──────┬───────┘ └─────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────┐ ┌──────────────────────────────────┐ │ │ │ Render View │───▶│ PHASE 2: THE QA TESTER │ │ │ └──────────────┘ │ (Agentic Vision Flash) │ │ │ ├──────────────────────────────────┤ │ │ │ 1. Compare Original vs Render │ │ │ │ 2. "Spot the difference" (SSIM) │ │ │ │ 3. Auto-fix suggestions │ │ │ └─────────────────┬────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────┐ │ │ │ FINAL PIXEL-PERFECT │ │ │ │ COMPONENT │ │ │ └──────────────────┘ │ │ │ └───────────────────────────────────────────────────────────────────┘
Phase 1: THE SURVEYOR
Measures layout BEFORE code generation.
API Endpoint
POST /api/survey/measure { imageBase64: string, // Base64 encoded frame mimeType?: string, // default: 'image/png' useParallel?: boolean, // default: true (faster) includePromptFormat?: boolean // Include formatted prompt for generator }
Response
{ success: true, measurements: { imageDimensions: { width: 1920, height: 1080 }, grid: { columns: 12, gap: "24px" }, spacing: { sidebarWidth: "256px", navHeight: "64px", cardPadding: "24px", sectionGap: "48px", containerPadding: "32px" }, colors: { background: "#0f172a", surface: "#1e293b", primary: "#6366f1", text: "#ffffff", textMuted: "#94a3b8", border: "#334155" }, typography: { h1: "48px", h2: "32px", body: "16px", small: "14px" }, components: [ { type: "sidebar", bbox: {...}, confidence: 0.95 } ], confidence: 0.91 }, promptFormat: "... formatted for code generator ..." }
Code Usage
import { runParallelSurveyor, formatSurveyorDataForPrompt } from '@/lib/agentic-vision'; // 1. Run Surveyor on video frame const { measurements } = await runParallelSurveyor(frameBase64, 'image/png'); // 2. Inject into code generator prompt const prompt = ` ${SYSTEM_PROMPT} ${formatSurveyorDataForPrompt(measurements)} Generate code based on the video above. `; // 3. Generator uses EXACT values: p-[24px] not p-4
Phase 2: THE QA TESTER
Verifies generated UI AFTER render.
API Endpoint
POST /api/verify/diff { originalImageBase64: string, // Original frame from video generatedImageBase64: string, // Screenshot of generated code mimeType?: string, // default: 'image/png' quickCheck?: boolean, // Only SSIM, skip full analysis includeReport?: boolean // Include formatted text report }
Response
{ success: true, verification: { ssimScore: 0.94, overallAccuracy: "94%", verdict: "needs_fixes", // "pass" | "needs_fixes" | "major_issues" issues: [ { type: "spacing", severity: "medium", location: "card padding", description: "Card padding is 16px, should be 24px", expected: "24px", actual: "16px" } ], autoFixSuggestions: [ { selector: ".card", property: "padding", suggestedValue: "24px", confidence: 0.85 } ] }, report: "✅ QA VERIFICATION REPORT..." }
Verdict Rules
| Verdict | Condition |
|---|---|
| SSIM >= 0.95 AND no high severity issues |
| SSIM >= 0.85 AND <= 3 high severity issues |
| SSIM < 0.85 OR > 3 high severity issues |
Enabling Code Execution
Agentic Vision requires
codeExecution tool in Gemini API:
import { GoogleGenAI } from '@google/genai'; const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); const response = await ai.models.generateContent({ model: 'gemini-3-flash', contents: [ { text: prompt }, { inlineData: { data: imageBase64, mimeType: 'image/png' } } ], config: { tools: [{ codeExecution: {} }] // <-- CRITICAL } }); // Response contains: // - executableCode: { code: "Python code..." } // - codeExecutionResult: { outcome: "OUTCOME_OK", output: "JSON result" }
Available Python Libraries in Sandbox
# Data Science import numpy as np import pandas as pd from scipy import ndimage from sklearn import preprocessing # Image Processing from PIL import Image from skimage import filters, measure, transform from skimage.metrics import structural_similarity as ssim # Visualization import matplotlib.pyplot as plt import seaborn as sns # Utilities import io import json
Technical Considerations
1. Coordinate Normalization
Gemini may rescale images internally. Always request BOTH:
- Normalized coordinates (0.0-1.0)
- Image dimensions for backend rescaling
def normalize_bbox(x, y, w, h, img_width, img_height): return { "x": x / img_width, "y": y / img_height, "width": w / img_width, "height": h / img_height }
2. Parallel Execution for Speed
Run color sampling and spacing measurement in parallel:
const [colors, spacing] = await Promise.all([ surveyColors(frame), // Fast surveySpacing(frame) // Heavier CV ]); // Time reduced by ~50%
3. SSIM with scikit-image
Use industry-standard SSIM calculation:
from skimage.metrics import structural_similarity as ssim score, diff_image = ssim(img1, img2, full=True) # score: 0.0 (different) to 1.0 (identical) # diff_image: per-pixel difference map
Integration with Replay Pipeline
Before (Without Surveyor)
Video → Gemini Pro "guesses" → p-4 or p-6? → 3-5 iterations
After (With Sandwich Architecture)
Video → Surveyor MEASURES → padding: 24px → Generator EXECUTES → 1-2 iterations
Result: First generation is 80% better!
File Structure
lib/agentic-vision/ ├── index.ts # Main exports ├── types.ts # TypeScript interfaces ├── prompts.ts # Surveyor & QA prompts ├── surveyor.ts # Phase 1 implementation └── qa-tester.ts # Phase 2 implementation app/api/ ├── survey/measure/route.ts # Surveyor endpoint └── verify/diff/route.ts # QA Tester endpoint
Quick Start
// Full pipeline with Agentic Vision // 1. PHASE 1: Measure before generation const surveyResult = await fetch('/api/survey/measure', { method: 'POST', body: JSON.stringify({ imageBase64: videoFrame, includePromptFormat: true }) }); const { measurements, promptFormat } = await surveyResult.json(); // 2. Generate code with HARD DATA const codeResult = await generateWithConstraints(video, promptFormat); // 3. Render and screenshot const screenshot = await renderAndCapture(codeResult.code); // 4. PHASE 2: Verify const qaResult = await fetch('/api/verify/diff', { method: 'POST', body: JSON.stringify({ originalImageBase64: videoFrame, generatedImageBase64: screenshot }) }); const { verification } = await qaResult.json(); // 5. Check result if (verification.verdict === 'pass') { console.log('✅ Pixel-perfect!'); } else { console.log('⚠️ Apply fixes:', verification.autoFixSuggestions); }