Claude-skills gemini-visual
Visual and front-end development assistant powered by Google Gemini's multimodal models. Use for UI analysis, design comparison, accessibility audits, color palette extraction, screenshot-to-code conversion, generating UI assets, and text-based design assistance from briefs.
git clone https://github.com/ckorhonen/claude-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/ckorhonen/claude-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/gemini-visual" ~/.claude/skills/ckorhonen-claude-skills-gemini-visual && rm -rf "$T"
skills/gemini-visual/SKILL.mdGemini Visual - Front-End & Visual Development Assistant
Overview
A comprehensive toolkit leveraging Google Gemini's advanced visual reasoning capabilities for front-end development and design tasks. Gemini provides state-of-the-art multimodal understanding with spatial reasoning, document understanding, and high-resolution image processing.
When to Use
- UI/UX Analysis: Analyze screenshots for layout issues, visual hierarchy, and design patterns
- Accessibility Audits: Check contrast ratios, text readability, and WCAG compliance
- Design Comparison: Compare mockups, before/after screenshots, or different design variations
- Color Palette Extraction: Extract colors from images with HEX, RGB, and HSL values
- Screenshot to Code: Generate HTML/CSS from design screenshots
- UI Asset Generation: Create icons, backgrounds, gradients, and UI graphics
- Responsive Design Review: Analyze multi-device screenshots for consistency
- Visual Debugging: Identify rendering issues, broken layouts, or visual bugs
- Design from Brief: Generate designs, code, and components from text descriptions
- Interactive Design Sessions: Multi-turn conversations for iterative design refinement
Prerequisites
- Python 3.9+
packagegoogle-genai
environment variableGEMINI_API_KEY
Installation
pip install google-genai
Getting an API Key
- Go to Google AI Studio
- Sign in with your Google account
- Click "Create API Key"
- Copy the key and set it as an environment variable:
export GEMINI_API_KEY="your-api-key"
Add to your shell profile (
~/.zshrc or ~/.bashrc) for persistence:
echo 'export GEMINI_API_KEY="your-api-key"' >> ~/.zshrc
Scripts Overview
| Script | Purpose |
|---|---|
| Analyze UI screenshots for issues, patterns, and suggestions |
| Generate icons, backgrounds, and UI graphics |
| Compare two designs and highlight differences |
| Extract color palettes from images |
| Convert screenshots to HTML/CSS code |
| Generate designs and code from text briefs (no image required) |
Available Models
| Model | Best For | Notes |
|---|---|---|
| Visual analysis, code generation, fast reasoning | Best cost/quality for analysis |
| Complex visual reasoning, code generation | Highest quality analysis |
| Asset generation, image editing | Native image output |
| High-quality image generation, 4K output | Latest image generation model |
| Professional image generation | Previous-gen high quality |
Note: For image analysis tasks (analyze_ui, compare_designs, screenshot_to_code, extract_colors), use the text+vision models (
orgemini-2.5-flash). For image generation tasks (generate_ui_assets), use the image-output models.gemini-2.5-pro
Media Resolution Options
Control token usage and detail level with
--resolution:
| Resolution | Tokens/Image | Best For |
|---|---|---|
| 70 | Quick scans, thumbnails |
| 560 | Standard screenshots, OCR |
| 1120 | Detailed UI analysis |
| 2240+ | Fine text, complex layouts |
Script Reference
1. analyze_ui.py - UI Analysis
Analyze UI screenshots for design issues, accessibility problems, and improvement suggestions.
python scripts/analyze_ui.py [options] IMAGE Required: IMAGE Path to UI screenshot to analyze Options: -m, --mode MODE Analysis mode (default: comprehensive) Modes: comprehensive, accessibility, layout, ux -r, --resolution RES Media resolution (default: high) -o, --output FILE Save analysis to file (JSON or text) -f, --format FORMAT Output format: text, json, markdown (default: text) --thinking LEVEL Thinking level: low, high (default: high) -v, --verbose Show detailed progress
Examples:
# Comprehensive UI analysis python scripts/analyze_ui.py screenshot.png # Accessibility-focused analysis python scripts/analyze_ui.py -m accessibility app_screen.png # Layout analysis with JSON output python scripts/analyze_ui.py -m layout -f json -o report.json mockup.png # Quick UX review python scripts/analyze_ui.py -m ux --thinking low mobile_app.png
Analysis Modes:
- comprehensive: Full analysis including layout, colors, typography, accessibility, and UX
- accessibility: WCAG compliance, contrast ratios, screen reader compatibility, touch targets
- layout: Visual hierarchy, spacing, alignment, responsive issues
- ux: User flow, affordances, cognitive load, interaction patterns
2. generate_ui_assets.py - Asset Generation
Generate UI assets like icons, backgrounds, patterns, and graphics.
python scripts/generate_ui_assets.py [options] Required: -p, --prompt TEXT Description of asset to generate Options: -t, --type TYPE Asset type (default: icon) Types: icon, background, pattern, illustration, badge -s, --style STYLE Design style (default: modern) Styles: modern, minimal, flat, gradient, glassmorphism, neumorphism, material, ios, outlined -c, --colors COLORS Color palette (comma-separated HEX or names) -a, --aspect-ratio RATIO Aspect ratio (default: 1:1) --size SIZE Resolution: 1K, 2K, 4K (default: 1K) -o, --output FILE Output file path -r, --reference IMAGE Reference image for style guidance -v, --verbose Show detailed progress
Examples:
# Generate app icon python scripts/generate_ui_assets.py -p "Weather app icon with sun and clouds" -t icon # Create gradient background python scripts/generate_ui_assets.py -p "Soft gradient for login screen" -t background \ -c "#667eea,#764ba2" -a 9:16 -o login_bg.png # Generate pattern for UI python scripts/generate_ui_assets.py -p "Subtle geometric pattern for card backgrounds" \ -t pattern -s minimal -o pattern.png # Create illustration from reference python scripts/generate_ui_assets.py -p "Onboarding illustration, person using phone" \ -t illustration -r brand_style.png -o onboarding.png # Generate badge/label python scripts/generate_ui_assets.py -p "Premium badge with star" -t badge \ -c "gold,white" -s gradient
3. compare_designs.py - Design Comparison
Compare two design screenshots and analyze differences.
python scripts/compare_designs.py [options] IMAGE1 IMAGE2 Required: IMAGE1 First design image (before/version A) IMAGE2 Second design image (after/version B) Options: -m, --mode MODE Comparison mode (default: full) Modes: full, visual, content, accessibility -f, --format FORMAT Output format: text, json, markdown (default: text) -o, --output FILE Save comparison to file -r, --resolution RES Media resolution (default: high) -v, --verbose Show detailed progress
Examples:
# Full design comparison python scripts/compare_designs.py before.png after.png # Visual-only comparison python scripts/compare_designs.py -m visual old_design.png new_design.png # Compare for accessibility changes python scripts/compare_designs.py -m accessibility v1.png v2.png -o report.md # A/B test comparison as JSON python scripts/compare_designs.py -m full variant_a.png variant_b.png -f json
Comparison Modes:
- full: Comprehensive comparison of all visual and functional aspects
- visual: Focus on colors, typography, spacing, visual hierarchy
- content: Text changes, layout shifts, information architecture
- accessibility: Contrast, readability, touch targets, WCAG compliance changes
4. extract_colors.py - Color Palette Extraction
Extract color palettes from images with multiple output formats.
python scripts/extract_colors.py [options] IMAGE Required: IMAGE Image to extract colors from Options: -n, --count COUNT Number of colors to extract (default: 6) -f, --format FORMAT Output format: text, json, css, tailwind, scss (default: text) -o, --output FILE Save palette to file --named Include closest CSS color names --contrast Calculate contrast ratios between colors -v, --verbose Show detailed progress
Examples:
# Extract 6 main colors python scripts/extract_colors.py screenshot.png # Extract palette as CSS variables python scripts/extract_colors.py -f css -o colors.css brand_image.png # Get Tailwind config python scripts/extract_colors.py -f tailwind -o tailwind.config.js design.png # Detailed palette with contrast info python scripts/extract_colors.py -n 8 --named --contrast hero_image.jpg # SCSS variables output python scripts/extract_colors.py -f scss -o _colors.scss mockup.png
5. screenshot_to_code.py - Screenshot to Code
Convert UI screenshots to HTML/CSS code.
python scripts/screenshot_to_code.py [options] IMAGE Required: IMAGE UI screenshot to convert Options: -f, --framework FRAME CSS framework (default: tailwind) Frameworks: tailwind, css, bootstrap, vanilla -c, --components Extract as reusable components --responsive Generate responsive code -o, --output DIR Output directory for files -r, --resolution RES Media resolution (default: ultra_high) --thinking LEVEL Thinking level: low, high (default: high) -v, --verbose Show detailed progress
Examples:
# Convert to Tailwind HTML python scripts/screenshot_to_code.py landing_page.png # Generate vanilla CSS python scripts/screenshot_to_code.py -f vanilla -o ./output mockup.png # Create responsive Bootstrap components python scripts/screenshot_to_code.py -f bootstrap -c --responsive card.png # Full page conversion with components python scripts/screenshot_to_code.py -f tailwind -c --responsive -o ./components page.png
6. design_from_brief.py - Text-Based Design Assistant
Generate frontend designs, code, and components from text descriptions without needing visual input.
python scripts/design_from_brief.py [options] Input (one required): -p, --prompt TEXT Design brief or prompt text -b, --brief-file FILE Read brief from a file --interactive Start interactive design session Options: -m, --mode MODE Generation mode (default: code) Modes: design, code, component, review, brainstorm -fw, --framework FW Framework for code generation (default: tailwind) Frameworks: tailwind, css, bootstrap, react, vue, svelte, vanilla -c, --context TEXT Additional context (existing code, constraints) -f, --format FORMAT Output format: text, json, markdown (default: text) -o, --output FILE Save output to file -v, --verbose Show detailed progress
Examples:
# Generate code from a brief python scripts/design_from_brief.py -p "Create a pricing table with 3 tiers" -m code -fw tailwind # Get design advice and guidance python scripts/design_from_brief.py -p "Design a modern SaaS landing page" -m design # Generate a React component python scripts/design_from_brief.py -p "A toggle switch with smooth animation" -m component -fw react # Review a design idea python scripts/design_from_brief.py -p "Is a hamburger menu good for desktop navigation?" -m review # Brainstorm creative ideas python scripts/design_from_brief.py -p "Ideas for a fitness app dashboard" -m brainstorm # Read brief from file python scripts/design_from_brief.py -b project_brief.txt -m code -fw vue # Interactive multi-turn session python scripts/design_from_brief.py --interactive -m code -fw tailwind
Generation Modes:
- design: Get design guidance including colors, typography, layout, and UX recommendations
- code: Generate complete, production-ready frontend code
- component: Design and code reusable UI components with props and variants
- review: Get feedback and recommendations on design ideas or code
- brainstorm: Generate creative ideas and multiple design directions
Supported Frameworks:
| Framework | Description |
|---|---|
| Tailwind CSS utility classes |
| Custom CSS with variables and BEM |
| Bootstrap 5 components |
| React functional components with TypeScript |
| Vue 3 Composition API with TypeScript |
| Svelte components |
| Plain HTML/CSS/JavaScript |
Interactive Session Commands:
When using
--interactive, you can use these commands:
| Command | Description |
|---|---|
| Change generation mode |
| Change framework |
| Save last response to file |
| Clear conversation history |
| Exit session |
Use Cases
Front-End Development Workflow
# 1. Analyze a design mockup python scripts/analyze_ui.py mockup.png -f markdown -o analysis.md # 2. Extract brand colors python scripts/extract_colors.py mockup.png -f tailwind -o colors.js # 3. Generate code from screenshot python scripts/screenshot_to_code.py mockup.png -f tailwind -c -o ./src # 4. Generate placeholder icons python scripts/generate_ui_assets.py -p "Settings gear icon" -t icon -s outlined
Design Review Process
# Compare design iterations python scripts/compare_designs.py v1.png v2.png -f markdown -o review.md # Check accessibility python scripts/analyze_ui.py final_design.png -m accessibility -o a11y_report.json
Asset Production Pipeline
# Generate icon set for icon in "home" "search" "profile" "settings"; do python scripts/generate_ui_assets.py -p "${icon} icon" -t icon -s outlined -o icons/${icon}.png done # Generate backgrounds for different screens python scripts/generate_ui_assets.py -p "Auth screen gradient" -t background -a 9:16 -o bg_auth.png python scripts/generate_ui_assets.py -p "Dashboard header" -t background -a 21:9 -o bg_header.png
Design from Text Brief
# Start with design exploration python scripts/design_from_brief.py -p "E-commerce product page for sneakers" -m design -o design_spec.md # Generate the code python scripts/design_from_brief.py -p "E-commerce product page with image gallery, size selector, add to cart button, and reviews section" -m code -fw react -o product_page.tsx # Create reusable components python scripts/design_from_brief.py -p "Star rating component, 1-5 stars, supports half stars, shows count" -m component -fw react # Interactive refinement session python scripts/design_from_brief.py --interactive -m code -fw tailwind # Then iterate: "Add a sticky header", "Make the CTA more prominent", etc.
Full Project Workflow (No Images Needed)
# 1. Brainstorm ideas python scripts/design_from_brief.py -p "Modern dashboard for analytics SaaS" -m brainstorm # 2. Get detailed design spec python scripts/design_from_brief.py -p "Analytics dashboard with sidebar nav, KPI cards, charts, and data tables" -m design -f markdown -o design.md # 3. Generate components python scripts/design_from_brief.py -p "KPI card showing metric, trend, and sparkline" \ -m component -fw react -o components/KPICard.tsx python scripts/design_from_brief.py -p "Data table with sorting, filtering, pagination" \ -m component -fw react -o components/DataTable.tsx # 4. Generate full page layout python scripts/design_from_brief.py -p "Dashboard layout combining sidebar, header, and main content area with the KPI cards and data table" -m code -fw react
API Capabilities Summary
Gemini 3 Visual Features
- Spatial Reasoning: Understands element positioning, alignment, and relationships
- Document Understanding: OCR, text extraction, layout parsing
- High Resolution: Up to ultra-high resolution for fine details and small text
- Multi-Image Input: Compare up to 3,600 images per request
- Thought Signatures: Maintains reasoning context for multi-turn editing
Image Support
- Input Formats: PNG, JPEG, WebP, HEIC, HEIF
- Max File Size: 20MB inline, larger via File API
- Output Formats: PNG (generated images)
- Resolutions: 1K, 2K, 4K (generation), configurable input resolution
Troubleshooting
"GEMINI_API_KEY environment variable not set"
Set your API key:
export GEMINI_API_KEY="your-api-key"
"Rate limit exceeded"
Wait a few minutes and retry. For batch operations, add delays between requests.
Low quality analysis
Use higher resolution:
python scripts/analyze_ui.py -r ultra_high screenshot.png
Missing fine details
For detailed UI analysis, use ultra_high resolution:
python scripts/analyze_ui.py -r ultra_high -m comprehensive app.png
Code generation issues
Use high thinking level and ultra_high resolution:
python scripts/screenshot_to_code.py --thinking high -r ultra_high mockup.png
Asset generation blocked by content moderation
If you see "No image content returned. The request may have been blocked by content moderation", try:
- Simplifying or rephrasing your prompt
- Removing specific brand names or copyrighted references
- Using more generic descriptions
This is a safety filter from the Gemini API and not all prompts will be accepted.
Resolution affects token usage
The
--resolution parameter controls how many tokens are used for image processing:
: ~70 tokens/image - Quick scans, thumbnailslow
: ~560 tokens/image - Standard screenshots, OCRmedium
: ~1120 tokens/image - Detailed UI analysis (default)high
Higher resolution provides better accuracy but uses more tokens.
Best Practices
- Use appropriate resolution: Higher resolution = more tokens but better accuracy
- Match model to task: Use
for analysis,gemini-3-pro-preview
for generationgemini-3-pro-image-preview - Provide context: Add relevant details to prompts for better results
- Iterate on assets: Use multi-turn conversations for refinement
- Combine scripts: Use extraction + generation for consistent design systems
- Save outputs: Use
flag to save reports for documentation-o - Batch similar tasks: Process multiple similar items together for efficiency