Squire omni-vu
git clone https://github.com/eddiebelaval/squire
T=$(mktemp -d) && git clone --depth=1 https://github.com/eddiebelaval/squire "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/omni-vu" ~/.claude/skills/eddiebelaval-squire-omni-vu && rm -rf "$T"
skills/omni-vu/SKILL.mdname: omni-vu description: Screen capture, AI vision analysis, and GUI automation for macOS. Use when you need to see what's on screen, analyze UI state, detect changes, or automate mouse/keyboard actions. slug: omni-vu category: operations complexity: complex version: "1.0.0" author: "id8Labs" triggers:
- "omni-vu"
- "omni vu" tags:
- development
- tool-factory-retrofitted---
omni.vu - Visual Understanding & Automation
Core Workflows
Workflow 1: Primary Action
- Analyze the input and context
- Validate prerequisites are met
- Execute the core operation
- Verify the output meets expectations
- Report results
Overview
omni.vu gives you eyes and hands on the user's macOS screen. Use it to:
- See what the user sees (screen capture)
- Understand UI state with AI vision
- Detect changes and wait for events
- Act with mouse and keyboard automation
When to Use This Skill
Proactively Use omni.vu When:
- Debugging UI issues - "The button isn't working" → capture and analyze
- Verifying changes - After modifying UI code, check if it rendered correctly
- Waiting for operations - Build finishing, deployment completing, tests running
- Understanding context - User describes something on screen you can't see
- Automating repetitive tasks - Clicking through UI flows, filling forms
- Documentation - Capturing screenshots of features
Do NOT Use When:
- Reading/writing files (use Read/Write tools)
- Running terminal commands (use Bash)
- Making API calls (use appropriate tools)
- User hasn't granted screen recording permission
Tool Reference
Capture Tools
vu
- Full Screen Capture
vuvu(monitor=0, save_to_history=True)
Captures the entire screen. Returns base64 image.
Use when: You need to see everything on screen.
vu_window
- Window Capture
vu_windowvu_window(window_id=None, title="VS Code", include_frame=False)
Captures a specific window by ID or title (partial match).
Use when: You only need one application's content.
Workflow:
- Call
to see available windowsvu_list_windows - Find the window_id or use title matching
- Call
with that ID/titlevu_window
vu_region
- Region Capture
vu_regionvu_region(x=100, y=100, width=500, height=300)
Captures a specific rectangle of the screen.
Use when: You need a precise area (error message, specific component).
vu_list_windows
- List Windows
vu_list_windowsvu_list_windows(filter_app="Chrome")
Lists all visible windows with metadata.
Returns: List of
{window_id, title, owner, x, y, width, height}
vu_list_monitors
- List Displays
vu_list_monitorsvu_list_monitors()
Lists connected displays with resolution and scale factor.
Vision Tools
vu_describe
- AI Vision Analysis
vu_describevu_describe( prompt="What errors are visible?", provider="claude", # or openai, gemini, ollama max_tokens=1024, capture_first=True )
Captures screen and analyzes with AI.
Best prompts:
- "Describe what you see on screen"
- "Are there any error messages visible?"
- "What is the state of the build/test output?"
- "Is the login form filled correctly?"
- "What color is the status indicator?"
Providers:
| Provider | Speed | Quality | Cost |
|---|---|---|---|
| claude | Medium | Excellent | $$ |
| openai | Fast | Very Good | $$ |
| gemini | Fast | Good | $ |
| ollama | Slow | Varies | Free |
Utility Tools
vu_diff
- Change Detection
vu_diffvu_diff(threshold=0.02, monitor=0)
Detects if screen changed since last capture.
Returns:
{changed: bool, diff_percentage: float}
Use for polling patterns:
# Wait for build to finish while True: result = vu_diff(threshold=0.05) if result["changed"]: # Screen changed, check what happened analysis = vu_describe(prompt="Did the build succeed or fail?") break # Wait before checking again
vu_history
- Capture History
vu_historyvu_history(limit=10, capture_type="full_screen")
Gets recent capture metadata.
vu_last
- Last Capture
vu_lastvu_last()
Returns the most recent capture with image data.
Use when: You need to re-analyze without re-capturing.
vu_status
- System Status
vu_statusvu_status()
Returns system info: monitors, providers, safety settings.
Automation Tools
vu_click
- Mouse Click
vu_clickvu_click(x=500, y=300, button="left", count=1)
Clicks at screen coordinates.
Buttons:
left, right, middle
Count: 1 (single), 2 (double), 3 (triple)
Safety: Coordinates validated against screen bounds.
vu_move
- Move Cursor
vu_movevu_move(x=500, y=300, duration_ms=0)
Moves cursor to position. Use
duration_ms for animated movement.
vu_drag
- Drag Operation
vu_dragvu_drag(start_x=100, start_y=100, end_x=500, end_y=300, duration_ms=500)
Drags from start to end position.
Safety: Maximum 2000px drag distance.
vu_scroll
- Scroll
vu_scrollvu_scroll(direction="down", amount=3, x=None, y=None)
Scrolls at current or specified position.
Directions:
up, down, left, right
Amount: Lines to scroll (1-20)
vu_type
- Type Text
vu_typevu_type(text="Hello World", delay_between_ms=0)
Types text character by character.
Note: Click on target field first!
vu_hotkey
- Keyboard Shortcut
vu_hotkeyvu_hotkey(keys="cmd+s")
Executes keyboard shortcuts.
Format:
modifier+key (e.g., cmd+c, ctrl+shift+s, cmd+opt+i)
Modifiers: cmd, ctrl, alt/opt, shift
Blocked hotkeys (for safety):
(quit)cmd+q
(logout)cmd+shift+q
(force quit)cmd+opt+esc
Common Workflows
1. Debug UI Issue
User: "The submit button doesn't do anything when I click it" 1. vu_describe(prompt="Describe the form and submit button state") 2. Analyze: Is button disabled? Is there validation error? 3. If needed: vu_window(title="Chrome") for focused capture 4. Check console: vu_describe(prompt="Are there any errors in the developer console?")
2. Verify Build/Deploy
User: "Run the build and let me know when it's done" 1. Run: npm run build (via Bash) 2. Loop: vu_diff(threshold=0.05) 3. When changed: vu_describe(prompt="What is the build status? Did it succeed or fail?") 4. Report result
3. Fill a Form
User: "Fill out the registration form with test data" 1. vu_describe(prompt="What form fields are visible?") 2. vu_click(x=field_x, y=field_y) # Click first field 3. vu_type(text="testuser@example.com") 4. vu_hotkey(keys="tab") # Move to next field 5. vu_type(text="Test User") 6. Continue for each field... 7. vu_click(x=submit_x, y=submit_y) # Submit 8. vu_describe(prompt="Was the form submitted successfully?")
4. Navigate UI
User: "Open the settings panel in VS Code" 1. vu_list_windows(filter_app="Code") 2. vu_window(title="Code") # Capture VS Code 3. vu_hotkey(keys="cmd+,") # Open settings 4. vu_diff() # Wait for settings to open 5. vu_describe(prompt="Are the VS Code settings now visible?")
5. Monitor for Changes
User: "Watch the dashboard and tell me when the status changes" 1. vu() # Initial capture 2. Loop every 5 seconds: - vu_diff(threshold=0.02) - If changed: vu_describe(prompt="What changed on the dashboard?") 3. Report changes to user
Best Practices
1. Capture Before Acting
Always capture and analyze before clicking/typing:
❌ Bad: vu_click(x=500, y=300) # Hope it's the right spot ✅ Good: 1. vu_describe(prompt="Where is the submit button?") 2. Parse coordinates from description 3. vu_click(x=parsed_x, y=parsed_y)
2. Use Appropriate Capture Scope
Full screen → General context, multi-app workflows Window → Single app focus, cleaner output Region → Specific element, faster/smaller
3. Specific Vision Prompts
❌ Vague: "What do you see?" ✅ Specific: - "Is there an error message visible? If so, what does it say?" - "What is the status of the test runner? Passing, failing, or running?" - "List all form fields visible and their current values"
4. Rate Limit Automation
Don't spam clicks. Wait between actions:
vu_click(x=100, y=100) # Wait for UI response vu_diff(threshold=0.01) vu_click(x=200, y=200)
5. Verify After Actions
Always verify automation succeeded:
vu_type(text="hello@example.com") vu_describe(prompt="What text is now in the email field?")
Troubleshooting
"Permission denied" or blank captures
→ Grant Screen Recording permission: System Settings > Privacy > Screen Recording
Automation not working
→ Grant Accessibility permission: System Settings > Privacy > Accessibility
Vision analysis failing
→ Check API keys in environment variables → Try different provider:
vu_describe(provider="openai")
Coordinates seem off
→ Retina display? Coordinates are in logical points, not pixels → Multi-monitor? Check
vu_list_monitors() for display arrangement
Hotkey blocked
→ Some dangerous hotkeys (cmd+q) are blocked for safety → Unblock in config if absolutely necessary
Configuration
Environment variables:
OMNI_VU_ANTHROPIC_API_KEY=sk-ant-... # For Claude vision OMNI_VU_OPENAI_API_KEY=sk-... # For GPT-4 vision OMNI_VU_DEFAULT_PROVIDER=claude # Default AI provider OMNI_VU_SAFETY_LEVEL=medium # low/medium/high
History stored at:
~/.omni.vu/captures/