Trending-skills pi-computer-use
Control macOS applications with Pi agents using semantic Accessibility API targets and optional screenshots
git clone https://github.com/Aradotso/trending-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/Aradotso/trending-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/pi-computer-use" ~/.claude/skills/aradotso-trending-skills-pi-computer-use && rm -rf "$T"
skills/pi-computer-use/SKILL.mdpi-computer-use
Skill by ara.so — Daily 2026 Skills collection.
pi-computer-use gives Pi agents a semantic computer-use surface for visible macOS windows. It prefers Accessibility (AX) targets (like @e1) over raw coordinates, returns semantic state after every action, and attaches screenshots only when AX coverage is too weak.
Installation
Via Pi (recommended)
pi install git:github.com/injaneity/pi-computer-use#v0.2.1
Pin to a specific version:
pi install -l git:github.com/injaneity/pi-computer-use#v0.2.1
Via npm
npm install @injaneity/pi-computer-use # or pin a version npm install @injaneity/pi-computer-use@0.2.1
Remove
pi remove git:github.com/injaneity/pi-computer-use#v0.2.1 npm remove @injaneity/pi-computer-use
First-Run Permissions
On first session, macOS will prompt for permissions for:
~/.pi/agent/helpers/pi-computer-use/bridge
Grant both:
- Accessibility — required for AX ref targeting
- Screen Recording — required for screenshots
How It Works
Three components:
- Pi extension (
) — registers public tools andextensions/computer-use.ts
command/computer-use - TypeScript bridge (
) — manages window state, AX refs, fallback policy, batching, execution metadatasrc/bridge.ts - Native Swift helper (
) — talks to macOS Accessibility, ScreenCaptureKit, AppKit, CoreGraphicsnative/macos/bridge.swift
Available Tools
| Tool | Purpose |
|---|---|
| List running apps |
| List windows for an app |
| Capture window + return AX state |
| Click element or coordinate |
| Double-click element or coordinate |
| Move cursor |
| Drag from point to point |
| Scroll element or coordinate |
| Press key combination |
| Type raw text |
| Replace element value via AX |
| Pause execution |
| Position/resize window |
| Batch multiple actions |
Core Workflow
Always start a session with
screenshot to select the controlled window and obtain AX refs:
// 1. Discover apps and windows if target is ambiguous list_apps() list_windows({ app: "Safari" }) // 2. Select the window and get AX state screenshot({ window: "@w1" }) // 3. Act on AX refs returned from screenshot click({ window: "@w1", ref: "@e1" }) set_text({ ref: "@e2", text: "https://example.com" }) keypress({ keys: ["Enter"] })
AX Ref Targeting (Preferred)
AX refs like
@e1, @e2 are returned by screenshot and carry capability metadata:
— supportscanSetValueset_text
— supportscanPressclick
— can receive focuscanFocus
— supportscanScrollscroll
— supports value adjustmentadjust
// Click by AX ref — no coordinates needed click({ ref: "@e1" }) // Scroll a specific element scroll({ ref: "@e3", scrollY: 600 }) // Replace text field value atomically set_text({ ref: "@e2", text: "hello world" })
Coordinate Fallback
Use coordinates only when no suitable AX target exists. Always include
stateId from the latest screenshot to guard against stale state:
click({ x: 320, y: 180, stateId: "abc123" })
Batching Actions
Use
computer_actions to batch obvious sequential steps. One semantic state update is returned after all actions:
computer_actions({ stateId: "abc123", actions: [ { type: "click", ref: "@e1" }, { type: "set_text", ref: "@e2", text: "https://example.com" }, { type: "keypress", keys: ["Enter"] } ] })
Each action in the result includes execution metadata:
— background-safe AX path (no focus takeover)stealth
— required focus or raw event fallbackdefault
Window Management
// List windows for a specific app list_windows({ app: "Finder" }) // Target a specific window in all subsequent calls screenshot({ window: "@w2" }) // Arrange window by preset arrange_window({ window: "@w1", preset: "left-half" }) // Arrange window with explicit frame arrange_window({ window: "@w1", frame: { x: 0, y: 0, width: 1280, height: 800 } })
Screenshot Modes
Control when screenshots are attached with the
image option:
screenshot({ window: "@w1", image: "auto" }) // default: attach when AX coverage is weak screenshot({ window: "@w1", image: "always" }) // always attach screenshot({ window: "@w1", image: "never" }) // never attach, AX state only
Common Patterns
Open URL in Safari
list_windows({ app: "Safari" }) screenshot({ window: "@w1" }) // @e1 = address bar (from AX state) set_text({ ref: "@e1", text: "https://example.com" }) keypress({ keys: ["Enter"] })
Fill a Form
screenshot({ window: "@w1" }) // Use refs from AX state set_text({ ref: "@e3", text: "Jane Doe" }) set_text({ ref: "@e4", text: "jane@example.com" }) click({ ref: "@e5" }) // Submit button
Keyboard Shortcut
keypress({ keys: ["Cmd", "T"] }) // New tab keypress({ keys: ["Cmd", "Shift", "N"] }) // New incognito window keypress({ keys: ["Escape"] })
Scroll a Page
scroll({ ref: "@e2", scrollY: 800 }) // Scroll element down scroll({ ref: "@e2", scrollY: -400 }) // Scroll up
Drag and Drop
drag({ fromX: 100, fromY: 200, toX: 400, toY: 200 })
Strict AX Mode (Stealth / Background-Safe)
Enable strict AX mode to prevent focus changes, raw pointer events, raw keyboard events, and cursor takeover. All actions must succeed via background-safe AX paths:
// Via config (see Configuration section) // Actions will report `stealth` in execution metadata when successful
Strict mode errors will surface if an action requires foreground focus and strict mode is active.
Configuration
Inspect effective config in Pi:
/computer-use
Config can be set via config files or environment variable overrides. Key options:
| Option | Description |
|---|---|
| | | — screenshot attachment mode |
| Enable background-safe strict AX mode |
| Browser-aware targeting preference |
See
for full config file format and environment variable overrides.docs/configuration.md
Development
# Install dependencies npm install # Run checks npm test # Run local checkout without loading installed copy pi --no-extensions -e .
Benchmarks
# Default QA benchmark npm run benchmark:qa # Full benchmark (may open apps) npm run benchmark:qa:full
See
for metrics, regression policy, and comparison workflow.benchmarks/README.md
Troubleshooting
Permissions not granted
Re-run and grant both Accessibility and Screen Recording to:
~/.pi/agent/helpers/pi-computer-use/bridge
On macOS, go to System Settings → Privacy & Security → Accessibility and Screen Recording.
AX refs are stale
Take a fresh
screenshot to get updated stateId and new refs before acting. Stale-action detection uses stateId to reject outdated coordinates or refs.
Browser window not targeted correctly
Use
list_windows({ app: "Safari" }) (or Chrome/Firefox) first, then explicitly pass window: "@wN" to screenshot and subsequent actions.
Strict AX mode errors
An action failed to complete via background-safe AX path. Either disable strict mode or identify an AX ref with
canPress/canSetValue that supports the background path.
Helper not found
Ensure Pi installed the native helper:
ls ~/.pi/agent/helpers/pi-computer-use/bridge
If missing, reinstall:
pi install git:github.com/injaneity/pi-computer-use#v0.2.1
Key Concepts
- AX refs (
,@e1
, …) — semantic element handles from macOS Accessibility API, stable within a state@e2 - Window refs (
,@w1
, …) — stable handles from@w2list_windows - stateId — opaque ID from the latest screenshot; attach to coordinate-based actions to detect stale state
- stealth execution — action completed via AX without foregrounding the app or moving the real cursor
- semantic state — structured AX tree returned after every action, used instead of screenshots when coverage is sufficient