Learn-skills.dev agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/abpai/skills/agent-browser" ~/.claude/skills/neversight-learn-skills-dev-agent-browser-b05826 && rm -rf "$T"
data/skills-md/abpai/skills/agent-browser/SKILL.mdBrowser Automation with agent-browser
Core Workflow
Every browser automation follows this pattern:
- Connect (if
is set):$AGENT_BROWSER_CDP_PORT
— see CDP section belowagent-browser connect $AGENT_BROWSER_CDP_PORT - Navigate:
agent-browser open <url> - Snapshot:
(get element refs likeagent-browser snapshot -i
,@e1
)@e2 - Interact: Use refs to click, fill, select
- Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form agent-browser get url agent-browser snapshot -i # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit" agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result
Open Confirmation (Required)
After every
open, confirm the browser actually reached the target page before continuing:
agent-browser open <url> agent-browser get url agent-browser get title
If
get url is still about:blank, the page closes immediately, or load does not stabilize, stop and report this to the user before taking more actions.
CDP Connection (Authenticated Browser)
If
AGENT_BROWSER_CDP_PORT is set, connect to an existing Chrome instance instead of
launching a fresh browser. This preserves logged-in sessions, cookies, and extensions.
Connect
At the start of any browser session, connect to the CDP browser:
agent-browser connect $AGENT_BROWSER_CDP_PORT
If the connection fails (Chrome not running), launch it first:
# Check if browser is listening lsof -i :$AGENT_BROWSER_CDP_PORT -sTCP:LISTEN >/dev/null 2>&1 # If not running, launch via AGENT_BROWSER_CDP_LAUNCH if [ -n "$AGENT_BROWSER_CDP_LAUNCH" ]; then eval "$AGENT_BROWSER_CDP_LAUNCH" & sleep 3 fi # Then connect agent-browser connect $AGENT_BROWSER_CDP_PORT
Command-Runner Note (Isolated)
In some command-runner environments, launching Chrome with direct binary +
& can be cleaned up when that command exits. If this happens, use a detached macOS launch command and verify the listener before connecting:
if ! lsof -i :$AGENT_BROWSER_CDP_PORT -sTCP:LISTEN >/dev/null 2>&1; then open -na "Google Chrome" --args \ --remote-debugging-port="$AGENT_BROWSER_CDP_PORT" \ --user-data-dir="$HOME/Projects/ai-chrome-profile" sleep 3 fi lsof -i :$AGENT_BROWSER_CDP_PORT -sTCP:LISTEN >/dev/null 2>&1 agent-browser connect $AGENT_BROWSER_CDP_PORT
After connect succeeds, verify navigation worked (do not assume it did):
agent-browser open https://example.com agent-browser wait --load networkidle agent-browser get url
If
AGENT_BROWSER_CDP_PORT is set but CDP still fails after retrying launch/connect, do not silently switch methods. Explain the failure to the user and ask for a fallback choice:
:headed fresh sessionagent-browser --headed open <url>
:headless fresh sessionagent-browser open <url>
Example message:
I couldn't connect through CDP on port $AGENT_BROWSER_CDP_PORT after retrying. Would you like me to continue with a headed fresh browser session or a headless fresh session?
Once connected, all commands work normally — no extra flags needed:
agent-browser open https://example.com agent-browser snapshot -i agent-browser click @e1 agent-browser screenshot
For workflows split across separate command invocations,
--cdp per command can be more robust:
agent-browser --cdp $AGENT_BROWSER_CDP_PORT open https://example.com agent-browser --cdp $AGENT_BROWSER_CDP_PORT snapshot -i agent-browser --cdp $AGENT_BROWSER_CDP_PORT click @e1
Clean Session (No Profile)
When the user explicitly asks for a "clean", "fresh", or "incognito" session, skip CDP — use the default ephemeral browser regardless of env vars:
agent-browser open <url> # Without connect = fresh browser
Essential Commands
# Navigation agent-browser open <url> # Navigate (aliases: goto, navigate) agent-browser close # Close browser # Snapshot agent-browser snapshot -i # Interactive elements with refs (recommended) agent-browser snapshot -s "#selector" # Scope to CSS selector # Interaction (use @refs from snapshot) agent-browser click @e1 # Click element agent-browser fill @e2 "text" # Clear and type text agent-browser type @e2 "text" # Type without clearing agent-browser select @e1 "option" # Select dropdown option agent-browser check @e1 # Check checkbox agent-browser press Enter # Press key agent-browser scroll down 500 # Scroll page # Get information agent-browser get text @e1 # Get element text agent-browser get url # Get current URL agent-browser get title # Get page title # Wait agent-browser wait @e1 # Wait for element agent-browser wait --load networkidle # Wait for network idle agent-browser wait --url "**/page" # Wait for URL pattern agent-browser wait 2000 # Wait milliseconds # Capture agent-browser screenshot # Screenshot to temp dir agent-browser screenshot --full # Full page screenshot agent-browser pdf output.pdf # Save as PDF
Common Patterns
Form Submission
agent-browser open https://example.com/signup agent-browser snapshot -i agent-browser fill @e1 "Jane Doe" agent-browser fill @e2 "jane@example.com" agent-browser select @e3 "California" agent-browser check @e4 agent-browser click @e5 agent-browser wait --load networkidle
Authentication with State Persistence
# Login once and save state agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "$USERNAME" agent-browser fill @e2 "$PASSWORD" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json # Reuse in future sessions agent-browser state load auth.json agent-browser open https://app.example.com/dashboard
Data Extraction
agent-browser open https://example.com/products agent-browser snapshot -i agent-browser get text @e5 # Get specific element text agent-browser get text body > page.txt # Get all page text # JSON output for parsing agent-browser snapshot -i --json agent-browser get text @e1 --json
Parallel Sessions
agent-browser --session site1 open https://site-a.com agent-browser --session site2 open https://site-b.com agent-browser --session site1 snapshot -i agent-browser --session site2 snapshot -i agent-browser session list
Visual Browser (Debugging)
agent-browser --headed open https://example.com agent-browser highlight @e1 # Highlight element agent-browser record start demo.webm # Record session
iOS Simulator (Mobile Safari)
# List available iOS simulators agent-browser device list # Launch Safari on a specific device agent-browser -p ios --device "iPhone 16 Pro" open https://example.com # Same workflow as desktop - snapshot, interact, re-snapshot agent-browser -p ios snapshot -i agent-browser -p ios tap @e1 # Tap (alias for click) agent-browser -p ios fill @e2 "text" agent-browser -p ios swipe up # Mobile-specific gesture # Take screenshot agent-browser -p ios screenshot mobile.png # Close session (shuts down simulator) agent-browser -p ios close
Requirements: macOS with Xcode, Appium (
npm install -g appium && appium driver install xcuitest)
Real devices: Works with physical iOS devices if pre-configured. Use
--device "<UDID>" where UDID is from xcrun xctrace list devices.
Ref Lifecycle (Important)
Refs (
@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)
agent-browser click @e5 # Navigates to new page agent-browser snapshot -i # MUST re-snapshot agent-browser click @e1 # Use new refs
Semantic Locators (Alternative to Refs)
When refs are unavailable or unreliable, use semantic locators:
agent-browser find text "Sign In" click agent-browser find label "Email" fill "user@test.com" agent-browser find role button click --name "Submit" agent-browser find placeholder "Search" type "query" agent-browser find testid "submit-btn" click
Deep-Dive Documentation
| Reference | When to Use |
|---|---|
| references/commands.md | Full command reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Parallel sessions, state persistence, concurrent scraping |
| references/authentication.md | Login flows, OAuth, 2FA handling, state reuse |
| references/video-recording.md | Recording workflows for debugging and documentation |
| references/proxy-support.md | Proxy configuration, geo-testing, rotating proxies |
Ready-to-Use Templates
| Template | Description |
|---|---|
| templates/form-automation.sh | Form filling with validation |
| templates/authenticated-session.sh | Login once, reuse state |
| templates/capture-workflow.sh | Content extraction with screenshots |
./templates/form-automation.sh https://example.com/form ./templates/authenticated-session.sh https://app.example.com/login ./templates/capture-workflow.sh https://example.com ./output