Learn-skills.dev agent-browser

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/abpai/skills/agent-browser" ~/.claude/skills/neversight-learn-skills-dev-agent-browser-b05826 && rm -rf "$T"
manifest: data/skills-md/abpai/skills/agent-browser/SKILL.md
source content

Browser Automation with agent-browser

Core Workflow

Every browser automation follows this pattern:

  1. Connect (if
    $AGENT_BROWSER_CDP_PORT
    is set):
    agent-browser connect $AGENT_BROWSER_CDP_PORT
    — see CDP section below
  2. Navigate:
    agent-browser open <url>
  3. Snapshot:
    agent-browser snapshot -i
    (get element refs like
    @e1
    ,
    @e2
    )
  4. Interact: Use refs to click, fill, select
  5. Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form
agent-browser get url
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i  # Check result

Open Confirmation (Required)

After every

open
, confirm the browser actually reached the target page before continuing:

agent-browser open <url>
agent-browser get url
agent-browser get title

If

get url
is still
about:blank
, the page closes immediately, or load does not stabilize, stop and report this to the user before taking more actions.

CDP Connection (Authenticated Browser)

If

AGENT_BROWSER_CDP_PORT
is set, connect to an existing Chrome instance instead of launching a fresh browser. This preserves logged-in sessions, cookies, and extensions.

Connect

At the start of any browser session, connect to the CDP browser:

agent-browser connect $AGENT_BROWSER_CDP_PORT

If the connection fails (Chrome not running), launch it first:

# Check if browser is listening
lsof -i :$AGENT_BROWSER_CDP_PORT -sTCP:LISTEN >/dev/null 2>&1

# If not running, launch via AGENT_BROWSER_CDP_LAUNCH
if [ -n "$AGENT_BROWSER_CDP_LAUNCH" ]; then
  eval "$AGENT_BROWSER_CDP_LAUNCH" &
  sleep 3
fi

# Then connect
agent-browser connect $AGENT_BROWSER_CDP_PORT

Command-Runner Note (Isolated)

In some command-runner environments, launching Chrome with direct binary +

&
can be cleaned up when that command exits. If this happens, use a detached macOS launch command and verify the listener before connecting:

if ! lsof -i :$AGENT_BROWSER_CDP_PORT -sTCP:LISTEN >/dev/null 2>&1; then
  open -na "Google Chrome" --args \
    --remote-debugging-port="$AGENT_BROWSER_CDP_PORT" \
    --user-data-dir="$HOME/Projects/ai-chrome-profile"
  sleep 3
fi

lsof -i :$AGENT_BROWSER_CDP_PORT -sTCP:LISTEN >/dev/null 2>&1
agent-browser connect $AGENT_BROWSER_CDP_PORT

After connect succeeds, verify navigation worked (do not assume it did):

agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser get url

If

AGENT_BROWSER_CDP_PORT
is set but CDP still fails after retrying launch/connect, do not silently switch methods. Explain the failure to the user and ask for a fallback choice:

  • headed fresh session
    :
    agent-browser --headed open <url>
  • headless fresh session
    :
    agent-browser open <url>

Example message:

I couldn't connect through CDP on port $AGENT_BROWSER_CDP_PORT after retrying. Would you like me to continue with a headed fresh browser session or a headless fresh session?

Once connected, all commands work normally — no extra flags needed:

agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e1
agent-browser screenshot

For workflows split across separate command invocations,

--cdp
per command can be more robust:

agent-browser --cdp $AGENT_BROWSER_CDP_PORT open https://example.com
agent-browser --cdp $AGENT_BROWSER_CDP_PORT snapshot -i
agent-browser --cdp $AGENT_BROWSER_CDP_PORT click @e1

Clean Session (No Profile)

When the user explicitly asks for a "clean", "fresh", or "incognito" session, skip CDP — use the default ephemeral browser regardless of env vars:

agent-browser open <url>          # Without connect = fresh browser

Essential Commands

# Navigation
agent-browser open <url>              # Navigate (aliases: goto, navigate)
agent-browser close                   # Close browser

# Snapshot
agent-browser snapshot -i             # Interactive elements with refs (recommended)
agent-browser snapshot -s "#selector" # Scope to CSS selector

# Interaction (use @refs from snapshot)
agent-browser click @e1               # Click element
agent-browser fill @e2 "text"         # Clear and type text
agent-browser type @e2 "text"         # Type without clearing
agent-browser select @e1 "option"     # Select dropdown option
agent-browser check @e1               # Check checkbox
agent-browser press Enter             # Press key
agent-browser scroll down 500         # Scroll page

# Get information
agent-browser get text @e1            # Get element text
agent-browser get url                 # Get current URL
agent-browser get title               # Get page title

# Wait
agent-browser wait @e1                # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page"    # Wait for URL pattern
agent-browser wait 2000               # Wait milliseconds

# Capture
agent-browser screenshot              # Screenshot to temp dir
agent-browser screenshot --full       # Full page screenshot
agent-browser pdf output.pdf          # Save as PDF

Common Patterns

Form Submission

agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle

Authentication with State Persistence

# Login once and save state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json

# Reuse in future sessions
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard

Data Extraction

agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5           # Get specific element text
agent-browser get text body > page.txt  # Get all page text

# JSON output for parsing
agent-browser snapshot -i --json
agent-browser get text @e1 --json

Parallel Sessions

agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com

agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i

agent-browser session list

Visual Browser (Debugging)

agent-browser --headed open https://example.com
agent-browser highlight @e1          # Highlight element
agent-browser record start demo.webm # Record session

iOS Simulator (Mobile Safari)

# List available iOS simulators
agent-browser device list

# Launch Safari on a specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com

# Same workflow as desktop - snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1          # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up         # Mobile-specific gesture

# Take screenshot
agent-browser -p ios screenshot mobile.png

# Close session (shuts down simulator)
agent-browser -p ios close

Requirements: macOS with Xcode, Appium (

npm install -g appium && appium driver install xcuitest
)

Real devices: Works with physical iOS devices if pre-configured. Use

--device "<UDID>"
where UDID is from
xcrun xctrace list devices
.

Ref Lifecycle (Important)

Refs (

@e1
,
@e2
, etc.) are invalidated when the page changes. Always re-snapshot after:

  • Clicking links or buttons that navigate
  • Form submissions
  • Dynamic content loading (dropdowns, modals)
agent-browser click @e5              # Navigates to new page
agent-browser snapshot -i            # MUST re-snapshot
agent-browser click @e1              # Use new refs

Semantic Locators (Alternative to Refs)

When refs are unavailable or unreliable, use semantic locators:

agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click

Deep-Dive Documentation

ReferenceWhen to Use
references/commands.mdFull command reference with all options
references/snapshot-refs.mdRef lifecycle, invalidation rules, troubleshooting
references/session-management.mdParallel sessions, state persistence, concurrent scraping
references/authentication.mdLogin flows, OAuth, 2FA handling, state reuse
references/video-recording.mdRecording workflows for debugging and documentation
references/proxy-support.mdProxy configuration, geo-testing, rotating proxies

Ready-to-Use Templates

TemplateDescription
templates/form-automation.shForm filling with validation
templates/authenticated-session.shLogin once, reuse state
templates/capture-workflow.shContent extraction with screenshots
./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output