Claude-prime agent-browser

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

install

source · Clone the upstream repo

git clone https://github.com/avibebuilder/claude-prime

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/avibebuilder/claude-prime "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/agent-browser" ~/.claude/skills/avibebuilder-claude-prime-agent-browser && rm -rf "$T"

manifest: .claude/skills/agent-browser/SKILL.md

source content

Browser Automation with agent-browser

Use this skill to drive websites through the

agent-browser

CLI. Keep the main loop tight: inspect the page, act with refs, verify the result, and only pull deeper docs when the task actually needs them.

Core workflow

Prefer

agent-browser

directly for speed. Use

npx agent-browser

only if it is not installed globally.

For most tasks, follow this loop:

Open the page
Wait for the relevant state
Snapshot with refs
Interact using those refs
Re-snapshot after page or DOM changes
Verify the outcome
Close the session when done

agent-browser open https://example.com/form
agent-browser wait --load networkidle
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i

Chain commands with

&&

only when you do not need to inspect intermediate output. Good:

open && wait && screenshot

. Bad:

snapshot && click

when you still need to read the refs from the snapshot.

Route fast

Read only the reference that matches the task:

Need	Read
Full command or flag lookup	references/commands.md
Ref lifecycle, stale refs, snapshot strategy	references/snapshot-refs.md
Login flows, OAuth, 2FA, saved auth state	references/authentication.md
Parallel sessions, state reuse, cleanup	references/session-management.md
Recording, profiling, local files, config, iOS, security	references/advanced-usage.md
Proxy setup	references/proxy-support.md
Recording workflows	references/video-recording.md
Profiling workflows	references/profiling.md

Golden path commands

Use these first; go to the command reference only when you need something more specific.

agent-browser open <url>
agent-browser wait --load networkidle
agent-browser snapshot -i
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser select @e3 "option"
agent-browser get url
agent-browser get text @e1
agent-browser diff snapshot
agent-browser screenshot --annotate
agent-browser close

Refs are the default interaction model

The main value of

agent-browser

is that snapshots produce compact refs like

@e1

@e2

@e3

. Those refs are cheaper and more reliable than repeatedly reasoning from raw HTML or long selectors.

Treat refs as short-lived. Re-snapshot after anything that can change the page state, especially:

navigation
form submission
opening dropdowns or modals
lazy-loaded or client-rendered content

If a ref fails or the page looks different from what you expected, your next move is usually

agent-browser snapshot -i

, not another blind click.

For the full lifecycle and troubleshooting rules, read references/snapshot-refs.md.

Choose the lightest tool that still proves the result

Default order:

```
snapshot -i
```
for structure and interactive targets
```
get text
```
,
```
get url
```
, or
```
get title
```
for precise verification
```
diff snapshot
```
when you need to confirm something changed
```
screenshot --annotate
```
when layout, icon-only controls, canvas, or visual context matters
semantic locators or
```
eval
```
only when refs are unavailable or the task truly needs them

If you need semantic locators, JavaScript evaluation, local file access, annotated screenshots, or config details, jump to references/advanced-usage.md and references/commands.md.

Authentication: decide sensitivity first

Before filling any credential, classify the auth flow.

Non-sensitive: localhost, staging, test accounts, or credentials the user explicitly provided for this task. The agent can usually fill these directly.
Sensitive: production domains, real user accounts, OAuth/SSO, or anything where the agent should not handle the secret. In that case, reach the auth step, switch to a headed browser if needed, and let the user complete sign-in manually.

After either path succeeds, offer to save reusable state if it would help next time. Do not auto-save credentials or session state without asking.

Use references/authentication.md for the exact decision rules and storage patterns.

Sessions: isolate work on purpose

Use named sessions when you are:

running parallel browser tasks
comparing two sites or variants
preserving auth state for reuse
avoiding interference across agents

When multiple agents may browse concurrently, use a named session from the start and close it explicitly when finished. Prefer semantic names over generic ones.

For session reuse and cleanup patterns, read references/session-management.md.

Security defaults for AI-driven browsing

If the page is untrusted or may contain hostile content, enable content boundaries before inspecting rich output. If the task is scoped to a known target, consider an allowlist for trusted domains.

The main point is to keep page content clearly separated from tool output and to narrow where the browser is allowed to go when the task permits it.

See references/advanced-usage.md for content boundaries, domain allowlists, action policy, and output limits.

Failure modes to catch early

Stale refs: you interacted after the page changed without re-snapshotting
Missing waits: you captured or clicked before async content settled
Visual-only UI: text snapshots missed icon buttons, canvas, or spatial layout
Shell quoting in
eval
: use the safer patterns from the advanced usage reference
Leaked sessions: you forgot to
```
close
```
the browser session after finishing

For worked examples and reusable flows, use the scripts in templates/ and the deeper references instead of expanding the hub.