Opencli opencli-browser

Use when an agent needs to drive a real Chrome window via opencli — inspect a page, fill forms, click through logged-in flows, or extract data ad-hoc. Covers the selector-first target contract, compound form fields, stale-ref handling, network capture, and the agent-native envelopes the CLI returns. Not for writing adapters — see opencli-adapter-author for that.

install

source · Clone the upstream repo

git clone https://github.com/jackwener/OpenCLI

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jackwener/OpenCLI "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/opencli-browser" ~/.claude/skills/jackwener-opencli-opencli-browser && rm -rf "$T"

manifest: skills/opencli-browser/SKILL.md

source content

opencli-browser

The first reader of this CLI is an agent, not a human. Every subcommand returns a structured envelope that tells you exactly what matched, how confident the match is, and what to do if it didn't. Lean on those envelopes — do not guess.

This skill is for driving a live browser to accomplish an agent task. If you are building a reusable adapter under

~/.opencli/clis/<site>/

use

opencli-adapter-author

instead.

Prerequisites

opencli doctor

Until

doctor

is green, nothing else will work. Typical failures: Chrome not running, extension not installed, debug port blocked by 1Password / other extensions. The doctor output tells you which.

Mental model

Selector-first target contract. Every interaction command (
```
click
```
,
```
type
```
,
```
select
```
,
```
get text/value/attributes
```
) takes one
```
<target>
```
, which is either a numeric ref from
```
state
```
/
```
find
```
or a CSS selector. Use
```
--nth <n>
```
to disambiguate multiple CSS matches.
Every envelope reports
matches_n
and
match_level
.
```
match_level
```
is
```
exact
```
,
```
stable
```
, or
```
reidentified
```
— the CLI already rescued moderate DOM drift for you, but the level tells you how confident to be.
Compact output first, full payload on demand.
```
state
```
is a budget-aware snapshot;
```
get html --as json
```
supports
```
--depth/--children-max/--text-max
```
;
```
network
```
returns shape previews and you re-fetch a single body with
```
--detail <key>
```
. If you emit a giant payload you are burning context you did not need to burn.
Structured errors are machine-readable. On failure the CLI emits
```
{error: {code, message, hint?, candidates?}}
```
. Branch on
```
code
```
, not on message strings.

Critical rules

Always inspect before you act. Run
```
state
```
or
```
find
```
first. Never hard-code a ref or selector from memory across sessions — indices are per-snapshot.
Prefer numeric ref over CSS once you have it. Numeric refs survive mild DOM shifts because the CLI fingerprints each tagged element. A CSS selector written by hand will break the first time the site re-renders.
Read
match_level
after every write.
```
exact
```
= all good.
```
stable
```
= the element is the same but some soft attrs drifted — your action still applied.
```
reidentified
```
= the original ref was gone and the CLI found a unique replacement; double-check you hit the right element.
Use the
compound
field for form controls. Do not regex-guess a date format, do not
```
state
```
twice to get the full
```
<select>
```
options list. The compound envelope has the format string, full option list up to 50,
```
options_total
```
for overflow, and
```
accept
```
/
```
multiple
```
for
```
<input type=file>
```
.
Verify writes that matter. After
```
type <target> <text>
```
, run
```
get value <target>
```
. After
```
select
```
, run
```
get value
```
. Autocomplete widgets, React controlled inputs, and masked fields all silently eat characters. The CLI cannot detect this for you.
state
→ action →
state
after a page change. Navigations, form submits, and SPA route changes invalidate refs. Take a fresh snapshot. Do not reuse refs from before the transition.
Chain with
&&
. A chained sequence runs in one shell so refs acquired by the first command stay live for the second. Separate shell invocations lose the session context you just set up.
eval
is read-only. Wrap the JS in an IIFE and return JSON. If you need to change the page, use the structured
```
click
```
/
```
type
```
/
```
select
```
/
```
keys
```
commands instead — they produce structured output and fingerprints,
```
eval
```
does not.
Prefer
network
to screen-scraping. If a page you care about fetches its data from a JSON API, the API is almost always more reliable than scraping the rendered DOM. Capture once, inspect the shape, then
```
--detail <key>
```
the body you need.

Target contract (

<target>

for click / type / select / get text|value|attributes)

<target> ::= <numeric-ref> | <css-selector>

Numeric ref — the
```
[N]
```
index from
```
state
```
or
```
find
```
. Cheap, resilient to soft DOM drift.
CSS selector — anything
```
querySelectorAll
```
accepts. Must be unambiguous on write ops, or pair with
```
--nth <n>
```
.

Envelope on success

{ "clicked": true, "target": "3", "matches_n": 1, "match_level": "exact" }

{ "value": "kalevin@example.com", "matches_n": 1, "match_level": "stable" }

match_level

level meaning you should

exact

Fingerprint agreed on tag + strong IDs with at most one soft drift

Proceed.

stable

Tag + strong IDs still agree, soft signals (aria-label, role, text) drifted

Proceed, but if what you typed/clicked matters, re-check with

get value

state

reidentified

Original ref was gone; a unique live element matched the fingerprint and was re-tagged with the old ref

Double-check you hit the right element before chaining more writes.

Structured error codes

Branch on these, not on the human message:

code	meaning
`not_found`	Numeric ref is no longer in the DOM. Re- `state` .
`stale_ref`	Ref exists but the element at that ref changed identity. Re- `state` .
`invalid_selector`	CSS was rejected by `querySelectorAll` . Fix the selector.
`selector_not_found`	CSS matches 0 elements. Try `find` with a looser selector.
`selector_ambiguous`	CSS matches >1 and no `--nth` . Add `--nth` or narrow the selector.
`selector_nth_out_of_range`	`--nth` beyond match count.
`option_not_found`	`select` couldn't find an option matching that label/value. Error envelope includes `available: string[]` of the real option labels.
`not_a_select`	`select` was called on a non- `<select>` element.

Error envelope always includes

error.code

and

error.message

. Target errors (

selector_not_found

selector_ambiguous

, etc.) often add

error.candidates: string[]

with suggested selectors.

option_not_found

adds

error.available: string[]

instead.

Command reference

Inspect

command	purpose
`browser state`	Snapshot: text tree with `[N]` refs, scroll hints, hidden-interactive hints, `compounds (N):` sidecar for date/select/file refs.
`browser find --css <sel> [--limit N] [--text-max N]`	Run a CSS query and return one entry per match with `{nth, ref, tag, role, text, attrs, visible, compound?}` . Allocates refs for matches the prior snapshot didn't tag. Cheap alternative to `state` when you already know the selector.
`browser frames`	List cross-origin iframe targets. Pass the index to `--frame` on `eval` .
`browser screenshot [path]`	Viewport PNG. No path → base64 to stdout. Prefer `state` when you just need structure.

Get (read-only)

command	returns
`browser get title`	plain text
`browser get url`	plain text
`browser get text <target> [--nth N]`	`{value, matches_n, match_level}`
`browser get value <target> [--nth N]`	`{value, matches_n, match_level}`
`browser get attributes <target> [--nth N]`	`{value: {attr: val, ...}, matches_n, match_level}`
`browser get html [--selector <css>] [--as html\|json] [--depth N] [--children-max N] [--text-max N] [--max N]`	Raw HTML, or structured tree. JSON tree nodes have `{tag, attrs, text, children[], compound?}` . Truncation reported via `truncated: {depth?, children_dropped?, text_truncated?}` .

Interact

command	notes
`browser click <target> [--nth N]`	Returns `{clicked, target, matches_n, match_level}` .
`browser type <target> <text> [--nth N]`	Clicks first, then types. Returns `{typed, text, target, matches_n, match_level, autocomplete}` . `autocomplete: true` means a combobox/datalist popup appeared after typing — you almost always need `keys Enter` or a follow-up `click` on the suggestion to commit the value.
`browser select <target> <option> [--nth N]`	Matches option by label first, then value. Use `compound` from `find` / `state` to see exactly what labels are available.
`browser keys <key>`	`Enter` , `Escape` , `Tab` , `Control+a` , etc. Runs against the focused element.
`browser scroll <direction> [--amount px]`	`up` / `down` . Default amount `500` .

Wait

browser wait selector "<css>" [--timeout ms]    # wait until the selector matches
browser wait text "<substring>" [--timeout ms]  # wait until the text appears
browser wait time <seconds>                     # hard sleep, last resort

Default timeout

ms. SPA routes, login redirects, and lazy-loaded lists need

wait

before

state

get

Extract

browser eval <js> [--frame N]
— Run an expression in the page (or in a cross-origin frame via
```
--frame
```
). Wrap in an IIFE and return JSON. Read-only: no
```
document.forms[0].submit()
```
, no clicks, no navigations. If the result is a string, stdout is the raw string; otherwise it's JSON.

browser extract [--selector <css>] [--chunk-size N] [--start N]
— Markdown extraction of long-form content with a continuation cursor. Returns

{url, title, selector, total_chars, chunk_size, start, end, next_start_char, content}

. Loop on

next_start_char

until it is

null

. Auto-scopes to

<main>

<article>

<body>

if you don't pass

--selector

Network

browser network                        # shape preview + cache key list
browser network --detail <key>         # full body for one cached entry
browser network --filter "field1,field2"  # keep only entries whose body shape contains ALL fields as path segments
browser network --all                  # include static resources (usually noise)
browser network --raw                  # full bodies inline — large; use sparingly
browser network --ttl <ms>             # cache TTL (default 24h)

List entries look like

{key, method, status, url, ct, size, shape, body_truncated?}

. Detail envelope is

{key, url, method, status, ct, size, shape, body, body_truncated?, body_full_size?, body_truncation_reason}

. Cache lives in

~/.opencli/cache/browser-network/

so you can re-inspect without re-triggering the request.

Tabs & session

command	purpose
`browser tab list`	JSON array of `{index, page, url, title, active}` . The `page` string is the tab identity you pass as `<targetId>` to `tab select` / `tab close` , or to `--tab <targetId>` on any subcommand. ( `--tab` 's placeholder is historical — the value is always `page` .)
`browser tab new [url]`	Open a new tab. Prints the new `page` string.
`browser tab select [targetId]`	Make a tab the default. All subcommands accept `--tab <targetId>` to target one without changing the default.
`browser tab close [targetId]`	Close by `page` .
`browser back`	History back on the active tab.
`browser close`	Close the automation window when done.

Compound form controls

Every date/time, select, and file input carries a

compound

field. Use it — do not regex attributes.

Date family

{
  "control": "date",
  "format": "YYYY-MM-DD",
  "current": "2026-04-21",
  "min": "2026-01-01",
  "max": "2026-12-31"
}

control

is one of

date | time | datetime-local | month | week

format

is a concrete template string — type into the field using that exact format, or

select

by label if the site wraps the native input in a custom widget.

Select

{
  "control": "select",
  "multiple": false,
  "current": "United States",
  "options": [
    { "label": "United States", "value": "us", "selected": true },
    { "label": "Canada", "value": "ca" }
  ],
  "options_total": 137
}

options[]

is capped at 50 entries. current
is always correct even when the selected option is past the cap — it's computed by scanning every option, not from the truncated list. If

options_total > options.length

and you need an option that isn't in

options[]

, call

browser select <target> "<label>"

directly — the CLI matches against the live DOM, not the truncated list.

File

{
  "control": "file",
  "multiple": true,
  "current": ["report.pdf", "cover.png"],
  "accept": "application/pdf,image/*"
}

Do not invent file paths. Upload is done via the normal click flow — respect

accept

when telling the user what to upload.

Where compounds show up

```
browser find --css <sel>
```
entries: inline on each match.
```
browser get html --as json
```
tree nodes: inline on matching nodes.
```
browser state
```
snapshot: in a
```
compounds (N):
```
sidecar keyed by numeric ref, so you can tell at a glance which
```
[N]
```
entries have rich metadata.

Cost guide

Think about payload size per call. Budgets exist for a reason.

command	rough cost	when to use
`state`	medium (bounded by internal budget)	First call on any page, after every nav, when you need refs.
`find --css <sel>`	small	You already know the selector — one query, compact entries.
`get title` / `get url`	tiny	Sanity checks between steps.
`get text/value/attributes`	tiny per call	Verifying one specific field.
`get html` (raw)	can be huge	Avoid on unbounded pages. Always pair with `--selector` and a budget.
`get html --as json --depth 3 --children-max 20`	medium	When you need to reason about structure, not a specific field.
`screenshot`	large	Only when the page is visual (CAPTCHA, charts). Prefer `state` .
`extract`	medium per chunk	Long-form reading. Loop via `next_start_char` .
`network` (default)	small	First look at APIs.
`network --detail <key>`	varies	Pull one body.
`network --raw`	huge	Only after `--filter` narrowed the candidate set.
`eval "JSON.stringify(...)"`	controlled	Targeted extraction when none of the above fit.

Rule of thumb: one

state

per page transition, one
find
per follow-up query, one
get
/
click
/
type
per action. If your plan involves >10 calls per page you are probably scraping instead of interacting — consider

extract

network

Chaining rules

Good — one shell, live session:

opencli browser open "https://news.ycombinator.com" \
  && opencli browser state \
  && opencli browser click 3

Bad — each line is a fresh shell, refs from call 1 are already forgotten when call 2 runs. (Only a problem if you rely on shell-scoped state; browser refs themselves persist in-page, but interleaving unrelated shells invites races.) Prefer

&&

when the steps are meant to be atomic.

Never chain a write and then an immediate

state

without a

wait

if the action causes a network round-trip — you will snapshot the pre-response DOM and make bad decisions off stale data.

Recipes

Fill a login form

opencli browser open "https://example.com/login"
opencli browser state                          # find [N] for email, password, submit
opencli browser type 4 "me@example.com"
opencli browser type 5 "hunter2"
opencli browser get value 4                    # verify (autocomplete can eat chars)
opencli browser click 6                        # submit
opencli browser wait selector "[data-testid=account-menu]" --timeout 15000
opencli browser state                          # fresh refs on the logged-in page

Pick from a long dropdown

opencli browser state                          # sidebar shows [12] <select name=country>
opencli browser find --css "select[name=country]"
# the compound.options_total is 137, but compound.current is "" — unselected.
opencli browser select 12 "Uruguay"
opencli browser get value 12                   # { value: "uy", match_level: "exact" }

Scrape a list via network instead of DOM

opencli browser open "https://news.ycombinator.com"
opencli browser network --filter "title,score"
# -> find the /topstories entry, note its key
opencli browser network --detail topstories-a1b2

Read a long article in chunks

opencli browser open "https://blog.example.com/long-post"
opencli browser extract --chunk-size 8000
# -> content + next_start_char: 8000
opencli browser extract --start 8000 --chunk-size 8000
# ...until next_start_char is null

Cross-origin iframe

opencli browser frames
# -> [{"index": 0, "url": "https://checkout.stripe.com/...", ...}]
opencli browser eval "(() => document.querySelector('input[name=cardnumber]')?.value)()" --frame 0

Pitfalls

Do not submit forms via
eval "document.forms[0].submit()"
— modern sites intercept with JS handlers and silently drop the call. Either
```
click
```
the submit button via its ref, or (if you know the GET URL) just
```
open
```
it directly.
Do not reuse refs across a page transition.
```
wait
```
for the new state, then re-
```
state
```
. Old refs will either 404 or (worse)
```
reidentify
```
onto a similarly-shaped element on the new page.
match_level: reidentified
is a warning, not an error. The action went through, but if you are chaining 5 more writes that all depend on that being the right element, verify with a
```
get text
```
or
```
get value
```
before continuing.
Budget-aware commands silently cap.
```
get html --as json
```
with default budgets will return
```
truncated: {...}
```
. If your downstream logic needs the whole subtree, raise
```
--depth
```
/
```
--children-max
```
or tighten the selector.
autocomplete: true
on a
type
response is not an error. It means a suggestion popup is open and your value isn't committed yet. Typically
```
keys Enter
```
to accept the first suggestion, or
```
click
```
the one you want.
network --filter
is AND-semantics on path segments.
```
--filter "title,score"
```
keeps entries whose body shape contains both
```
title
```
and
```
score
```
as path segments, at any depth. It is not a regex.
Screenshots are for humans, not for agents. Use
```
state
```
+
```
find
```
unless the page is genuinely visual (captcha, chart). Screenshots burn tokens and rarely add signal an agent can act on.

Troubleshooting

symptom	fix
`opencli doctor` red: "Browser not connected"	Start Chrome with `--remote-debugging-port=9222` , or rerun the extension install.
`attach failed: chrome-extension://...`	Disable 1Password / other CDP-hungry extensions temporarily.
`selector_not_found` right after `state`	Page mutated. `wait selector "..."` then retry.
`stale_ref` across every command	You are reusing refs from a prior page. Re- `state` .
`click` succeeds but nothing happens	The element is probably a decorative wrapper stealing clicks from the real target. `find --css "..."` with a narrower selector and retry on the inner element.
`type` appears to finish but value is wrong	Autocomplete, masked input, or React controlled re-render. Verify with `get value` . Add `keys Enter` or re-type.
Giant `get html` output	Pass `--selector` + `--as json --depth 3 --children-max 20 --text-max 200` .
Network cache seems stale	Bump `--ttl` down, or let it expire. The cache lives at `~/.opencli/cache/browser-network/` .

Opencli opencli-browser

opencli-browser

Prerequisites

Mental model

Critical rules

Target contract (
`<target>`
for click / type / select / get text|value|attributes)

Envelope on success

match_level

Structured error codes

Command reference

Inspect

Get (read-only)

Interact

Wait

Extract

Network

Tabs & session

Compound form controls

Date family

Select

File

Where compounds show up

Cost guide

Chaining rules

Recipes

Fill a login form

Pick from a long dropdown

Scrape a list via network instead of DOM

Read a long article in chunks

Cross-origin iframe

Pitfalls

Troubleshooting

See also

Opencli opencli-browser

opencli-browser

Prerequisites

Mental model

Critical rules

Target contract (<target> for click / type / select / get text|value|attributes)

Envelope on success

match_level

Structured error codes

Command reference

Inspect

Get (read-only)

Interact

Wait

Extract

Network

Tabs & session

Compound form controls

Date family

Select

File

Where compounds show up

Cost guide

Chaining rules

Recipes

Fill a login form

Pick from a long dropdown

Scrape a list via network instead of DOM

Read a long article in chunks

Cross-origin iframe

Pitfalls

Troubleshooting

See also

Target contract (
`<target>`
for click / type / select / get text|value|attributes)