Opencli opencli-browser

Use when an agent needs to drive a real Chrome window via opencli — inspect a page, fill forms, click through logged-in flows, or extract data ad-hoc. Covers the selector-first target contract, compound form fields, stale-ref handling, network capture, and the agent-native envelopes the CLI returns. Not for writing adapters — see opencli-adapter-author for that.

install
source · Clone the upstream repo
git clone https://github.com/jackwener/OpenCLI
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jackwener/OpenCLI "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/opencli-browser" ~/.claude/skills/jackwener-opencli-opencli-browser && rm -rf "$T"
manifest: skills/opencli-browser/SKILL.md
source content

opencli-browser

The first reader of this CLI is an agent, not a human. Every subcommand returns a structured envelope that tells you exactly what matched, how confident the match is, and what to do if it didn't. Lean on those envelopes — do not guess.

This skill is for driving a live browser to accomplish an agent task. If you are building a reusable adapter under

~/.opencli/clis/<site>/
use
opencli-adapter-author
instead.


Prerequisites

opencli doctor

Until

doctor
is green, nothing else will work. Typical failures: Chrome not running, extension not installed, debug port blocked by 1Password / other extensions. The doctor output tells you which.


Mental model

  1. Selector-first target contract. Every interaction command (
    click
    ,
    type
    ,
    select
    ,
    get text/value/attributes
    ) takes one
    <target>
    , which is either a numeric ref from
    state
    /
    find
    or a CSS selector. Use
    --nth <n>
    to disambiguate multiple CSS matches.
  2. Every envelope reports
    matches_n
    and
    match_level
    .
    match_level
    is
    exact
    ,
    stable
    , or
    reidentified
    — the CLI already rescued moderate DOM drift for you, but the level tells you how confident to be.
  3. Compact output first, full payload on demand.
    state
    is a budget-aware snapshot;
    get html --as json
    supports
    --depth/--children-max/--text-max
    ;
    network
    returns shape previews and you re-fetch a single body with
    --detail <key>
    . If you emit a giant payload you are burning context you did not need to burn.
  4. Structured errors are machine-readable. On failure the CLI emits
    {error: {code, message, hint?, candidates?}}
    . Branch on
    code
    , not on message strings.

Critical rules

  1. Always inspect before you act. Run
    state
    or
    find
    first. Never hard-code a ref or selector from memory across sessions — indices are per-snapshot.
  2. Prefer numeric ref over CSS once you have it. Numeric refs survive mild DOM shifts because the CLI fingerprints each tagged element. A CSS selector written by hand will break the first time the site re-renders.
  3. Read
    match_level
    after every write.
    exact
    = all good.
    stable
    = the element is the same but some soft attrs drifted — your action still applied.
    reidentified
    = the original ref was gone and the CLI found a unique replacement; double-check you hit the right element.
  4. Use the
    compound
    field for form controls.
    Do not regex-guess a date format, do not
    state
    twice to get the full
    <select>
    options list. The compound envelope has the format string, full option list up to 50,
    options_total
    for overflow, and
    accept
    /
    multiple
    for
    <input type=file>
    .
  5. Verify writes that matter. After
    type <target> <text>
    , run
    get value <target>
    . After
    select
    , run
    get value
    . Autocomplete widgets, React controlled inputs, and masked fields all silently eat characters. The CLI cannot detect this for you.
  6. state
    → action →
    state
    after a page change.
    Navigations, form submits, and SPA route changes invalidate refs. Take a fresh snapshot. Do not reuse refs from before the transition.
  7. Chain with
    &&
    .
    A chained sequence runs in one shell so refs acquired by the first command stay live for the second. Separate shell invocations lose the session context you just set up.
  8. eval
    is read-only.
    Wrap the JS in an IIFE and return JSON. If you need to change the page, use the structured
    click
    /
    type
    /
    select
    /
    keys
    commands instead — they produce structured output and fingerprints,
    eval
    does not.
  9. Prefer
    network
    to screen-scraping.
    If a page you care about fetches its data from a JSON API, the API is almost always more reliable than scraping the rendered DOM. Capture once, inspect the shape, then
    --detail <key>
    the body you need.

Target contract (
<target>
for click / type / select / get text|value|attributes)

<target> ::= <numeric-ref> | <css-selector>
  • Numeric ref — the
    [N]
    index from
    state
    or
    find
    . Cheap, resilient to soft DOM drift.
  • CSS selector — anything
    querySelectorAll
    accepts. Must be unambiguous on write ops, or pair with
    --nth <n>
    .

Envelope on success

{ "clicked": true, "target": "3", "matches_n": 1, "match_level": "exact" }
{ "value": "kalevin@example.com", "matches_n": 1, "match_level": "stable" }

match_level

levelmeaningyou should
exact
Fingerprint agreed on tag + strong IDs with at most one soft driftProceed.
stable
Tag + strong IDs still agree, soft signals (aria-label, role, text) driftedProceed, but if what you typed/clicked matters, re-check with
get value
or
state
.
reidentified
Original ref was gone; a unique live element matched the fingerprint and was re-tagged with the old refDouble-check you hit the right element before chaining more writes.

Structured error codes

Branch on these, not on the human message:

codemeaning
not_found
Numeric ref is no longer in the DOM. Re-
state
.
stale_ref
Ref exists but the element at that ref changed identity. Re-
state
.
invalid_selector
CSS was rejected by
querySelectorAll
. Fix the selector.
selector_not_found
CSS matches 0 elements. Try
find
with a looser selector.
selector_ambiguous
CSS matches >1 and no
--nth
. Add
--nth
or narrow the selector.
selector_nth_out_of_range
--nth
beyond match count.
option_not_found
select
couldn't find an option matching that label/value. Error envelope includes
available: string[]
of the real option labels.
not_a_select
select
was called on a non-
<select>
element.

Error envelope always includes

error.code
and
error.message
. Target errors (
selector_not_found
,
selector_ambiguous
, etc.) often add
error.candidates: string[]
with suggested selectors.
option_not_found
adds
error.available: string[]
instead.


Command reference

Inspect

commandpurpose
browser state
Snapshot: text tree with
[N]
refs, scroll hints, hidden-interactive hints,
compounds (N):
sidecar for date/select/file refs.
browser find --css <sel> [--limit N] [--text-max N]
Run a CSS query and return one entry per match with
{nth, ref, tag, role, text, attrs, visible, compound?}
. Allocates refs for matches the prior snapshot didn't tag. Cheap alternative to
state
when you already know the selector.
browser frames
List cross-origin iframe targets. Pass the index to
--frame
on
eval
.
browser screenshot [path]
Viewport PNG. No path → base64 to stdout. Prefer
state
when you just need structure.

Get (read-only)

commandreturns
browser get title
plain text
browser get url
plain text
browser get text <target> [--nth N]
{value, matches_n, match_level}
browser get value <target> [--nth N]
{value, matches_n, match_level}
browser get attributes <target> [--nth N]
{value: {attr: val, ...}, matches_n, match_level}
browser get html [--selector <css>] [--as html|json] [--depth N] [--children-max N] [--text-max N] [--max N]
Raw HTML, or structured tree. JSON tree nodes have
{tag, attrs, text, children[], compound?}
. Truncation reported via
truncated: {depth?, children_dropped?, text_truncated?}
.

Interact

commandnotes
browser click <target> [--nth N]
Returns
{clicked, target, matches_n, match_level}
.
browser type <target> <text> [--nth N]
Clicks first, then types. Returns
{typed, text, target, matches_n, match_level, autocomplete}
.
autocomplete: true
means a combobox/datalist popup appeared after typing — you almost always need
keys Enter
or a follow-up
click
on the suggestion to commit the value.
browser select <target> <option> [--nth N]
Matches option by label first, then value. Use
compound
from
find
/
state
to see exactly what labels are available.
browser keys <key>
Enter
,
Escape
,
Tab
,
Control+a
, etc. Runs against the focused element.
browser scroll <direction> [--amount px]
up
/
down
. Default amount
500
.

Wait

browser wait selector "<css>" [--timeout ms]    # wait until the selector matches
browser wait text "<substring>" [--timeout ms]  # wait until the text appears
browser wait time <seconds>                     # hard sleep, last resort

Default timeout

10000
ms. SPA routes, login redirects, and lazy-loaded lists need
wait
before
state
/
get
.

Extract

  • browser eval <js> [--frame N]
    — Run an expression in the page (or in a cross-origin frame via
    --frame
    ). Wrap in an IIFE and return JSON. Read-only: no
    document.forms[0].submit()
    , no clicks, no navigations. If the result is a string, stdout is the raw string; otherwise it's JSON.
  • browser extract [--selector <css>] [--chunk-size N] [--start N]
    — Markdown extraction of long-form content with a continuation cursor. Returns
    {url, title, selector, total_chars, chunk_size, start, end, next_start_char, content}
    . Loop on
    next_start_char
    until it is
    null
    . Auto-scopes to
    <main>
    /
    <article>
    /
    <body>
    if you don't pass
    --selector
    .

Network

browser network                        # shape preview + cache key list
browser network --detail <key>         # full body for one cached entry
browser network --filter "field1,field2"  # keep only entries whose body shape contains ALL fields as path segments
browser network --all                  # include static resources (usually noise)
browser network --raw                  # full bodies inline — large; use sparingly
browser network --ttl <ms>             # cache TTL (default 24h)

List entries look like

{key, method, status, url, ct, size, shape, body_truncated?}
. Detail envelope is
{key, url, method, status, ct, size, shape, body, body_truncated?, body_full_size?, body_truncation_reason}
. Cache lives in
~/.opencli/cache/browser-network/
so you can re-inspect without re-triggering the request.

Tabs & session

commandpurpose
browser tab list
JSON array of
{index, page, url, title, active}
. The
page
string is the tab identity you pass as
<targetId>
to
tab select
/
tab close
, or to
--tab <targetId>
on any subcommand. (
--tab
's placeholder is historical — the value is always
page
.)
browser tab new [url]
Open a new tab. Prints the new
page
string.
browser tab select [targetId]
Make a tab the default. All subcommands accept
--tab <targetId>
to target one without changing the default.
browser tab close [targetId]
Close by
page
.
browser back
History back on the active tab.
browser close
Close the automation window when done.

Compound form controls

Every date/time, select, and file input carries a

compound
field. Use it — do not regex attributes.

Date family

{
  "control": "date",
  "format": "YYYY-MM-DD",
  "current": "2026-04-21",
  "min": "2026-01-01",
  "max": "2026-12-31"
}

control
is one of
date | time | datetime-local | month | week
.
format
is a concrete template string — type into the field using that exact format, or
select
by label if the site wraps the native input in a custom widget.

Select

{
  "control": "select",
  "multiple": false,
  "current": "United States",
  "options": [
    { "label": "United States", "value": "us", "selected": true },
    { "label": "Canada", "value": "ca" }
  ],
  "options_total": 137
}

options[]
is capped at 50 entries.
current
is always correct
even when the selected option is past the cap — it's computed by scanning every option, not from the truncated list. If
options_total > options.length
and you need an option that isn't in
options[]
, call
browser select <target> "<label>"
directly — the CLI matches against the live DOM, not the truncated list.

File

{
  "control": "file",
  "multiple": true,
  "current": ["report.pdf", "cover.png"],
  "accept": "application/pdf,image/*"
}

Do not invent file paths. Upload is done via the normal click flow — respect

accept
when telling the user what to upload.

Where compounds show up

  • browser find --css <sel>
    entries: inline on each match.
  • browser get html --as json
    tree nodes: inline on matching nodes.
  • browser state
    snapshot: in a
    compounds (N):
    sidecar keyed by numeric ref, so you can tell at a glance which
    [N]
    entries have rich metadata.

Cost guide

Think about payload size per call. Budgets exist for a reason.

commandrough costwhen to use
state
medium (bounded by internal budget)First call on any page, after every nav, when you need refs.
find --css <sel>
smallYou already know the selector — one query, compact entries.
get title
/
get url
tinySanity checks between steps.
get text/value/attributes
tiny per callVerifying one specific field.
get html
(raw)
can be hugeAvoid on unbounded pages. Always pair with
--selector
and a budget.
get html --as json --depth 3 --children-max 20
mediumWhen you need to reason about structure, not a specific field.
screenshot
largeOnly when the page is visual (CAPTCHA, charts). Prefer
state
.
extract
medium per chunkLong-form reading. Loop via
next_start_char
.
network
(default)
smallFirst look at APIs.
network --detail <key>
variesPull one body.
network --raw
hugeOnly after
--filter
narrowed the candidate set.
eval "JSON.stringify(...)"
controlledTargeted extraction when none of the above fit.

Rule of thumb: one

state
per page transition, one
find
per follow-up query, one
get
/
click
/
type
per action.
If your plan involves >10 calls per page you are probably scraping instead of interacting — consider
extract
or
network
.


Chaining rules

Good — one shell, live session:

opencli browser open "https://news.ycombinator.com" \
  && opencli browser state \
  && opencli browser click 3

Bad — each line is a fresh shell, refs from call 1 are already forgotten when call 2 runs. (Only a problem if you rely on shell-scoped state; browser refs themselves persist in-page, but interleaving unrelated shells invites races.) Prefer

&&
when the steps are meant to be atomic.

Never chain a write and then an immediate

state
without a
wait
if the action causes a network round-trip — you will snapshot the pre-response DOM and make bad decisions off stale data.


Recipes

Fill a login form

opencli browser open "https://example.com/login"
opencli browser state                          # find [N] for email, password, submit
opencli browser type 4 "me@example.com"
opencli browser type 5 "hunter2"
opencli browser get value 4                    # verify (autocomplete can eat chars)
opencli browser click 6                        # submit
opencli browser wait selector "[data-testid=account-menu]" --timeout 15000
opencli browser state                          # fresh refs on the logged-in page

Pick from a long dropdown

opencli browser state                          # sidebar shows [12] <select name=country>
opencli browser find --css "select[name=country]"
# the compound.options_total is 137, but compound.current is "" — unselected.
opencli browser select 12 "Uruguay"
opencli browser get value 12                   # { value: "uy", match_level: "exact" }

Scrape a list via network instead of DOM

opencli browser open "https://news.ycombinator.com"
opencli browser network --filter "title,score"
# -> find the /topstories entry, note its key
opencli browser network --detail topstories-a1b2

Read a long article in chunks

opencli browser open "https://blog.example.com/long-post"
opencli browser extract --chunk-size 8000
# -> content + next_start_char: 8000
opencli browser extract --start 8000 --chunk-size 8000
# ...until next_start_char is null

Cross-origin iframe

opencli browser frames
# -> [{"index": 0, "url": "https://checkout.stripe.com/...", ...}]
opencli browser eval "(() => document.querySelector('input[name=cardnumber]')?.value)()" --frame 0

Pitfalls

  • Do not submit forms via
    eval "document.forms[0].submit()"
    — modern sites intercept with JS handlers and silently drop the call. Either
    click
    the submit button via its ref, or (if you know the GET URL) just
    open
    it directly.
  • Do not reuse refs across a page transition.
    wait
    for the new state, then re-
    state
    . Old refs will either 404 or (worse)
    reidentify
    onto a similarly-shaped element on the new page.
  • match_level: reidentified
    is a warning, not an error.
    The action went through, but if you are chaining 5 more writes that all depend on that being the right element, verify with a
    get text
    or
    get value
    before continuing.
  • Budget-aware commands silently cap.
    get html --as json
    with default budgets will return
    truncated: {...}
    . If your downstream logic needs the whole subtree, raise
    --depth
    /
    --children-max
    or tighten the selector.
  • autocomplete: true
    on a
    type
    response is not an error.
    It means a suggestion popup is open and your value isn't committed yet. Typically
    keys Enter
    to accept the first suggestion, or
    click
    the one you want.
  • network --filter
    is AND-semantics on path segments.
    --filter "title,score"
    keeps entries whose body shape contains both
    title
    and
    score
    as path segments, at any depth. It is not a regex.
  • Screenshots are for humans, not for agents. Use
    state
    +
    find
    unless the page is genuinely visual (captcha, chart). Screenshots burn tokens and rarely add signal an agent can act on.

Troubleshooting

symptomfix
opencli doctor
red: "Browser not connected"
Start Chrome with
--remote-debugging-port=9222
, or rerun the extension install.
attach failed: chrome-extension://...
Disable 1Password / other CDP-hungry extensions temporarily.
selector_not_found
right after
state
Page mutated.
wait selector "..."
then retry.
stale_ref
across every command
You are reusing refs from a prior page. Re-
state
.
click
succeeds but nothing happens
The element is probably a decorative wrapper stealing clicks from the real target.
find --css "..."
with a narrower selector and retry on the inner element.
type
appears to finish but value is wrong
Autocomplete, masked input, or React controlled re-render. Verify with
get value
. Add
keys Enter
or re-type.
Giant
get html
output
Pass
--selector
+
--as json --depth 3 --children-max 20 --text-max 200
.
Network cache seems staleBump
--ttl
down, or let it expire. The cache lives at
~/.opencli/cache/browser-network/
.

See also

  • opencli-adapter-author
    — turning what you just figured out into a reusable
    ~/.opencli/clis/<site>/<command>.js
    .
  • opencli-autofix
    — when an existing adapter breaks, this skill walks you through
    OPENCLI_DIAGNOSTIC
    and filing a fix.