Pinchtab pinchtab

Name: pinchtab
Author: pinchtab

Use this skill when a task needs browser automation through PinchTab: open a website, inspect interactive elements, click through flows, fill out forms, scrape page text, log into sites with a persistent profile, export screenshots or PDFs, manage multiple browser instances, or fall back to the HTTP API when the CLI is unavailable. Prefer this skill for token-efficient browser work driven by stable accessibility refs such as `e5` and `e12`.

install

source · Clone the upstream repo

git clone https://github.com/pinchtab/pinchtab

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/pinchtab/pinchtab "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/pinchtab" ~/.claude/skills/pinchtab-pinchtab-pinchtab && rm -rf "$T"

manifest: skills/pinchtab/SKILL.md

source content

Browser Automation with PinchTab

PinchTab gives agents a browser they can drive through stable accessibility refs, low-token text extraction, and persistent profiles or instances. Treat it as a CLI-first browser skill; use the HTTP API only when the CLI is unavailable or you need profile-management routes that do not exist in the CLI yet.

Preferred tool surface:

Use
```
pinchtab
```
CLI commands first.
Use
```
curl
```
for profile-management routes or non-shell/API fallback flows.
Use
```
jq
```
only when you need structured parsing from JSON responses.

Core Workflow

Every PinchTab automation follows this pattern:

Ensure the correct server, profile, or instance is available for the task.

Navigate with

pinchtab nav <url>

pinchtab instance navigate <instance-id> <url>

Observe with
```
pinchtab snap -i -c
```
,
```
pinchtab snap --text
```
, or
```
pinchtab text
```
, then collect the current refs such as
```
e5
```
.
Interact with those fresh refs using
```
click
```
,
```
fill
```
,
```
type
```
,
```
press
```
,
```
select
```
,
```
hover
```
, or
```
scroll
```
.
Re-snapshot or re-read text after any navigation, submit, modal open, accordion expand, or other DOM-changing action.

Rules:

Never act on stale refs after the page changes.
Default to
```
pinchtab text
```
when you need content, not layout.
Default to
```
pinchtab snap -i -c
```
when you need actionable elements.
Use screenshots only for visual verification, UI diffs, or debugging.
Start multi-site or parallel work by choosing the right instance or profile first.

Selectors

PinchTab uses a unified selector system. Any command that targets an element accepts these formats:

Selector	Example	Resolves via
Ref	`e5`	Snapshot cache (fastest)
CSS	`#login` , `.btn` , `[data-testid="x"]`	`document.querySelector`
XPath	`xpath://button[@id="submit"]`	CDP search
Text	`text:Sign In`	Visible text match
Semantic	`find:login button`	Natural language query via `/find`

Auto-detection: bare

e5

→ ref,

#id

.class

[attr]

→ CSS,

//path

→ XPath. Use explicit prefixes (

css:

xpath:

text:

find:

) when auto-detection is ambiguous.

pinchtab click e5                        # ref
pinchtab click "#submit"                 # CSS (auto-detected)
pinchtab click "text:Sign In"            # text match
pinchtab click "xpath://button[@type]"   # XPath
pinchtab fill "#email" "user@test.com"   # CSS
pinchtab fill e3 "user@test.com"         # ref

Same syntax in HTTP API via

selector

field. Legacy

ref

field still accepted.

Command Chaining

Use

&&

when you don't need intermediate output:

pinchtab nav <url> && pinchtab snap -i -c

. Run separately when you must read refs before acting.

Challenge Solving

If a page shows a challenge instead of content (e.g., "Just a moment..."), call

POST /solve

with

{"maxAttempts": 3}

to auto-detect and resolve it. Use

POST /tabs/TAB_ID/solve

for tab-scoped. Works best with

stealthLevel: "full"

in config. Safe to call speculatively — returns immediately if no challenge is present. See api.md for full solver options.

Handling Authentication and State

Patterns: (1) One-off:

pinchtab instance start

→

--server http://localhost:<port>

. (2) Reuse profile:

pinchtab instance start --profile work --mode headed

→ switch to headless after login. (3) HTTP:

POST /profiles

, then

POST /profiles/<name>/start

. (4) Human-assisted: headed login, then agent reuses headless.

Agent sessions:

pinchtab session create --agent-id <id>

POST /sessions

→ set

PINCHTAB_SESSION=ses_...

Essential Commands

Server and targeting

pinchtab server                                     # Start server foreground
pinchtab daemon install                             # Install as system service
pinchtab health                                     # Check server status
pinchtab instances                                  # List running instances
pinchtab profiles                                   # List available profiles
pinchtab --server http://localhost:9868 snap -i -c  # Target specific instance

Navigation and tabs

pinchtab nav <url>
pinchtab nav <url> --new-tab
pinchtab nav <url> --tab <tab-id>
pinchtab nav <url> --block-images
pinchtab nav <url> --block-ads
pinchtab nav <url> --print-tab-id                   # Print only the new tabId on stdout
pinchtab back                                       # Navigate back in history
pinchtab forward                                    # Navigate forward
pinchtab reload                                     # Reload current page
pinchtab tab                                        # List tabs (no subcommand - just `tab`)
pinchtab tab <tab-id>                               # Focus an existing tab
pinchtab tab new <url>                              # Open a new tab
pinchtab tab close <tab-id>                         # Close a tab — use this to clean up stale tabs between runs
pinchtab instance navigate <instance-id> <url>

Tab workflow — most commands target the active tab by default, so single-tab flows need no plumbing:

pinchtab nav http://example.com
pinchtab snap -i -c      # active tab
pinchtab click e5        # active tab
pinchtab text            # active tab

When you need to pin to a specific tab (parallel tabs, long-running flows, or shell-isolated runners like agent tool calls where env vars don't persist across invocations), capture the tab ID with

--print-tab-id

and pass

--tab

on every subsequent command:

TAB_ID=$(pinchtab nav http://example.com --print-tab-id)
pinchtab --tab "$TAB_ID" snap -i -c
pinchtab --tab "$TAB_ID" click e5
pinchtab --tab "$TAB_ID" text

Within a single shell session you can also

export PINCHTAB_TAB=$(pinchtab nav URL --print-tab-id)

and drop the

--tab

flag. This does not survive across separate shell invocations (each

Bash

tool call in an agent runs a fresh shell), so prefer explicit

--tab

for agent workflows.

Priority:

--tab <id>

flag >

PINCHTAB_TAB

env var > active tab.

Observation

pinchtab snap
pinchtab snap -i                                    # Interactive elements only
pinchtab snap -i -c                                 # Interactive + compact
pinchtab snap -d                                    # Diff from previous snapshot
pinchtab snap --selector <css>                      # Scope to CSS selector
pinchtab snap --max-tokens <n>                      # Token budget limit
pinchtab snap --text                                # Text output format
pinchtab text                                       # Page text content (Readability-filtered; drops nav/repeated headlines)
pinchtab text --full                                # Full page text (document.body.innerText) — use when Readability is dropping content you need
pinchtab text --raw                                 # Alias of --full
# CLI returns JSON; use `| jq -r .text` for plain text
pinchtab find <query>                               # Semantic element search
pinchtab find --ref-only <query>                    # Return refs only

Guidance:

```
snap -i -c
```
is the default for finding actionable refs.
```
snap -d
```
is the default follow-up snapshot for multi-step flows.
```
text
```
is the default for reading articles, dashboards, reports, or confirmation messages.
pinchtab find <query>
is the direct route when you already know the semantic target (e.g. "login button", "email input", "accept cookies link") — skips the full snapshot and returns a ranked match with its ref. Pair with
```
--ref-only
```
on large/dense pages to get just the ref string for piping straight into
```
click
```
/
```
fill
```
/
```
type
```
. Prefer
```
find
```
over
```
snap -i -c
```
+ visual scan whenever you can describe the target in a phrase.
Refs from
```
snap -i
```
and full
```
snap
```
use different numbering. Do not mix them — if you snapshot with
```
-i
```
, use those refs. If you re-snapshot without
```
-i
```
, get fresh refs before acting.

Interaction

All interaction commands accept unified selectors (refs, CSS, XPath, text, semantic). See the Selectors section above.

pinchtab click <selector>                           # Click element
pinchtab click --wait-nav <selector>                # Click and wait for navigation
pinchtab click --x 100 --y 200                      # Click by coordinates
pinchtab click <selector> --dialog-action accept    # Click + auto-accept any alert/confirm the click opens
pinchtab click <selector> --dialog-action dismiss   # Click + auto-dismiss
pinchtab click <selector> --dialog-action accept \
    --dialog-text "hello"                           # Click + accept a prompt() with a response
pinchtab dblclick <selector>                        # Double-click element
pinchtab mouse move <selector>                      # Move pointer to element center
pinchtab mouse move <x> <y>                         # Move pointer to coordinates
pinchtab mouse down <selector> --button left        # Press a mouse button at an explicit target
pinchtab mouse down --button left                   # Press a mouse button at current pointer
pinchtab mouse up <selector> --button left          # Release a mouse button at an explicit target
pinchtab mouse up --button left                     # Release a mouse button at current pointer
pinchtab mouse wheel 240 --dx 40                    # Dispatch wheel deltas at current pointer
pinchtab drag <from> <to>                           # Drag between selector/ref or x,y points (synthesized mouse sequence)
pinchtab drag <selector> --drag-x <n> --drag-y <n>  # Single-step drag by pixel offset (mirrors HTTP /action dragX/dragY)
pinchtab type <selector> <text>                     # Type with keystrokes
pinchtab fill <selector> <text>                     # Set value directly
pinchtab press <key>                                # Press key (Enter, Tab, Escape...)
pinchtab hover <selector>                           # Hover element
pinchtab select <selector> <value|text>             # Select dropdown option by value attr, or fall back to visible text
pinchtab scroll <pixels|direction|selector>         # e.g. `scroll 1500`, `scroll down`, `scroll '#footer'`

Rules:

Prefer
```
fill
```
for deterministic form entry.
Prefer
```
type
```
only when the site depends on keystroke events.
Prefer
```
click --wait-nav
```
when a click is expected to navigate.
Prefer low-level
```
mouse
```
commands only when normal
```
click
```
/
```
hover
```
abstractions are insufficient, such as drag handles, canvas widgets, or sites that depend on exact pointer sequences.
Re-snapshot immediately after
```
click
```
,
```
press Enter
```
,
```
select
```
, or
```
scroll
```
if the UI can change.
```
select
```
matches by value attr first, then visible text (case-insensitive). Error lists available options if no match.

For JS dialogs: use

--dialog-action accept

--dialog-action dismiss

on click. Add

--dialog-text

for prompt responses.

For the
```
scroll
```
action via HTTP, use
```
"scrollX"
```
/
```
"scrollY"
```
for pixel deltas, or
```
"selector"
```
to scroll an element into view. Example:
```
{"kind":"scroll","scrollY":1500}
```
or
```
{"kind":"scroll","selector":"#footer"}
```
. The
```
x
```
/
```
y
```
fields are target viewport coordinates, not scroll deltas.
The download HTTP endpoint (
```
GET /download?url=...
```
or
```
GET /tabs/TAB_ID/download?url=...
```
) returns JSON
```
{contentType, data (base64), size, url}
```
, not raw bytes. Decode
```
data
```
with base64 to get the file. Only
```
http
```
/
```
https
```
URLs are allowed. Private/internal hosts are blocked unless listed in
```
security.downloadAllowedDomains
```
.

Waiting

Use

wait

when the DOM settles asynchronously — spinners, toasts, XHR-driven content.

pinchtab wait <selector>                            # Element to appear (default visible)
pinchtab wait <selector> --state hidden             # Element to disappear
pinchtab wait --text "Order confirmed"              # Text to appear
pinchtab wait --not-text "Loading..."               # Text to disappear (spinner/toast dismiss)
pinchtab wait --url "**/dashboard"                  # URL glob match
pinchtab wait --load networkidle                    # Network idle
pinchtab wait 500                                   # Fixed delay in ms (last resort)

Default timeout 10s, max 30s via

--timeout <ms>

. Prefer

--not-text

--state hidden

over polling.

Export, debug, and verification

pinchtab screenshot
pinchtab screenshot -o /tmp/pinchtab-page.png       # Format driven by extension
pinchtab screenshot -q 60                            # JPEG quality
pinchtab pdf
pinchtab pdf -o /tmp/pinchtab-report.pdf
pinchtab pdf --landscape

Advanced operations: explicit opt-in only

Use these only when the task explicitly requires them and safer commands are insufficient.

pinchtab eval "document.title"
pinchtab eval --await-promise "fetch('/api/me').then(r => r.json())"
pinchtab download <url> -o /tmp/pinchtab-download.bin
pinchtab upload /absolute/path/provided-by-user.ext -s <css>

Rules:

```
eval
```
is for narrow, read-only DOM inspection unless the user explicitly asks for a page mutation.
```
download
```
should prefer a safe temporary or workspace path over an arbitrary filesystem location.
```
upload
```
requires a file path the user explicitly provided or clearly approved for the task.

HTTP API fallback

Use curl when CLI unavailable. Key endpoints on instance port (e.g. 9867):

```
POST /navigate
```
with
```
{"url":"..."}
```

GET /snapshot?filter=interactive&format=compact

POST /action

with

{"kind":"fill","selector":"e3","text":"..."}

```
POST /actions
```
with a batch of actions — runs them in one round-trip. Body accepts either an array
```
[{"kind":"fill",...},{"kind":"click",...}]
```
or an envelope
```
{"actions":[...],"stopOnError":true,"tabId":"..."}
```
. Use this for tight form flows (fill + fill + click submit) to cut round-trip latency. Set
```
stopOnError:true
```
to halt on the first failure; the response contains a per-step
```
{index, success, result?, error?}
```
array. Tab-scoped variant:
```
POST /tabs/TAB_ID/actions
```
.
```
GET /text
```
```
POST /solve
```
with
```
{"maxAttempts": 3}
```

Tab-scoped HTTP API

Use

/tabs/TAB_ID/...

routes to target specific tabs. Get tab ID from navigate response or

GET /tabs

Pattern:

curl -H "Authorization: Bearer <token>" http://localhost:9867/tabs/TAB_ID/<endpoint>

Key endpoints:

navigate

snapshot

text

action

screenshot

pdf

back

forward

close

wait

download

upload

handoff

resume

Action examples:

Click:
```
{"kind":"click","selector":"#btn"}
```

Click with nav:

{"kind":"click","selector":"#link","waitNav":true}

Drag:

{"kind":"drag","selector":"#piece","dragX":12,"dragY":-158}

Scroll:

{"kind":"scroll","scrollY":1500}

{"kind":"scroll","selector":"#footer"}

Common Patterns

Form flow:
```
nav
```
→
```
snap -i -c
```
→
```
fill
```
fields →
```
click --wait-nav
```
submit →
```
text
```
to verify
Multi-step: After each action,
```
snap -d -i -c
```
for diff

Direct selectors: Skip snapshot when structure is known:

pinchtab click "text:Accept Cookies"

fill "#search" "query"

Form submission: Always click the submit button — never use

press Enter

Token Economy

Prefer low-token commands:

text

snap -i -c

snap -d

. Use

--block-images

for read-heavy tasks. Reserve screenshots/PDFs for visual verification.

Diffing and Verification

Use
```
pinchtab snap -d
```
after each state-changing action in long workflows.
Use
```
pinchtab text
```
to confirm success messages, table updates, or navigation outcomes. The default mode extracts Readability-filtered content (reader view), which may drop navigation, repeated headlines, short-text nodes, or collapse lists/grids down to a single representative item. Reach for
```
pinchtab text --full
```
whenever (a) you're verifying content on a list/grid/tab/accordion page, (b) the expected marker is short, or (c) a default read came back missing content you can see in the snapshot. It returns the raw
```
document.body.innerText
```
and is almost always the safer choice once you know Readability is going to trim.
Use
```
pinchtab screenshot
```
only when visual regressions, CAPTCHA, or layout-specific confirmation matters.
If a ref disappears after a change, treat that as expected and fetch fresh refs instead of retrying the stale one.
Action responses like
```
{"clicked":true,"submitted":true}
```
mean the event fired on the target element — not that the form was accepted by the server or passed native HTML validation. Always verify the expected success marker or state change via
```
snap
```
/
```
text
```
before treating a submission as complete.
Same-origin iframes are supported natively via
```
pinchtab frame <target>
```
— a stateful scope that subsequent selector-based
```
/snapshot
```
and
```
/action
```
calls inherit. Typical flow:
```
pinchtab frame '#payment-frame'
```
→
```
pinchtab snap -i -c
```
(refs reflect iframe interior) →
```
pinchtab fill '#card'
```
/
```
click '#pay'
```
→
```
pinchtab frame main
```
. Target accepts
```
main
```
, an iframe ref, a CSS selector for the iframe element, a frame name, or a frame URL. Nested iframes need multiple hops. Refs emitted by a full
```
snap
```
(no
```
-i
```
) for iframe descendants carry frame context — ref-based actions work across the boundary without an explicit scope set. Cross-origin iframes are not exposed as frame scopes; fall back to
```
eval
```
against
```
iframe.contentDocument
```
(same-origin-policy permitting).
```
pinchtab text
```
(and
```
text --full
```
) honors the active frame scope and also accepts an explicit
```
--frame <frameId>
```
flag for one-shot reads — so after
```
pinchtab frame '#content-frame'
```
, a following
```
pinchtab text --full
```
extracts from the iframe's document, not the outer page. The
--frame
argument must be a frame ID (the 32-char hex
frameId
from
pinchtab frame <target>
output), not a CSS selector. For a one-shot read, the idiom is:
```
FID=$(pinchtab frame '#content-frame' | jq -r .current.frameId); pinchtab frame main; pinchtab text --full --frame "$FID"
```
. Passing a selector like
```
text --frame '#content-frame'
```
returns "no frame for given id found".
eval
→ always IIFE.
```
eval
```
expressions share the same realm across calls, so any top-level
```
const
```
/
```
let
```
/
```
class
```
from one call collides with the next:
```
SyntaxError: Identifier 'x' has already been declared
```
. Use an IIFE on every
```
eval
```
that introduces identifiers, not only on multi-statement ones:
```
pinchtab eval "(() => { const r = document.querySelector('#x').getBoundingClientRect(); return {x: r.x, y: r.y, w: r.width, h: r.height}; })()"
```
. For a single expression that doesn't introduce identifiers (e.g.
```
document.title
```
,
```
document.getElementById('x').value
```
), the IIFE is optional. The IIFE pattern also fixes DOMRect serialization —
```
getBoundingClientRect()
```
returns a value whose own-enumerable fields don't survive JSON, so the explicit projection is what actually ships the numbers back.
pinchtab text
(both default and
--full
) returns content from
display:none
and
visibility:hidden
nodes because it reads
```
document.body.innerText
```
(and Readability's input) from raw DOM — the visibility cascade is not applied. When you need to confirm that a success banner or error message is actually visible (not just present as a pre-seeded hidden element), verify via
```
pinchtab snap
```
(the accessibility tree respects visibility and hides non-rendered subtrees) or via
```
eval
```
against the element's
```
offsetHeight
```
/
```
getComputedStyle().display
```
. A common trap: a page ships with a hidden success
```
<div>
```
pre-rendered;
```
text
```
will report the success string before the form is ever submitted.
The compact snapshot shows
```
<option>
```
elements by their visible text, not their
```
value
```
attribute. You don't normally need to look up the
```
value
```
: the
```
select
```
action accepts either — it matches on
```
value
```
first and falls back to visible text (case-insensitive). Only reach for
```
eval
```
+
```
Array.from(select.options)
```
when debugging an unexpected no-match error.
```
text:<value>
```
selectors are resolved by a JS-level search over visible text and can intermittently fail with
```
DOM Error
```
or
```
context deadline exceeded
```
on large/dynamic pages. If you have a fresh
```
snap -i -c
```
in hand, prefer the ref (
```
e12
```
) — refs resolve by stable backend node IDs and don't depend on page-side JS.
```
snap -i -c
```
(interactive, compact) skips non-interactive descendants. For iframe interiors, either set a
```
frame
```
scope first or use a full
```
pinchtab snap
```
(no
```
-i
```
) which flattens same-origin iframe descendants into the parent snapshot.
ARIA expansion state (
```
aria-expanded="true" | "false"
```
) is usually placed on the outermost container of an accordion/menu/disclosure section, not on the header/trigger that dispatches the click. When verifying state after a click, query
```
document.querySelector('#section-a').getAttribute('aria-expanded')
```
(or the wrapper's equivalent) rather than the clicked element.
```
click --wait-nav
```
can return
```
{"success": true}
```
or, immediately after the navigation fires,
```
Error 409: unexpected page navigation
```
— the latter means the server saw a navigation while mid-response and aborted its reply, not that the click failed. Treat 409 after a navigation-expected click as success and verify the resulting page with a fresh
```
snap
```
/
```
text
```
.

References

Full API: api.md
Minimal env vars: env.md
Agent optimization: agent-optimization.md
Profiles: profiles.md
MCP: mcp.md
Security model: TRUST.md