Pinchtab pinchtab
Use this skill when a task needs browser automation through PinchTab: open a website, inspect interactive elements, click through flows, fill out forms, scrape page text, log into sites with a persistent profile, export screenshots or PDFs, manage multiple browser instances, or fall back to the HTTP API when the CLI is unavailable. Prefer this skill for token-efficient browser work driven by stable accessibility refs such as `e5` and `e12`.
git clone https://github.com/pinchtab/pinchtab
T=$(mktemp -d) && git clone --depth=1 https://github.com/pinchtab/pinchtab "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/pinchtab" ~/.claude/skills/pinchtab-pinchtab-pinchtab && rm -rf "$T"
skills/pinchtab/SKILL.mdBrowser Automation with PinchTab
PinchTab gives agents a browser they can drive through stable accessibility refs, low-token text extraction, and persistent profiles or instances. Treat it as a CLI-first browser skill; use the HTTP API only when the CLI is unavailable or you need profile-management routes that do not exist in the CLI yet.
Preferred tool surface:
- Use
CLI commands first.pinchtab - Use
for profile-management routes or non-shell/API fallback flows.curl - Use
only when you need structured parsing from JSON responses.jq
Core Workflow
Every PinchTab automation follows this pattern:
- Ensure the correct server, profile, or instance is available for the task.
- Navigate with
orpinchtab nav <url>
.pinchtab instance navigate <instance-id> <url> - Observe with
,pinchtab snap -i -c
, orpinchtab snap --text
, then collect the current refs such aspinchtab text
.e5 - Interact with those fresh refs using
,click
,fill
,type
,press
,select
, orhover
.scroll - Re-snapshot or re-read text after any navigation, submit, modal open, accordion expand, or other DOM-changing action.
Rules:
- Never act on stale refs after the page changes.
- Default to
when you need content, not layout.pinchtab text - Default to
when you need actionable elements.pinchtab snap -i -c - Use screenshots only for visual verification, UI diffs, or debugging.
- Start multi-site or parallel work by choosing the right instance or profile first.
Selectors
PinchTab uses a unified selector system. Any command that targets an element accepts these formats:
| Selector | Example | Resolves via |
|---|---|---|
| Ref | | Snapshot cache (fastest) |
| CSS | , , | |
| XPath | | CDP search |
| Text | | Visible text match |
| Semantic | | Natural language query via |
Auto-detection: bare
e5 → ref, #id / .class / [attr] → CSS, //path → XPath. Use explicit prefixes (css:, xpath:, text:, find:) when auto-detection is ambiguous.
pinchtab click e5 # ref pinchtab click "#submit" # CSS (auto-detected) pinchtab click "text:Sign In" # text match pinchtab click "xpath://button[@type]" # XPath pinchtab fill "#email" "user@test.com" # CSS pinchtab fill e3 "user@test.com" # ref
Same syntax in HTTP API via
selector field. Legacy ref field still accepted.
Command Chaining
Use
&& when you don't need intermediate output: pinchtab nav <url> && pinchtab snap -i -c. Run separately when you must read refs before acting.
Challenge Solving
If a page shows a challenge instead of content (e.g., "Just a moment..."), call
POST /solve with {"maxAttempts": 3} to auto-detect and resolve it. Use POST /tabs/TAB_ID/solve for tab-scoped. Works best with stealthLevel: "full" in config. Safe to call speculatively — returns immediately if no challenge is present. See api.md for full solver options.
Handling Authentication and State
Patterns: (1) One-off:
pinchtab instance start → --server http://localhost:<port>. (2) Reuse profile: pinchtab instance start --profile work --mode headed → switch to headless after login. (3) HTTP: POST /profiles, then POST /profiles/<name>/start. (4) Human-assisted: headed login, then agent reuses headless.
Agent sessions:
pinchtab session create --agent-id <id> or POST /sessions → set PINCHTAB_SESSION=ses_....
Essential Commands
Server and targeting
pinchtab server # Start server foreground pinchtab daemon install # Install as system service pinchtab health # Check server status pinchtab instances # List running instances pinchtab profiles # List available profiles pinchtab --server http://localhost:9868 snap -i -c # Target specific instance
Navigation and tabs
pinchtab nav <url> pinchtab nav <url> --new-tab pinchtab nav <url> --tab <tab-id> pinchtab nav <url> --block-images pinchtab nav <url> --block-ads pinchtab nav <url> --print-tab-id # Print only the new tabId on stdout pinchtab back # Navigate back in history pinchtab forward # Navigate forward pinchtab reload # Reload current page pinchtab tab # List tabs (no subcommand - just `tab`) pinchtab tab <tab-id> # Focus an existing tab pinchtab tab new <url> # Open a new tab pinchtab tab close <tab-id> # Close a tab — use this to clean up stale tabs between runs pinchtab instance navigate <instance-id> <url>
Tab workflow — most commands target the active tab by default, so single-tab flows need no plumbing:
pinchtab nav http://example.com pinchtab snap -i -c # active tab pinchtab click e5 # active tab pinchtab text # active tab
When you need to pin to a specific tab (parallel tabs, long-running flows, or shell-isolated runners like agent tool calls where env vars don't persist across invocations), capture the tab ID with
--print-tab-id and pass --tab on every subsequent command:
TAB_ID=$(pinchtab nav http://example.com --print-tab-id) pinchtab --tab "$TAB_ID" snap -i -c pinchtab --tab "$TAB_ID" click e5 pinchtab --tab "$TAB_ID" text
Within a single shell session you can also
export PINCHTAB_TAB=$(pinchtab nav URL --print-tab-id)
and drop the --tab flag. This does not survive across separate shell invocations (each
Bash tool call in an agent runs a fresh shell), so prefer explicit --tab for agent workflows.
Priority:
--tab <id> flag > PINCHTAB_TAB env var > active tab.
Observation
pinchtab snap pinchtab snap -i # Interactive elements only pinchtab snap -i -c # Interactive + compact pinchtab snap -d # Diff from previous snapshot pinchtab snap --selector <css> # Scope to CSS selector pinchtab snap --max-tokens <n> # Token budget limit pinchtab snap --text # Text output format pinchtab text # Page text content (Readability-filtered; drops nav/repeated headlines) pinchtab text --full # Full page text (document.body.innerText) — use when Readability is dropping content you need pinchtab text --raw # Alias of --full # CLI returns JSON; use `| jq -r .text` for plain text pinchtab find <query> # Semantic element search pinchtab find --ref-only <query> # Return refs only
Guidance:
is the default for finding actionable refs.snap -i -c
is the default follow-up snapshot for multi-step flows.snap -d
is the default for reading articles, dashboards, reports, or confirmation messages.text
is the direct route when you already know the semantic target (e.g. "login button", "email input", "accept cookies link") — skips the full snapshot and returns a ranked match with its ref. Pair withpinchtab find <query>
on large/dense pages to get just the ref string for piping straight into--ref-only
/click
/fill
. Prefertype
overfind
+ visual scan whenever you can describe the target in a phrase.snap -i -c- Refs from
and fullsnap -i
use different numbering. Do not mix them — if you snapshot withsnap
, use those refs. If you re-snapshot without-i
, get fresh refs before acting.-i
Interaction
All interaction commands accept unified selectors (refs, CSS, XPath, text, semantic). See the Selectors section above.
pinchtab click <selector> # Click element pinchtab click --wait-nav <selector> # Click and wait for navigation pinchtab click --x 100 --y 200 # Click by coordinates pinchtab click <selector> --dialog-action accept # Click + auto-accept any alert/confirm the click opens pinchtab click <selector> --dialog-action dismiss # Click + auto-dismiss pinchtab click <selector> --dialog-action accept \ --dialog-text "hello" # Click + accept a prompt() with a response pinchtab dblclick <selector> # Double-click element pinchtab mouse move <selector> # Move pointer to element center pinchtab mouse move <x> <y> # Move pointer to coordinates pinchtab mouse down <selector> --button left # Press a mouse button at an explicit target pinchtab mouse down --button left # Press a mouse button at current pointer pinchtab mouse up <selector> --button left # Release a mouse button at an explicit target pinchtab mouse up --button left # Release a mouse button at current pointer pinchtab mouse wheel 240 --dx 40 # Dispatch wheel deltas at current pointer pinchtab drag <from> <to> # Drag between selector/ref or x,y points (synthesized mouse sequence) pinchtab drag <selector> --drag-x <n> --drag-y <n> # Single-step drag by pixel offset (mirrors HTTP /action dragX/dragY) pinchtab type <selector> <text> # Type with keystrokes pinchtab fill <selector> <text> # Set value directly pinchtab press <key> # Press key (Enter, Tab, Escape...) pinchtab hover <selector> # Hover element pinchtab select <selector> <value|text> # Select dropdown option by value attr, or fall back to visible text pinchtab scroll <pixels|direction|selector> # e.g. `scroll 1500`, `scroll down`, `scroll '#footer'`
Rules:
- Prefer
for deterministic form entry.fill - Prefer
only when the site depends on keystroke events.type - Prefer
when a click is expected to navigate.click --wait-nav - Prefer low-level
commands only when normalmouse
/click
abstractions are insufficient, such as drag handles, canvas widgets, or sites that depend on exact pointer sequences.hover - Re-snapshot immediately after
,click
,press Enter
, orselect
if the UI can change.scroll
matches by value attr first, then visible text (case-insensitive). Error lists available options if no match.select- For JS dialogs: use
or--dialog-action accept
on click. Add--dialog-action dismiss
for prompt responses.--dialog-text - For the
action via HTTP, usescroll
/"scrollX"
for pixel deltas, or"scrollY"
to scroll an element into view. Example:"selector"
or{"kind":"scroll","scrollY":1500}
. The{"kind":"scroll","selector":"#footer"}
/x
fields are target viewport coordinates, not scroll deltas.y - The download HTTP endpoint (
orGET /download?url=...
) returns JSONGET /tabs/TAB_ID/download?url=...
, not raw bytes. Decode{contentType, data (base64), size, url}
with base64 to get the file. Onlydata
/http
URLs are allowed. Private/internal hosts are blocked unless listed inhttps
.security.downloadAllowedDomains
Waiting
Use
wait when the DOM settles asynchronously — spinners, toasts, XHR-driven content.
pinchtab wait <selector> # Element to appear (default visible) pinchtab wait <selector> --state hidden # Element to disappear pinchtab wait --text "Order confirmed" # Text to appear pinchtab wait --not-text "Loading..." # Text to disappear (spinner/toast dismiss) pinchtab wait --url "**/dashboard" # URL glob match pinchtab wait --load networkidle # Network idle pinchtab wait 500 # Fixed delay in ms (last resort)
Default timeout 10s, max 30s via
--timeout <ms>. Prefer --not-text / --state hidden over polling.
Export, debug, and verification
pinchtab screenshot pinchtab screenshot -o /tmp/pinchtab-page.png # Format driven by extension pinchtab screenshot -q 60 # JPEG quality pinchtab pdf pinchtab pdf -o /tmp/pinchtab-report.pdf pinchtab pdf --landscape
Advanced operations: explicit opt-in only
Use these only when the task explicitly requires them and safer commands are insufficient.
pinchtab eval "document.title" pinchtab eval --await-promise "fetch('/api/me').then(r => r.json())" pinchtab download <url> -o /tmp/pinchtab-download.bin pinchtab upload /absolute/path/provided-by-user.ext -s <css>
Rules:
is for narrow, read-only DOM inspection unless the user explicitly asks for a page mutation.eval
should prefer a safe temporary or workspace path over an arbitrary filesystem location.download
requires a file path the user explicitly provided or clearly approved for the task.upload
HTTP API fallback
Use curl when CLI unavailable. Key endpoints on instance port (e.g. 9867):
withPOST /navigate{"url":"..."}GET /snapshot?filter=interactive&format=compact
withPOST /action{"kind":"fill","selector":"e3","text":"..."}
with a batch of actions — runs them in one round-trip. Body accepts either an arrayPOST /actions
or an envelope[{"kind":"fill",...},{"kind":"click",...}]
. Use this for tight form flows (fill + fill + click submit) to cut round-trip latency. Set{"actions":[...],"stopOnError":true,"tabId":"..."}
to halt on the first failure; the response contains a per-stepstopOnError:true
array. Tab-scoped variant:{index, success, result?, error?}
.POST /tabs/TAB_ID/actionsGET /text
withPOST /solve{"maxAttempts": 3}
Tab-scoped HTTP API
Use
/tabs/TAB_ID/... routes to target specific tabs. Get tab ID from navigate response or GET /tabs.
Pattern:
curl -H "Authorization: Bearer <token>" http://localhost:9867/tabs/TAB_ID/<endpoint>
Key endpoints:
navigate, snapshot, text, action, screenshot, pdf, back, forward, close, wait, download, upload, handoff, resume.
Action examples:
- Click:
{"kind":"click","selector":"#btn"} - Click with nav:
{"kind":"click","selector":"#link","waitNav":true} - Drag:
{"kind":"drag","selector":"#piece","dragX":12,"dragY":-158} - Scroll:
or{"kind":"scroll","scrollY":1500}{"kind":"scroll","selector":"#footer"}
Common Patterns
- Form flow:
→nav
→snap -i -c
fields →fill
submit →click --wait-nav
to verifytext - Multi-step: After each action,
for diffsnap -d -i -c - Direct selectors: Skip snapshot when structure is known:
orpinchtab click "text:Accept Cookies"fill "#search" "query"
Form submission: Always click the submit button — never use
press Enter.
Token Economy
Prefer low-token commands:
text, snap -i -c, snap -d. Use --block-images for read-heavy tasks. Reserve screenshots/PDFs for visual verification.
Diffing and Verification
- Use
after each state-changing action in long workflows.pinchtab snap -d - Use
to confirm success messages, table updates, or navigation outcomes. The default mode extracts Readability-filtered content (reader view), which may drop navigation, repeated headlines, short-text nodes, or collapse lists/grids down to a single representative item. Reach forpinchtab text
whenever (a) you're verifying content on a list/grid/tab/accordion page, (b) the expected marker is short, or (c) a default read came back missing content you can see in the snapshot. It returns the rawpinchtab text --full
and is almost always the safer choice once you know Readability is going to trim.document.body.innerText - Use
only when visual regressions, CAPTCHA, or layout-specific confirmation matters.pinchtab screenshot - If a ref disappears after a change, treat that as expected and fetch fresh refs instead of retrying the stale one.
- Action responses like
mean the event fired on the target element — not that the form was accepted by the server or passed native HTML validation. Always verify the expected success marker or state change via{"clicked":true,"submitted":true}
/snap
before treating a submission as complete.text - Same-origin iframes are supported natively via
— a stateful scope that subsequent selector-basedpinchtab frame <target>
and/snapshot
calls inherit. Typical flow:/action
→pinchtab frame '#payment-frame'
(refs reflect iframe interior) →pinchtab snap -i -c
/pinchtab fill '#card'
→click '#pay'
. Target acceptspinchtab frame main
, an iframe ref, a CSS selector for the iframe element, a frame name, or a frame URL. Nested iframes need multiple hops. Refs emitted by a fullmain
(nosnap
) for iframe descendants carry frame context — ref-based actions work across the boundary without an explicit scope set. Cross-origin iframes are not exposed as frame scopes; fall back to-i
againsteval
(same-origin-policy permitting).iframe.contentDocument
(andpinchtab text
) honors the active frame scope and also accepts an explicittext --full
flag for one-shot reads — so after--frame <frameId>
, a followingpinchtab frame '#content-frame'
extracts from the iframe's document, not the outer page. Thepinchtab text --full
argument must be a frame ID (the 32-char hex--frame
fromframeId
output), not a CSS selector. For a one-shot read, the idiom is:pinchtab frame <target>
. Passing a selector likeFID=$(pinchtab frame '#content-frame' | jq -r .current.frameId); pinchtab frame main; pinchtab text --full --frame "$FID"
returns "no frame for given id found".text --frame '#content-frame'
→ always IIFE.eval
expressions share the same realm across calls, so any top-leveleval
/const
/let
from one call collides with the next:class
. Use an IIFE on everySyntaxError: Identifier 'x' has already been declared
that introduces identifiers, not only on multi-statement ones:eval
. For a single expression that doesn't introduce identifiers (e.g.pinchtab eval "(() => { const r = document.querySelector('#x').getBoundingClientRect(); return {x: r.x, y: r.y, w: r.width, h: r.height}; })()"
,document.title
), the IIFE is optional. The IIFE pattern also fixes DOMRect serialization —document.getElementById('x').value
returns a value whose own-enumerable fields don't survive JSON, so the explicit projection is what actually ships the numbers back.getBoundingClientRect()
(both default andpinchtab text
) returns content from--full
anddisplay:none
nodes because it readsvisibility:hidden
(and Readability's input) from raw DOM — the visibility cascade is not applied. When you need to confirm that a success banner or error message is actually visible (not just present as a pre-seeded hidden element), verify viadocument.body.innerText
(the accessibility tree respects visibility and hides non-rendered subtrees) or viapinchtab snap
against the element'seval
/offsetHeight
. A common trap: a page ships with a hidden successgetComputedStyle().display
pre-rendered;<div>
will report the success string before the form is ever submitted.text- The compact snapshot shows
elements by their visible text, not their<option>
attribute. You don't normally need to look up thevalue
: thevalue
action accepts either — it matches onselect
first and falls back to visible text (case-insensitive). Only reach forvalue
+eval
when debugging an unexpected no-match error.Array.from(select.options)
selectors are resolved by a JS-level search over visible text and can intermittently fail withtext:<value>
orDOM Error
on large/dynamic pages. If you have a freshcontext deadline exceeded
in hand, prefer the ref (snap -i -c
) — refs resolve by stable backend node IDs and don't depend on page-side JS.e12
(interactive, compact) skips non-interactive descendants. For iframe interiors, either set asnap -i -c
scope first or use a fullframe
(nopinchtab snap
) which flattens same-origin iframe descendants into the parent snapshot.-i- ARIA expansion state (
) is usually placed on the outermost container of an accordion/menu/disclosure section, not on the header/trigger that dispatches the click. When verifying state after a click, queryaria-expanded="true" | "false"
(or the wrapper's equivalent) rather than the clicked element.document.querySelector('#section-a').getAttribute('aria-expanded')
can returnclick --wait-nav
or, immediately after the navigation fires,{"success": true}
— the latter means the server saw a navigation while mid-response and aborted its reply, not that the click failed. Treat 409 after a navigation-expected click as success and verify the resulting page with a freshError 409: unexpected page navigation
/snap
.text
References
- Full API: api.md
- Minimal env vars: env.md
- Agent optimization: agent-optimization.md
- Profiles: profiles.md
- MCP: mcp.md
- Security model: TRUST.md