Claude-code-sdk browse
git clone https://github.com/SeifBenayed/cloclo
T=$(mktemp -d) && git clone --depth=1 https://github.com/SeifBenayed/cloclo "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/browse" ~/.claude/skills/seifbenayed-claude-code-sdk-browse && rm -rf "$T"
.claude/skills/browse/SKILL.mdBrowse — Visual Feedback Browser Automation
You have a
Browser tool with an action-based interface. This skill teaches you how to use it
effectively with a screenshot → analyze → act → verify loop that prevents blind clicking.
Core Principle
Never act without seeing. Never assume an action worked. Always verify.
OBSERVE → DECIDE → ACT → VERIFY (get_state (analyze (click/ (get_state + screenshot elements) type) + screenshot + compare)
Starting a Session
Every browser session starts the same way:
- Navigate to the target URL
- Observe the page with
— this returns the DOM with indexed interactive elements AND the page textget_state - Screenshot to see the visual layout
Browser { action: "navigate", url: "https://example.com" } Browser { action: "get_state" } Browser { action: "screenshot" }
Read the screenshot to understand the visual layout. The
get_state output gives you:
- URL and title
- Scroll position (how far down the page you are)
- Interactive element count (links, inputs, buttons)
- Indexed elements:
,[0] <button> "Submit"[1] <input:text> name="email" - Page text (first 3000 chars)
JSON Format
For structured processing, use
format: "json":
Browser { action: "get_state", format: "json" }
Returns:
{ url, title, scroll, stats, elements: [{index, tag, text, ...}], text, session_id, active_tab_id }
Interacting with Elements
Always prefer element indices over CSS selectors. The indices come from
get_state and are
reliable. CSS selectors can break if the page structure changes.
Browser { action: "click_element", index: 5 } Browser { action: "type_element", index: 3, value: "hello@example.com" }
Fall back to CSS selectors only when:
- The element doesn't appear in the indexed list
- You need to target something very specific (like
)input[name="csrf_token"]
Browser { action: "click", selector: "#submit-btn" } Browser { action: "fill", selector: "input[name='email']", value: "test@example.com" }
Keyboard Input (send_keys)
For named keys and key combinations:
Browser { action: "send_keys", keys: "Enter" } Browser { action: "send_keys", keys: "Tab Tab Enter" } Browser { action: "send_keys", keys: "Ctrl+a" } Browser { action: "send_keys", keys: "Shift+Tab" }
Supported named keys: Enter, Tab, Escape, Backspace, Delete, Space, ArrowUp/Down/Left/Right, Home, End, PageUp, PageDown, F1-F12. Modifiers: Ctrl, Alt, Shift, Meta/Cmd. Sequences: space-separated.
File Upload
Browser { action: "upload_file", selector: "input[type='file']", file_path: "/path/to/file.pdf" }
Dropdowns
Browser { action: "dropdown_options", selector: "#my-select" } Browser { action: "select_dropdown", selector: "#my-select", value: "option_value" }
Structured Data Extraction
Extract structured data from the page using CSS selectors:
Browser { action: "extract", schema: {"title": "h1", "price": ".price", "description": ".desc"} }
Returns JSON:
{"title": "Product Name", "price": "$29.99", "description": "..."}
Multi-Tab Workflows
Open, switch between, and close tabs for side-by-side comparison:
Browser { action: "navigate", url: "https://site-a.com" } Browser { action: "new_tab", url: "https://site-b.com" } Browser { action: "list_tabs" } Browser { action: "switch_tab", tab_id: "PREVIOUS_TAB_ID" } Browser { action: "close_tab", tab_id: "TAB_TO_CLOSE" }
Tab IDs are strings (CDP target IDs). Get them from
list_tabs.
Multi-Session Workflows
Use separate sessions for isolated browser instances (different profiles, auth states):
Browser { action: "new_session", session_id: "admin", profile_name: "admin-profile" } Browser { action: "navigate", url: "https://app.com/admin", session_id: "admin" } Browser { action: "new_session", session_id: "user", profile_name: "user-profile" } Browser { action: "navigate", url: "https://app.com/dashboard", session_id: "user" } Browser { action: "list_sessions" } Browser { action: "close_session", session_id: "admin" }
Named Profiles
Profiles persist cookies and state across sessions:
→ stored inprofile_name: "my-profile"~/.claude/browser-profiles/my-profile/
→ raw Chrome user data directoryuser_data_dir: "/custom/path"
→ Chrome'sprofile_dir: "Profile 1"
flag--profile-directory
Attach Mode (Remote Chrome)
Connect to an already-running Chrome instance:
Browser { action: "new_session", session_id: "remote", cdp_url: "http://localhost:9222" }
Or set
BROWSER_CDP_URL=http://localhost:9222 env var for the default session.
In attach mode,
close disconnects without killing Chrome.
Iframe Support
Browser { action: "list_frames" } Browser { action: "click_element", index: 0, frame_id: "FRAME_ID" } Browser { action: "fill", selector: "input", value: "text", frame_id: "FRAME_ID" }
Frame IDs come from
list_frames. Use frame_id on any DOM action to target elements inside iframes.
Events & Dialogs
The browser captures events (dialogs, navigations, crashes, downloads):
Browser { action: "get_events" } Browser { action: "set_dialog_auto_dismiss", enabled: true }
Dialogs (alert/confirm/prompt) are auto-dismissed by default. Disable with
enabled: false to handle manually.
The Verify Loop
After every action that changes the page, you must verify:
1. Take action (click, type, navigate, submit) 2. Wait briefly if needed: Browser { action: "wait_for", selector: ".result", timeout: 3000 } 3. Observe again: Browser { action: "get_state" } 4. Screenshot again: Browser { action: "screenshot" } 5. Compare: did the page change as expected? - YES → continue to next step - NO → try a different approach (different element, different selector, scroll first)
Iteration Budget
You have a maximum of 10 iterations (observe-act-verify cycles) per task. This prevents infinite loops. If you haven't achieved the goal in 10 iterations:
- Stop
- Report what you accomplished and what failed
- Include the last screenshot as evidence
Count your iterations. Mention the count when reporting.
Error Recovery
When something doesn't work:
- Element not found — the page may have changed. Run
again to refresh the element indices.get_state - Click didn't work — the element might be obscured. Try
first, then click again.scroll_to - Page didn't load — try
with a key selector, orwait_for
.reload - Form submission failed — screenshot to see error messages, read them, adjust input.
- Same action 3 times — the Browser tool has loop detection. If you get a loop warning, you MUST try a completely different approach.
Fallback chain for clicking:
click_element by index → click by CSS selector → evaluate with JS click → scroll + retry
Scrolling
Pages are often longer than the viewport. The
get_state output shows scroll position.
If you need elements below the fold:
Browser { action: "scroll_to", value: "500" } // scroll down 500px Browser { action: "scroll_to", selector: "#footer" } // scroll to element Browser { action: "get_state" } // refresh elements after scroll
Always
get_state after scrolling — the element indices change.
Evidence and Reporting
When completing a task, provide evidence:
- Before state — screenshot of the page before your actions
- Actions taken — list of what you did (clicked X, typed Y, navigated to Z)
- After state — screenshot showing the result
- Verification — what you checked to confirm the task is done
Format your report:
## Browser Task: [description] **URL:** https://example.com/page **Session:** default **Iterations:** 4/10 ### Actions 1. Navigated to https://example.com 2. Clicked [3] <button> "Login" 3. Typed email into [5] <input:email> 4. Typed password into [7] <input:password> 5. Sent keys: Enter ### Verification - Page title changed to "Dashboard" - User avatar visible in header - No error messages present ### Screenshots - Before: [screenshot 1] - After: [screenshot 2]
JavaScript Evaluation
For complex checks or actions not covered by built-in actions:
Browser { action: "evaluate", value: "document.querySelectorAll('.error').length" } Browser { action: "evaluate", value: "window.localStorage.getItem('token')" }
This is powerful but use it sparingly — prefer the built-in actions.
Cookies
For authenticated sessions:
Browser { action: "cookies_get" } Browser { action: "cookies_set", cookie: { name: "session", value: "abc123", domain: ".example.com" } } Browser { action: "cookies_clear" }
Closing
Always close the browser when done:
Browser { action: "close" }
This frees system resources. The browser will auto-launch again if needed.
Quick Reference
| Goal | Action | Notes |
|---|---|---|
| See the page | + | Always first |
| See page (structured) | with | For parsing |
| Click a button | with index | From get_state |
| Type text | with index and value | |
| Press keys | with keys | "Enter", "Tab", "Ctrl+a" |
| Navigate | with url | |
| Upload a file | with selector, file_path | |
| Read dropdown | with selector | Returns [{value,text,selected}] |
| Select dropdown | with selector, value | |
| Extract data | with schema | Returns structured JSON |
| Wait for content | with selector | |
| Scroll down | with value or selector | |
| Open new tab | with optional url | |
| Switch tab | with tab_id | |
| List tabs | | |
| Close tab | with tab_id | |
| New session | with session_id | Optional: profile_name, cdp_url |
| List sessions | | |
| List frames | | For iframe inspection |
| Get events | | Dialog, crash, download, navigation |
| Run JS | with value | |
| Go back | | |
| Save page as PDF | | |
| Check cookies | | |
| Done | |