Claude-skill-registry browser-pilot
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/browser-pilot" ~/.claude/skills/majiayu000-claude-skill-registry-browser-pilot && rm -rf "$T"
skills/data/browser-pilot/SKILL.mdbrowser-pilot
Purpose
Automate Chrome browser using Chrome DevTools Protocol (CDP) with a daemon-based architecture. Maintains persistent browser connection for instant command execution. Features Smart Mode with Interaction Map for reliable element targeting using text-based search instead of brittle selectors.
Always run scripts with
first to see usage. DO NOT read the source until you try running the script first and find that a customized solution is abslutely necessary. These scripts can be very large and thus pollute your context window. They exist to be called directly as black-box scripts rather than ingested into your context window.--help
When to Use
Use browser-pilot when tasks involve:
- Browser automation, web scraping, data extraction
- Screenshot capture, PDF generation
- Form filling, login automation, element interaction
- Tab management, cookie control, JavaScript execution
- Tasks requiring text-based element selection ("click the 3rd Delete button")
- Bot detection bypass requirements (navigator.webdriver = false)
⚠️ Important Guidelines
When to Ask User: Use AskUserQuestion tool if:
- Task requirements unclear or ambiguous
- Multiple implementation approaches possible
- Element selectors not working despite troubleshooting
- User intent uncertain (e.g., "automate this" without specifics)
DO NOT guess or assume user requirements. Always clarify first.
Prerequisites
Chrome must be installed. Local scripts initialize automatically on session start (no manual setup required).
Getting Help
All commands support
--help for detailed options:
# See all available commands node .browser-pilot/bp --help # Get help for specific command node .browser-pilot/bp <command> --help
Architecture
Daemon-based design:
- Background daemon maintains persistent CDP connection
- CLI commands communicate via IPC
- Auto-starts on first command, stops at session end
- 30-minute inactivity timeout
Interaction Map System:
- Auto-generates JSON map of interactive elements on page load
- Enables text-based search with automatic selector generation
- Handles duplicates with indexing
- 10-minute cache with auto-regeneration
Core Workflow
1. Extract Required Information
From user's request, identify:
- Target URL(s) to visit
- Actions to perform (screenshot, click, fill, etc.)
- Element identifiers (text content, CSS selectors, or XPath)
- Output file names (for screenshots/PDFs)
- Data to extract or forms to fill
When information is missing or ambiguous, use AskUserQuestion tool.
2. Execute Commands
All commands use
.browser-pilot/bp wrapper script. Replace placeholders with actual values.
Navigation:
node .browser-pilot/bp navigate -u <url> node .browser-pilot/bp back node .browser-pilot/bp forward node .browser-pilot/bp reload
Interaction (Smart Mode - Recommended):
# Text-based element search (map auto-generated) # No quotes for single words node .browser-pilot/bp click --text Login --type button node .browser-pilot/bp fill --text Email -v <value> # Use quotes when text contains spaces node .browser-pilot/bp click --text "Sign In" --type button node .browser-pilot/bp fill --text "Email Address" -v <value> # Handle duplicates with indexing node .browser-pilot/bp click --text Delete --index 2 # Filter visible elements only node .browser-pilot/bp click --text Submit --viewport-only # Type aliases (auto-expanded) node .browser-pilot/bp click --text Search --type input # Matches: input, input-text, input-search, etc. # Tag-based filtering (HTML tag) node .browser-pilot/bp click --text Submit --tag button # Matches all <button> tags node .browser-pilot/bp fill --text Email --tag input -v user@example.com # 3-stage fallback (automatic) # Stage 1: Type search (with alias expansion) # Stage 2: Tag search (if type fails) # Stage 3: Map regeneration + retry (up to 3 attempts)
Interaction (Direct Mode - fallback for unique IDs):
node .browser-pilot/bp click -s "#login-button" node .browser-pilot/bp fill -s "input[name='email']" -v <value>
Capture:
# Screenshots saved to .browser-pilot/screenshots/ node .browser-pilot/bp screenshot -o <filename>.png # Capture specific region node .browser-pilot/bp screenshot -o region.png --clip-x 100 --clip-y 200 --clip-width 800 --clip-height 600 # Set viewport size for responsive testing node .browser-pilot/bp set-viewport -w 375 -h 667 --scale 2 --mobile # Get current viewport size node .browser-pilot/bp get-viewport # Get screen and viewport information node .browser-pilot/bp get-screen-info # PDFs saved to .browser-pilot/pdfs/ node .browser-pilot/bp pdf -o <filename>.pdf
Chain Mode (multiple commands):
# Basic chain (no quotes needed for single words) node .browser-pilot/bp chain navigate -u <url> click --text Submit extract -s .result # With spaces (quotes required) node .browser-pilot/bp chain navigate -u <url> click --text "Sign In" fill --text Email -v <email> # Login workflow node .browser-pilot/bp chain navigate -u <url> fill --text Email -v <email> fill --text Password -v <password> click --text Login # Screenshot workflow node .browser-pilot/bp chain navigate -u <url> wait -s .content-loaded screenshot -o result.png
Chain-specific options:
: Map wait timeout after navigation (default: 10000ms)--timeout <ms>
: Fixed delay between commands (overrides random 300-800ms)--delay <ms>
Data Extraction:
node .browser-pilot/bp extract -s <selector> node .browser-pilot/bp content node .browser-pilot/bp console node .browser-pilot/bp cookies
Other Actions:
node .browser-pilot/bp wait -s <selector> -t <timeout-ms> node .browser-pilot/bp scroll -s <selector> node .browser-pilot/bp eval -e <javascript-expression>
3. Query Interaction Map (when needed)
# List all element types node .browser-pilot/bp query --list-types # Find elements by text node .browser-pilot/bp query --text <text> # Check map status node .browser-pilot/bp map-status # Force regenerate map node .browser-pilot/bp regen-map
Best Practices
-
🌟 Use Smart Mode by default: Text-based search (
) is more stable than CSS selectors--text- Recommended:
click --text Login - Fallback:
(only for unique IDs)click -s #login-btn
- Recommended:
-
Maps auto-generate: No manual map generation needed, happens on page load
-
Handle duplicates with indexing:
selects 2nd match when multiple elements have same text--index 2 -
Filter with type aliases:
auto-expands to match--type input
,input
,input-text
, etc.input-search- Generic:
(matches all input types)--type input - Specific:
(exact match only)--type input-search
- Generic:
-
Use tag-based search for flexibility:
matches all--tag button
elements regardless of type<button> -
3-stage fallback is automatic: If element not found, system automatically:
- Tries type-based search (with alias expansion)
- Falls back to tag-based search
- Regenerates map and retries (up to 3 attempts)
-
Verify element visibility:
ensures element is on screen--viewport-only -
Use Chain Mode for workflows: Execute multiple commands in sequence for complex automation
-
Check console for errors:
after actions failnode .browser-pilot/bp console -
Let daemon auto-manage: Starts on first command, stops at session end
References
Detailed documentation in
references/ folder (load as needed):
: Complete command list with all options and examplesreferences/commands-reference.md
: Smart Mode system, map structure, and query APIreferences/interaction-map.md
: Selector strategies, best practices, and troubleshootingreferences/selector-guide.md
Load references when user needs detailed information about specific features, advanced usage patterns, or troubleshooting guidance.