MemOS browserwing-executor
Control browser automation through HTTP API. Supports page navigation, element interaction (click, type, select), data extraction, accessibility snapshot analysis, screenshot, JavaScript execution, and batch operations.
git clone https://github.com/MemTensor/MemOS
T=$(mktemp -d) && git clone --depth=1 https://github.com/MemTensor/MemOS "$T" && mkdir -p ~/.claude/skills && cp -r "$T/apps/memos-local-openclaw/skill/browserwing-executor" ~/.claude/skills/memtensor-memos-browserwing-executor && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/MemTensor/MemOS "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/apps/memos-local-openclaw/skill/browserwing-executor" ~/.openclaw/skills/memtensor-memos-browserwing-executor && rm -rf "$T"
apps/memos-local-openclaw/skill/browserwing-executor/SKILL.mdBrowserWing Executor API
Overview
BrowserWing Executor provides comprehensive browser automation capabilities through HTTP APIs. You can control browser navigation, interact with page elements, extract data, and analyze page structure.
API Base URL:
http://localhost:8080/api/v1/executor
Authentication: Use
X-BrowserWing-Key: <api-key> header or Authorization: Bearer <token>
Core Capabilities
- Page Navigation: Navigate to URLs, go back/forward, reload
- Element Interaction: Click, type, select, hover on page elements
- Data Extraction: Extract text, attributes, values from elements
- Accessibility Analysis: Get accessibility snapshot to understand page structure
- Advanced Operations: Screenshot, JavaScript execution, keyboard input
- Batch Processing: Execute multiple operations in sequence
API Endpoints
1. Discover Available Commands
IMPORTANT: Always call this endpoint first to see all available commands and their parameters.
curl -X GET 'http://localhost:8080/api/v1/executor/help'
Response: Returns complete list of all commands with parameters, examples, and usage guidelines.
Query specific command:
curl -X GET 'http://localhost:8080/api/v1/executor/help?command=extract'
2. Get Accessibility Snapshot
CRITICAL: Always call this after navigation to understand page structure and get element RefIDs.
curl -X GET 'http://localhost:8080/api/v1/executor/snapshot'
Response Example:
{ "success": true, "snapshot_text": "Clickable Elements:\n @e1 Login (role: button)\n @e2 Sign Up (role: link)\n\nInput Elements:\n @e3 Email (role: textbox) [placeholder: your@email.com]\n @e4 Password (role: textbox)" }
Use Cases:
- Understand what interactive elements are on the page
- Get element RefIDs (@e1, @e2, etc.) for precise identification
- See element labels, roles, and attributes
- The accessibility tree is cleaner than raw DOM and better for LLMs
- RefIDs are stable references that work reliably across page changes
3. Common Operations
Navigate to URL
curl -X POST 'http://localhost:8080/api/v1/executor/navigate' \ -H 'Content-Type: application/json' \ -d '{"url": "https://example.com"}'
Click Element
curl -X POST 'http://localhost:8080/api/v1/executor/click' \ -H 'Content-Type: application/json' \ -d '{"identifier": "@e1"}'
Identifier formats:
- RefID (Recommended):
,@e1
(from snapshot)@e2 - CSS Selector:
,#button-id.class-name - XPath:
//button[@type='submit'] - Text:
(text content)Login
Type Text
curl -X POST 'http://localhost:8080/api/v1/executor/type' \ -H 'Content-Type: application/json' \ -d '{"identifier": "@e3", "text": "user@example.com"}'
Extract Data
curl -X POST 'http://localhost:8080/api/v1/executor/extract' \ -H 'Content-Type: application/json' \ -d '{ "selector": ".product-item", "fields": ["text", "href"], "multiple": true }'
Wait for Element
curl -X POST 'http://localhost:8080/api/v1/executor/wait' \ -H 'Content-Type: application/json' \ -d '{"identifier": ".loading", "state": "hidden", "timeout": 10}'
Batch Operations
curl -X POST 'http://localhost:8080/api/v1/executor/batch' \ -H 'Content-Type: application/json' \ -d '{ "operations": [ {"type": "navigate", "params": {"url": "https://example.com"}, "stop_on_error": true}, {"type": "click", "params": {"identifier": "@e1"}, "stop_on_error": true}, {"type": "type", "params": {"identifier": "@e3", "text": "query"}, "stop_on_error": true} ] }'
Instructions
Step-by-step workflow:
-
Discover commands: Call
to see all available operations and their parameters (do this first if unsure).GET /help -
Navigate: Use
to open the target webpage.POST /navigate -
Analyze page: Call
to understand page structure and get element RefIDs.GET /snapshot -
Interact: Use element RefIDs (like
,@e1
) or CSS selectors to:@e2- Click elements:
POST /click - Input text:
POST /type - Select options:
POST /select - Wait for elements:
POST /wait
- Click elements:
-
Extract data: Use
to get information from the page.POST /extract -
Present results: Format and show extracted data to the user.
Complete Example
User Request: "Search for 'laptop' on example.com and get the first 5 results"
Your Actions:
- Navigate to search page:
curl -X POST 'http://localhost:8080/api/v1/executor/navigate' \ -H 'Content-Type: application/json' \ -d '{"url": "https://example.com/search"}'
- Get page structure to find search input:
curl -X GET 'http://localhost:8080/api/v1/executor/snapshot'
Response shows:
@e3 Search (role: textbox) [placeholder: Search...]
- Type search query:
curl -X POST 'http://localhost:8080/api/v1/executor/type' \ -H 'Content-Type: application/json' \ -d '{"identifier": "@e3", "text": "laptop"}'
- Press Enter to submit:
curl -X POST 'http://localhost:8080/api/v1/executor/press-key' \ -H 'Content-Type: application/json' \ -d '{"key": "Enter"}'
- Wait for results to load:
curl -X POST 'http://localhost:8080/api/v1/executor/wait' \ -H 'Content-Type: application/json' \ -d '{"identifier": ".search-results", "state": "visible", "timeout": 10}'
- Extract search results:
curl -X POST 'http://localhost:8080/api/v1/executor/extract' \ -H 'Content-Type: application/json' \ -d '{ "selector": ".result-item", "fields": ["text", "href"], "multiple": true }'
- Present the extracted data:
Found 15 results for 'laptop': 1. Gaming Laptop - $1299 (https://...) 2. Business Laptop - $899 (https://...) ...
Key Commands Reference
Navigation
- Navigate to URLPOST /navigate
- Go back in historyPOST /go-back
- Go forward in historyPOST /go-forward
- Reload current pagePOST /reload
Element Interaction
- Click element (supports: RefIDPOST /click
, CSS selector, XPath, text content)@e1
- Type text into input (supports: RefIDPOST /type
, CSS selector, XPath)@e3
- Select dropdown optionPOST /select
- Hover over elementPOST /hover
- Wait for element state (visible, hidden, enabled)POST /wait
- Press keyboard key (Enter, Tab, Ctrl+S, etc.)POST /press-key
Data Extraction
- Extract data from elements (supports multiple elements, custom fields)POST /extract
- Get element text contentPOST /get-text
- Get input element valuePOST /get-value
- Get page URL and titleGET /page-info
- Get all page textGET /page-text
- Get full HTMLGET /page-content
Page Analysis
- Get accessibility snapshot (⭐ ALWAYS call after navigation)GET /snapshot
- Get all clickable elementsGET /clickable-elements
- Get all input elementsGET /input-elements
Advanced
- Take page screenshot (base64 encoded)POST /screenshot
- Execute JavaScript codePOST /evaluate
- Execute multiple operations in sequencePOST /batch
- Scroll to page bottomPOST /scroll-to-bottom
- Resize browser windowPOST /resize
- Manage browser tabs (list, new, switch, close)POST /tabs
- Intelligently fill multiple form fields at oncePOST /fill-form
Debug & Monitoring
- Get browser console messages (logs, warnings, errors)GET /console-messages
- Get network requests made by the pageGET /network-requests
- Configure JavaScript dialog (alert, confirm, prompt) handlingPOST /handle-dialog
- Upload files to input elementsPOST /file-upload
- Drag and drop elementsPOST /drag
- Close the current page/tabPOST /close-page
Element Identification
You can identify elements using:
-
RefID (Recommended):
,@e1
,@e2@e3- Most reliable method - stable across page changes
- Get RefIDs from
endpoint/snapshot - Valid for 5 minutes after snapshot
- Example:
"identifier": "@e1" - Works with multi-strategy fallback for robustness
-
CSS Selector:
,#id
,.classbutton[type="submit"]- Standard CSS selectors
- Example:
"identifier": "#login-button"
-
XPath:
,//button[@id='login']//a[contains(text(), 'Submit')]- XPath expressions for complex queries
- Example:
"identifier": "//button[@id='login']"
-
Text Content:
,Login
,Sign UpSubmit- Searches buttons and links with matching text
- Example:
"identifier": "Login"
-
ARIA Label: Elements with
attributearia-label- Automatically searched
Guidelines
Before starting:
- Call
if you're unsure about available commands or their parametersGET /help - Ensure browser is started (if not, it will auto-start on first operation)
During automation:
- Always call
after navigation to get page structure and RefIDs/snapshot - Prefer RefIDs (like
) over CSS selectors for reliability and stability@e1 - Re-snapshot after page changes to get updated RefIDs
- Use
for dynamic content that loads asynchronously/wait - Check element states before interaction (visible, enabled)
- Use
for multiple sequential operations to improve efficiency/batch
Error handling:
- If operation fails, check element identifier and try different format
- For timeout errors, increase timeout value
- If element not found, call
again to refresh page structure/snapshot - Explain errors clearly to user with suggested solutions
Data extraction:
- Use
parameter to specify what to extract:fields["text", "href", "src"] - Set
to extract from multiple elementsmultiple: true - Format extracted data in a readable way for user
Complete Workflow Example
Scenario: User wants to login to a website
User: "Please log in to example.com with username 'john' and password 'secret123'"
Your Actions:
Step 1: Navigate to login page
POST http://localhost:8080/api/v1/executor/navigate {"url": "https://example.com/login"}
Step 2: Get page structure
GET http://localhost:8080/api/v1/executor/snapshot
Response:
Clickable Elements: @e1 Login (role: button) Input Elements: @e2 Username (role: textbox) @e3 Password (role: textbox)
Step 3: Enter username
POST http://localhost:8080/api/v1/executor/type {"identifier": "@e2", "text": "john"}
Step 4: Enter password
POST http://localhost:8080/api/v1/executor/type {"identifier": "@e3", "text": "secret123"}
Step 5: Click login button
POST http://localhost:8080/api/v1/executor/click {"identifier": "@e1"}
Step 6: Wait for login success (optional)
POST http://localhost:8080/api/v1/executor/wait {"identifier": ".welcome-message", "state": "visible", "timeout": 10}
Step 7: Inform user
"Successfully logged in to example.com!"
Batch Operation Example
Scenario: Fill out a form with multiple fields
Instead of making 5 separate API calls, use one batch operation:
curl -X POST 'http://localhost:8080/api/v1/executor/batch' \ -H 'Content-Type: application/json' \ -d '{ "operations": [ { "type": "navigate", "params": {"url": "https://example.com/form"}, "stop_on_error": true }, { "type": "type", "params": {"identifier": "#name", "text": "John Doe"}, "stop_on_error": true }, { "type": "type", "params": {"identifier": "#email", "text": "john@example.com"}, "stop_on_error": true }, { "type": "select", "params": {"identifier": "#country", "value": "United States"}, "stop_on_error": true }, { "type": "click", "params": {"identifier": "#submit"}, "stop_on_error": true } ] }'
Best Practices
- Discovery first: If unsure, call
or/help
to learn about commands/help?command=<name> - Structure first: Always call
after navigation to understand the page/snapshot - Use accessibility indices: They're more reliable than CSS selectors (elements might have dynamic classes)
- Wait for dynamic content: Use
before interacting with elements that load asynchronously/wait - Batch when possible: Use
for multiple sequential operations/batch - Handle errors gracefully: Provide clear explanations and suggestions when operations fail
- Verify results: After operations, check if desired outcome was achieved
Common Scenarios
Form Filling
- Navigate to form page
- Get accessibility snapshot to find input elements and their RefIDs
- Use
for each field:/type
,@e1
, etc.@e2 - Use
for dropdowns/select - Click submit button using its RefID
Data Scraping
- Navigate to target page
- Wait for content to load with
/wait - Use
with CSS selector and/extractmultiple: true - Specify fields to extract:
["text", "href", "src"]
Search Operations
- Navigate to search page
- Get accessibility snapshot to locate search input
- Type search query into input
- Press Enter or click search button
- Wait for results
- Extract results data
Login Automation
- Navigate to login page
- Get accessibility snapshot to find RefIDs
- Type username:
@e2 - Type password:
@e3 - Click login button:
@e1 - Wait for success indicator
Important Notes
- Browser must be running (it will auto-start on first operation if needed)
- Operations are executed on the currently active browser tab
- Accessibility snapshot updates after each navigation and click operation
- All timeouts are in seconds
- Use
(default) for reliable element interactionwait_visible: true - Replace
with actual API host addresslocalhost:8080 - Authentication required: use
header or JWT tokenX-BrowserWing-Key
Troubleshooting
Element not found:
- Call
to see available elements/snapshot - Try different identifier format (accessibility index, CSS selector, text)
- Check if page has finished loading
Timeout errors:
- Increase timeout value in request
- Check if element actually appears on page
- Use
with appropriate state before interaction/wait
Extraction returns empty:
- Verify CSS selector matches target elements
- Check if content has loaded (use
first)/wait - Try different extraction fields or type
Quick Reference
# Discover commands GET localhost:8080/api/v1/executor/help # Navigate POST localhost:8080/api/v1/executor/navigate {"url": "..."} # Get page structure GET localhost:8080/api/v1/executor/snapshot # Click element POST localhost:8080/api/v1/executor/click {"identifier": "@e1"} # Type text POST localhost:8080/api/v1/executor/type {"identifier": "@e3", "text": "..."} # Extract data POST localhost:8080/api/v1/executor/extract {"selector": "...", "fields": [...], "multiple": true}
Response Format
All operations return:
{ "success": true, "message": "Operation description", "timestamp": "2026-01-15T10:30:00Z", "data": { // Operation-specific data } }
Error response:
{ "error": "error.operationFailed", "detail": "Detailed error message" }