Openakita agent-browser
Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: web scraping, form filling, clicking, typing, drag-drop, file upload, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet, record video
git clone https://github.com/openakita/openakita
T=$(mktemp -d) && git clone --depth=1 https://github.com/openakita/openakita "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/agent-browser/skills/agentic-browser" ~/.claude/skills/openakita-openakita-agent-browser && rm -rf "$T"
skills/agent-browser/skills/agentic-browser/SKILL.mdAgentic Browser
Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple
@e ref system for element interaction.

Quick Start
# Install CLI curl -fsSL https://cli.inference.sh | sh && infsh login # Open a page and get interactive elements infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new
Install note: The install script only detects your OS/architecture, downloads the matching binary from
, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.dist.inference.sh
Core Workflow
Every browser automation follows this pattern:
- Open - Navigate to URL, get
refs for elements@e - Interact - Use refs to click, fill, drag, etc.
- Re-snapshot - After navigation/changes, get fresh refs
- Close - End session (returns video if recording)
# 1. Start session RESULT=$(infsh app run agent-browser --function open --session new --input '{ "url": "https://example.com/login" }') SESSION_ID=$(echo $RESULT | jq -r '.session_id') # Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In" # 2. Fill and submit infsh app run agent-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e1", "text": "user@example.com" }' infsh app run agent-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e2", "text": "password123" }' infsh app run agent-browser --function interact --session $SESSION_ID --input '{ "action": "click", "ref": "@e3" }' # 3. Re-snapshot after navigation infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}' # 4. Close when done infsh app run agent-browser --function close --session $SESSION_ID --input '{}'
Functions
| Function | Description |
|---|---|
| Navigate to URL, configure browser (viewport, proxy, video recording) |
| Re-fetch page state with refs after DOM changes |
| Perform actions using refs (click, fill, drag, upload, etc.) |
| Take page screenshot (viewport or full page) |
| Run JavaScript code on the page |
| Close session, returns video if recording was enabled |
Interact Actions
| Action | Description | Required Fields |
|---|---|---|
| Click element | |
| Double-click element | |
| Clear and type text | , |
| Type text (no clear) | |
| Press key (Enter, Tab, etc.) | |
| Select dropdown option | , |
| Hover over element | |
| Check checkbox | |
| Uncheck checkbox | |
| Drag and drop | , |
| Upload file(s) | , |
| Scroll page | (up/down/left/right), |
| Go back in history | - |
| Wait milliseconds | |
| Navigate to URL | |
Element Refs
Elements are returned with
@e refs:
@e1 [a] "Home" href="/" @e2 [input type="text"] placeholder="Search" @e3 [button] "Submit" @e4 [select] "Choose option" @e5 [input type="checkbox"] name="agree"
Important: Refs are invalidated after navigation. Always re-snapshot after:
- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading
Features
Video Recording
Record browser sessions for debugging or documentation:
# Start with recording enabled (optionally show cursor indicator) SESSION=$(infsh app run agent-browser --function open --session new --input '{ "url": "https://example.com", "record_video": true, "show_cursor": true }' | jq -r '.session_id') # ... perform actions ... # Close to get the video file infsh app run agent-browser --function close --session $SESSION --input '{}' # Returns: {"success": true, "video": <File>}
Cursor Indicator
Show a visible cursor in screenshots and video (useful for demos):
infsh app run agent-browser --function open --session new --input '{ "url": "https://example.com", "show_cursor": true, "record_video": true }'
The cursor appears as a red dot that follows mouse movements and shows click feedback.
Proxy Support
Route traffic through a proxy server:
infsh app run agent-browser --function open --session new --input '{ "url": "https://example.com", "proxy_url": "http://proxy.example.com:8080", "proxy_username": "user", "proxy_password": "pass" }'
File Upload
Upload files to file inputs:
infsh app run agent-browser --function interact --session $SESSION --input '{ "action": "upload", "ref": "@e5", "file_paths": ["/path/to/file.pdf"] }'
Drag and Drop
Drag elements to targets:
infsh app run agent-browser --function interact --session $SESSION --input '{ "action": "drag", "ref": "@e1", "target_ref": "@e2" }'
JavaScript Execution
Run custom JavaScript:
infsh app run agent-browser --function execute --session $SESSION --input '{ "code": "document.querySelectorAll(\"h2\").length" }' # Returns: {"result": "5", "screenshot": <File>}
Deep-Dive Documentation
| Reference | Description |
|---|---|
| references/commands.md | Full function reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Session persistence, parallel sessions |
| references/authentication.md | Login flows, OAuth, 2FA handling |
| references/video-recording.md | Recording workflows for debugging |
| references/proxy-support.md | Proxy configuration, geo-testing |
Ready-to-Use Templates
| Template | Description |
|---|---|
| templates/form-automation.sh | Form filling with validation |
| templates/authenticated-session.sh | Login once, reuse session |
| templates/capture-workflow.sh | Content extraction with screenshots |
Examples
Form Submission
SESSION=$(infsh app run agent-browser --function open --session new --input '{ "url": "https://example.com/contact" }' | jq -r '.session_id') # Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send" infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}' infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}' infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}' infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}' infsh app run agent-browser --function snapshot --session $SESSION --input '{}' infsh app run agent-browser --function close --session $SESSION --input '{}'
Search and Extract
SESSION=$(infsh app run agent-browser --function open --session new --input '{ "url": "https://google.com" }' | jq -r '.session_id') infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}' infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}' infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}' infsh app run agent-browser --function snapshot --session $SESSION --input '{}' infsh app run agent-browser --function close --session $SESSION --input '{}'
Screenshot with Video
SESSION=$(infsh app run agent-browser --function open --session new --input '{ "url": "https://example.com", "record_video": true }' | jq -r '.session_id') # Take full page screenshot infsh app run agent-browser --function screenshot --session $SESSION --input '{ "full_page": true }' # Close and get video RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}') echo $RESULT | jq '.video'
Sessions
Browser state persists within a session. Always:
- Start with
on first call--session new - Use returned
for subsequent callssession_id - Close session when done
Related Skills
# Web search (for research + browse) npx skills add inference-sh/skills@web-search # LLM models (analyze extracted content) npx skills add inference-sh/skills@llm-models
Documentation
- inference.sh Sessions - Session management
- Multi-function Apps - How functions work