Learn-skills.dev browser-use-integration
Self-hosted AI browser automation using Browser Use with any LLM (Claude, GPT, Ollama). Use when building web scraping agents, data extraction pipelines, self-hosted automation, or when you need flexibility without API rate limits.
install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/adaptationio/skrillz/browser-use-integration" ~/.claude/skills/neversight-learn-skills-dev-browser-use-integration && rm -rf "$T"
manifest:
data/skills-md/adaptationio/skrillz/browser-use-integration/SKILL.mdsource content
Browser Use Integration
Overview
Browser Use is an open-source AI browser automation framework that works with any LLM. Unlike cloud-dependent solutions, you can self-host for unlimited usage with local models.
Key Advantages:
- Open Source: No API rate limits or vendor lock-in
- Any LLM: Claude, GPT-4, Ollama (local), and more
- Self-Hosted: Run on your infrastructure
- 3-5x Faster: Optimized for browser tasks
Quick Start (10 Minutes)
1. Install Browser Use
# Create virtual environment python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install Browser Use pip install browser-use # Install LLM provider (choose one) pip install langchain-anthropic # For Claude pip install langchain-openai # For GPT-4 pip install langchain-ollama # For local models
2. Configure API Key
# For Claude export ANTHROPIC_API_KEY=your_key_here # For OpenAI export OPENAI_API_KEY=your_key_here # For Ollama (no key needed, just run Ollama locally) ollama serve
3. Write First Agent
# agent.py import asyncio from browser_use import Agent from langchain_anthropic import ChatAnthropic async def main(): agent = Agent( task="Go to google.com and search for 'Browser Use AI automation'", llm=ChatAnthropic(model="claude-sonnet-4-20250514"), ) result = await agent.run() print(result) asyncio.run(main())
4. Run
python agent.py
LLM Configuration
Claude (Recommended)
from langchain_anthropic import ChatAnthropic # Claude Sonnet (best balance) llm = ChatAnthropic( model="claude-sonnet-4-20250514", api_key=os.environ.get("ANTHROPIC_API_KEY"), ) # Claude Opus (highest quality) llm = ChatAnthropic(model="claude-opus-4-20250514") # Claude Haiku (fastest, cheapest) llm = ChatAnthropic(model="claude-3-5-haiku-20241022")
OpenAI
from langchain_openai import ChatOpenAI # GPT-4o llm = ChatOpenAI( model="gpt-4o", api_key=os.environ.get("OPENAI_API_KEY"), ) # GPT-4 Turbo llm = ChatOpenAI(model="gpt-4-turbo-preview")
Ollama (Free, Local)
# First, install and run Ollama ollama serve # Pull a model ollama pull llama3.2
from langchain_ollama import ChatOllama # Local Llama 3.2 llm = ChatOllama( model="llama3.2", base_url="http://localhost:11434", ) # Local Mistral llm = ChatOllama(model="mistral") # Local Code Llama llm = ChatOllama(model="codellama")
Cost Comparison
| LLM | Cost per 1M tokens | Best For |
|---|---|---|
| Claude Haiku | ~$0.25 | Simple tasks |
| Claude Sonnet | ~$3.00 | Complex tasks |
| GPT-4o | ~$5.00 | General use |
| Ollama | Free | Unlimited local |
Agent Patterns
Simple Task
agent = Agent( task="Search for 'Python tutorials' on YouTube and get the top 5 video titles", llm=llm, ) result = await agent.run()
Multi-Step Task
agent = Agent( task=""" 1. Go to amazon.com 2. Search for 'wireless mouse' 3. Filter by 4+ star rating 4. Extract the top 5 products with name, price, and rating 5. Return as JSON """, llm=llm, ) result = await agent.run()
Task with Extraction Schema
from pydantic import BaseModel from typing import List class Product(BaseModel): name: str price: float rating: float url: str class ProductList(BaseModel): products: List[Product] agent = Agent( task="Find the top 5 laptops on BestBuy under $1000", llm=llm, output_schema=ProductList, # Structured output ) result = await agent.run() # result.products is List[Product]
With Custom Browser Settings
from browser_use import Agent, Browser browser = Browser( headless=False, # Show browser proxy="http://proxy.example.com:8080", # Use proxy ) agent = Agent( task="Navigate to example.com", llm=llm, browser=browser, )
Error Handling
import asyncio from browser_use import Agent, AgentError async def run_with_retry(task: str, max_retries: int = 3): for attempt in range(max_retries): try: agent = Agent(task=task, llm=llm) result = await agent.run() return result except AgentError as e: print(f"Attempt {attempt + 1} failed: {e}") if attempt == max_retries - 1: raise await asyncio.sleep(2 ** attempt) # Exponential backoff # Usage result = await run_with_retry("Search Google for 'AI news'")
Timeout Handling
async def run_with_timeout(task: str, timeout: int = 60): agent = Agent(task=task, llm=llm) try: result = await asyncio.wait_for(agent.run(), timeout=timeout) return result except asyncio.TimeoutError: print(f"Task timed out after {timeout}s") return None
Self-Hosting
Docker Setup
# Dockerfile FROM python:3.11-slim # Install Chrome RUN apt-get update && apt-get install -y \ wget gnupg \ && wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add - \ && echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list \ && apt-get update \ && apt-get install -y google-chrome-stable \ && rm -rf /var/lib/apt/lists/* WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["python", "agent.py"]
# requirements.txt browser-use langchain-anthropic langchain-ollama
Docker Compose with Ollama
# docker-compose.yml version: '3.8' services: ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama-data:/root/.ollama deploy: resources: reservations: devices: - capabilities: [gpu] # If GPU available browser-agent: build: . environment: - OLLAMA_HOST=http://ollama:11434 depends_on: - ollama volumes: ollama-data:
Run
# Build and run docker-compose up -d # View logs docker-compose logs -f browser-agent
Use Cases
1. Web Scraping
agent = Agent( task=""" Go to news.ycombinator.com Extract the top 30 stories with: title, points, comments, and URL Return as JSON array """, llm=llm, )
2. Form Automation
agent = Agent( task=""" Go to example.com/contact Fill the form: - Name: John Doe - Email: john@example.com - Message: I'm interested in your services Submit the form """, llm=llm, )
3. Price Monitoring
agent = Agent( task=""" Check the price of 'Sony WH-1000XM5' on: 1. Amazon 2. BestBuy 3. Walmart Return prices from each site """, llm=llm, )
4. Competitor Research
agent = Agent( task=""" Visit competitor.com Extract: - Pricing tiers - Feature list - Customer testimonials Format as structured report """, llm=llm, )
5. Data Entry
# Batch process data entry data_entries = [ {"name": "Product A", "price": 99.99}, {"name": "Product B", "price": 149.99}, ] for entry in data_entries: agent = Agent( task=f""" Go to admin.example.com/products/new Add product: {entry['name']} with price ${entry['price']} Save and confirm """, llm=llm, ) await agent.run()
Best Practices
1. Be Specific
# BAD - vague agent = Agent(task="Find products", llm=llm) # GOOD - specific agent = Agent( task="Go to amazon.com, search for 'mechanical keyboard', filter by 4+ stars, extract top 5 with name and price", llm=llm, )
2. Use Structured Output
from pydantic import BaseModel class SearchResult(BaseModel): title: str url: str snippet: str agent = Agent( task="Search Google for 'AI news' and get top 5 results", llm=llm, output_schema=SearchResult, # Type-safe output )
3. Handle Authentication
# Option 1: Include credentials in task agent = Agent( task=""" Go to app.example.com/login Login with email 'user@example.com' and password 'secure123' Navigate to dashboard """, llm=llm, ) # Option 2: Use cookies/session (more secure) browser = Browser() await browser.load_cookies("session_cookies.json") agent = Agent(task="...", llm=llm, browser=browser)
4. Rate Limiting
import asyncio async def run_with_rate_limit(tasks: list, rate_per_minute: int = 10): delay = 60 / rate_per_minute results = [] for task in tasks: agent = Agent(task=task, llm=llm) result = await agent.run() results.append(result) await asyncio.sleep(delay) return results
Comparison: Browser Use vs Stagehand
| Feature | Browser Use | Stagehand |
|---|---|---|
| Language | Python | TypeScript |
| Self-Hosted | Yes | Yes |
| Local LLM | Yes (Ollama) | Limited |
| Speed | 3-5x optimized | 44% faster (v3) |
| Best For | Python scraping | TypeScript testing |
| Learning Curve | Easy | Medium |
When to use Browser Use:
- Python projects
- Need local LLM (Ollama)
- Web scraping focus
- Cost optimization (free with Ollama)
When to use Stagehand:
- TypeScript/Node.js projects
- Testing focus
- Claude integration priority
- Self-healing tests
References
- Complete installation guidereferences/browser-use-setup.md
- LLM setup for all providersreferences/llm-configuration.md
Browser Use gives you AI browser automation with full control - self-host with any LLM, no rate limits, no vendor lock-in.