openclaw-ultra-scraping

install
source · Clone the upstream repo
git clone https://github.com/LeoYeAI/openclaw-ultra-scraping
Claude Code · Install into ~/.claude/skills/
git clone --depth=1 https://github.com/LeoYeAI/openclaw-ultra-scraping ~/.claude/skills/leoyeai-openclaw-ultra-scraping-openclaw-ultra-scraping
OpenClaw · Install into ~/.openclaw/skills/
git clone --depth=1 https://github.com/LeoYeAI/openclaw-ultra-scraping ~/.openclaw/skills/leoyeai-openclaw-ultra-scraping-openclaw-ultra-scraping
manifest: SKILL.md
source content

OpenClaw Ultra Scraping

Adaptive web scraping framework for OpenClaw agents. Handles everything from single-page extraction to full-scale concurrent crawls with anti-bot bypass.

Setup

Run once before first use:

bash scripts/setup.sh

This installs all dependencies + browser engines into

/opt/scrapling-venv
.

Quick Start — CLI Script

The bundled

scripts/scrape.py
provides a unified CLI:

PYTHON=/opt/scrapling-venv/bin/python3

# Simple fetch (JSON output)
$PYTHON scripts/scrape.py fetch "https://example.com" --css ".content"

# Extract text
$PYTHON scripts/scrape.py extract "https://example.com" --css "h1"

# Stealth mode (bypass Cloudflare)
$PYTHON scripts/scrape.py fetch "https://protected-site.com" --stealth --solve-cloudflare --css ".data"

# Dynamic (full browser rendering)
$PYTHON scripts/scrape.py fetch "https://spa-site.com" --dynamic --css ".product"

# Extract links
$PYTHON scripts/scrape.py links "https://example.com" --filter "\.pdf$"

# Multi-page crawl
$PYTHON scripts/scrape.py crawl "https://example.com" --depth 2 --concurrency 10 --css ".item" -o results.json

# Output formats: json, jsonl, csv, text, markdown, html
$PYTHON scripts/scrape.py fetch "https://example.com" -f markdown -o page.md

Quick Start — Python

For complex tasks, write Python directly using the venv:

#!/opt/scrapling-venv/bin/python3
from scrapling.fetchers import Fetcher, StealthyFetcher

# Simple HTTP
page = Fetcher.get('https://example.com', impersonate='chrome')
titles = page.css('h1::text').getall()

# Bypass Cloudflare
page = StealthyFetcher.fetch('https://protected.com', headless=True, solve_cloudflare=True)
data = page.css('.product').getall()

Fetcher Selection Guide

ScenarioFetcherFlag
Normal sites, fast scraping
Fetcher
(default)
JS-rendered SPAs
DynamicFetcher
--dynamic
Cloudflare/anti-bot protected
StealthyFetcher
--stealth
Cloudflare Turnstile challenge
StealthyFetcher
--stealth --solve-cloudflare

Selector Cheat Sheet

page.css('.class')                    # CSS
page.css('.class::text').getall()     # Text extraction
page.xpath('//div[@id="main"]')      # XPath
page.find_all('div', class_='item')  # BS4-style
page.find_by_text('keyword')         # Text search
page.css('.item', adaptive=True)     # Adaptive (survives redesigns)

Advanced Features

  • Adaptive tracking:
    auto_save=True
    on first run,
    adaptive=True
    later — elements are found even after site redesign
  • Proxy rotation: Pass
    proxy="http://host:port"
    or use
    ProxyRotator
  • Sessions:
    FetcherSession
    ,
    StealthySession
    ,
    DynamicSession
    for cookie/state persistence
  • Spider framework: Scrapy-like concurrent crawling with pause/resume
  • Async support: All fetchers have async variants

For full API details: read

references/api-reference.md