Skills crawlee-web-scraper
Resilient web scraper with bot-detection evasion using the Crawlee library. Use when web_fetch is blocked by rate limits or bot detection. Supports single URLs, bulk file input, and automatic fallback from requests to Crawlee on 403/429 responses.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bryantegomoh/crawlee-web-scraper" ~/.claude/skills/clawdbot-skills-crawlee-web-scraper && rm -rf "$T"
manifest:
skills/bryantegomoh/crawlee-web-scraper/SKILL.mdsource content
crawlee-web-scraper
Drop-in replacement for
web_fetch when sites block automated requests. Crawlee handles session management, retry logic, and bot-detection evasion automatically.
Scripts
— main scraper; accepts a single URL or a file of URLs; returns JSONcrawlee_fetch.py
— library helper; triescrawlee_http.py
first, falls back to Crawlee on 403/429/503requests
Usage
# Single URL, return HTML preview python3 scripts/crawlee_fetch.py --url "https://example.com" # Single URL, extract text (strips HTML tags) python3 scripts/crawlee_fetch.py --url "https://example.com" --extract-text # Bulk scrape from file python3 scripts/crawlee_fetch.py --urls-file urls.txt --output results.json
Library usage
from crawlee_http import fetch_with_fallback resp = fetch_with_fallback("https://example.com") print(resp.status_code, resp.text[:500])
Output
JSON array with one object per URL:
[ { "url": "https://example.com", "status": 200, "fetched_at": "2026-01-01T00:00:00Z", "length": 12345, "text": "Page content..." } ]
Installation
pip install crawlee requests
When to use
returns 403 / 429 / emptyweb_fetch- Bulk scraping 10+ URLs
- Sites using Cloudflare or similar bot protection