Skills cf-crawl
Crawl websites using Cloudflare Browser Rendering /crawl API. Async multi-page crawl with markdown/HTML/JSON output, link following, pattern filtering, and AI-powered structured data extraction. Use when crawling entire sites or multiple pages, building knowledge bases, extracting structured data from websites, or when web_fetch is insufficient (JS rendering, multi-page, authenticated crawls).
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bill492/cf-crawl" ~/.claude/skills/clawdbot-skills-cf-crawl && rm -rf "$T"
manifest:
skills/bill492/cf-crawl/SKILL.mdsource content
Cloudflare /crawl
Async site crawler via CF Browser Rendering API. Start a job → poll for results → get markdown/HTML/JSON per page.
Quick Start
# Crawl a site (5 pages, markdown, no JS rendering = fast + free) bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 5 --format markdown # With JS rendering (for SPAs, dynamic content) bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --render --limit 10 # Start only (get job ID, poll later) bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 100 --start-only # Poll existing job bash ~/clawd/skills/cf-crawl/scripts/poll.sh <job-id>
Credentials
Stored at
~/.clawdbot/secrets/cloudflare-crawl.env:
CF_ACCOUNT_ID=<account_id> CF_CRAWL_API_TOKEN=<token_with_read_and_edit>
Key Options
| Option | Description |
|---|---|
| Max pages (default 10) |
| Max link depth (default 10) |
| Output format (default markdown) |
| Enable headless browser (default: off = fast fetch, free during beta) |
| Wildcard URL pattern to include (repeatable) |
| Wildcard URL pattern to exclude (repeatable) |
| Follow external domain links |
| Follow subdomain links |
| URL discovery method |
| AI extraction prompt (with ) |
| JSON schema for structured extraction |
| Max poll wait (default 300s) |
| Write full results to file |
| Output raw API response |
| Print job ID without polling |
Common Patterns
Crawl docs site for knowledge base
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://docs.example.com/" \ --limit 50 --depth 3 --format markdown --output docs.json
Crawl with URL filtering
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/" \ --include "/docs/**" --exclude "/docs/archive/**" --limit 20
AI-powered structured extraction
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/products" \ --format json --render \ --json-prompt "Extract product name, price, and description" \ --json-schema schema.json
Long-running crawl (background)
JOB_ID=$(bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://big-site.com" \ --limit 1000 --start-only) # Check later: bash ~/clawd/skills/cf-crawl/scripts/poll.sh "$JOB_ID"
Cost Notes
(default) — fast HTML fetch, free during betarender: false
— uses Browser Rendering minutes (paid)render: true
— uses Workers AI tokens for extraction (paid)format json- Results cached in R2 with
(default 24hr)--max-age
API Details
See
references/api-reference.md for full parameter documentation, response schema, and lifecycle details.