Skills firecrawl
Web scraping and content extraction using Firecrawl API. Use when users need to crawl websites, extract structured data, convert web pages to markdown, scrape multiple URLs, or build knowledge bases from web content. Supports single page extraction, site-wide crawling, batch processing, and structured data extraction with CSS selectors.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/antonia-sz/web-scraper-firecrawl" ~/.claude/skills/openclaw-skills-firecrawl && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/antonia-sz/web-scraper-firecrawl" ~/.openclaw/skills/openclaw-skills-firecrawl && rm -rf "$T"
manifest:
skills/antonia-sz/web-scraper-firecrawl/SKILL.mdsource content
Firecrawl Skill
Powerful web scraping powered by Firecrawl - turn websites into LLM-ready markdown.
Overview
Firecrawl provides APIs for:
- Scrape - Single page extraction to markdown
- Crawl - Entire site crawling with depth control
- Map - URL discovery from a starting point
- Batch - Multiple URL processing
- Extract - Structured data extraction with schemas
Prerequisites
- Firecrawl API Key - Get free tier at https://firecrawl.dev
- Install Python dependencies:
requests
Configuration
Set environment variable:
export FIRECRAWL_API_KEY="fc-your-api-key"
Usage
Single Page Scraping
# Basic scrape firecrawl scrape https://example.com # With specific options firecrawl scrape https://example.com --formats markdown,html --only-main-content # Wait for JS rendering firecrawl scrape https://spa-app.com --wait-for 2000
Site Crawling
# Crawl entire site (up to limit) firecrawl crawl https://docs.example.com --limit 50 # With depth control firecrawl crawl https://blog.example.com --max-depth 2 --limit 100 # Include/exclude patterns firecrawl crawl https://site.com --include "/blog/*" --exclude "/admin/*" # Custom formats firecrawl crawl https://docs.example.com --formats markdown,links
URL Mapping
# Discover all URLs from a site firecrawl map https://example.com # With search term firecrawl map https://docs.python.org --search "tutorial"
Batch Processing
# Scrape multiple URLs firecrawl batch urls.txt --output ./scraped/ # From JSON list firecrawl batch urls.json --formats markdown --concurrency 5
Structured Extraction
# Extract specific data using CSS selectors firecrawl extract https://example.com/products \ --schema '{"name": ".product-title", "price": ".price", "description": ".desc"}' # Extract to JSON firecrawl extract https://news.example.com/article --schema article-schema.json
Output Formats
Markdown
Clean, LLM-ready markdown with:
- Headings preserved
- Links converted to markdown format
- Images with alt text
- Tables formatted as markdown tables
HTML
Raw or cleaned HTML
Links
Extracted link lists for further crawling
Screenshot
Page screenshot (if requested)
Use Cases
Knowledge Base Building
# Crawl documentation site firecrawl crawl https://docs.framework.com --limit 200 -o ./kb/ # Merge into single file for RAG cat ./kb/*.md > knowledge-base.md
Research & Analysis
# Scrape competitor pricing firecrawl batch competitors.txt --extract pricing-schema.json # Monitor blog updates firecrawl map https://blog.company.com --since 2024-01-01
Content Migration
# Export old CMS content firecrawl crawl https://old-site.com --formats markdown,html -o ./export/
Scripts
All functionality via
scripts/firecrawl.py:
- Handles API authentication
- Automatic rate limiting
- Retry logic for failures
- Progress tracking for large crawls
Integration
Works well with:
- Sync scraped content to Notion/GitHubmarkdown-sync-pro
- Combine with academic paper downloadsarxiv-paper
- Scrape financial data for analysismaybe-finance