Claude-code-minoan firecrawl
Scrape web pages to clean markdown using Firecrawl v2 — handles JS-heavy pages, site crawls, URL mapping, LLM-powered extraction, autonomous agent scraping, and post-scrape browser interaction (Interact API). Prefer over WebFetch for quality and completeness. Triggers on scrape URL, fetch page, crawl site, extract content, web to markdown, DeepWiki, Firecrawl.
git clone https://github.com/tdimino/claude-code-minoan
T=$(mktemp -d) && git clone --depth=1 https://github.com/tdimino/claude-code-minoan "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/research/firecrawl" ~/.claude/skills/tdimino-claude-code-minoan-firecrawl && rm -rf "$T"
skills/research/firecrawl/SKILL.mdFirecrawl & Jina Web Scraping
Firecrawl vs WebFetch
Prefer
firecrawl scrape URL --only-main-content over the WebFetch tool—it produces cleaner markdown, handles JavaScript-heavy pages, and avoids content truncation (>80% benchmark coverage). WebFetch is acceptable as a fallback when Firecrawl is unavailable.
# Preferred approach: firecrawl scrape https://docs.example.com/api --only-main-content
Token-Efficient Scraping
Inspired by Anthropic's dynamic filtering—always filter before reasoning. This reduced input tokens by ~24% and improved accuracy by ~11% in their benchmarks.
The Principle: Search → Filter → Scrape → Filter → Reason
DO:
Search (titles/URLs only) → Evaluate relevance → Scrape top hits → Filter by section → Reason
DON'T:
Search → Scrape everything → Reason over all of it
Step-by-Step Efficient Workflow
# Step 1: Search — get titles/URLs only (cheap) firecrawl search "query" --limit 20 # Step 2: Evaluate results, pick 3-5 best URLs # Step 3: Scrape only those, filter to relevant sections firecrawl scrape URL1 --only-main-content | \ python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py \ --sections "API,Authentication" --max-chars 5000
Post-Processing with filter_web_results.py
Pipe any Firecrawl or Exa output through this script to reduce context before reasoning:
# Extract only matching sections from scraped page firecrawl scrape URL --only-main-content | \ python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "Pricing,Plans" # Keep only paragraphs with keywords firecrawl search "query" --scrape --pretty | \ python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --keywords "pricing,cost" --max-chars 5000 # Extract specific JSON fields from API output python3 ~/.claude/skills/exa-search/scripts/exa_search.py "query" --json | \ python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --fields "title,url,text" --max-chars 3000 # Combine filters with stats firecrawl scrape URL --only-main-content | \ python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "API" --keywords "endpoint" --compact --stats
Full path:
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py
Flags: --sections, --keywords, --max-chars, --max-lines, --fields (JSON), --strip-links, --strip-images, --compact, --stats
Other Token-Saving Patterns
- Use
to strip navigation and footer boilerplate, reducing token consumption. Omit only when nav/footer content is specifically needed.--only-main-content - Use
first to find relevant subpages before scrapingfirecrawl map URL --search "topic" - Use
first to get URL list, evaluate, then scrape selectively--format links - Use
with--max-chars
to cap extraction lengthexa_contents.py - Use
(Python API script) over full text when you need the gist, not raw content--formats summary
Claude API Native Tools (for API Agent Builders)
Anthropic's API now offers built-in dynamic filtering tools:
web_search_20260209 / web_fetch_20260209 Header: anthropic-beta: code-execution-web-tools-2026-02-09
These have built-in dynamic filtering via code execution. Use them when building Claude API agents directly. Use Firecrawl/Exa when you need: autonomous agents, batch scraping, structured extraction, domain-specific crawling, or when not on the Claude API.
Available Tools
1. Official Firecrawl CLI (firecrawl
) — Primary
firecrawlSetup:
npm install -g firecrawl-cli && firecrawl login --api-key $FIRECRAWL_API_KEY
| Command | Purpose | Quick Example |
|---|---|---|
| Single page → markdown | |
| Entire site with progress | |
| Discover all URLs on a site | |
| Web search (+ optional scrape) | |
Full CLI reference:
references/cli-reference.md
2. Auto-Save Alias (fc-save
) — Shell Alias
fc-saveRequires shell alias setup (not bundled with this skill).
fc-save URL # → Saves to ~/Desktop/Screencaps & Chats/Web-Scrapes/docs-example-com-api.md
3. Python API Script (firecrawl_api.py
) — Advanced Features
firecrawl_api.pyCommand:
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py <command>
Requires: FIRECRAWL_API_KEY env var, pip install firecrawl-py requests
| Command | Purpose | Quick Example |
|---|---|---|
| Web search with scraping | |
| Single URL with page actions | |
| Multiple URLs concurrently | |
| Website crawling | |
| URL discovery | |
| LLM-powered structured extraction | |
| Autonomous extraction (no URLs needed) | |
| Bulk agent queries (v2.8.0+) | |
| Post-scrape browser interaction | |
| Stop an interact session | |
Agent models:
spark-1-fast (10 credits, simple), spark-1-mini (default), spark-1-pro (thorough)
Full Python API reference:
references/python-api-reference.md
4. DeepWiki — GitHub Repo Documentation
~/.claude/skills/firecrawl/scripts/deepwiki.sh <owner/repo> [section] [options]
AI-generated wiki for any public GitHub repo. No API key required.
# Overview ~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat # Browse sections ~/.claude/skills/firecrawl/scripts/deepwiki.sh langchain-ai/langchain --toc # Specific section ~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat 4.1-gpt-transformer-implementation # Full dump for RAG ~/.claude/skills/firecrawl/scripts/deepwiki.sh openai/openai-python --all --save
5. Jina Reader (jina
) — Fallback
jinaUse when Firecrawl fails or for Twitter/X URLs (Firecrawl blocks Twitter, Jina works).
jina https://x.com/username/status/123456
Firecrawl vs Exa vs Native Claude Tools
| Need | Best Tool | Why |
|---|---|---|
| Single page → markdown | | Cleanest output |
| Search + scrape in one shot | | Combined operation |
| Crawl entire site | | Link following + progress |
| Autonomous data finding | | No URLs needed |
| Semantic/neural search | Exa | AI-powered relevance |
| Find research papers | Exa | Academic index |
| Quick research answer | Exa | Citations + synthesis |
| Find similar pages | Exa | Competitive analysis |
| Claude API agent building | Native | Built-in dynamic filtering |
| Twitter/X content | | Only tool that works |
| GitHub repo docs | | AI-generated wiki |
| Anti-bot / Cloudflare bypass | stealth fetch | Local Turnstile solver |
| Element-level extraction | + CSS selectors | Precision targeting, adaptive tracking |
| No API key scraping | HTTP fetch | 100% local, no credentials |
| Site redesign resilience | adaptive mode | SQLite similarity matching |
| Budget JS-rendered scrape | | CF free tier: 10 min/day, $0.09/hr paid |
| Free static page fetch | | FREE during beta (no JS) |
| Budget multi-page crawl | | 5 free crawls/day, 100 pages each |
| Incremental re-crawl | | Built-in, Firecrawl lacks this |
| Page screenshot/PDF | | Built-in CF endpoints, cheaper |
| AI structured extraction | | Workers AI included free |
Common Workflows
Single Page Scraping
firecrawl scrape https://example.com/page --only-main-content # Or auto-save: fc-save URL # Or to file: firecrawl scrape URL --only-main-content -o page.md
Documentation Crawling
# Map first, then crawl relevant paths firecrawl map https://docs.example.com --search "API" firecrawl crawl https://docs.example.com --include-paths /api,/guides --wait --progress
Research Workflow
firecrawl search "machine learning best practices 2026" --scrape --scrape-formats markdown
Agent-Powered Research (No URLs Needed)
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py agent \ "Compare pricing tiers for Firecrawl, Apify, and ScrapingBee"
Interact Workflows (Post-Scrape Browser Interaction)
Scrape a page, then take actions on it—click buttons, fill forms, extract dynamic content. Two modes: AI prompts (natural language) and code execution (Node.js/Python/Bash).
When to Use Interact vs. Actions
| Need | Use | Why |
|---|---|---|
| Click/wait before a single scrape | on scrape | Fire-and-forget, no session overhead |
| Multiple interactions with same page | | Persistent session, back-and-forth |
| Fill forms, log in, navigate | | Stateful, multi-step |
| Simple "wait for JS to load" | with | Cheaper, no session |
Basic Interact (AI Prompt Mode)
# Step 1: Scrape and note the Scrape ID from output python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://example.com/pricing" # Step 2: Interact using natural language python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \ --prompt "Click the Enterprise pricing tab" # Step 3: More interactions on same session python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \ --prompt "What is the monthly price for the Enterprise plan?" # Step 4: Stop when done python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact-stop SCRAPE_ID
Code Execution Mode (Cheaper)
# Execute Playwright code directly (2 credits/min vs 7 for prompts) python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \ --code "const text = await page.locator('.pricing-table').textContent(); console.log(text);" # Python mode python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \ --code "text = await page.locator('.content').text_content(); print(text)" \ --language python
Persistent Profile (Login Sessions)
# Scrape with a named profile — browser state persists across sessions python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://app.example.com/login" \ --profile my-app --json # Interact to log in python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \ --code "await page.fill('#email', 'user@example.com'); await page.fill('#password', 'pass'); await page.click('button[type=submit]');" # Later: scrape another page with same profile — cookies restored python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://app.example.com/dashboard" \ --profile my-app
Important: Interact does NOT return page markdown. To get updated content after interaction, use code mode to extract specific elements, or issue a follow-up scrape.
Full interact reference:
references/interact-reference.md
Troubleshooting
# Check status and credits firecrawl --status && firecrawl credit-usage # Re-authenticate firecrawl logout && firecrawl login --api-key $FIRECRAWL_API_KEY # Check API key echo $FIRECRAWL_API_KEY
- Scrape fails: Try
, or addjina URL
for JS-heavy sites--wait-for 3000 - Async job stuck: Check with
/crawl-status
, cancel withbatch-status
/crawl-cancelbatch-cancel - Disable telemetry:
export FIRECRAWL_NO_TELEMETRY=1
Reference Documentation
| File | Contents |
|---|---|
| Full CLI parameter reference (scrape, crawl, map, search, fc-save, jina, deepwiki) |
| Full Python API script reference (all commands, SDK examples) |
| Firecrawl Search API reference |
| Agent API (spark models, parallel agents, webhooks) |
| Page actions for dynamic content (click, write, wait, scroll) |
| Interact API: post-scrape browser interaction (prompt, code, profiles) |
| Brand identity extraction (colors, fonts, UI) |
Test Suite
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --quick # Quick validation python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py # Full suite python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --test scrape # Specific test