Claude-code-minoan firecrawl

Scrape web pages to clean markdown using Firecrawl v2 — handles JS-heavy pages, site crawls, URL mapping, LLM-powered extraction, autonomous agent scraping, and post-scrape browser interaction (Interact API). Prefer over WebFetch for quality and completeness. Triggers on scrape URL, fetch page, crawl site, extract content, web to markdown, DeepWiki, Firecrawl.

install
source · Clone the upstream repo
git clone https://github.com/tdimino/claude-code-minoan
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/tdimino/claude-code-minoan "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/research/firecrawl" ~/.claude/skills/tdimino-claude-code-minoan-firecrawl && rm -rf "$T"
manifest: skills/research/firecrawl/SKILL.md
source content

Firecrawl & Jina Web Scraping

Firecrawl vs WebFetch

Prefer

firecrawl scrape URL --only-main-content
over the WebFetch tool—it produces cleaner markdown, handles JavaScript-heavy pages, and avoids content truncation (>80% benchmark coverage). WebFetch is acceptable as a fallback when Firecrawl is unavailable.

# Preferred approach:
firecrawl scrape https://docs.example.com/api --only-main-content

Token-Efficient Scraping

Inspired by Anthropic's dynamic filtering—always filter before reasoning. This reduced input tokens by ~24% and improved accuracy by ~11% in their benchmarks.

The Principle: Search → Filter → Scrape → Filter → Reason

DO:

Search (titles/URLs only) → Evaluate relevance → Scrape top hits → Filter by section → Reason

DON'T:

Search → Scrape everything → Reason over all of it

Step-by-Step Efficient Workflow

# Step 1: Search — get titles/URLs only (cheap)
firecrawl search "query" --limit 20

# Step 2: Evaluate results, pick 3-5 best URLs

# Step 3: Scrape only those, filter to relevant sections
firecrawl scrape URL1 --only-main-content | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py \
  --sections "API,Authentication" --max-chars 5000

Post-Processing with filter_web_results.py

Pipe any Firecrawl or Exa output through this script to reduce context before reasoning:

# Extract only matching sections from scraped page
firecrawl scrape URL --only-main-content | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "Pricing,Plans"

# Keep only paragraphs with keywords
firecrawl search "query" --scrape --pretty | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --keywords "pricing,cost" --max-chars 5000

# Extract specific JSON fields from API output
python3 ~/.claude/skills/exa-search/scripts/exa_search.py "query" --json | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --fields "title,url,text" --max-chars 3000

# Combine filters with stats
firecrawl scrape URL --only-main-content | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "API" --keywords "endpoint" --compact --stats

Full path:

python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py
Flags:
--sections
,
--keywords
,
--max-chars
,
--max-lines
,
--fields
(JSON),
--strip-links
,
--strip-images
,
--compact
,
--stats

Other Token-Saving Patterns

  • Use
    --only-main-content
    to strip navigation and footer boilerplate, reducing token consumption. Omit only when nav/footer content is specifically needed.
  • Use
    firecrawl map URL --search "topic"
    first
    to find relevant subpages before scraping
  • Use
    --format links
    first
    to get URL list, evaluate, then scrape selectively
  • Use
    --max-chars
    with
    exa_contents.py
    to cap extraction length
  • Use
    --formats summary
    (Python API script) over full text when you need the gist, not raw content

Claude API Native Tools (for API Agent Builders)

Anthropic's API now offers built-in dynamic filtering tools:

web_search_20260209 / web_fetch_20260209
Header: anthropic-beta: code-execution-web-tools-2026-02-09

These have built-in dynamic filtering via code execution. Use them when building Claude API agents directly. Use Firecrawl/Exa when you need: autonomous agents, batch scraping, structured extraction, domain-specific crawling, or when not on the Claude API.


Available Tools

1. Official Firecrawl CLI (
firecrawl
) — Primary

Setup:

npm install -g firecrawl-cli && firecrawl login --api-key $FIRECRAWL_API_KEY

CommandPurposeQuick Example
scrape
Single page → markdown
firecrawl scrape URL --only-main-content
crawl
Entire site with progress
firecrawl crawl URL --wait --progress --limit 50
map
Discover all URLs on a site
firecrawl map URL --search "API"
search
Web search (+ optional scrape)
firecrawl search "query" --limit 10

Full CLI reference:

references/cli-reference.md

2. Auto-Save Alias (
fc-save
) — Shell Alias

Requires shell alias setup (not bundled with this skill).

fc-save URL
# → Saves to ~/Desktop/Screencaps & Chats/Web-Scrapes/docs-example-com-api.md

3. Python API Script (
firecrawl_api.py
) — Advanced Features

Command:

python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py <command>
Requires:
FIRECRAWL_API_KEY
env var,
pip install firecrawl-py requests

CommandPurposeQuick Example
search
Web search with scraping
firecrawl_api.py search "query" -n 10
scrape
Single URL with page actions
firecrawl_api.py scrape URL --formats markdown summary
batch-scrape
Multiple URLs concurrently
firecrawl_api.py batch-scrape URL1 URL2 URL3
crawl
Website crawling
firecrawl_api.py crawl URL --limit 20
map
URL discovery
firecrawl_api.py map URL --search "query"
extract
LLM-powered structured extraction
firecrawl_api.py extract URL --prompt "Find pricing"
agent
Autonomous extraction (no URLs needed)
firecrawl_api.py agent "Find YC W24 AI startups"
parallel-agent
Bulk agent queries (v2.8.0+)
firecrawl_api.py parallel-agent "Q1" "Q2" "Q3"
interact
Post-scrape browser interaction
firecrawl_api.py interact SCRAPE_ID --prompt "Click pricing"
interact-stop
Stop an interact session
firecrawl_api.py interact-stop SCRAPE_ID

Agent models:

spark-1-fast
(10 credits, simple),
spark-1-mini
(default),
spark-1-pro
(thorough)

Full Python API reference:

references/python-api-reference.md

4. DeepWiki — GitHub Repo Documentation

~/.claude/skills/firecrawl/scripts/deepwiki.sh <owner/repo> [section] [options]

AI-generated wiki for any public GitHub repo. No API key required.

# Overview
~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat

# Browse sections
~/.claude/skills/firecrawl/scripts/deepwiki.sh langchain-ai/langchain --toc

# Specific section
~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat 4.1-gpt-transformer-implementation

# Full dump for RAG
~/.claude/skills/firecrawl/scripts/deepwiki.sh openai/openai-python --all --save

5. Jina Reader (
jina
) — Fallback

Use when Firecrawl fails or for Twitter/X URLs (Firecrawl blocks Twitter, Jina works).

jina https://x.com/username/status/123456

Firecrawl vs Exa vs Native Claude Tools

NeedBest ToolWhy
Single page → markdown
firecrawl scrape --only-main-content
Cleanest output
Search + scrape in one shot
firecrawl search --scrape
Combined operation
Crawl entire site
firecrawl crawl --wait --progress
Link following + progress
Autonomous data finding
firecrawl_api.py agent
No URLs needed
Semantic/neural searchExa
exa_search.py
AI-powered relevance
Find research papersExa
--category "research paper"
Academic index
Quick research answerExa
exa_research.py
Citations + synthesis
Find similar pagesExa
exa_similar.py
Competitive analysis
Claude API agent buildingNative
web_search_20260209
Built-in dynamic filtering
Twitter/X content
jina URL
Only tool that works
GitHub repo docs
deepwiki.sh owner/repo
AI-generated wiki
Anti-bot / Cloudflare bypass
scrapling
stealth fetch
Local Turnstile solver
Element-level extraction
scrapling
+ CSS selectors
Precision targeting, adaptive tracking
No API key scraping
scrapling
HTTP fetch
100% local, no credentials
Site redesign resilience
scrapling
adaptive mode
SQLite similarity matching
Budget JS-rendered scrape
cf_browser.py markdown URL
CF free tier: 10 min/day, $0.09/hr paid
Free static page fetch
cf_browser.py markdown URL --no-render
FREE during beta (no JS)
Budget multi-page crawl
cf_browser.py crawl URL
5 free crawls/day, 100 pages each
Incremental re-crawl
cf_browser.py crawl --modified-since
Built-in, Firecrawl lacks this
Page screenshot/PDF
cf_browser.py screenshot/pdf URL
Built-in CF endpoints, cheaper
AI structured extraction
cf_browser.py json URL --prompt "..."
Workers AI included free

Common Workflows

Single Page Scraping

firecrawl scrape https://example.com/page --only-main-content
# Or auto-save: fc-save URL
# Or to file: firecrawl scrape URL --only-main-content -o page.md

Documentation Crawling

# Map first, then crawl relevant paths
firecrawl map https://docs.example.com --search "API"
firecrawl crawl https://docs.example.com --include-paths /api,/guides --wait --progress

Research Workflow

firecrawl search "machine learning best practices 2026" --scrape --scrape-formats markdown

Agent-Powered Research (No URLs Needed)

python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py agent \
  "Compare pricing tiers for Firecrawl, Apify, and ScrapingBee"

Interact Workflows (Post-Scrape Browser Interaction)

Scrape a page, then take actions on it—click buttons, fill forms, extract dynamic content. Two modes: AI prompts (natural language) and code execution (Node.js/Python/Bash).

When to Use Interact vs. Actions

NeedUseWhy
Click/wait before a single scrape
--actions
on scrape
Fire-and-forget, no session overhead
Multiple interactions with same page
interact
Persistent session, back-and-forth
Fill forms, log in, navigate
interact
Stateful, multi-step
Simple "wait for JS to load"
--actions
with
wait
Cheaper, no session

Basic Interact (AI Prompt Mode)

# Step 1: Scrape and note the Scrape ID from output
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://example.com/pricing"

# Step 2: Interact using natural language
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
  --prompt "Click the Enterprise pricing tab"

# Step 3: More interactions on same session
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
  --prompt "What is the monthly price for the Enterprise plan?"

# Step 4: Stop when done
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact-stop SCRAPE_ID

Code Execution Mode (Cheaper)

# Execute Playwright code directly (2 credits/min vs 7 for prompts)
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
  --code "const text = await page.locator('.pricing-table').textContent(); console.log(text);"

# Python mode
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
  --code "text = await page.locator('.content').text_content(); print(text)" \
  --language python

Persistent Profile (Login Sessions)

# Scrape with a named profile — browser state persists across sessions
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://app.example.com/login" \
  --profile my-app --json

# Interact to log in
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
  --code "await page.fill('#email', 'user@example.com'); await page.fill('#password', 'pass'); await page.click('button[type=submit]');"

# Later: scrape another page with same profile — cookies restored
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://app.example.com/dashboard" \
  --profile my-app

Important: Interact does NOT return page markdown. To get updated content after interaction, use code mode to extract specific elements, or issue a follow-up scrape.

Full interact reference:

references/interact-reference.md


Troubleshooting

# Check status and credits
firecrawl --status && firecrawl credit-usage

# Re-authenticate
firecrawl logout && firecrawl login --api-key $FIRECRAWL_API_KEY

# Check API key
echo $FIRECRAWL_API_KEY
  • Scrape fails: Try
    jina URL
    , or add
    --wait-for 3000
    for JS-heavy sites
  • Async job stuck: Check with
    crawl-status
    /
    batch-status
    , cancel with
    crawl-cancel
    /
    batch-cancel
  • Disable telemetry:
    export FIRECRAWL_NO_TELEMETRY=1

Reference Documentation

FileContents
references/cli-reference.md
Full CLI parameter reference (scrape, crawl, map, search, fc-save, jina, deepwiki)
references/python-api-reference.md
Full Python API script reference (all commands, SDK examples)
references/firecrawl-api.md
Firecrawl Search API reference
references/firecrawl-agent-api.md
Agent API (spark models, parallel agents, webhooks)
references/actions-reference.md
Page actions for dynamic content (click, write, wait, scroll)
references/interact-reference.md
Interact API: post-scrape browser interaction (prompt, code, profiles)
references/branding-format.md
Brand identity extraction (colors, fonts, UI)

Test Suite

python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --quick    # Quick validation
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py            # Full suite
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --test scrape  # Specific test