Claude-code-minoan firecrawl

Scrape web pages to clean markdown using Firecrawl v2 — handles JS-heavy pages, site crawls, URL mapping, LLM-powered extraction, autonomous agent scraping, and post-scrape browser interaction (Interact API). Prefer over WebFetch for quality and completeness. Triggers on scrape URL, fetch page, crawl site, extract content, web to markdown, DeepWiki, Firecrawl.

install

source · Clone the upstream repo

git clone https://github.com/tdimino/claude-code-minoan

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/tdimino/claude-code-minoan "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/research/firecrawl" ~/.claude/skills/tdimino-claude-code-minoan-firecrawl && rm -rf "$T"

manifest: skills/research/firecrawl/SKILL.md

source content

Firecrawl & Jina Web Scraping

Firecrawl vs WebFetch

Prefer

firecrawl scrape URL --only-main-content

over the WebFetch tool—it produces cleaner markdown, handles JavaScript-heavy pages, and avoids content truncation (>80% benchmark coverage). WebFetch is acceptable as a fallback when Firecrawl is unavailable.

# Preferred approach:
firecrawl scrape https://docs.example.com/api --only-main-content

Token-Efficient Scraping

Inspired by Anthropic's dynamic filtering—always filter before reasoning. This reduced input tokens by ~24% and improved accuracy by ~11% in their benchmarks.

The Principle: Search → Filter → Scrape → Filter → Reason

DO:

Search (titles/URLs only) → Evaluate relevance → Scrape top hits → Filter by section → Reason

DON'T:

Search → Scrape everything → Reason over all of it

Step-by-Step Efficient Workflow

# Step 1: Search — get titles/URLs only (cheap)
firecrawl search "query" --limit 20

# Step 2: Evaluate results, pick 3-5 best URLs

# Step 3: Scrape only those, filter to relevant sections
firecrawl scrape URL1 --only-main-content | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py \
  --sections "API,Authentication" --max-chars 5000

Post-Processing with filter_web_results.py

Pipe any Firecrawl or Exa output through this script to reduce context before reasoning:

# Extract only matching sections from scraped page
firecrawl scrape URL --only-main-content | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "Pricing,Plans"

# Keep only paragraphs with keywords
firecrawl search "query" --scrape --pretty | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --keywords "pricing,cost" --max-chars 5000

# Extract specific JSON fields from API output
python3 ~/.claude/skills/exa-search/scripts/exa_search.py "query" --json | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --fields "title,url,text" --max-chars 3000

# Combine filters with stats
firecrawl scrape URL --only-main-content | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "API" --keywords "endpoint" --compact --stats

Full path:

python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py

Flags:

--sections

--keywords

--max-chars

--max-lines

--fields

(JSON),

--strip-links

--strip-images

--compact

--stats

Other Token-Saving Patterns

Use
--only-main-content
to strip navigation and footer boilerplate, reducing token consumption. Omit only when nav/footer content is specifically needed.
Use
firecrawl map URL --search "topic"
first to find relevant subpages before scraping
Use
--format links
first to get URL list, evaluate, then scrape selectively
Use
--max-chars
with
```
exa_contents.py
```
to cap extraction length
Use
--formats summary
(Python API script) over full text when you need the gist, not raw content

Claude API Native Tools (for API Agent Builders)

Anthropic's API now offers built-in dynamic filtering tools:

web_search_20260209 / web_fetch_20260209
Header: anthropic-beta: code-execution-web-tools-2026-02-09

These have built-in dynamic filtering via code execution. Use them when building Claude API agents directly. Use Firecrawl/Exa when you need: autonomous agents, batch scraping, structured extraction, domain-specific crawling, or when not on the Claude API.

Available Tools

1. Official Firecrawl CLI (

firecrawl

) — Primary

Setup:

npm install -g firecrawl-cli && firecrawl login --api-key $FIRECRAWL_API_KEY

Command	Purpose	Quick Example
`scrape`	Single page → markdown	`firecrawl scrape URL --only-main-content`
`crawl`	Entire site with progress	`firecrawl crawl URL --wait --progress --limit 50`
`map`	Discover all URLs on a site	`firecrawl map URL --search "API"`
`search`	Web search (+ optional scrape)	`firecrawl search "query" --limit 10`

Full CLI reference:

references/cli-reference.md

2. Auto-Save Alias (

fc-save

) — Shell Alias

Requires shell alias setup (not bundled with this skill).

fc-save URL
# → Saves to ~/Desktop/Screencaps & Chats/Web-Scrapes/docs-example-com-api.md

3. Python API Script (

firecrawl_api.py

) — Advanced Features

Command:

python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py <command>

Requires:

FIRECRAWL_API_KEY

env var,

pip install firecrawl-py requests

Command	Purpose	Quick Example
`search`	Web search with scraping	`firecrawl_api.py search "query" -n 10`
`scrape`	Single URL with page actions	`firecrawl_api.py scrape URL --formats markdown summary`
`batch-scrape`	Multiple URLs concurrently	`firecrawl_api.py batch-scrape URL1 URL2 URL3`
`crawl`	Website crawling	`firecrawl_api.py crawl URL --limit 20`
`map`	URL discovery	`firecrawl_api.py map URL --search "query"`
`extract`	LLM-powered structured extraction	`firecrawl_api.py extract URL --prompt "Find pricing"`
`agent`	Autonomous extraction (no URLs needed)	`firecrawl_api.py agent "Find YC W24 AI startups"`
`parallel-agent`	Bulk agent queries (v2.8.0+)	`firecrawl_api.py parallel-agent "Q1" "Q2" "Q3"`
`interact`	Post-scrape browser interaction	`firecrawl_api.py interact SCRAPE_ID --prompt "Click pricing"`
`interact-stop`	Stop an interact session	`firecrawl_api.py interact-stop SCRAPE_ID`

Agent models:

spark-1-fast

(10 credits, simple),

spark-1-mini

(default),

spark-1-pro

(thorough)

Full Python API reference:

references/python-api-reference.md

4. DeepWiki — GitHub Repo Documentation

~/.claude/skills/firecrawl/scripts/deepwiki.sh <owner/repo> [section] [options]

AI-generated wiki for any public GitHub repo. No API key required.

# Overview
~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat

# Browse sections
~/.claude/skills/firecrawl/scripts/deepwiki.sh langchain-ai/langchain --toc

# Specific section
~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat 4.1-gpt-transformer-implementation

# Full dump for RAG
~/.claude/skills/firecrawl/scripts/deepwiki.sh openai/openai-python --all --save

5. Jina Reader (

jina

) — Fallback

Use when Firecrawl fails or for Twitter/X URLs (Firecrawl blocks Twitter, Jina works).

jina https://x.com/username/status/123456

Firecrawl vs Exa vs Native Claude Tools

Need	Best Tool	Why
Single page → markdown	`firecrawl scrape --only-main-content`	Cleanest output
Search + scrape in one shot	`firecrawl search --scrape`	Combined operation
Crawl entire site	`firecrawl crawl --wait --progress`	Link following + progress
Autonomous data finding	`firecrawl_api.py agent`	No URLs needed
Semantic/neural search	Exa `exa_search.py`	AI-powered relevance
Find research papers	Exa `--category "research paper"`	Academic index
Quick research answer	Exa `exa_research.py`	Citations + synthesis
Find similar pages	Exa `exa_similar.py`	Competitive analysis
Claude API agent building	Native `web_search_20260209`	Built-in dynamic filtering
Twitter/X content	`jina URL`	Only tool that works
GitHub repo docs	`deepwiki.sh owner/repo`	AI-generated wiki
Anti-bot / Cloudflare bypass	`scrapling` stealth fetch	Local Turnstile solver
Element-level extraction	`scrapling` + CSS selectors	Precision targeting, adaptive tracking
No API key scraping	`scrapling` HTTP fetch	100% local, no credentials
Site redesign resilience	`scrapling` adaptive mode	SQLite similarity matching
Budget JS-rendered scrape	`cf_browser.py markdown URL`	CF free tier: 10 min/day, $0.09/hr paid
Free static page fetch	`cf_browser.py markdown URL --no-render`	FREE during beta (no JS)
Budget multi-page crawl	`cf_browser.py crawl URL`	5 free crawls/day, 100 pages each
Incremental re-crawl	`cf_browser.py crawl --modified-since`	Built-in, Firecrawl lacks this
Page screenshot/PDF	`cf_browser.py screenshot/pdf URL`	Built-in CF endpoints, cheaper
AI structured extraction	`cf_browser.py json URL --prompt "..."`	Workers AI included free

Common Workflows

Single Page Scraping

firecrawl scrape https://example.com/page --only-main-content
# Or auto-save: fc-save URL
# Or to file: firecrawl scrape URL --only-main-content -o page.md

Documentation Crawling

# Map first, then crawl relevant paths
firecrawl map https://docs.example.com --search "API"
firecrawl crawl https://docs.example.com --include-paths /api,/guides --wait --progress

Research Workflow

firecrawl search "machine learning best practices 2026" --scrape --scrape-formats markdown

Agent-Powered Research (No URLs Needed)

python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py agent \
  "Compare pricing tiers for Firecrawl, Apify, and ScrapingBee"

Interact Workflows (Post-Scrape Browser Interaction)

Scrape a page, then take actions on it—click buttons, fill forms, extract dynamic content. Two modes: AI prompts (natural language) and code execution (Node.js/Python/Bash).

When to Use Interact vs. Actions

Need	Use	Why
Click/wait before a single scrape	`--actions` on scrape	Fire-and-forget, no session overhead
Multiple interactions with same page	`interact`	Persistent session, back-and-forth
Fill forms, log in, navigate	`interact`	Stateful, multi-step
Simple "wait for JS to load"	`--actions` with `wait`	Cheaper, no session

Basic Interact (AI Prompt Mode)

# Step 1: Scrape and note the Scrape ID from output
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://example.com/pricing"

# Step 2: Interact using natural language
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
  --prompt "Click the Enterprise pricing tab"

# Step 3: More interactions on same session
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
  --prompt "What is the monthly price for the Enterprise plan?"

# Step 4: Stop when done
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact-stop SCRAPE_ID

Code Execution Mode (Cheaper)

# Execute Playwright code directly (2 credits/min vs 7 for prompts)
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
  --code "const text = await page.locator('.pricing-table').textContent(); console.log(text);"

# Python mode
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
  --code "text = await page.locator('.content').text_content(); print(text)" \
  --language python

Persistent Profile (Login Sessions)

# Scrape with a named profile — browser state persists across sessions
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://app.example.com/login" \
  --profile my-app --json

# Interact to log in
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
  --code "await page.fill('#email', 'user@example.com'); await page.fill('#password', 'pass'); await page.click('button[type=submit]');"

# Later: scrape another page with same profile — cookies restored
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://app.example.com/dashboard" \
  --profile my-app

Important: Interact does NOT return page markdown. To get updated content after interaction, use code mode to extract specific elements, or issue a follow-up scrape.

Full interact reference:

references/interact-reference.md

Troubleshooting

# Check status and credits
firecrawl --status && firecrawl credit-usage

# Re-authenticate
firecrawl logout && firecrawl login --api-key $FIRECRAWL_API_KEY

# Check API key
echo $FIRECRAWL_API_KEY

Scrape fails: Try
```
jina URL
```
, or add
```
--wait-for 3000
```
for JS-heavy sites

Async job stuck: Check with

crawl-status

batch-status

, cancel with

crawl-cancel

batch-cancel

Disable telemetry:
```
export FIRECRAWL_NO_TELEMETRY=1
```

Reference Documentation

File	Contents
`references/cli-reference.md`	Full CLI parameter reference (scrape, crawl, map, search, fc-save, jina, deepwiki)
`references/python-api-reference.md`	Full Python API script reference (all commands, SDK examples)
`references/firecrawl-api.md`	Firecrawl Search API reference
`references/firecrawl-agent-api.md`	Agent API (spark models, parallel agents, webhooks)
`references/actions-reference.md`	Page actions for dynamic content (click, write, wait, scroll)
`references/interact-reference.md`	Interact API: post-scrape browser interaction (prompt, code, profiles)
`references/branding-format.md`	Brand identity extraction (colors, fonts, UI)

Test Suite

python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --quick    # Quick validation
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py            # Full suite
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --test scrape  # Specific test

Claude-code-minoan firecrawl

Firecrawl & Jina Web Scraping

Firecrawl vs WebFetch

Token-Efficient Scraping

The Principle: Search → Filter → Scrape → Filter → Reason

Step-by-Step Efficient Workflow

Post-Processing with filter_web_results.py

Other Token-Saving Patterns

Claude API Native Tools (for API Agent Builders)

Available Tools

1. Official Firecrawl CLI (
`firecrawl`
) — Primary

2. Auto-Save Alias (
`fc-save`
) — Shell Alias

3. Python API Script (
`firecrawl_api.py`
) — Advanced Features

4. DeepWiki — GitHub Repo Documentation

5. Jina Reader (
`jina`
) — Fallback

Firecrawl vs Exa vs Native Claude Tools

Common Workflows

Single Page Scraping

Documentation Crawling

Research Workflow

Agent-Powered Research (No URLs Needed)

Interact Workflows (Post-Scrape Browser Interaction)

When to Use Interact vs. Actions

Basic Interact (AI Prompt Mode)

Code Execution Mode (Cheaper)

Persistent Profile (Login Sessions)

Troubleshooting

Reference Documentation

Test Suite

Claude-code-minoan firecrawl

Firecrawl & Jina Web Scraping

Firecrawl vs WebFetch

Token-Efficient Scraping

The Principle: Search → Filter → Scrape → Filter → Reason

Step-by-Step Efficient Workflow

Post-Processing with filter_web_results.py

Other Token-Saving Patterns

Claude API Native Tools (for API Agent Builders)

Available Tools

1. Official Firecrawl CLI (firecrawl) — Primary

2. Auto-Save Alias (fc-save) — Shell Alias

3. Python API Script (firecrawl_api.py) — Advanced Features

4. DeepWiki — GitHub Repo Documentation

5. Jina Reader (jina) — Fallback

Firecrawl vs Exa vs Native Claude Tools

Common Workflows

Single Page Scraping

Documentation Crawling

Research Workflow

Agent-Powered Research (No URLs Needed)

Interact Workflows (Post-Scrape Browser Interaction)

When to Use Interact vs. Actions

Basic Interact (AI Prompt Mode)

Code Execution Mode (Cheaper)

Persistent Profile (Login Sessions)

Troubleshooting

Reference Documentation

Test Suite

1. Official Firecrawl CLI (
`firecrawl`
) — Primary

2. Auto-Save Alias (
`fc-save`
) — Shell Alias

3. Python API Script (
`firecrawl_api.py`
) — Advanced Features

5. Jina Reader (
`jina`
) — Fallback