Skillshub Firecrawl Automation
Automate web crawling and data extraction with Firecrawl -- scrape pages, crawl sites, extract structured data, batch scrape URLs, and map website structures through the Composio Firecrawl integration.
git clone https://github.com/ComeOnOliver/skillshub
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ComposioHQ/awesome-claude-skills/firecrawl-automation" ~/.claude/skills/comeonoliver-skillshub-firecrawl-automation && rm -rf "$T"
skills/ComposioHQ/awesome-claude-skills/firecrawl-automation/SKILL.mdFirecrawl Automation
Run Firecrawl web crawling and extraction directly from Claude Code. Scrape individual pages, crawl entire sites, extract structured data with AI, batch process URL lists, and map website structures without leaving your terminal.
Toolkit docs: composio.dev/toolkits/firecrawl
Setup
- Add the Composio MCP server to your configuration:
https://rube.app/mcp - Connect your Firecrawl account when prompted. The agent will provide an authentication link.
- Be mindful of credit consumption -- scope your crawls tightly and test on small URL sets before scaling.
Core Workflows
1. Scrape a Single Page
Fetch content from a URL in multiple formats with optional browser actions for dynamic pages.
Tool:
FIRECRAWL_SCRAPE
Key parameters:
(required) -- fully qualified URL to scrapeurl
-- output formats:formats
(default),markdown
,html
,rawHtml
,links
,screenshotjson
(default true) -- extract main content only, excluding nav/footer/adsonlyMainContent
-- milliseconds to wait for JS rendering (default 0)waitFor
-- max wait in ms (default 30000)timeout
-- browser actions before scraping (click, write, wait, press, scroll)actions
/includeTags
-- filter by HTML tagsexcludeTags
-- for structured extraction withjsonOptions
and/orschemaprompt
Example prompt: "Scrape the main content from https://example.com/pricing as markdown"
2. Crawl an Entire Site
Discover and scrape multiple pages from a website with configurable depth, path filters, and concurrency.
Tool:
FIRECRAWL_CRAWL_V2
Key parameters:
(required) -- starting URL for the crawlurl
(default 10) -- max pages to crawllimit
-- depth limit from the root pagemaxDiscoveryDepth
/includePaths
-- regex patterns for URL pathsexcludePaths
-- include subdomains (default false)allowSubdomains
-- follow sibling/parent links, not just children (default false)crawlEntireDomain
--sitemap
(default),include
, orskiponly
-- natural language to auto-configure crawler settingsprompt
-- output format for each pagescrapeOptions_formats
-- main content extraction per pagescrapeOptions_onlyMainContent
Example prompt: "Crawl the docs section of firecrawl.dev, max 50 pages, only paths matching docs"
3. Extract Structured Data
Extract structured JSON data from web pages using AI with a natural language prompt or JSON schema.
Tool:
FIRECRAWL_EXTRACT
Key parameters:
(required) -- array of URLs to extract from (max 10 in beta). Supports wildcards likeurlshttps://example.com/blog/*
-- natural language description of what to extractprompt
-- JSON Schema defining the desired output structureschema
-- allow crawling links outside initial domains (default false)enable_web_search
At least one of
prompt or schema must be provided.
Check extraction status with
FIRECRAWL_EXTRACT_GET using the returned job id.
Example prompt: "Extract company name, pricing tiers, and feature lists from https://example.com/pricing"
4. Batch Scrape Multiple URLs
Scrape many URLs concurrently with shared configuration for efficient bulk data collection.
Tool:
FIRECRAWL_BATCH_SCRAPE
Key parameters:
(required) -- array of URLs to scrapeurls
-- output format for all pages (defaultformats
)markdown
(default true) -- main content extractiononlyMainContent
-- parallel scrape limitmaxConcurrency
(default true) -- skip bad URLs instead of failing the batchignoreInvalidURLs
-- geolocation settings withlocation
codecountry
-- browser actions applied to each pageactions
(default true) -- block advertisementsblockAds
Example prompt: "Batch scrape these 20 product page URLs as markdown with ad blocking"
5. Map Website Structure
Discover all URLs on a website from a starting URL, useful for planning crawls or auditing site structure.
Tool:
FIRECRAWL_MAP_MULTIPLE_URLS_BASED_ON_OPTIONS
Key parameters:
(required) -- starting URL (must beurl
orhttps://
)http://
-- guide URL discovery toward specific page typessearch
(default 5000, max 100000) -- max URLs to returnlimit
(default true) -- include subdomainsincludeSubdomains
(default true) -- dedupe URLs differing only by query paramsignoreQueryParameters
--sitemap
,include
, orskiponly
Example prompt: "Map all URLs on docs.example.com, focusing on API reference pages"
6. Monitor and Manage Crawl Jobs
Track crawl progress, retrieve results, and cancel runaway jobs.
Tools:
FIRECRAWL_CRAWL_GET, FIRECRAWL_GET_THE_STATUS_OF_A_CRAWL_JOB, FIRECRAWL_CANCEL_A_CRAWL_JOB
-- get status, progress, credits used, and crawled page dataFIRECRAWL_CRAWL_GET
-- stop an active or queued crawlFIRECRAWL_CANCEL_A_CRAWL_JOB
Both require the crawl job
id (UUID) returned when the crawl was initiated.
Example prompt: "Check the status of crawl job 019b0806-b7a1-7652-94c1-e865b5d2e89a"
Known Pitfalls
- Rate limiting: Firecrawl can trigger "Rate limit exceeded" errors (429). Prefer
over many individualFIRECRAWL_BATCH_SCRAPE
calls, and implement backoff on 429/5xx responses.FIRECRAWL_SCRAPE - Credit consumption:
can fail with "Insufficient credits." Scope tightly and avoid broad homepage URLs that yield sparse fields. Test on small URL sets first.FIRECRAWL_EXTRACT - Nested error responses: Per-page failures may be nested in
(e.g.,response.data.code
) even when the outer API call succeeds. Always validate inner status/error fields.SCRAPE_DNS_RESOLUTION_ERROR - JS-heavy pages: Non-rendered fetches may miss key content. Use
(e.g., 1000-5000ms) for dynamic pages, or configurewaitFor
to interact with the page before scraping.scrapeOptions_actions - Extraction schema precision: Vague or shifting schemas/prompts produce noisy, inconsistent output. Freeze your schema and test on a small sample before scaling to many URLs.
- Crawl jobs are async:
returns immediately with a job ID. UseFIRECRAWL_CRAWL_V2
to poll for results. Cancel stuck crawls withFIRECRAWL_CRAWL_GET
to avoid wasting credits.FIRECRAWL_CANCEL_A_CRAWL_JOB - Extract job polling:
is also async for larger jobs. Retrieve final output withFIRECRAWL_EXTRACT
.FIRECRAWL_EXTRACT_GET - URL batching for extract: Keep extract URL batches small (~10 URLs) to avoid 429 rate limit errors.
- Deeply nested responses: Results are often nested under
or deeper. Inspect the returned shape rather than assuming flat keys.data.data
Quick Reference
| Tool Slug | Description |
|---|---|
| Scrape a single URL with format/action options |
| Crawl a website with depth/path control |
| Extract structured data with AI prompt/schema |
| Batch scrape multiple URLs concurrently |
| Discover/map all URLs on a site |
| Get crawl job status and results |
| Check crawl job progress |
| Cancel an active crawl job |
| Get extraction job status and results |
| Preview crawl parameters before starting |
| Web search + scrape top results |
Powered by Composio