Xcrawl-skills xcrawl-scrape
Use this skill for XCrawl scrape tasks, including single-URL fetch, format selection, sync or async execution, and JSON extraction with prompt or json_schema.
install
source · Clone the upstream repo
git clone https://github.com/xcrawl-api/xcrawl-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/xcrawl-api/xcrawl-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/xcrawl-scrape" ~/.claude/skills/xcrawl-api-xcrawl-skills-xcrawl-scrape && rm -rf "$T"
manifest:
skills/xcrawl-scrape/SKILL.mdsource content
XCrawl Scrape
Overview
This skill handles single-page extraction with XCrawl Scrape APIs. Default behavior is raw passthrough: return upstream API response bodies as-is.
Required Local Config
Before using this skill, the user must create a local config file and write
XCRAWL_API_KEY into it.
Path:
~/.xcrawl/config.json
{ "XCRAWL_API_KEY": "<your_api_key>" }
Read API key from local config file only. Do not require global environment variables.
Credits and Account Setup
Using XCrawl APIs consumes credits. If the user does not have an account or available credits, guide them to register at
https://dash.xcrawl.com/.
After registration, they can activate the free 1000 credits plan before running requests.
Tool Permission Policy
Request runtime permissions for
curl and node only.
Do not request Python, shell helper scripts, or other runtime permissions.
API Surface
- Start scrape:
POST /v1/scrape - Read async result:
GET /v1/scrape/{scrape_id} - Base URL:
https://run.xcrawl.com - Required header:
Authorization: Bearer <XCRAWL_API_KEY>
Usage Examples
cURL (sync)
API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")" curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${API_KEY}" \ -d '{"url":"https://example.com","mode":"sync","output":{"formats":["markdown","links"]}}'
cURL (async create + result)
API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")" CREATE_RESP="$(curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${API_KEY}" \ -d '{"url":"https://example.com/product/1","mode":"async","output":{"formats":["json"]},"json":{"prompt":"Extract title and price."}}')" echo "$CREATE_RESP" SCRAPE_ID="$(node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.scrape_id||"")' "$CREATE_RESP")" curl -sS -X GET "https://run.xcrawl.com/v1/scrape/${SCRAPE_ID}" \ -H "Authorization: Bearer ${API_KEY}"
Node
node -e ' const fs=require("fs"); const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY; const body={url:"https://example.com",mode:"sync",output:{formats:["markdown","json"]},json:{prompt:"Extract title and publish date."}}; fetch("https://run.xcrawl.com/v1/scrape",{ method:"POST", headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`}, body:JSON.stringify(body) }).then(async r=>{console.log(await r.text());}); '
Request Parameters
Request endpoint and headers
- Endpoint:
POST https://run.xcrawl.com/v1/scrape - Headers:
Content-Type: application/jsonAuthorization: Bearer <api_key>
Request body: top-level fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | Yes | - | Target URL |
| string | No | | or |
| object | No | - | Proxy config |
| object | No | - | Request config |
| object | No | - | JS rendering config |
| object | No | - | Output config |
| object | No | - | Async webhook config () |
proxy
proxy| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | No | | ISO-3166-1 alpha-2 country code, e.g. / / |
| string | No | Auto-generated | Sticky session ID; same ID attempts to reuse exit |
request
request| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | No | | Affects |
| string | No | | / ; affects UA and viewport |
| object map | No | - | Cookie key/value pairs |
| object map | No | - | Header key/value pairs |
| boolean | No | | Return main content only |
| boolean | No | | Attempt to block ad resources |
| boolean | No | | Skip TLS verification |
js_render
js_render| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| boolean | No | | Enable browser rendering |
| string | No | | / / |
| integer | No | - | Viewport width (desktop , mobile ) |
| integer | No | - | Viewport height (desktop , mobile ) |
output
output| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string[] | No | | Output formats |
| string | No | | / (only if includes ) |
| string | No | - | Extraction prompt |
| object | No | - | JSON Schema |
output.formats enum:
htmlraw_htmlmarkdownlinkssummaryscreenshotjson
webhook
webhook| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | No | - | Callback URL |
| object map | No | - | Custom callback headers |
| string[] | No | | Events: / / |
Response Parameters
Sync create response (mode=sync
)
mode=sync| Field | Type | Description |
|---|---|---|
| string | Task ID |
| string | Always |
| string | Version |
| string | / |
| string | Target URL |
| object | Result data |
| string | Start time (ISO 8601) |
| string | End time (ISO 8601) |
| integer | Total credits used |
data fields (based on output.formats):
,html
,raw_html
,markdown
,links
,summary
,screenshotjson
(page metadata)metadatatraffic_bytescredits_usedcredits_detail
credits_detail fields:
| Field | Type | Description |
|---|---|---|
| integer | Base scrape cost |
| integer | Traffic cost |
| integer | JSON extraction cost |
Async create response (mode=async
)
mode=async| Field | Type | Description |
|---|---|---|
| string | Task ID |
| string | Always |
| string | Version |
| string | Always |
Async result response (GET /v1/scrape/{scrape_id}
)
GET /v1/scrape/{scrape_id}| Field | Type | Description |
|---|---|---|
| string | Task ID |
| string | Always |
| string | Version |
| string | / / / |
| string | Target URL |
| object | Same shape as sync |
| string | Start time (ISO 8601) |
| string | End time (ISO 8601) |
Workflow
- Restate the user goal as an extraction contract.
- URL scope, required fields, accepted nulls, and precision expectations.
- Build the scrape request body.
- Keep only necessary options.
- Prefer explicit
.output.formats
- Execute scrape and capture task metadata.
- Track
,scrape_id
, and timestamps.status - If async, poll until
orcompleted
.failed
- Return raw API responses directly.
- Do not synthesize or compress fields by default.
Output Contract
Return:
- Endpoint(s) used and mode (
orsync
)async
used for the requestrequest_payload- Raw response body from each API call
- Error details when request fails
Do not generate summaries unless the user explicitly requests a summary.
Guardrails
- Do not invent unsupported output fields.
- Do not hardcode provider-specific tool schemas in core logic.
- Call out uncertainty when page structure is unstable.