Xcrawl-skills xcrawl-scrape

Use this skill for XCrawl scrape tasks, including single-URL fetch, format selection, sync or async execution, and JSON extraction with prompt or json_schema.

install
source · Clone the upstream repo
git clone https://github.com/xcrawl-api/xcrawl-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/xcrawl-api/xcrawl-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/xcrawl-scrape" ~/.claude/skills/xcrawl-api-xcrawl-skills-xcrawl-scrape && rm -rf "$T"
manifest: skills/xcrawl-scrape/SKILL.md
source content

XCrawl Scrape

Overview

This skill handles single-page extraction with XCrawl Scrape APIs. Default behavior is raw passthrough: return upstream API response bodies as-is.

Required Local Config

Before using this skill, the user must create a local config file and write

XCRAWL_API_KEY
into it.

Path:

~/.xcrawl/config.json

{
  "XCRAWL_API_KEY": "<your_api_key>"
}

Read API key from local config file only. Do not require global environment variables.

Credits and Account Setup

Using XCrawl APIs consumes credits. If the user does not have an account or available credits, guide them to register at

https://dash.xcrawl.com/
. After registration, they can activate the free
1000
credits plan before running requests.

Tool Permission Policy

Request runtime permissions for

curl
and
node
only. Do not request Python, shell helper scripts, or other runtime permissions.

API Surface

  • Start scrape:
    POST /v1/scrape
  • Read async result:
    GET /v1/scrape/{scrape_id}
  • Base URL:
    https://run.xcrawl.com
  • Required header:
    Authorization: Bearer <XCRAWL_API_KEY>

Usage Examples

cURL (sync)

API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com","mode":"sync","output":{"formats":["markdown","links"]}}'

cURL (async create + result)

API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

CREATE_RESP="$(curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com/product/1","mode":"async","output":{"formats":["json"]},"json":{"prompt":"Extract title and price."}}')"

echo "$CREATE_RESP"

SCRAPE_ID="$(node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.scrape_id||"")' "$CREATE_RESP")"

curl -sS -X GET "https://run.xcrawl.com/v1/scrape/${SCRAPE_ID}" \
  -H "Authorization: Bearer ${API_KEY}"

Node

node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",mode:"sync",output:{formats:["markdown","json"]},json:{prompt:"Extract title and publish date."}};
fetch("https://run.xcrawl.com/v1/scrape",{
  method:"POST",
  headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
  body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'

Request Parameters

Request endpoint and headers

  • Endpoint:
    POST https://run.xcrawl.com/v1/scrape
  • Headers:
  • Content-Type: application/json
  • Authorization: Bearer <api_key>

Request body: top-level fields

FieldTypeRequiredDefaultDescription
url
stringYes-Target URL
mode
stringNo
sync
sync
or
async
proxy
objectNo-Proxy config
request
objectNo-Request config
js_render
objectNo-JS rendering config
output
objectNo-Output config
webhook
objectNo-Async webhook config (
mode=async
)

proxy

FieldTypeRequiredDefaultDescription
location
stringNo
US
ISO-3166-1 alpha-2 country code, e.g.
US
/
JP
/
SG
sticky_session
stringNoAuto-generatedSticky session ID; same ID attempts to reuse exit

request

FieldTypeRequiredDefaultDescription
locale
stringNo
en-US,en;q=0.9
Affects
Accept-Language
device
stringNo
desktop
desktop
/
mobile
; affects UA and viewport
cookies
object mapNo-Cookie key/value pairs
headers
object mapNo-Header key/value pairs
only_main_content
booleanNo
true
Return main content only
block_ads
booleanNo
true
Attempt to block ad resources
skip_tls_verification
booleanNo
true
Skip TLS verification

js_render

FieldTypeRequiredDefaultDescription
enabled
booleanNo
true
Enable browser rendering
wait_until
stringNo
load
load
/
domcontentloaded
/
networkidle
viewport.width
integerNo-Viewport width (desktop
1920
, mobile
402
)
viewport.height
integerNo-Viewport height (desktop
1080
, mobile
874
)

output

FieldTypeRequiredDefaultDescription
formats
string[]No
["markdown"]
Output formats
screenshot
stringNo
viewport
full_page
/
viewport
(only if
formats
includes
screenshot
)
json.prompt
stringNo-Extraction prompt
json.json_schema
objectNo-JSON Schema

output.formats
enum:

  • html
  • raw_html
  • markdown
  • links
  • summary
  • screenshot
  • json

webhook

FieldTypeRequiredDefaultDescription
url
stringNo-Callback URL
headers
object mapNo-Custom callback headers
events
string[]No
["started","completed","failed"]
Events:
started
/
completed
/
failed

Response Parameters

Sync create response (
mode=sync
)

FieldTypeDescription
scrape_id
stringTask ID
endpoint
stringAlways
scrape
version
stringVersion
status
string
completed
/
failed
url
stringTarget URL
data
objectResult data
started_at
stringStart time (ISO 8601)
ended_at
stringEnd time (ISO 8601)
total_credits_used
integerTotal credits used

data
fields (based on
output.formats
):

  • html
    ,
    raw_html
    ,
    markdown
    ,
    links
    ,
    summary
    ,
    screenshot
    ,
    json
  • metadata
    (page metadata)
  • traffic_bytes
  • credits_used
  • credits_detail

credits_detail
fields:

FieldTypeDescription
base_cost
integerBase scrape cost
traffic_cost
integerTraffic cost
json_extract_cost
integerJSON extraction cost

Async create response (
mode=async
)

FieldTypeDescription
scrape_id
stringTask ID
endpoint
stringAlways
scrape
version
stringVersion
status
stringAlways
pending

Async result response (
GET /v1/scrape/{scrape_id}
)

FieldTypeDescription
scrape_id
stringTask ID
endpoint
stringAlways
scrape
version
stringVersion
status
string
pending
/
crawling
/
completed
/
failed
url
stringTarget URL
data
objectSame shape as sync
data
started_at
stringStart time (ISO 8601)
ended_at
stringEnd time (ISO 8601)

Workflow

  1. Restate the user goal as an extraction contract.
  • URL scope, required fields, accepted nulls, and precision expectations.
  1. Build the scrape request body.
  • Keep only necessary options.
  • Prefer explicit
    output.formats
    .
  1. Execute scrape and capture task metadata.
  • Track
    scrape_id
    ,
    status
    , and timestamps.
  • If async, poll until
    completed
    or
    failed
    .
  1. Return raw API responses directly.
  • Do not synthesize or compress fields by default.

Output Contract

Return:

  • Endpoint(s) used and mode (
    sync
    or
    async
    )
  • request_payload
    used for the request
  • Raw response body from each API call
  • Error details when request fails

Do not generate summaries unless the user explicitly requests a summary.

Guardrails

  • Do not invent unsupported output fields.
  • Do not hardcode provider-specific tool schemas in core logic.
  • Call out uncertainty when page structure is unstable.