Xcrawl-skills xcrawl-scrape

Use this skill for XCrawl scrape tasks, including single-URL fetch, format selection, sync or async execution, and JSON extraction with prompt or json_schema.

install

source · Clone the upstream repo

git clone https://github.com/xcrawl-api/xcrawl-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/xcrawl-api/xcrawl-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/xcrawl-scrape" ~/.claude/skills/xcrawl-api-xcrawl-skills-xcrawl-scrape && rm -rf "$T"

manifest: skills/xcrawl-scrape/SKILL.md

source content

XCrawl Scrape

Overview

This skill handles single-page extraction with XCrawl Scrape APIs. Default behavior is raw passthrough: return upstream API response bodies as-is.

Required Local Config

Before using this skill, the user must create a local config file and write

XCRAWL_API_KEY

into it.

Path:

~/.xcrawl/config.json

{
  "XCRAWL_API_KEY": "<your_api_key>"
}

Read API key from local config file only. Do not require global environment variables.

Credits and Account Setup

Using XCrawl APIs consumes credits. If the user does not have an account or available credits, guide them to register at

https://dash.xcrawl.com/

. After registration, they can activate the free

credits plan before running requests.

Tool Permission Policy

Request runtime permissions for

curl

and

node

only. Do not request Python, shell helper scripts, or other runtime permissions.

API Surface

Start scrape:
```
POST /v1/scrape
```
Read async result:
```
GET /v1/scrape/{scrape_id}
```
Base URL:
```
https://run.xcrawl.com
```
Required header:
```
Authorization: Bearer <XCRAWL_API_KEY>
```

Usage Examples

cURL (sync)

API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com","mode":"sync","output":{"formats":["markdown","links"]}}'

cURL (async create + result)

API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

CREATE_RESP="$(curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com/product/1","mode":"async","output":{"formats":["json"]},"json":{"prompt":"Extract title and price."}}')"

echo "$CREATE_RESP"

SCRAPE_ID="$(node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.scrape_id||"")' "$CREATE_RESP")"

curl -sS -X GET "https://run.xcrawl.com/v1/scrape/${SCRAPE_ID}" \
  -H "Authorization: Bearer ${API_KEY}"

Node

node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",mode:"sync",output:{formats:["markdown","json"]},json:{prompt:"Extract title and publish date."}};
fetch("https://run.xcrawl.com/v1/scrape",{
  method:"POST",
  headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
  body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'

Request Parameters

Request endpoint and headers

Endpoint:
```
POST https://run.xcrawl.com/v1/scrape
```
Headers:
```
Content-Type: application/json
```
```
Authorization: Bearer <api_key>
```

Request body: top-level fields

Field	Type	Required	Default	Description
`url`	string	Yes	-	Target URL
`mode`	string	No	`sync`	`sync` or `async`
`proxy`	object	No	-	Proxy config
`request`	object	No	-	Request config
`js_render`	object	No	-	JS rendering config
`output`	object	No	-	Output config
`webhook`	object	No	-	Async webhook config ( `mode=async` )

proxy

Field Type Required Default Description

location

string

US

ISO-3166-1 alpha-2 country code, e.g.

US

JP

SG

sticky_session

string

Auto-generated

Sticky session ID; same ID attempts to reuse exit

request

Field	Type	Required	Default	Description
`locale`	string	No	`en-US,en;q=0.9`	Affects `Accept-Language`
`device`	string	No	`desktop`	`desktop` / `mobile` ; affects UA and viewport
`cookies`	object map	No	-	Cookie key/value pairs
`headers`	object map	No	-	Header key/value pairs
`only_main_content`	boolean	No	`true`	Return main content only
`block_ads`	boolean	No	`true`	Attempt to block ad resources
`skip_tls_verification`	boolean	No	`true`	Skip TLS verification

js_render

Field	Type	Required	Default	Description
`enabled`	boolean	No	`true`	Enable browser rendering
`wait_until`	string	No	`load`	`load` / `domcontentloaded` / `networkidle`
`viewport.width`	integer	No	-	Viewport width (desktop `1920` , mobile `402` )
`viewport.height`	integer	No	-	Viewport height (desktop `1080` , mobile `874` )

output

Field	Type	Required	Default	Description
`formats`	string[]	No	`["markdown"]`	Output formats
`screenshot`	string	No	`viewport`	`full_page` / `viewport` (only if `formats` includes `screenshot` )
`json.prompt`	string	No	-	Extraction prompt
`json.json_schema`	object	No	-	JSON Schema

output.formats

enum:

```
html
```
```
raw_html
```
```
markdown
```
```
links
```
```
summary
```
```
screenshot
```
```
json
```

webhook

Field Type Required Default Description

url

string

Callback URL

headers

object map

Custom callback headers

events

string[]

["started","completed","failed"]

Events:

started

completed

failed

Response Parameters

Sync create response (

mode=sync

)

Field	Type	Description
`scrape_id`	string	Task ID
`endpoint`	string	Always `scrape`
`version`	string	Version
`status`	string	`completed` / `failed`
`url`	string	Target URL
`data`	object	Result data
`started_at`	string	Start time (ISO 8601)
`ended_at`	string	End time (ISO 8601)
`total_credits_used`	integer	Total credits used

data

fields (based on

output.formats

html

raw_html

markdown

links

summary

screenshot

json

```
metadata
```
(page metadata)
```
traffic_bytes
```
```
credits_used
```
```
credits_detail
```

credits_detail

fields:

Field	Type	Description
`base_cost`	integer	Base scrape cost
`traffic_cost`	integer	Traffic cost
`json_extract_cost`	integer	JSON extraction cost

Async create response (

mode=async

)

Field	Type	Description
`scrape_id`	string	Task ID
`endpoint`	string	Always `scrape`
`version`	string	Version
`status`	string	Always `pending`

Async result response (

GET /v1/scrape/{scrape_id}

)

Field	Type	Description
`scrape_id`	string	Task ID
`endpoint`	string	Always `scrape`
`version`	string	Version
`status`	string	`pending` / `crawling` / `completed` / `failed`
`url`	string	Target URL
`data`	object	Same shape as sync `data`
`started_at`	string	Start time (ISO 8601)
`ended_at`	string	End time (ISO 8601)

Workflow

Restate the user goal as an extraction contract.

URL scope, required fields, accepted nulls, and precision expectations.

Build the scrape request body.

Keep only necessary options.
Prefer explicit
```
output.formats
```
.

Execute scrape and capture task metadata.

Track
```
scrape_id
```
,
```
status
```
, and timestamps.
If async, poll until
```
completed
```
or
```
failed
```
.

Return raw API responses directly.

Do not synthesize or compress fields by default.

Output Contract

Return:

Endpoint(s) used and mode (
```
sync
```
or
```
async
```
)
```
request_payload
```
used for the request
Raw response body from each API call
Error details when request fails

Do not generate summaries unless the user explicitly requests a summary.

Guardrails

Do not invent unsupported output fields.
Do not hardcode provider-specific tool schemas in core logic.
Call out uncertainty when page structure is unstable.

Xcrawl-skills xcrawl-scrape

XCrawl Scrape

Overview

Required Local Config

Credits and Account Setup

Tool Permission Policy

API Surface

Usage Examples

cURL (sync)

cURL (async create + result)

Node

Request Parameters

Request endpoint and headers

Request body: top-level fields

`proxy`

`request`

`js_render`

`output`

`webhook`

Response Parameters

Sync create response (
`mode=sync`
)

Async create response (
`mode=async`
)

Async result response (
`GET /v1/scrape/{scrape_id}`
)

Workflow

Output Contract

Guardrails

Xcrawl-skills xcrawl-scrape

XCrawl Scrape

Overview

Required Local Config

Credits and Account Setup

Tool Permission Policy

API Surface

Usage Examples

cURL (sync)

cURL (async create + result)

Node

Request Parameters

Request endpoint and headers

Request body: top-level fields

proxy

request

js_render

output

webhook

Response Parameters

Sync create response (mode=sync)

Async create response (mode=async)

Async result response (GET /v1/scrape/{scrape_id})

Workflow

Output Contract

Guardrails

`proxy`

`request`

`js_render`

`output`

`webhook`

Sync create response (
`mode=sync`
)

Async create response (
`mode=async`
)

Async result response (
`GET /v1/scrape/{scrape_id}`
)