Goose-skills orthogonal-riveter

Web scraping with structured data extraction - define your output schema

install

source · Clone the upstream repo

git clone https://github.com/gooseworks-ai/goose-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/gooseworks-ai/goose-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/capabilities/orthogonal-riveter" ~/.claude/skills/gooseworks-ai-goose-skills-orthogonal-riveter && rm -rf "$T"

manifest: skills/capabilities/orthogonal-riveter/SKILL.md

Riveter - Structured Web Scraping

Setup

Read your credentials from ~/.gooseworks/credentials.json:

export GOOSEWORKS_API_KEY=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json'))['api_key'])")
export GOOSEWORKS_API_BASE=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json')).get('api_base','https://api.gooseworks.ai'))")

If ~/.gooseworks/credentials.json does not exist, tell the user to run:

npx gooseworks login

All endpoints use Bearer auth:

-H "Authorization: Bearer $GOOSEWORKS_API_KEY"

Scrape web pages and extract data into your defined structure.

Capabilities

Scrape: Scrape a webpage and return the text content
Run: Copy link Define the structure of your output directly in the API request
Run data: Retrieve the processed data from a completed project run (free)
Run status: Check the current status of a project run (free)
Stop run: Stop a currently running project (free)

Usage

Scrape

Scrape a webpage and return the text content. This endpoint allows you to extract text content from any public webpage.

Parameters:

url* (string) - Example: "https://example.com"
proxy_country_code (string) - Optional two-character country code for proxy (e.g., 'us', 'gb', 'de')
skip_cache (boolean) - Default: false. Set to true to bypass cache and always fetch fresh content

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/scrape","body":{"url":"https://example.com/article"}}'

Run

Copy link Define the structure of your output directly in the API request. This endpoint allows you to define both your input data and output configuration in a single request.

Parameters:

input* (object) - The input object contains your source data: Keys are column/attribute names Values are arrays of strings (all arrays must be the same length) Maximum 1000 rows per request
output* (object) - The output object defines what data you want to extract: Keys are the names of attributes you want to extract Each attribute requires: prompt: Instructions for finding/extracting this data contexts: Array of input or other output attribute names this depends on. Optional Output Configuration Each output attribute can optionally include: format: Data type ('number', 'json', 'url', 'text', 'email', 'tag', 'date', 'boolean') format_details: Format-specific configuration (varies by format type). For json format, you can provide either a description (string) or a schema (JSON Schema object) or both. tools: Array of tools to use (['web_search', 'web_scrape', 'query_pdf', 'query_image']) max_tool_calls: Number of tool calls allowed (0-10) run_when: When to run this extraction ('always', 'any_filled', 'all_filled')
run_key (string) - Custom identifier for this run (optional, will be generated if not provided)

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/run"}'
  "input": {
    "urls": ["https://example.com/products"]
  },
  "output": {
    "name": {"prompt": "Product name", "contexts": ["urls"]},
    "price": {"prompt": "Product price", "contexts": ["urls"], "format": "number"}
  }
}'

Run data (free)

Retrieve the processed data from a completed project run

Parameters:

run_key* (string) - The run key (UUID) of the project run to retrieve data for

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/run_data","query":{"run_key":"abc123"}}'

Run status (free)

Check the current status of a project run

Parameters:

run_key* (string) - The run key (UUID) of the project run to check

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/run_status","query":{"run_key":"abc123"}}'

Stop run (free)

Stop a currently running project. This will halt all processing and mark the run as stopped. Behavior: If the run is already stopped or success, returns success with current status. If the run is in progress, stops all pending cells and marks the run as stopped. Stopped runs cannot be resumed

Parameters:

run_key* (string) - The run key (UUID) of the project run to stop

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/stop_run","query":{"run_key":"abc123"}}'

Use Cases

E-commerce Scraping: Extract product data in consistent format
Job Listings: Gather job postings with structured fields
News Aggregation: Extract articles with title, date, content
Price Monitoring: Track prices across competitor sites

Discover More

For full endpoint details and parameters:

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/search \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"riveter API endpoints"}' List all endpoints
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/details \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/scrape"}'   # Get endpoint details