Vibecosystem harvest-structured

Structured data extraction - tables, pricing, products, API endpoints with schema

install

source · Clone the upstream repo

git clone https://github.com/vibeeval/vibecosystem

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/vibeeval/vibecosystem "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/harvest-structured" ~/.claude/skills/vibeeval-vibecosystem-harvest-structured && rm -rf "$T"

manifest: skills/harvest-structured/SKILL.md

Harvest Structured

Extract structured data from web pages using user-defined schemas. Turns messy HTML into clean JSON/CSV - pricing tables, product listings, API endpoint docs, comparison matrices.

Usage

/scrape <url> --schema "<field descriptions>"

Examples

# Extract pricing data
/scrape https://example.com/pricing --schema "plan_name, price, features[], cta_text"

# Extract product listings
/scrape https://store.example.com/products --schema "name, price, rating, reviews_count, image_url"

# Extract API endpoints
/scrape https://docs.api.com/reference --schema "method, path, description, parameters[], response_code"

Schema Definition

Define fields as comma-separated names. Use

[]

for arrays:

name            → Single text value
price           → Single value (auto-detects currency)
features[]      → Array of items
description     → Long text
url             → Auto-detects links
image_url       → Auto-detects image sources

How It Works

Fetch page content
Parse schema definition
Use CSS selectors or LLM extraction to match fields
Validate extracted data against schema
Output as JSON (default) or CSV

Output Format

JSON (default)

[
  {
    "plan_name": "Pro",
    "price": "$29/mo",
    "features": ["Unlimited projects", "Priority support", "API access"],
    "source_url": "https://example.com/pricing"
  }
]

CSV

plan_name,price,features,source_url
Pro,"$29/mo","Unlimited projects; Priority support; API access",https://example.com/pricing

Integration

growth: Competitor pricing extraction
migrator: Changelog/breaking changes extraction
tech-radar: Feature comparison across tools
data-analyst: Structured data for analysis

Rules

Only extract publicly visible data
Respect rate limits (1 req/sec)
Validate schema before extraction
Report confidence per field (high/medium/low)
Output includes source URL for every record