Vibecosystem harvest-structured

Structured data extraction - tables, pricing, products, API endpoints with schema

install
source · Clone the upstream repo
git clone https://github.com/vibeeval/vibecosystem
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/vibeeval/vibecosystem "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/harvest-structured" ~/.claude/skills/vibeeval-vibecosystem-harvest-structured && rm -rf "$T"
manifest: skills/harvest-structured/SKILL.md
source content

Harvest Structured

Extract structured data from web pages using user-defined schemas. Turns messy HTML into clean JSON/CSV - pricing tables, product listings, API endpoint docs, comparison matrices.

Usage

/scrape <url> --schema "<field descriptions>"

Examples

# Extract pricing data
/scrape https://example.com/pricing --schema "plan_name, price, features[], cta_text"

# Extract product listings
/scrape https://store.example.com/products --schema "name, price, rating, reviews_count, image_url"

# Extract API endpoints
/scrape https://docs.api.com/reference --schema "method, path, description, parameters[], response_code"

Schema Definition

Define fields as comma-separated names. Use

[]
for arrays:

name            → Single text value
price           → Single value (auto-detects currency)
features[]      → Array of items
description     → Long text
url             → Auto-detects links
image_url       → Auto-detects image sources

How It Works

  1. Fetch page content
  2. Parse schema definition
  3. Use CSS selectors or LLM extraction to match fields
  4. Validate extracted data against schema
  5. Output as JSON (default) or CSV

Output Format

JSON (default)

[
  {
    "plan_name": "Pro",
    "price": "$29/mo",
    "features": ["Unlimited projects", "Priority support", "API access"],
    "source_url": "https://example.com/pricing"
  }
]

CSV

plan_name,price,features,source_url
Pro,"$29/mo","Unlimited projects; Priority support; API access",https://example.com/pricing

Integration

  • growth: Competitor pricing extraction
  • migrator: Changelog/breaking changes extraction
  • tech-radar: Feature comparison across tools
  • data-analyst: Structured data for analysis

Rules

  • Only extract publicly visible data
  • Respect rate limits (1 req/sec)
  • Validate schema before extraction
  • Report confidence per field (high/medium/low)
  • Output includes source URL for every record