Vibecosystem harvest-structured
Structured data extraction - tables, pricing, products, API endpoints with schema
install
source · Clone the upstream repo
git clone https://github.com/vibeeval/vibecosystem
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/vibeeval/vibecosystem "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/harvest-structured" ~/.claude/skills/vibeeval-vibecosystem-harvest-structured && rm -rf "$T"
manifest:
skills/harvest-structured/SKILL.mdsource content
Harvest Structured
Extract structured data from web pages using user-defined schemas. Turns messy HTML into clean JSON/CSV - pricing tables, product listings, API endpoint docs, comparison matrices.
Usage
/scrape <url> --schema "<field descriptions>"
Examples
# Extract pricing data /scrape https://example.com/pricing --schema "plan_name, price, features[], cta_text" # Extract product listings /scrape https://store.example.com/products --schema "name, price, rating, reviews_count, image_url" # Extract API endpoints /scrape https://docs.api.com/reference --schema "method, path, description, parameters[], response_code"
Schema Definition
Define fields as comma-separated names. Use
[] for arrays:
name → Single text value price → Single value (auto-detects currency) features[] → Array of items description → Long text url → Auto-detects links image_url → Auto-detects image sources
How It Works
- Fetch page content
- Parse schema definition
- Use CSS selectors or LLM extraction to match fields
- Validate extracted data against schema
- Output as JSON (default) or CSV
Output Format
JSON (default)
[ { "plan_name": "Pro", "price": "$29/mo", "features": ["Unlimited projects", "Priority support", "API access"], "source_url": "https://example.com/pricing" } ]
CSV
plan_name,price,features,source_url Pro,"$29/mo","Unlimited projects; Priority support; API access",https://example.com/pricing
Integration
- growth: Competitor pricing extraction
- migrator: Changelog/breaking changes extraction
- tech-radar: Feature comparison across tools
- data-analyst: Structured data for analysis
Rules
- Only extract publicly visible data
- Respect rate limits (1 req/sec)
- Validate schema before extraction
- Report confidence per field (high/medium/low)
- Output includes source URL for every record