Vibecosystem harvest-single
Single page smart extraction - articles, docs, blog posts to clean markdown
install
source · Clone the upstream repo
git clone https://github.com/vibeeval/vibecosystem
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/vibeeval/vibecosystem "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/harvest-single" ~/.claude/skills/vibeeval-vibecosystem-harvest-single && rm -rf "$T"
manifest:
skills/harvest-single/SKILL.mdsource content
Harvest Single Page
Extract and clean content from a single web page. Auto-detects content type (article, documentation, API reference, blog post) and produces clean, structured markdown.
Usage
/harvest <url>
Examples
# Extract a blog post /harvest https://blog.example.com/best-practices-2024 # Extract API documentation page /harvest https://docs.stripe.com/api/charges # Extract a GitHub README /harvest https://github.com/owner/repo
How It Works
- Fetch URL content via WebFetch or crawl4ai
- Detect content type (article, docs, API ref, blog, wiki)
- Extract main content, strip navigation/ads/footers
- Preserve code blocks, tables, images
- Add metadata header (source, date, word count)
- Save to
.claude/cache/agents/harvest/
Output Format
# [Page Title] > Source: [URL] > Extracted: [timestamp] > Type: [article|docs|api|blog|wiki] > Words: [count] [Clean extracted content in markdown] ## Links Found - [Link text](URL)
Fallback Chain
- crawl4ai Docker (port 11235) - preferred
- WebFetch tool - built-in fallback
- curl + html2text - last resort
When to Use
- Quick grab of a single page's content
- Extracting a specific doc page for reference
- Saving an article for later analysis
- Getting clean markdown from messy HTML