Vibecosystem harvest-single

Single page smart extraction - articles, docs, blog posts to clean markdown

install
source · Clone the upstream repo
git clone https://github.com/vibeeval/vibecosystem
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/vibeeval/vibecosystem "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/harvest-single" ~/.claude/skills/vibeeval-vibecosystem-harvest-single && rm -rf "$T"
manifest: skills/harvest-single/SKILL.md
source content

Harvest Single Page

Extract and clean content from a single web page. Auto-detects content type (article, documentation, API reference, blog post) and produces clean, structured markdown.

Usage

/harvest <url>

Examples

# Extract a blog post
/harvest https://blog.example.com/best-practices-2024

# Extract API documentation page
/harvest https://docs.stripe.com/api/charges

# Extract a GitHub README
/harvest https://github.com/owner/repo

How It Works

  1. Fetch URL content via WebFetch or crawl4ai
  2. Detect content type (article, docs, API ref, blog, wiki)
  3. Extract main content, strip navigation/ads/footers
  4. Preserve code blocks, tables, images
  5. Add metadata header (source, date, word count)
  6. Save to
    .claude/cache/agents/harvest/

Output Format

# [Page Title]
> Source: [URL]
> Extracted: [timestamp]
> Type: [article|docs|api|blog|wiki]
> Words: [count]

[Clean extracted content in markdown]

## Links Found
- [Link text](URL)

Fallback Chain

  1. crawl4ai Docker (port 11235) - preferred
  2. WebFetch tool - built-in fallback
  3. curl + html2text - last resort

When to Use

  • Quick grab of a single page's content
  • Extracting a specific doc page for reference
  • Saving an article for later analysis
  • Getting clean markdown from messy HTML