PythonClaw web_scraper

install
source · Clone the upstream repo
git clone https://github.com/ericwang915/PythonClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ericwang915/PythonClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/pythonclaw/templates/skills/data/scraper" ~/.claude/skills/ericwang915-pythonclaw-web-scraper && rm -rf "$T"
manifest: pythonclaw/templates/skills/data/scraper/SKILL.md
source content

Instructions

Scrape and extract readable content from any web page.

Prerequisites

Install dependencies:

pip install requests beautifulsoup4

Usage

python {skill_path}/scrape.py URL [--format text|json|links|headings]

Formats:

  • text
    (default) — cleaned readable text
  • json
    — structured JSON with title, text, links, headings
  • links
    — all links on the page
  • headings
    — all headings (h1–h6)

Examples

Resources

FileDescription
scrape.py
Generic web page scraper