PythonClaw web_scraper
install
source · Clone the upstream repo
git clone https://github.com/ericwang915/PythonClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ericwang915/PythonClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/pythonclaw/templates/skills/data/scraper" ~/.claude/skills/ericwang915-pythonclaw-web-scraper && rm -rf "$T"
manifest:
pythonclaw/templates/skills/data/scraper/SKILL.mdsource content
Instructions
Scrape and extract readable content from any web page.
Prerequisites
Install dependencies:
pip install requests beautifulsoup4
Usage
python {skill_path}/scrape.py URL [--format text|json|links|headings]
Formats:
(default) — cleaned readable texttext
— structured JSON with title, text, links, headingsjson
— all links on the pagelinks
— all headings (h1–h6)headings
Examples
- "Read the content of https://example.com"
- "Extract all links from https://news.ycombinator.com"
- "What does this page say? https://some-article.com/post"
Resources
| File | Description |
|---|---|
| Generic web page scraper |