PythonClaw web_scraper

Name: web_scraper
Author: ericwang915

install

source · Clone the upstream repo

git clone https://github.com/ericwang915/PythonClaw

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ericwang915/PythonClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/pythonclaw/templates/skills/data/scraper" ~/.claude/skills/ericwang915-pythonclaw-web-scraper && rm -rf "$T"

manifest: pythonclaw/templates/skills/data/scraper/SKILL.md

source content

Instructions

Scrape and extract readable content from any web page.

Prerequisites

Install dependencies:

pip install requests beautifulsoup4

Usage

python {skill_path}/scrape.py URL [--format text|json|links|headings]

Formats:

```
text
```
(default) — cleaned readable text
```
json
```
— structured JSON with title, text, links, headings
```
links
```
— all links on the page
```
headings
```
— all headings (h1–h6)

Examples

"Read the content of https://example.com"
"Extract all links from https://news.ycombinator.com"
"What does this page say? https://some-article.com/post"

Resources

File	Description
`scrape.py`	Generic web page scraper