Vibe-Skills scrapling
CLI-first web scraping & content extraction with optional MCP server. Use when you have target URLs and need clean, selector-based outputs (html/md/txt).
git clone https://github.com/foryourhealth111-pixel/Vibe-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/foryourhealth111-pixel/Vibe-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/bundled/skills/scrapling" ~/.claude/skills/foryourhealth111-pixel-vibe-skills-scrapling && rm -rf "$T"
bundled/skills/scrapling/SKILL.mdScrapling Skill (VCO)
Scrapling is a Python-based web scraping / extraction toolkit that exposes:
- a CLI (
) for fetching + extracting content into filesscrapling ... - an optional MCP server (
) so an agent can call structured scraping toolsscrapling mcp
This skill is CLI-first. Prefer it when you already have URLs and need reliable, repeatable extraction (CSS selector → file).
When to use
Use
scrapling when you need:
- Extract specific parts of a web page (CSS selector / XPath) into
/.txt
/.md.html - Run repeatable scraping jobs (batch URLs with a small wrapper script)
- Reduce token usage by extracting only the relevant DOM region before passing to the LLM
- Provide a local MCP endpoint for scraping tools (agent → MCP → scrapling)
Boundaries (vs Playwright / Search)
vs playwright
playwright
: best for “get URL → extract selector → write file” workflows; simpler, faster iterationscrapling
: best for interactive UI flows (login, multi-step navigation, downloads, complex JS actions, stateful sessions)playwright
If you must navigate or click through a UI, use
playwright.
If you can directly fetch the target page and just need extraction, use scrapling.
vs search tools
- Search tools are for discovering sources/URLs (query → result list → choose URLs).
is for acquisition + extraction once you already know the URL(s).scrapling
A common pipeline:
- Search → find candidate URLs
- Scrapling → extract focused content from chosen URLs
- LLM → summarize / transform / analyze extracted outputs
Prerequisite check (required)
- Python version (Scrapling requires Python >= 3.10):
python --version
- Scrapling CLI availability:
scrapling --help
Installation (recommended)
Scrapling’s CLI and MCP features are enabled via extras.
Recommended (CLI + MCP + fetchers):
python -m pip install "scrapling[ai]"
If you only want CLI fetch/extract without MCP:
python -m pip install "scrapling[fetchers]"
If you use browser-based fetchers, you may need browser binaries:
# Option A: via Scrapling helper (after install) scrapling install # Option B: directly via Playwright python -m playwright install
Wrapper script (Windows convenience)
This skill ships a thin PowerShell wrapper:
C:/Users/羽裳/.codex/skills/scrapling/scripts/scrapling.ps1
It checks whether
scrapling exists and prints install hints if missing.
Common CLI patterns
1) Extract full page body (to Markdown)
scrapling extract get "https://example.com" out.md
2) Extract a specific element (CSS selector) to text
scrapling extract get "https://example.com" out.txt --css-selector "main article"
3) Extract HTML for downstream parsing
scrapling extract get "https://example.com" out.html --css-selector "#content"
4) Use browser-backed fetcher mode (when simple GET is blocked / dynamic)
scrapling extract fetch "https://example.com" out.md --css-selector "main"
Tip: keep outputs in files and only feed the smallest relevant snippet to the LLM.
MCP server relationship (optional)
Scrapling can run as an MCP server. This is useful when:
- the agent needs tool-style scraping calls
- you want scraping results to be structured and deterministic
Start MCP server (stdio transport by default):
scrapling mcp
Optional: run MCP server with HTTP transport:
scrapling mcp --http --host 127.0.0.1 --port 8765
Example MCP server config snippet
{ "servers": { "scrapling": { "mode": "stdio", "command": "scrapling", "args": ["mcp"], "required": false, "note": "Requires: python -m pip install \"scrapling[ai]\"" } } }
Safety & ops notes
- Prefer selector-based extraction to minimize data volume.
- Treat scraping as an external dependency: handle timeouts, retries, and failures explicitly.
- For aggressive bot protection, consider switching fetchers or using
.playwright