Vibe-Skills scrapling

CLI-first web scraping & content extraction with optional MCP server. Use when you have target URLs and need clean, selector-based outputs (html/md/txt).

install
source · Clone the upstream repo
git clone https://github.com/foryourhealth111-pixel/Vibe-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/foryourhealth111-pixel/Vibe-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/bundled/skills/scrapling" ~/.claude/skills/foryourhealth111-pixel-vibe-skills-scrapling && rm -rf "$T"
manifest: bundled/skills/scrapling/SKILL.md
source content

Scrapling Skill (VCO)

Scrapling is a Python-based web scraping / extraction toolkit that exposes:

  • a CLI (
    scrapling ...
    ) for fetching + extracting content into files
  • an optional MCP server (
    scrapling mcp
    ) so an agent can call structured scraping tools

This skill is CLI-first. Prefer it when you already have URLs and need reliable, repeatable extraction (CSS selector → file).

When to use

Use

scrapling
when you need:

  • Extract specific parts of a web page (CSS selector / XPath) into
    .txt
    /
    .md
    /
    .html
  • Run repeatable scraping jobs (batch URLs with a small wrapper script)
  • Reduce token usage by extracting only the relevant DOM region before passing to the LLM
  • Provide a local MCP endpoint for scraping tools (agent → MCP → scrapling)

Boundaries (vs Playwright / Search)

vs
playwright

  • scrapling
    : best for “get URL → extract selector → write file” workflows; simpler, faster iteration
  • playwright
    : best for interactive UI flows (login, multi-step navigation, downloads, complex JS actions, stateful sessions)

If you must navigate or click through a UI, use

playwright
. If you can directly fetch the target page and just need extraction, use
scrapling
.

vs search tools

  • Search tools are for discovering sources/URLs (query → result list → choose URLs).
  • scrapling
    is for acquisition + extraction once you already know the URL(s).

A common pipeline:

  1. Search → find candidate URLs
  2. Scrapling → extract focused content from chosen URLs
  3. LLM → summarize / transform / analyze extracted outputs

Prerequisite check (required)

  1. Python version (Scrapling requires Python >= 3.10):
python --version
  1. Scrapling CLI availability:
scrapling --help

Installation (recommended)

Scrapling’s CLI and MCP features are enabled via extras.

Recommended (CLI + MCP + fetchers):

python -m pip install "scrapling[ai]"

If you only want CLI fetch/extract without MCP:

python -m pip install "scrapling[fetchers]"

If you use browser-based fetchers, you may need browser binaries:

# Option A: via Scrapling helper (after install)
scrapling install

# Option B: directly via Playwright
python -m playwright install

Wrapper script (Windows convenience)

This skill ships a thin PowerShell wrapper:

  • C:/Users/羽裳/.codex/skills/scrapling/scripts/scrapling.ps1

It checks whether

scrapling
exists and prints install hints if missing.

Common CLI patterns

1) Extract full page body (to Markdown)

scrapling extract get "https://example.com" out.md

2) Extract a specific element (CSS selector) to text

scrapling extract get "https://example.com" out.txt --css-selector "main article"

3) Extract HTML for downstream parsing

scrapling extract get "https://example.com" out.html --css-selector "#content"

4) Use browser-backed fetcher mode (when simple GET is blocked / dynamic)

scrapling extract fetch "https://example.com" out.md --css-selector "main"

Tip: keep outputs in files and only feed the smallest relevant snippet to the LLM.

MCP server relationship (optional)

Scrapling can run as an MCP server. This is useful when:

  • the agent needs tool-style scraping calls
  • you want scraping results to be structured and deterministic

Start MCP server (stdio transport by default):

scrapling mcp

Optional: run MCP server with HTTP transport:

scrapling mcp --http --host 127.0.0.1 --port 8765

Example MCP server config snippet

{
  "servers": {
    "scrapling": {
      "mode": "stdio",
      "command": "scrapling",
      "args": ["mcp"],
      "required": false,
      "note": "Requires: python -m pip install \"scrapling[ai]\""
    }
  }
}

Safety & ops notes

  • Prefer selector-based extraction to minimize data volume.
  • Treat scraping as an external dependency: handle timeouts, retries, and failures explicitly.
  • For aggressive bot protection, consider switching fetchers or using
    playwright
    .