Agent-toolkit web-to-markdown
Use ONLY when the user explicitly says: 'use the skill web-to-markdown ...' (or 'use a skill web-to-markdown ...'). Converts webpage URLs to clean Markdown by calling the local web2md CLI (Puppeteer + Readability), suitable for JS-rendered pages.
install
source · Clone the upstream repo
git clone https://github.com/softaworks/agent-toolkit
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/softaworks/agent-toolkit "$T" && mkdir -p ~/.claude/skills && cp -r "$T/dist/plugins/web-to-markdown/skills/web-to-markdown" ~/.claude/skills/softaworks-agent-toolkit-web-to-markdown && rm -rf "$T"
manifest:
dist/plugins/web-to-markdown/skills/web-to-markdown/SKILL.mdsource content
web-to-markdown
Convert web pages to clean Markdown by driving a locally installed browser (via
web2md).
Hard trigger gate (must enforce)
This skill MUST NOT be used unless the user explicitly wrote exactly a phrase like:
use the skill web-to-markdown ...use a skill web-to-markdown ...
If the user did not explicitly request this skill by name, stop and ask them to re-issue the request including:
use the skill web-to-markdown.
What this skill does
- Handles JS-rendered pages (Puppeteer → user Chrome).
- Works best with Chromium-family browsers (Chrome/Chromium/Brave/Edge) via
.puppeteer-core - Extracts main content (Readability).
- Converts to Markdown (Turndown) with cleaned links and optional YAML frontmatter.
Non-goals
- Do not use Playwright or other browser automation stacks; the mechanism is
.web2md
Inputs you should collect (ask only if missing)
(or a list of URLs)url- Output preference:
- Print to stdout (
), OR--print - Save to a file (
), OR--out ./file.md - Save to a directory (
to auto-name by page title)--out ./some-dir/
- Print to stdout (
- Optional rendering controls for tricky pages:
(if Chrome auto-detection fails)--chrome-path <path>
(show Chrome and pause so the user can complete human checks/login, then press Enter)--interactive--wait-until load|domcontentloaded|networkidle0|networkidle2--wait-for '<css selector>'--wait-ms <milliseconds>
(debug)--headful
(sometimes required in containers/CI)--no-sandbox
(login/session; use a dedicated profile directory)--user-data-dir <dir>
Workflow
- Confirm the user explicitly invoked the skill (
).use the skill web-to-markdown - Validate URL(s) start with
orhttp://
.https:// - Ensure
is installed:web2md- Run:
command -v web2md - If missing, instruct the user to install it (assume the project exists at
):~/workspace/softaworks/projects/web2mdcd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm link- Or:
cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm install -g .
- Run:
- Convert:
- Single URL → file:
web2md '<url>' --out ./page.md
- Single URL → auto-named file in directory:
mkdir -p ./out && web2md '<url>' --out ./out/
- Human verification / login walls (interactive):
mkdir -p ./out && web2md '<url>' --interactive --user-data-dir ./tmp/web2md-profile --out ./out/- Then: complete the check in the browser window and press Enter in the terminal to continue.
- Print to stdout:
web2md '<url>' --print
- Multiple URLs (batch):
- Create output dir (e.g.
) then run one./out/
command per URL usingweb2md--out ./out/
- Create output dir (e.g.
- Single URL → file:
- Validate output:
- If writing files, verify they exist and are non-empty (e.g.
andls -la <path>
).wc -c <path>
- If writing files, verify they exist and are non-empty (e.g.
- Return:
- The saved file path(s), or the Markdown (stdout mode).
Defaults (recommended)
- For most pages:
--wait-until networkidle2 - For heavy apps: start with
, then add--wait-until domcontentloaded --wait-ms 2000
(or another stable selector) if needed.--wait-for 'main'