Fetch-everything web-content-fetcher

install

source · Clone the upstream repo

git clone https://github.com/liangdabiao/fetch-everything

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/liangdabiao/fetch-everything "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/web-content-fetcher-main" ~/.claude/skills/liangdabiao-fetch-everything-web-content-fetcher && rm -rf "$T"

manifest: .claude/skills/web-content-fetcher-main/SKILL.md

source content

Web Content Fetcher — 网页正文提取

能力说明

给一个 URL，返回干净的 Markdown 格式正文，保留：

标题层级（# ## ###）
超链接（文字）
图片（）
列表、代码块、引用块

提取策略（三级降级）

URL
 ↓
1. Jina Reader（首选）
   web_fetch("https://r.jina.ai/<url>", maxChars=30000)
   优点：快（~1.5s），格式干净
   限制：200次/天免费配额
   失败场景：微信公众号（403）、部分国内平台
 ↓
2. Scrapling + html2text（Jina 超限或失败时）
   exec: python3 scripts/fetch.py <url> 30000
   优点：无限制，效果和 Jina 相当，能读微信公众号
   适合：mp.weixin.qq.com、Substack、Medium 等反爬平台
 ↓
3. web_fetch 直接抓（静态页面兜底）
   web_fetch(url, maxChars=30000)
   适合：GitHub README、普通静态博客、技术文档

域名快捷路由

直接跳过 Jina，节省配额：

```
mp.weixin.qq.com
```
→ 直接用 Scrapling
```
zhuanlan.zhihu.com
```
、
```
juejin.cn
```
、
```
csdn.net
```
→ 优先 Scrapling

使用方式

自动模式（推荐）

直接告诉我要读取的 URL，我会自动选择合适的方案：

帮我读取这篇文章：https://example.com/article

手动指定方案

用 Scrapling 读取：https://mp.weixin.qq.com/s/xxx

安装依赖

# 安装基础依赖（包含 fetchers）
pip install "scrapling[fetchers]" html2text --break-system-packages

# 安装浏览器依赖（首次使用需要执行）
scrapling install

脚本路径

scripts/fetch.py

— Scrapling + html2text 提取脚本

调用方式：

python3 ~/.openclaw/workspace/skills/web-content-fetcher/scripts/fetch.py <url> [max_chars]

防死循环规则

同一个 URL 累计失败 2 次就放弃，记录为"无法提取"，不重复重试。