Skills web-fetcher
Smart web content fetcher - articles and videos from WeChat, Feishu, Bilibili, Zhihu, Toutiao, YouTube, etc. Triggers: '抓取文章', '下载网页', '保存文章', 'fetch URL', '下载视频', '抓取飞书文档', '抓取微信文章', '把这个链接内容保存下来', '下载B站视频', 'download video', 'scrape article'.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/alexxxiong/web-fetcher" ~/.claude/skills/openclaw-skills-web-fetcher && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/alexxxiong/web-fetcher" ~/.openclaw/skills/openclaw-skills-web-fetcher && rm -rf "$T"
manifest:
skills/alexxxiong/web-fetcher/SKILL.mdsource content
Web Fetcher
Smart web content fetcher for Claude Code. Automatically detects platform and uses the best strategy to fetch articles or download videos.
Quick Start
# Fetch an article python3 {SKILL_DIR}/fetcher.py "URL" -o ~/docs/ # Download a video python3 {SKILL_DIR}/fetcher.py "https://b23.tv/xxx" -o ~/videos/ # Batch fetch from file python3 {SKILL_DIR}/fetcher.py --urls-file urls.txt -o ~/docs/
Install Dependencies
Install only what you need — dependencies are checked at runtime:
| Dependency | Purpose | Install |
|---|---|---|
| scrapling | Article fetching (HTTP + browser) | |
| yt-dlp | Video download | |
| camoufox | Anti-detection browser (Xiaohongshu, Weibo) | |
| html2text | HTML to Markdown conversion | |
Smart Routing
The fetcher automatically detects the platform from the URL:
| Platform | Method | Notes |
|---|---|---|
| mp.weixin.qq.com | scrapling | Extracts images, handles SVG placeholders |
| *.feishu.cn | Virtual scroll | Collects all blocks via scrolling, downloads images with cookies |
| zhuanlan.zhihu.com | scrapling | selector |
| www.zhihu.com | scrapling | selector |
| www.toutiao.com | scrapling | Handles base64 placeholders |
| www.xiaohongshu.com | camoufox | Anti-bot protection requires stealth browser |
| www.weibo.com | camoufox | Anti-bot protection requires stealth browser |
| bilibili.com / b23.tv | yt-dlp | Video download, supports quality selection |
| youtube.com / youtu.be | yt-dlp | Video download |
| douyin.com | yt-dlp | Video download |
| Unknown URLs | scrapling | Generic fetch with fallback tiers |
CLI Reference
python3 {SKILL_DIR}/fetcher.py [URL] [OPTIONS] Arguments: url URL to fetch Options: -o, --output DIR Output directory (default: current) -q, --quality N Video quality, e.g. 1080, 720 (default: 1080) --method METHOD Force method: scrapling, camoufox, ytdlp, feishu --selector CSS Force CSS selector for content extraction --urls-file FILE File with URLs (one per line, # for comments) --audio-only Extract audio only (video downloads) --no-images Skip image download (articles) --cookies-browser NAME Browser for cookies (e.g., chrome, firefox)
Platform Notes
WeChat (mp.weixin.qq.com)
- Images use
attribute withdata-src
URLsmmbiz.qpic.cn - Visible
tags contain SVG placeholders (lazy loading)<img> - Image download requires
headerReferer: https://mp.weixin.qq.com/ - Scrapling GET usually works; no browser needed
Feishu (*.feishu.cn)
- Uses virtual scroll — content blocks are rendered on-demand
- The fetcher scrolls through the entire document, collecting
elements[data-block-id] - Images require authenticated fetch (cookies), downloaded via browser's fetch API
- May show "Unable to print" artifacts which are auto-cleaned
Bilibili
- Short links (b23.tv) are auto-resolved
- For premium/member content, use
--cookies-browser chrome - Default quality is 1080p, adjustable with
-q
Troubleshooting
| Problem | Solution |
|---|---|
| |
| |
| Article content too short | Try for JS-heavy pages |
| Feishu returns login page | The doc may require authentication |
| Bilibili 403 | Use |
| Image download fails | Check network; WeChat images need Referer header (auto-handled) |
Manual Usage
When the CLI doesn't fit your needs, use the modules directly:
from lib.router import route, check_dependency from lib.article import fetch_article from lib.video import fetch_video from lib.feishu import fetch_feishu # Route a URL r = route("https://mp.weixin.qq.com/s/xxx") # {'type': 'article', 'method': 'scrapling', 'selector': '#js_content', 'post': 'wx_images'} # Fetch article fetch_article(url, output_dir="/tmp/out", route_config=r) # Download video fetch_video(url, output_dir="/tmp/out", quality="720") # Fetch Feishu doc fetch_feishu(url, output_dir="/tmp/out")