Cli-power-skills web-research
Use when extracting article text from news URLs or downloading video or audio with metadata
install
source · Clone the upstream repo
git clone https://github.com/ykotik/cli-power-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ykotik/cli-power-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/web-research" ~/.claude/skills/ykotik-cli-power-skills-web-research && rm -rf "$T"
manifest:
skills/web-research/SKILL.mdsource content
Web Research
When to Use
- Extracting clean article text and metadata from a news/blog URL
- Downloading video or audio from supported sites with full metadata
- Getting video/audio metadata without downloading the media
Tools
| Tool | Purpose | Structured output |
|---|---|---|
| newspaper4k | Extract article text, title, authors, date from URLs | Python object (access , , ) |
| yt-dlp | Download video/audio from 1000+ sites | for metadata JSON |
Patterns
Extract article text from a URL
python3 -c " from newspaper import Article a = Article('https://example.com/news/article') a.download() a.parse() import json print(json.dumps({'title': a.title, 'authors': a.authors, 'date': str(a.publish_date), 'text': a.text[:2000]}, indent=2)) "
Extract multiple articles
for url in "https://example.com/article1" "https://example.com/article2"; do python3 -c " from newspaper import Article import json a = Article('$url') a.download() a.parse() print(json.dumps({'url': '$url', 'title': a.title, 'text': a.text[:1000]})) " done
Get video metadata without downloading
yt-dlp --dump-json "https://www.youtube.com/watch?v=VIDEO_ID" | jq '{title, duration, view_count, upload_date}'
Download audio only (best quality)
yt-dlp -x --audio-format mp3 "https://www.youtube.com/watch?v=VIDEO_ID"
Download video (best quality, specific format)
yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]" "https://www.youtube.com/watch?v=VIDEO_ID"
Download with metadata and subtitles
yt-dlp --write-info-json --write-subs --sub-langs en "https://www.youtube.com/watch?v=VIDEO_ID"
List available formats for a video
yt-dlp -F "https://www.youtube.com/watch?v=VIDEO_ID"
Pipelines
Get video metadata → query with DuckDB
yt-dlp --dump-json "https://www.youtube.com/@channel/videos" --flat-playlist | head -20 > videos.jsonl duckdb -c "SELECT title, view_count, duration FROM read_json_auto('videos.jsonl') ORDER BY view_count DESC LIMIT 10"
Each stage: yt-dlp dumps playlist metadata as JSONL, DuckDB queries for top videos by views.
Prefer Over
- Prefer newspaper4k over curl + HTML parsing for article extraction — handles boilerplate removal, metadata extraction automatically
- Prefer yt-dlp over browser downloads — supports 1000+ sites, can extract metadata without downloading
Do NOT Use When
- Need to interact with a web page (click, scroll, fill forms) — use web-crawling skill (Playwright)
- Real-time search with the WebSearch tool available — prefer the built-in tool for conversational search
- Downloading copyrighted content without authorization
- Target site requires authentication — use web-crawling skill with login automation