Cli-power-skills web-research

Use when extracting article text from news URLs or downloading video or audio with metadata

install

source · Clone the upstream repo

git clone https://github.com/ykotik/cli-power-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ykotik/cli-power-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/web-research" ~/.claude/skills/ykotik-cli-power-skills-web-research && rm -rf "$T"

manifest: skills/web-research/SKILL.md

source content

Web Research

When to Use

Extracting clean article text and metadata from a news/blog URL
Downloading video or audio from supported sites with full metadata
Getting video/audio metadata without downloading the media

Tools

Tool Purpose Structured output

newspaper4k

Extract article text, title, authors, date from URLs

Python object (access

.title

.text

.authors

)

yt-dlp

Download video/audio from 1000+ sites

--dump-json

for metadata JSON

Patterns

Extract article text from a URL

python3 -c "
from newspaper import Article
a = Article('https://example.com/news/article')
a.download()
a.parse()
import json
print(json.dumps({'title': a.title, 'authors': a.authors, 'date': str(a.publish_date), 'text': a.text[:2000]}, indent=2))
"

Extract multiple articles

for url in "https://example.com/article1" "https://example.com/article2"; do
  python3 -c "
from newspaper import Article
import json
a = Article('$url')
a.download()
a.parse()
print(json.dumps({'url': '$url', 'title': a.title, 'text': a.text[:1000]}))
"
done

Get video metadata without downloading

yt-dlp --dump-json "https://www.youtube.com/watch?v=VIDEO_ID" | jq '{title, duration, view_count, upload_date}'

Download audio only (best quality)

yt-dlp -x --audio-format mp3 "https://www.youtube.com/watch?v=VIDEO_ID"

Download video (best quality, specific format)

yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]" "https://www.youtube.com/watch?v=VIDEO_ID"

Download with metadata and subtitles

yt-dlp --write-info-json --write-subs --sub-langs en "https://www.youtube.com/watch?v=VIDEO_ID"

List available formats for a video

yt-dlp -F "https://www.youtube.com/watch?v=VIDEO_ID"

Pipelines

Get video metadata → query with DuckDB

yt-dlp --dump-json "https://www.youtube.com/@channel/videos" --flat-playlist | head -20 > videos.jsonl
duckdb -c "SELECT title, view_count, duration FROM read_json_auto('videos.jsonl') ORDER BY view_count DESC LIMIT 10"

Each stage: yt-dlp dumps playlist metadata as JSONL, DuckDB queries for top videos by views.

Prefer Over

Prefer newspaper4k over curl + HTML parsing for article extraction — handles boilerplate removal, metadata extraction automatically
Prefer yt-dlp over browser downloads — supports 1000+ sites, can extract metadata without downloading

Do NOT Use When

Need to interact with a web page (click, scroll, fill forms) — use web-crawling skill (Playwright)
Real-time search with the WebSearch tool available — prefer the built-in tool for conversational search
Downloading copyrighted content without authorization
Target site requires authentication — use web-crawling skill with login automation