Awesome-omni-skill web-search
Web search and content extraction toolkit. Use for searching documentation, facts, current information, or extracting readable content from URLs. Supports multiple providers (ddgs keyless, brave_api with key), caching, and safe defaults. Prefer this over browser-tools when no interaction is needed.
install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/web-search" ~/.claude/skills/diegosouzapw-awesome-omni-skill-web-search-e22863 && rm -rf "$T"
manifest:
skills/development/web-search/SKILL.mdsource content
Web Search Toolkit
Search the web and extract readable content. Stable CLI with JSON output for agents.
Setup
cd {baseDir} uv sync # Install dependencies (once)
Optional: Set
BRAVE_API_KEY for better search reliability (ddgs is keyless but flaky).
Commands
Search
{baseDir}/.venv/bin/wstk search "query" # Default (10 results) {baseDir}/.venv/bin/wstk search "query" -n 5 --plain # URLs only, one per line {baseDir}/.venv/bin/wstk search "query" --json # Machine-readable {baseDir}/.venv/bin/wstk search "query" --time-range w # Last week {baseDir}/.venv/bin/wstk search "site:docs.python.org asyncio" # Site-scoped
Key flags:
— Number of results (default: 10)-n, --max-results <N>
— Filter by recency--time-range <d|w|m|y>
— Search provider--provider <ddgs|brave_api|auto>
— Output URLs only (for piping)--plain
— Structured output--json
Pipeline (search → extract)
{baseDir}/.venv/bin/wstk pipeline "python asyncio tutorial" --json {baseDir}/.venv/bin/wstk pipeline "python asyncio tutorial" --plan --plain
Key flags:
— Search results to consider--top-k <N>
— Number of results to extract--extract-k <N>
— Return candidates without fetching--plan
— Extraction method (default: http)--method <http|browser|auto>
Extract (fetch + extract readable content)
{baseDir}/.venv/bin/wstk extract https://example.com --plain # Markdown output {baseDir}/.venv/bin/wstk extract https://example.com --text # Plain text {baseDir}/.venv/bin/wstk extract https://example.com --json # Full metadata {baseDir}/.venv/bin/wstk extract ./local-file.html --plain # From file
Key flags:
/--markdown
/--text
— Output format--both
— Extraction strategy--strategy <auto|readability|docs>
— Truncate output--max-chars <N>
— Restrict to specific domains (safety)--allow-domain <domain>
Fetch (raw HTTP, no extraction)
{baseDir}/.venv/bin/wstk fetch https://example.com --json # Metadata + status {baseDir}/.venv/bin/wstk fetch https://example.com --plain # Path to cached body
List providers
{baseDir}/.venv/bin/wstk providers --plain
Decision Guide
when you need discovery or candidate URLs.search
when you want a one-shot search → extract bundle.pipeline
when you need HTTP metadata or the cached body path (no extraction).fetch
when you want readable content from a URL or local HTML.extract
when a page is JS-only or blocked (or userender
for one-step extraction).extract --method browser
Common Patterns
Search → extract top result:
url=$({baseDir}/.venv/bin/wstk search "python asyncio tutorial" --plain | head -1) {baseDir}/.venv/bin/wstk extract "$url" --plain --max-chars 8000
Search with JSON for programmatic use:
{baseDir}/.venv/bin/wstk search "openai api reference" --json | jq '.data.results[0].url'
Safe extraction (restrict domains):
{baseDir}/.venv/bin/wstk extract https://docs.python.org/3/library/asyncio.html \ --allow-domain docs.python.org --plain
Output Formats
— Stable text for piping (URLs for search, content for extract)--plain
— Structured envelope:--json{ "ok": bool, "data": {...}, "error": {...} }- Default — Human-readable with colors
Agent Defaults
- Default to
in agent wrappers; parse--json
,ok
, anderror.code
.warnings - Surface concise diagnostics by relaying
anderror.message
whenerror.details.reason
.ok=false - Use
only for piping, and add--plain
for non-interactive runs.--no-input - Consider
when handling sensitive URLs or content.--redact
Exit Codes
— Success0
— Runtime failure (network, provider error)1
— Invalid usage2
— Not found / empty result3
— Blocked / access denied4
— Needs JS rendering (page is JS-only)5
Global Flags
— Network timeout--timeout <seconds>
— Disable caching--no-cache
— Bypass cache reads (still writes)--fresh
— Minimal output--quiet
— Debug diagnostics to stderr--verbose
— Safety defaults--policy <standard|strict|permissive>
References
— 403/JS-only guidance and advanced fetch flags.references/troubleshooting.md
— Provider selection and privacy notes.references/providers.md
— Claude Code wrapper usage.docs/claude-code.md
When to Use
- Searching for documentation or API references
- Looking up facts or current information
- Extracting content from known URLs
- Any task requiring web search without interactive browsing
Prefer
browser-tools when you need: JS interaction, form filling, clicking, or visual inspection.