Awesome-omni-skill web-search

Web search and content extraction toolkit. Use for searching documentation, facts, current information, or extracting readable content from URLs. Supports multiple providers (ddgs keyless, brave_api with key), caching, and safe defaults. Prefer this over browser-tools when no interaction is needed.

install

source · Clone the upstream repo

git clone https://github.com/diegosouzapw/awesome-omni-skill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/web-search" ~/.claude/skills/diegosouzapw-awesome-omni-skill-web-search-e22863 && rm -rf "$T"

manifest: skills/development/web-search/SKILL.md

source content

Web Search Toolkit

Search the web and extract readable content. Stable CLI with JSON output for agents.

Setup

cd {baseDir}
uv sync  # Install dependencies (once)

Optional: Set

BRAVE_API_KEY

for better search reliability (ddgs is keyless but flaky).

Commands

Search

{baseDir}/.venv/bin/wstk search "query"                    # Default (10 results)
{baseDir}/.venv/bin/wstk search "query" -n 5 --plain       # URLs only, one per line
{baseDir}/.venv/bin/wstk search "query" --json             # Machine-readable
{baseDir}/.venv/bin/wstk search "query" --time-range w     # Last week
{baseDir}/.venv/bin/wstk search "site:docs.python.org asyncio"  # Site-scoped

Key flags:

```
-n, --max-results <N>
```
— Number of results (default: 10)
```
--time-range <d|w|m|y>
```
— Filter by recency
```
--provider <ddgs|brave_api|auto>
```
— Search provider
```
--plain
```
— Output URLs only (for piping)
```
--json
```
— Structured output

Pipeline (search → extract)

{baseDir}/.venv/bin/wstk pipeline "python asyncio tutorial" --json
{baseDir}/.venv/bin/wstk pipeline "python asyncio tutorial" --plan --plain

Key flags:

```
--top-k <N>
```
— Search results to consider
```
--extract-k <N>
```
— Number of results to extract
```
--plan
```
— Return candidates without fetching
```
--method <http|browser|auto>
```
— Extraction method (default: http)

Extract (fetch + extract readable content)

{baseDir}/.venv/bin/wstk extract https://example.com --plain     # Markdown output
{baseDir}/.venv/bin/wstk extract https://example.com --text      # Plain text
{baseDir}/.venv/bin/wstk extract https://example.com --json      # Full metadata
{baseDir}/.venv/bin/wstk extract ./local-file.html --plain       # From file

Key flags:

```
--markdown
```
/
```
--text
```
/
```
--both
```
— Output format
```
--strategy <auto|readability|docs>
```
— Extraction strategy
```
--max-chars <N>
```
— Truncate output
```
--allow-domain <domain>
```
— Restrict to specific domains (safety)

Fetch (raw HTTP, no extraction)

{baseDir}/.venv/bin/wstk fetch https://example.com --json        # Metadata + status
{baseDir}/.venv/bin/wstk fetch https://example.com --plain       # Path to cached body

List providers

{baseDir}/.venv/bin/wstk providers --plain

Decision Guide

```
search
```
when you need discovery or candidate URLs.
```
pipeline
```
when you want a one-shot search → extract bundle.
```
fetch
```
when you need HTTP metadata or the cached body path (no extraction).
```
extract
```
when you want readable content from a URL or local HTML.
```
render
```
when a page is JS-only or blocked (or use
```
extract --method browser
```
for one-step extraction).

Common Patterns

Search → extract top result:

url=$({baseDir}/.venv/bin/wstk search "python asyncio tutorial" --plain | head -1)
{baseDir}/.venv/bin/wstk extract "$url" --plain --max-chars 8000

Search with JSON for programmatic use:

{baseDir}/.venv/bin/wstk search "openai api reference" --json | jq '.data.results[0].url'

Safe extraction (restrict domains):

{baseDir}/.venv/bin/wstk extract https://docs.python.org/3/library/asyncio.html \
  --allow-domain docs.python.org --plain

Output Formats

```
--plain
```
— Stable text for piping (URLs for search, content for extract)

--json

— Structured envelope:

{ "ok": bool, "data": {...}, "error": {...} }

Default — Human-readable with colors

Agent Defaults

Default to
```
--json
```
in agent wrappers; parse
```
ok
```
,
```
error.code
```
, and
```
warnings
```
.
Surface concise diagnostics by relaying
```
error.message
```
and
```
error.details.reason
```
when
```
ok=false
```
.
Use
```
--plain
```
only for piping, and add
```
--no-input
```
for non-interactive runs.
Consider
```
--redact
```
when handling sensitive URLs or content.

Exit Codes

```
0
```
— Success
```
1
```
— Runtime failure (network, provider error)
```
2
```
— Invalid usage
```
3
```
— Not found / empty result
```
4
```
— Blocked / access denied
```
5
```
— Needs JS rendering (page is JS-only)

Global Flags

```
--timeout <seconds>
```
— Network timeout
```
--no-cache
```
— Disable caching
```
--fresh
```
— Bypass cache reads (still writes)
```
--quiet
```
— Minimal output
```
--verbose
```
— Debug diagnostics to stderr
```
--policy <standard|strict|permissive>
```
— Safety defaults

References

```
references/troubleshooting.md
```
— 403/JS-only guidance and advanced fetch flags.
```
references/providers.md
```
— Provider selection and privacy notes.
```
docs/claude-code.md
```
— Claude Code wrapper usage.

When to Use

Searching for documentation or API references
Looking up facts or current information
Extracting content from known URLs
Any task requiring web search without interactive browsing

Prefer

browser-tools

when you need: JS interaction, form filling, clicking, or visual inspection.