Awesome-omni-skill anydocs
Generic Documentation Indexing & Search. Index any documentation site (SPA/static) and search it instantly.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/documentation/anydocs" ~/.claude/skills/diegosouzapw-awesome-omni-skill-anydocs && rm -rf "$T"
skills/documentation/anydocs/SKILL.mdanydocs - Generic Documentation Indexing & Search
A powerful, reusable skill for indexing and searching ANY documentation site.
What It Does
anydocs solves a real problem: accessing documentation from code or CLI. Instead of opening a browser every time, you can:
- Index any documentation site (Discord, OpenClaw, internal docs, etc.)
- Search instantly from the command line or Python API
- Cache pages locally to avoid repeated network calls
- Configure multiple profiles for different doc sites
When to Use It
Use
anydocs when you need to:
- Quickly look up API documentation without leaving the terminal
- Build agents that need to reference docs
- Extract specific information from documentation
- Search across multiple documentation sites
- Integrate docs into your workflow
Key Features
🔍 Multi-Method Search
- Keyword search: Fast, term-based matching with BM25-style scoring
- Hybrid search: Keyword + phrase proximity for better relevance
- Regex search: Advanced pattern matching for power users
🌐 Works with Any Docs Site
- Sitemap-based discovery (standard XML sitemap)
- Fallback crawling from base URL
- HTML content extraction with smart selector detection
- Automatic rate limiting to be respectful
💾 Smart Caching
- Pages cached locally with 7-day TTL (configurable)
- Search indexes cached for instant second searches
- Cache statistics and cleanup commands
- Respects cache invalidation
⚙️ Profile-Based Configuration
- Support multiple doc sites simultaneously
- Per-profile search methods and cache TTLs
- Configuration stored in
~/.anydocs/config.json - Examples for Discord, OpenClaw, and custom sites
🌐 JavaScript Rendering (Optional)
- Uses Playwright to render client-side SPAs (Single Page Apps)
- Automatically discovers links on JS-heavy sites like Discord docs
- Gracefully falls back to standard HTTP if Playwright unavailable
- Configure per-discovery session or globally per profile
Installation
cd /path/to/skills/anydocs pip install -r requirements.txt chmod +x anydocs.py
Optional: Browser-based rendering (for JavaScript-heavy sites)
For sites like Discord that use client-side rendering, install Playwright:
pip install playwright==1.40.0 playwright install # Downloads Chromium
If Playwright is unavailable, anydocs gracefully falls back to standard HTTP fetching.
Quick Start
1. Configure a Documentation Site
python anydocs.py config vuejs \ https://vuejs.org \ https://vuejs.org/sitemap.xml
2. Build the Index
python anydocs.py index vuejs
This discovers all pages via sitemap, scrapes content, and builds a searchable index.
3. Search
python anydocs.py search "composition api" --profile vuejs python anydocs.py search "reactivity" --profile vuejs --limit 5
4. Fetch a Specific Page
python anydocs.py fetch "guide/introduction" --profile vuejs
CLI Commands
Configuration
# Add or update a profile anydocs config <profile> <base_url> <sitemap_url> [--search-method hybrid] [--ttl-days 7] # List configured profiles anydocs list-profiles
Indexing
# Build index for a profile anydocs index <profile> # Force re-index (skip cache) anydocs index <profile> --force
Search
# Basic keyword search anydocs search "query" --profile discord # Limit results anydocs search "query" --profile discord --limit 5 # Regex search anydocs search "^API" --profile discord --regex
Fetch
# Fetch a specific page (URL or path) anydocs fetch "https://discord.com/developers/docs/resources/webhook" anydocs fetch "resources/webhook" --profile discord
Cache Management
# Show cache statistics anydocs cache status # Clear all cache anydocs cache clear # Clear specific profile's cache anydocs cache clear --profile discord
Python API
For use in agents and scripts:
from lib.config import ConfigManager from lib.scraper import DiscoveryEngine from lib.indexer import SearchIndex # Load configuration config_mgr = ConfigManager() config = config_mgr.get_profile("discord") # Scrape documentation scraper = DiscoveryEngine(config["base_url"], config["sitemap_url"]) pages = scraper.fetch_all() # Build search index index = SearchIndex() index.build(pages) # Search results = index.search("webhooks", limit=10) for result in results: print(f"{result['title']} ({result['relevance_score']})") print(f" {result['url']}")
Configuration File Format
Configuration is stored in
~/.anydocs/config.json:
{ "discord": { "name": "discord", "base_url": "https://discord.com/developers/docs", "sitemap_url": "https://discord.com/developers/docs/sitemap.xml", "search_method": "hybrid", "cache_ttl_days": 7 }, "openclaw": { "name": "openclaw", "base_url": "https://docs.openclaw.ai", "sitemap_url": "https://docs.openclaw.ai/sitemap.xml", "search_method": "hybrid", "cache_ttl_days": 7 } }
Search Methods
Keyword Search
- Speed: Fast
- Best for: Common terms, exact matches
- How it works: Term matching with position weighting (title > tags > content)
- Example:
anydocs search "webhooks"
Hybrid Search (Default)
- Speed: Fast
- Best for: Natural language queries
- How it works: Keyword search + phrase proximity scoring
- Example:
anydocs search "how to set up webhooks"
Regex Search
- Speed: Medium
- Best for: Complex patterns
- How it works: Compiled regex pattern matching across all content
- Example:
anydocs search "^(GET|POST)" --regex
Caching Behavior
- Pages: Cached as JSON with 7-day TTL (configurable)
- Indexes: Cached after indexing, invalidated on TTL expiry
- Cache location:
~/.anydocs/cache/ - Manual refresh: Use
flag or clear cache--force
Performance Notes
- First index build takes 2-10 minutes depending on site size
- Subsequent searches are instant (cached indexes)
- Rate limit: 0.5s per page to be respectful
- Typical search returns ~100 results in <100ms
Troubleshooting
"No index for 'profile'" error
Run
anydocs index <profile> first to build the index.
Sitemap not found
Check the sitemap URL. Falls back to crawling from base_url if unavailable.
Slow indexing
This is normal for large sites. Rate limiting prevents overwhelming servers.
Cache grows too large
Run
anydocs cache clear or set --ttl-days to a smaller value.
Examples
Vue.js Framework Docs (SPA Example)
anydocs config vuejs \ https://vuejs.org \ https://vuejs.org/sitemap.xml anydocs index vuejs anydocs search "composition api"
Next.js API Docs
anydocs config nextjs \ https://nextjs.org \ https://nextjs.org/sitemap.xml anydocs index nextjs anydocs search "app router" --profile nextjs
Internal Company Documentation
anydocs config internal \ https://docs.company.local \ https://docs.company.local/sitemap.xml anydocs index internal --force anydocs search "deployment" --profile internal
Architecture
- scraper.py: Discovers URLs via sitemap, fetches and parses HTML
- indexer.py: Builds searchable indexes, implements multiple search strategies
- config.py: Manages configuration profiles
- cache.py: TTL-based file caching for pages and indexes
- cli.py: Click-based command-line interface
Contributing
To add new documentation sites, run:
anydocs config <profile> <base_url> <sitemap_url>
To extend search functionality, modify
lib/indexer.py.
License
Part of the OpenClaw system.