install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/doc-scraper" ~/.claude/skills/majiayu000-claude-skill-registry-doc-scraper && rm -rf "$T"
manifest:
skills/data/doc-scraper/SKILL.mdsource content
Snowflake Documentation Scraper
Scrapes docs.snowflake.com sections to Markdown with SQLite caching (7-day expiration).
Usage
First time setup (auto-installs uv and doc-scraper):
python3 .claude/skills/doc-scraper/scripts/doc_scraper.py
Subsequent runs:
doc-scraper --output-dir=./snowflake-docs doc-scraper --output-dir=./snowflake-docs --base-path="/en/sql-reference/" doc-scraper --output-dir=./snowflake-docs --spider-depth=2
Command Options
| Option | Default | Description |
|---|---|---|
| Required | Output directory for scraped docs |
| | URL section to scrape |
| | Link depth: 0=seeds, 1=+links, 2=+2nd |
| None | Cap URLs (for testing) |
| - | Preview without writing |
Output
output-dir/ ├── SKILL.md # Auto-generated index ├── scraper_config.yaml # Editable config (auto-created) ├── .cache/ # SQLite cache (auto-managed) └── en/migrations/*.md # Scraped pages with frontmatter
Configuration
Auto-created at
{output-dir}/scraper_config.yaml:
rate_limiting: max_concurrent_threads: 4 spider: max_pages: 1000 allowed_paths: ["/en/"] scraped_pages: expiration_days: 7
Troubleshooting
| Issue | Solution |
|---|---|
| Too many pages | Lower or edit config |
| Missing pages | Increase |
| Cache corruption | Delete (rare) |