Claude-skill-registry debug-fetcher

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/debug-fetcher" ~/.claude/skills/majiayu000-claude-skill-registry-debug-fetcher && rm -rf "$T"

manifest: skills/data/debug-fetcher/SKILL.md

Debug-Fetcher Skill

Automated fetch failure handling that:

Queries /memory first - applies learned strategies before trying defaults
Exhausts all strategies - direct, playwright, wayback, brave, jina, proxy, UA rotation
Stores successes - saves working strategies to /memory for future runs
Collaborates with humans - uses /interview when all automated strategies fail

Quick Start

# Fetch single URL with failure handling
./run.sh fetch https://example.com

# Fetch batch with failure handling
./run.sh fetch-batch urls.txt

# Check what was learned about a domain
./run.sh recall example.com

# Export all learned strategies
./run.sh export-learnings

How It Works

URL Request
    │
    ▼
┌──────────────────────────┐
│  1. Query /memory        │
│  "What works for this    │
│   domain?"               │
└──────────────────────────┘
    │
    ▼
┌──────────────────────────┐
│  2. Try learned strategy │
│     (if exists)          │
└──────────────────────────┘
    │
    ▼ (fail or no learned strategy)
┌──────────────────────────┐
│  3. Exhaust strategies:  │
│  - direct fetch          │
│  - playwright            │
│  - wayback machine       │
│  - brave alternates      │
│  - jina reader           │
│  - proxy rotation        │
│  - user-agent rotation   │
└──────────────────────────┘
    │
    ▼ (all fail)
┌──────────────────────────┐
│  4. Launch /interview    │
│  Ask human for help:     │
│  - Credentials?          │
│  - Mirror URL?           │
│  - Manual download?      │
│  - Skip this URL?        │
└──────────────────────────┘
    │
    ▼
┌──────────────────────────┐
│  5. Store to /memory     │
│  - Successful strategy   │
│  - Domain patterns       │
│  - Human-provided info   │
└──────────────────────────┘

Memory Schema

Each learned strategy stores:

Field	Description
`domain`	Target domain (e.g., "nytimes.com")
`path_pattern`	URL path pattern (e.g., "/article/*")
`successful_strategy`	What worked (e.g., "playwright")
`headers`	Custom headers that helped
`timing_ms`	How long the fetch took
`success_rate`	Historical success rate
`failure_count`	How many times this domain failed
`last_used`	Timestamp of last use
`discovered_at`	When strategy was first learned

Commands

Command	Description
`fetch <url>`	Fetch single URL with failure handling
`fetch-batch <manifest>`	Fetch list of URLs with failure handling
`recall <domain>`	Show learned strategies for domain
`export-learnings`	Export all strategies to JSON

Environment Variables

Variable	Description
`DEBUG_FETCHER_MEMORY_SCOPE`	Memory scope for storing strategies (default: "fetcher_strategies")
`DEBUG_FETCHER_MAX_RETRIES`	Max retries per strategy (default: 2)
`DEBUG_FETCHER_INTERVIEW_THRESHOLD`	Min failures before triggering interview (default: 3)

Integration with Fetcher

Debug-fetcher wraps the standard fetcher skill and adds failure handling capabilities. All fetcher environment variables (BRAVE_API_KEY, FETCHER_EMIT_MARKDOWN, etc.) are respected.

Examples

Learning from Failures

After fetching a batch of URLs, debug-fetcher stores successful strategies:

# Fetch a batch
./run.sh fetch-batch urls.txt --output results.jsonl

# View what was learned
./run.sh recall attack.mitre.org
# Output:
# Domain: attack.mitre.org
# Strategy: playwright
# Success rate: 95%
# Last used: 2025-01-30

# Next time, playwright will be tried first for attack.mitre.org
./run.sh fetch https://attack.mitre.org/techniques/T1059

Human-in-the-Loop Interview

When all strategies fail, an interview is generated:

# Fetch batch with failures
./run.sh fetch-batch difficult_urls.txt

# Interview generated at: /tmp/interview_abc123.json
# Run: ./agents/skills/interview/run.sh /tmp/interview_abc123.json

# Example interview questions:
# - "Failed 5 URLs from nytimes.com. Do you have credentials?"
# - "archive.org not working. Try a mirror URL?"

YouTube URL Handling

YouTube URLs are automatically detected and handled via the

/ingest-youtube

skill:

# YouTube URLs use transcript extraction
./run.sh fetch https://www.youtube.com/watch?v=abc123
# Uses: /ingest-youtube skill for transcript extraction
# Falls back to other strategies if transcript unavailable

Batch Analysis

After a batch run, analyze patterns:

from debug_fetcher.batch_analyzer import analyze_batch, get_failure_summary

# Get summary
summary = get_failure_summary(results)
# {
#   "total": 1000,
#   "success": 850,
#   "failed": 150,
#   "success_rate": "85.0%",
#   "top_failing_domains": [
#     {"domain": "nytimes.com", "count": 45},
#     {"domain": "wsj.com", "count": 30}
#   ],
#   "patterns": [
#     "All 45 URLs from nytimes.com returned HTTP 403",
#     "High failure rate: 50% of failures are paywalled sites"
#   ]
# }

Recovery Actions

When human provides help via interview:

Action Type	Description	Example
`credentials`	Login credentials provided	username/password for site
`mirror`	Alternative URL to try	archive.org mirror
`manual_file`	Human downloaded file manually	Path to local PDF
`skip`	URL not needed	"Not critical"
`retry`	Try again later	Server was down
`custom_strategy`	Specific approach suggested	"Use proxy"

Files

.agents/skills/debug-fetcher/
├── SKILL.md           # This file
├── run.sh             # Entry point
├── pyproject.toml     # Dependencies
└── debug_fetcher/     # Python package
    ├── __init__.py
    ├── cli.py                 # CLI commands
    ├── memory_schema.py       # FetchStrategy dataclass
    ├── memory_bridge.py       # Recall/learn from /memory
    ├── strategy_engine.py     # Strategy exhaustion loop
    ├── batch_analyzer.py      # Analyze batch failures
    ├── interview_generator.py # Generate /interview JSON
    ├── interview_processor.py # Process interview responses
    ├── recovery_executor.py   # Execute recovery actions
    └── pdf_bridge.py          # Cross-skill integration with debug-pdf

Companion Skill: debug-pdf

debug-fetcher

and

debug-pdf

work together in the pipeline:

URL → debug-fetcher → /fetcher → /extractor → debug-pdf
         ↓                           ↓
      fetch fail               extraction fail
         ↓                           ↓
    retry/recover            analyze PDF issues
         ↓                           ↓
      /memory                     /memory

Shared failure patterns:

Pattern	debug-fetcher	debug-pdf
`auth_required`	HTTP 401/403	N/A
`access_restricted`	HTTP 403	N/A
`paywall_detected`	Soft paywall	N/A
`password_protected`	N/A	Encrypted PDF
`scanned_no_ocr`	N/A	No text layer
`archive_org_wrap`	Wayback wrapper	Wayback wrapper

Cross-skill notifications:

When debug-fetcher successfully fetches a PDF but detects issues (password protected, scanned), it notifies debug-pdf via agent-inbox
When debug-fetcher fails to fetch a PDF URL, it notifies debug-pdf for tracking

Related Skills

```
/memory
```
- Stores learned fetch strategies
```
/interview
```
- Human collaboration for unrecoverable URLs
```
/ingest-youtube
```
- YouTube transcript extraction
```
/fetcher
```
- Core URL fetching functionality
```
/extractor
```
- Content extraction from fetched documents
```
/debug-pdf
```
- Companion skill for PDF extraction failures