Claude-skill-registry debug-fetcher

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/debug-fetcher" ~/.claude/skills/majiayu000-claude-skill-registry-debug-fetcher && rm -rf "$T"
manifest: skills/data/debug-fetcher/SKILL.md
source content

Debug-Fetcher Skill

Automated fetch failure handling that:

  1. Queries /memory first - applies learned strategies before trying defaults
  2. Exhausts all strategies - direct, playwright, wayback, brave, jina, proxy, UA rotation
  3. Stores successes - saves working strategies to /memory for future runs
  4. Collaborates with humans - uses /interview when all automated strategies fail

Quick Start

# Fetch single URL with failure handling
./run.sh fetch https://example.com

# Fetch batch with failure handling
./run.sh fetch-batch urls.txt

# Check what was learned about a domain
./run.sh recall example.com

# Export all learned strategies
./run.sh export-learnings

How It Works

URL Request
    │
    ▼
┌──────────────────────────┐
│  1. Query /memory        │
│  "What works for this    │
│   domain?"               │
└──────────────────────────┘
    │
    ▼
┌──────────────────────────┐
│  2. Try learned strategy │
│     (if exists)          │
└──────────────────────────┘
    │
    ▼ (fail or no learned strategy)
┌──────────────────────────┐
│  3. Exhaust strategies:  │
│  - direct fetch          │
│  - playwright            │
│  - wayback machine       │
│  - brave alternates      │
│  - jina reader           │
│  - proxy rotation        │
│  - user-agent rotation   │
└──────────────────────────┘
    │
    ▼ (all fail)
┌──────────────────────────┐
│  4. Launch /interview    │
│  Ask human for help:     │
│  - Credentials?          │
│  - Mirror URL?           │
│  - Manual download?      │
│  - Skip this URL?        │
└──────────────────────────┘
    │
    ▼
┌──────────────────────────┐
│  5. Store to /memory     │
│  - Successful strategy   │
│  - Domain patterns       │
│  - Human-provided info   │
└──────────────────────────┘

Memory Schema

Each learned strategy stores:

FieldDescription
domain
Target domain (e.g., "nytimes.com")
path_pattern
URL path pattern (e.g., "/article/*")
successful_strategy
What worked (e.g., "playwright")
headers
Custom headers that helped
timing_ms
How long the fetch took
success_rate
Historical success rate
failure_count
How many times this domain failed
last_used
Timestamp of last use
discovered_at
When strategy was first learned

Commands

CommandDescription
fetch <url>
Fetch single URL with failure handling
fetch-batch <manifest>
Fetch list of URLs with failure handling
recall <domain>
Show learned strategies for domain
export-learnings
Export all strategies to JSON

Environment Variables

VariableDescription
DEBUG_FETCHER_MEMORY_SCOPE
Memory scope for storing strategies (default: "fetcher_strategies")
DEBUG_FETCHER_MAX_RETRIES
Max retries per strategy (default: 2)
DEBUG_FETCHER_INTERVIEW_THRESHOLD
Min failures before triggering interview (default: 3)

Integration with Fetcher

Debug-fetcher wraps the standard fetcher skill and adds failure handling capabilities. All fetcher environment variables (BRAVE_API_KEY, FETCHER_EMIT_MARKDOWN, etc.) are respected.

Examples

Learning from Failures

After fetching a batch of URLs, debug-fetcher stores successful strategies:

# Fetch a batch
./run.sh fetch-batch urls.txt --output results.jsonl

# View what was learned
./run.sh recall attack.mitre.org
# Output:
# Domain: attack.mitre.org
# Strategy: playwright
# Success rate: 95%
# Last used: 2025-01-30

# Next time, playwright will be tried first for attack.mitre.org
./run.sh fetch https://attack.mitre.org/techniques/T1059

Human-in-the-Loop Interview

When all strategies fail, an interview is generated:

# Fetch batch with failures
./run.sh fetch-batch difficult_urls.txt

# Interview generated at: /tmp/interview_abc123.json
# Run: ./agents/skills/interview/run.sh /tmp/interview_abc123.json

# Example interview questions:
# - "Failed 5 URLs from nytimes.com. Do you have credentials?"
# - "archive.org not working. Try a mirror URL?"

YouTube URL Handling

YouTube URLs are automatically detected and handled via the

/ingest-youtube
skill:

# YouTube URLs use transcript extraction
./run.sh fetch https://www.youtube.com/watch?v=abc123
# Uses: /ingest-youtube skill for transcript extraction
# Falls back to other strategies if transcript unavailable

Batch Analysis

After a batch run, analyze patterns:

from debug_fetcher.batch_analyzer import analyze_batch, get_failure_summary

# Get summary
summary = get_failure_summary(results)
# {
#   "total": 1000,
#   "success": 850,
#   "failed": 150,
#   "success_rate": "85.0%",
#   "top_failing_domains": [
#     {"domain": "nytimes.com", "count": 45},
#     {"domain": "wsj.com", "count": 30}
#   ],
#   "patterns": [
#     "All 45 URLs from nytimes.com returned HTTP 403",
#     "High failure rate: 50% of failures are paywalled sites"
#   ]
# }

Recovery Actions

When human provides help via interview:

Action TypeDescriptionExample
credentials
Login credentials providedusername/password for site
mirror
Alternative URL to tryarchive.org mirror
manual_file
Human downloaded file manuallyPath to local PDF
skip
URL not needed"Not critical"
retry
Try again laterServer was down
custom_strategy
Specific approach suggested"Use proxy"

Files

.agents/skills/debug-fetcher/
├── SKILL.md           # This file
├── run.sh             # Entry point
├── pyproject.toml     # Dependencies
└── debug_fetcher/     # Python package
    ├── __init__.py
    ├── cli.py                 # CLI commands
    ├── memory_schema.py       # FetchStrategy dataclass
    ├── memory_bridge.py       # Recall/learn from /memory
    ├── strategy_engine.py     # Strategy exhaustion loop
    ├── batch_analyzer.py      # Analyze batch failures
    ├── interview_generator.py # Generate /interview JSON
    ├── interview_processor.py # Process interview responses
    ├── recovery_executor.py   # Execute recovery actions
    └── pdf_bridge.py          # Cross-skill integration with debug-pdf

Companion Skill: debug-pdf

debug-fetcher
and
debug-pdf
work together in the pipeline:

URL → debug-fetcher → /fetcher → /extractor → debug-pdf
         ↓                           ↓
      fetch fail               extraction fail
         ↓                           ↓
    retry/recover            analyze PDF issues
         ↓                           ↓
      /memory                     /memory

Shared failure patterns:

Patterndebug-fetcherdebug-pdf
auth_required
HTTP 401/403N/A
access_restricted
HTTP 403N/A
paywall_detected
Soft paywallN/A
password_protected
N/AEncrypted PDF
scanned_no_ocr
N/ANo text layer
archive_org_wrap
Wayback wrapperWayback wrapper

Cross-skill notifications:

  • When debug-fetcher successfully fetches a PDF but detects issues (password protected, scanned), it notifies debug-pdf via agent-inbox
  • When debug-fetcher fails to fetch a PDF URL, it notifies debug-pdf for tracking

Related Skills

  • /memory
    - Stores learned fetch strategies
  • /interview
    - Human collaboration for unrecoverable URLs
  • /ingest-youtube
    - YouTube transcript extraction
  • /fetcher
    - Core URL fetching functionality
  • /extractor
    - Content extraction from fetched documents
  • /debug-pdf
    - Companion skill for PDF extraction failures