Claude-skill-registry fetching-blocked-urls

Retrieve clean markdown from URLs when web_fetch fails. Converts pages via Jina AI reader service with automatic retry. Use when web_fetch or curl returns 403, blocked, paywall, timeout, JavaScript-rendering errors, or empty content or user explicitly suggests using jina.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/fetching-blocked-urls" ~/.claude/skills/majiayu000-claude-skill-registry-fetching-blocked-urls && rm -rf "$T"

manifest: skills/data/fetching-blocked-urls/SKILL.md

source content

Fetching Blocked URLs

Retrieve readable content from URLs that web_fetch cannot access. Jina AI's reader service renders JavaScript, bypasses soft blocks, and returns clean markdown.

Activation Triggers

Invoke this skill immediately when web_fetch returns:

403 Forbidden or access denied
Paywall or login wall indicators
Empty, garbled, or truncated content
JavaScript-heavy SPA failures
Timeout errors

Core Command

curl -s --max-time 30 "https://r.jina.ai/TARGET_URL"

The service returns markdown with page title, body text, and preserved links.

Retry Pattern

Jina's backend has ~10% intermittent failures. Use retry logic to achieve 99%+ success:

for attempt in 1 2 3; do
  result=$(curl -s --max-time 30 "https://r.jina.ai/TARGET_URL" 2>&1)
  echo "$result" | grep -q "upstream connect error" || { echo "$result"; break; }
  [ $attempt -lt 3 ] && sleep 1
done

Workflow Integration

Primary: Use web_fetch (native tool)
Fallback: This skill with retry when web_fetch fails
Escalate: Request user assistance only after retry exhaustion

Attempt this fallback before asking users to copy-paste content manually.

Output Format

Jina returns structured markdown:

```
Title:
```
page title
```
URL Source:
```
original URL
```
Markdown Content:
```
extracted body text, links preserved

Limitations

Long pages may truncate
Sites blocking all scrapers remain inaccessible
Login-required content limited to public portions
Real-time dynamic content may not render

Domain Access

r.jina.ai

is whitelisted in Claude container network configuration.