Claude-skill-registry handler-http

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/handler-http" ~/.claude/skills/majiayu000-claude-skill-registry-handler-http && rm -rf "$T"

manifest: skills/data/handler-http/SKILL.md

Step 1: Validate URL

IF reference starts with @codex/external/:

Extract source name and path
Look up source in configuration
Construct URL from url_pattern
FUTURE: Not implemented in Phase 2, return error ELSE:
Use reference as direct URL
Validate protocol (http:// or https:// only)

IF protocol validation fails:

Error: "Invalid protocol: only http:// and https:// allowed"
STOP

Step 2: Fetch Content

USE SCRIPT: ./scripts/fetch-url.sh Arguments: { url: validated URL timeout: source_config.handler_config.timeout || 30 max_size_mb: source_config.handler_config.max_size_mb || 10 }

OUTPUT to stdout: Content OUTPUT to stderr: Metadata JSON

IF fetch fails:

Check HTTP status code
Return appropriate error message
Log failure
STOP

Step 3: Parse Metadata

Extract from HTTP headers (captured by fetch-url.sh):

content_type: MIME type
content_length: Size in bytes
last_modified: Last-Modified header
etag: ETag header
final_url: URL after redirects

IF content_type is text/markdown or contains YAML frontmatter: USE SCRIPT: ../document-fetcher/scripts/parse-frontmatter.sh Arguments: {content} OUTPUT: Frontmatter JSON ELSE: frontmatter = {}

Step 4: Return Result

Return structured response:

{
  "success": true,
  "content": "<fetched content>",
  "metadata": {
    "content_type": "text/html",
    "content_length": 12543,
    "last_modified": "2025-01-15T10:00:00Z",
    "etag": "\"abc123\"",
    "final_url": "https://...",
    "frontmatter": {...}
  }
}

</WORKFLOW>

<COMPLETION_CRITERIA> Operation is complete when:

✅ URL validated and fetched successfully
✅ Content returned
✅ Metadata extracted and structured
✅ All errors logged
✅ No security violations </COMPLETION_CRITERIA>

<OUTPUTS> Return to caller:

Success Response:

{
  "success": true,
  "content": "document content...",
  "metadata": {
    "content_type": "text/markdown",
    "content_length": 8192,
    "last_modified": "2025-01-15T10:00:00Z",
    "etag": "\"def456\"",
    "final_url": "https://docs.example.com/guide.md",
    "frontmatter": {
      "title": "API Guide",
      "codex_sync_include": ["*"]
    }
  }
}

Error Response:

{
  "success": false,
  "error": "HTTP 404: Not found",
  "url": "https://docs.example.com/missing.md",
  "http_code": 404
}

</OUTPUTS>

<ERROR_HANDLING>

<INVALID_PROTOCOL> If URL uses non-HTTP(S) protocol:

Error: "Invalid protocol: only http:// and https:// allowed"
Example: file:///etc/passwd → REJECT
Example: javascript:alert(1) → REJECT
Log security violation
STOP </INVALID_PROTOCOL>

<HTTP_ERROR_404> If HTTP status 404:

Error: "HTTP 404: Not found"
Suggest: Check URL spelling, verify document exists
Cache 404 status for 1 hour (prevent repeated failures)
STOP </HTTP_ERROR_404>

<HTTP_ERROR_403> If HTTP status 403:

Error: "HTTP 403: Access denied"
Suggest: Check authentication, verify permissions
STOP </HTTP_ERROR_403>

<HTTP_ERROR_5XX> If HTTP status 500-599:

Retry with exponential backoff (3 attempts)
Delays: 1s, 2s, 4s
If all retries fail:
- Error: "HTTP {code}: Server error (tried 3 times)"
- Log failure with timestamp
- STOP </HTTP_ERROR_5XX>

<TIMEOUT> If request times out: - Error: "Request timeout after {timeout}s" - Suggest: Check network connection, try again later - STOP </TIMEOUT>

<SIZE_LIMIT> If content exceeds max_size_mb:

Error: "Content too large: {size}MB exceeds limit of {max_size_mb}MB"
Suggest: Increase max_size_mb in configuration
STOP </SIZE_LIMIT>

</ERROR_HANDLING>

<DOCUMENTATION> Upon completion, output:

🎯 STARTING: handler-http
URL: https://docs.example.com/guide.md
Max size: 10MB | Timeout: 30s
───────────────────────────────────────

✓ URL validated
✓ Content fetched (8.2 KB)
✓ Metadata extracted
✓ Frontmatter parsed

✅ COMPLETED: handler-http
Source: External URL
Content-Type: text/markdown
Size: 8.2 KB
Fetch time: 1.2s
───────────────────────────────────────
Ready for caching and permission check

</DOCUMENTATION> <NOTES> ## Retry Logic

Server errors (5xx) are retried with exponential backoff:

Attempt 1: Immediate
Attempt 2: After 1s delay
Attempt 3: After 2s delay
Attempt 4: After 4s delay

Client errors (4xx) are NOT retried (they won't succeed).

Caching 404s

To prevent repeated fetches of non-existent URLs:

Cache 404 responses for 1 hour
Store in cache index with special marker
Future fetches within 1 hour return cached 404

Content Type Handling

Supported content types:

text/markdown → Parse frontmatter
text/html → Extract metadata from HTML meta tags (future)
text/plain → Plain text, no frontmatter
application/json → JSON documents (future)

Future Enhancements

Phase 3 and beyond:

robots.txt respect
Rate limiting per domain
Conditional GET (If-Modified-Since, If-None-Match)
Compressed response handling (gzip, brotli)
HTML→Markdown conversion
PDF fetching and parsing </NOTES>