Claude-skill-registry doc-to-skill-scraper
Scrape external documentation (API references, library docs, protocol specifications) and generate Claude Agent Skills in SKILL.md format. Use when creating skills from documentation, integrating third-party knowledge, or building domain-specific skills. Parses HTML/Markdown, extracts structure, generates proper frontmatter and instructions. Triggers on "scrape documentation", "docs to skill", "generate skill from docs", "API documentation to skill".
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/doc-to-skill-scraper" ~/.claude/skills/majiayu000-claude-skill-registry-doc-to-skill-scraper && rm -rf "$T"
skills/data/doc-to-skill-scraper/SKILL.mdDocumentation to Skill Scraper
Purpose
Meta-skill for converting external documentation into Claude Agent Skills with proper YAML frontmatter and structured instructions following the official specification.
When to Use
- Creating skills from API documentation
- Integrating third-party library knowledge
- Building domain-specific skills from protocols
- Converting documentation to reusable skills
- Continual learning from new documentation
Core Instructions
Step 1: Parse Documentation
from bs4 import BeautifulSoup import requests import yaml def scrape_documentation(url): """ Scrape and parse documentation """ response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # Extract structure title = soup.find('h1').text if soup.find('h1') else 'Unknown' sections = [] for heading in soup.find_all(['h1', 'h2', 'h3']): sections.append({ 'level': heading.name, 'text': heading.text, 'content': extract_content_after(heading) }) return { 'title': title, 'url': url, 'sections': sections }
Step 2: Extract Key Information
def extract_skill_info(doc_data): """ Extract skill components from documentation """ # Generate skill name name = doc_data['title'].lower().replace(' ', '-') # Extract purpose from first paragraph purpose = doc_data['sections'][0]['content'] if doc_data['sections'] else '' # Extract examples (look for code blocks) examples = extract_code_blocks(doc_data) # Extract API methods or key functions methods = extract_api_methods(doc_data) return { 'name': name, 'purpose': purpose, 'examples': examples, 'methods': methods }
Step 3: Generate SKILL.md
def generate_skill_md(skill_info): """ Generate SKILL.md content """ # Build description description = f"{skill_info['purpose']} Use when {infer_use_cases(skill_info)}. " \ f"Triggers on {infer_triggers(skill_info)}." # Ensure description is 200-1024 chars if len(description) > 1024: description = description[:1021] + "..." elif len(description) < 200: description = pad_description(description, skill_info) # Generate frontmatter frontmatter = { 'name': skill_info['name'], 'description': description } # Generate body body = f"""# {skill_info['title']} ## Purpose {skill_info['purpose']} ## When to Use {generate_when_to_use(skill_info)} ## Core Instructions {generate_instructions(skill_info)} ## Examples {format_examples(skill_info['examples'])} ## Dependencies {infer_dependencies(skill_info)} ## Version v1.0.0 """ # Combine yaml_str = yaml.dump(frontmatter, default_flow_style=False) return f"---\n{yaml_str}---\n\n{body}"
Step 4: Validate Generated Skill
def validate_skill(skill_md_content): """ Validate generated SKILL.md """ # Parse frontmatter parts = skill_md_content.split('---\n') if len(parts) < 3: raise ValueError("Invalid frontmatter structure") frontmatter = yaml.safe_load(parts[1]) # Check required fields assert 'name' in frontmatter, "Missing 'name' field" assert 'description' in frontmatter, "Missing 'description' field" # Check description length desc_len = len(frontmatter['description']) assert 200 <= desc_len <= 1024, f"Description must be 200-1024 chars (got {desc_len})" # Check name format assert frontmatter['name'].islower(), "Name must be lowercase" assert '-' in frontmatter['name'] or '_' not in frontmatter['name'], "Use hyphens, not underscores" return True
Example: Stripe API to Skill
Input Documentation URL
https://stripe.com/docs/api/payment_intents
Scraping Process
-
Parse HTML: Extract title, sections, code examples
-
Extract Info:
- Name:
stripe-payment-intents - Purpose: "Create and manage PaymentIntents for Stripe payments"
- Methods:
,create
,retrieve
,confirmcancel - Examples: Code snippets from docs
- Name:
-
Generate SKILL.md:
--- name: stripe-payment-intents description: Create and manage Stripe PaymentIntents for processing payments with 3D Secure support, automatic payment methods, and webhook integration. Use when building payment flows, processing credit cards, handling payment confirmations, or integrating Stripe checkout. Supports one-time and recurring payments. Triggers on "Stripe payment", "PaymentIntent", "process payment", "Stripe checkout", "payment processing". --- # Stripe Payment Intents ## Purpose Create and manage PaymentIntents for Stripe payment processing with support for 3D Secure, multiple payment methods, and webhook confirmations. ## When to Use - Building payment checkout flows - Processing credit card payments - Handling payment confirmations - Supporting 3D Secure authentication - Managing payment lifecycle ## Core Instructions ### Create Payment Intent ```python import stripe stripe.api_key = os.getenv('STRIPE_SECRET_KEY') # Create payment intent intent = stripe.PaymentIntent.create( amount=2000, # Amount in cents currency='usd', payment_method_types=['card'], metadata={'order_id': '12345'} )
Confirm Payment
# Confirm payment with payment method stripe.PaymentIntent.confirm( intent.id, payment_method='pm_card_visa' )
Handle Webhooks
# Verify webhook signature payload = request.get_data() sig_header = request.headers.get('Stripe-Signature') event = stripe.Webhook.construct_event( payload, sig_header, webhook_secret ) if event['type'] == 'payment_intent.succeeded': payment_intent = event['data']['object'] # Handle successful payment
Dependencies
- Python 3.7+
- stripe library
- Environment variable: STRIPE_SECRET_KEY
Version
v1.0.0
## Best Practices ### Ethics and Respect - **Respect robots.txt**: Check before scraping - **Cache locally**: Don't repeatedly scrape same content - **Rate limiting**: Add delays between requests - **Attribution**: Credit original documentation source ### Quality - **Validate generated skills**: Run through validation checks - **Test in sandbox**: Test generated skill before using - **Manual review**: Always review generated content - **Iterate**: Refine based on actual usage ### Description Generation - Extract key terms from documentation - Include file types, domain terms, action verbs - Add specific use cases from examples - Ensure 200-1024 character length ## Workflow ```bash # 1. Scrape documentation python scrape_docs.py https://api.example.com/docs # 2. Generate SKILL.md python generate_skill.py scraped_data.json # 3. Validate python validate_skill.py generated_skill.md # 4. Test # Create .claude/skills/new-skill/SKILL.md # Test with realistic prompt # 5. Iterate # Refine based on testing
Supporting Files
Generated skills can include:
- examples/: Usage examples from documentation
- templates/: API request templates
- scripts/: Helper scripts for common operations
Dependencies
- Python 3.8+
- beautifulsoup4 - HTML parsing
- requests - HTTP requests
- pyyaml - YAML generation
- Optional: markdown - Markdown parsing
Version
v1.0.0 (2025-10-23)