Claude-skill-registry anti-scraping
Use when need to bypass Cloudflare protection, scrape websites with anti-bot measures, render JavaScript pages, or simulate real browser behavior for web scraping
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/anti-scraping" ~/.claude/skills/majiayu000-claude-skill-registry-anti-scraping && rm -rf "$T"
manifest:
skills/data/anti-scraping/SKILL.mdsafety · automated scan (medium risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
- global npm install
- references .env files
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content
Anti-Scraping & Web Scraping
When to use: Websites with Cloudflare protection, JavaScript rendering requirements, or anti-bot measures.
Overview
Provides battle-tested solutions for bypassing common anti-scraping measures using Playwright headless browser with stealth configurations.
Key Capabilities
- ✅ Cloudflare challenge bypass
- ✅ JavaScript rendering
- ✅ Real browser context simulation
- ✅ Stealth mode (hides automation detection)
- ✅ Screenshot capture for debugging
Quick Start
Prerequisites
# Install Playwright npm install -g playwright playwright install chromium
Basic Usage Pattern
// n8n Execute Command node const { execSync } = require('child_process'); const url = 'https://example.com'; const outputFile = '/tmp/page.html'; // Playwright command with stealth const command = `node playwright-cloudflare.js "${url}" "${outputFile}"`; execSync(command); // Read result const html = fs.readFileSync(outputFile, 'utf8');
Core Script: playwright-cloudflare.js
Location:
n8n-skills/anti-scraping/playwright-cloudflare.js
Key Features:
- Disables automation detection
- Sets real browser headers
- Configures viewport and user agent
- Handles Cloudflare waiting
- Captures screenshots on failure
Configuration:
const config = { waitForCloudflare: true, // Wait for CF challenge waitTime: 15000, // Max wait time (ms) selector: '.product-list', // Element to wait for screenshotOnError: true, // Debug screenshots userAgent: 'Mozilla/5.0...' // Real browser UA };
n8n Workflow Pattern
[Manual Trigger] ↓ [Set Parameters] target_url: https://site.com wait_selector: .content ↓ [Execute Command: Playwright] Command: node Arguments: playwright-cloudflare.js {{$json.target_url}} /tmp/output.html ↓ [Read HTML File] File: /tmp/output.html ↓ [Parse with Cheerio] (use html-parsing skill)
Performance
- Speed: 15-25 seconds per page
- Success Rate: ~95% for Cloudflare sites
- Resource Usage: ~200-300MB RAM per browser instance
Troubleshooting
Cloudflare Still Blocking
# Increase wait time --wait 30000 # Add specific selector to wait for --selector '.product-list' # Check screenshot for errors /tmp/error-screenshot.png
Timeout Errors
# Increase timeout in playwright script timeout: 60000 // 60 seconds
Memory Issues
# Close browser properly await browser.close(); # Limit concurrent instances # Use n8n Split Into Batches with batch size = 1
Best Practices
- Add Delays: Wait 3-5 seconds between requests
- Rotate User Agents: Change UA periodically
- Use Residential Proxies: For high-volume scraping
- Handle Errors: Implement retry logic with exponential backoff
- Respect robots.txt: Check site policies
Common Patterns
Pattern 1: Single Page Scraping
Trigger → Playwright → Parse → Export
Pattern 2: Multi-Page with Pagination
Trigger → Generate URLs (pagination skill) → Split Into Batches → Playwright → Wait 5s → Parse → Deduplicate → Export
Pattern 3: With Error Handling
Playwright → [Error Trigger] → Retry Logic → Notification
Integration with Other Skills
- pagination: Generate URLs for multi-page scraping
- html-parsing: Extract data from rendered HTML
- error-handling: Retry on failures
- debugging: Validate extracted data
Full Code and Documentation
Complete implementation with examples:
/mnt/d/work/n8n_agent/n8n-skills/anti-scraping/
Files:
- Main scraping scriptplaywright-cloudflare.js
- Detailed documentationREADME.md
- n8n workflow exampleexample-workflow.json
- Configuration templateconfig.template.env