Claude-code-plugins brightdata-performance-tuning
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/brightdata-pack/skills/brightdata-performance-tuning" ~/.claude/skills/jeremylongshore-claude-code-plugins-brightdata-performance-tuning && rm -rf "$T"
manifest:
plugins/saas-packs/brightdata-pack/skills/brightdata-performance-tuning/SKILL.mdsource content
Bright Data Performance Tuning
Overview
Optimize Bright Data scraping performance through connection pooling, response caching, concurrent request tuning, and smart product selection. Web Unlocker latency is typically 5-30s due to CAPTCHA solving; Scraping Browser sessions are 10-60s.
Prerequisites
- Bright Data zone configured
- Understanding of async patterns
- Redis or file cache available (optional)
Latency Benchmarks
| Product | P50 | P95 | P99 | Notes |
|---|---|---|---|---|
| Web Unlocker (simple) | 3s | 8s | 15s | No CAPTCHA |
| Web Unlocker (CAPTCHA) | 10s | 25s | 45s | With CAPTCHA solving |
| Scraping Browser | 8s | 20s | 40s | Full browser render |
| SERP API (sync) | 2s | 5s | 10s | Search results |
| Residential Proxy | 1s | 3s | 8s | Raw proxy, no unblocking |
Instructions
Step 1: Choose the Right Product
// Product selection matrix function selectProduct(target: { js: boolean; captcha: boolean; structured: boolean }) { if (target.structured) return 'serp_api'; // Pre-parsed JSON if (!target.js && !target.captcha) return 'residential'; // Fastest if (target.js) return 'scraping_browser'; // Browser rendering return 'web_unlocker'; // Best default }
Step 2: Connection Pooling with Keep-Alive
import { Agent } from 'https'; import axios from 'axios'; // Reuse TCP connections to brd.superproxy.io const httpsAgent = new Agent({ keepAlive: true, maxSockets: 25, // Match your concurrency limit maxFreeSockets: 5, timeout: 120000, rejectUnauthorized: false, }); const client = axios.create({ proxy: { host: 'brd.superproxy.io', port: 33335, auth: { username: proxyUser, password: proxyPass } }, httpsAgent, timeout: 60000, });
Step 3: Response Caching Layer
// src/brightdata/cache.ts — avoid re-scraping identical URLs import { createHash } from 'crypto'; import { LRUCache } from 'lru-cache'; const memoryCache = new LRUCache<string, string>({ max: 500, // Max cached pages maxSize: 100_000_000, // 100MB total sizeCalculation: (v) => Buffer.byteLength(v), ttl: 3600000, // 1 hour }); export async function cachedScrape( url: string, scraper: (url: string) => Promise<string>, ttlMs?: number ): Promise<string> { const key = createHash('sha256').update(url).digest('hex'); const cached = memoryCache.get(key); if (cached) { console.log(`Cache HIT: ${url}`); return cached; } const html = await scraper(url); memoryCache.set(key, html, { ttl: ttlMs }); console.log(`Cache MISS: ${url} (${Buffer.byteLength(html)} bytes)`); return html; }
Step 4: Concurrent Scraping with Backpressure
import PQueue from 'p-queue'; // Tune concurrency based on your plan and target site const scrapeQueue = new PQueue({ concurrency: 10, // Concurrent proxy connections interval: 1000, // Per second window intervalCap: 15, // Max new requests per second }); async function scrapeMany(urls: string[]): Promise<Map<string, string>> { const results = new Map<string, string>(); await Promise.allSettled( urls.map(url => scrapeQueue.add(async () => { const html = await cachedScrape(url, (u) => client.get(u).then(r => r.data)); results.set(url, html); }) ) ); console.log(`Scraped ${results.size}/${urls.length} successfully`); return results; }
Step 5: Use Async API for Bulk Jobs
For 100+ URLs, use the Web Scraper API instead of individual proxy requests:
// Bulk collection — one API call, Bright Data handles parallelism async function bulkScrape(urls: string[]) { const response = await fetch( `https://api.brightdata.com/datasets/v3/trigger?dataset_id=${DATASET_ID}&format=json`, { method: 'POST', headers: { 'Authorization': `Bearer ${process.env.BRIGHTDATA_API_TOKEN}`, 'Content-Type': 'application/json', }, body: JSON.stringify(urls.map(url => ({ url }))), } ); return response.json(); // Returns snapshot_id for status polling } // 1000 URLs via one trigger vs 1000 individual proxy requests
Step 6: Performance Monitoring
class ScrapeMetrics { private timings: number[] = []; private errors = 0; private cacheHits = 0; record(durationMs: number) { this.timings.push(durationMs); } recordError() { this.errors++; } recordCacheHit() { this.cacheHits++; } report() { const sorted = [...this.timings].sort((a, b) => a - b); return { count: sorted.length, errors: this.errors, cacheHits: this.cacheHits, p50: sorted[Math.floor(sorted.length * 0.5)] || 0, p95: sorted[Math.floor(sorted.length * 0.95)] || 0, p99: sorted[Math.floor(sorted.length * 0.99)] || 0, }; } }
Output
- Right product selection per use case
- Connection pooling reducing TCP overhead
- Response cache avoiding duplicate scrapes
- Concurrent scraping with backpressure control
- Bulk API for large-scale jobs
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Slow scrapes | CAPTCHA solving overhead | Expected for Web Unlocker; use cache |
| Connection exhausted | Too many concurrent | Reduce p-queue concurrency |
| Memory pressure | Large cached pages | Set maxSize on LRU cache |
| Timeout storms | All requests hitting slow site | Add circuit breaker |
Resources
Next Steps
For cost optimization, see
brightdata-cost-tuning.