Claude-code-plugins-plus-skills exa-load-scale
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/exa-pack/skills/exa-load-scale" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-exa-load-scale && rm -rf "$T"
manifest:
plugins/saas-packs/exa-pack/skills/exa-load-scale/SKILL.mdsource content
Exa Load & Scale
Overview
Load testing and capacity planning for Exa integrations. Key constraint: Exa's default rate limit is 10 QPS. Scaling strategies focus on caching, request queuing, parallel processing within rate limits, and search type selection for latency budgets.
Prerequisites
- k6 load testing tool installed
- Test environment Exa API key (separate from production)
- Redis for result caching
Capacity Reference
| Search Type | Typical Latency | Max Throughput (10 QPS) |
|---|---|---|
| < 150ms | 10 req/s (600/min) |
| < 425ms | 10 req/s (600/min) |
| 300-1500ms | 10 req/s (600/min) |
| 500-2000ms | 10 req/s (600/min) |
| 2-5s | 10 req/s (600/min) |
With caching (50% hit rate): Effective throughput doubles to 20 req/s equivalent.
Instructions
Step 1: k6 Load Test Against Your Wrapper
// exa-load-test.js import http from "k6/http"; import { check, sleep } from "k6"; export const options = { stages: [ { duration: "1m", target: 5 }, // Ramp up to 5 VUs { duration: "3m", target: 5 }, // Steady state { duration: "1m", target: 10 }, // Push toward rate limit { duration: "2m", target: 10 }, // Stress test { duration: "1m", target: 0 }, // Ramp down ], thresholds: { http_req_duration: ["p(95)<3000"], // 3s P95 for neural search http_req_failed: ["rate<0.05"], // < 5% error rate }, }; const queries = [ "best practices for building RAG systems", "transformer architecture improvements 2025", "TypeScript 5.5 new features", "vector database comparison guide", "AI safety alignment research", ]; export default function () { const query = queries[Math.floor(Math.random() * queries.length)]; const response = http.post( `${__ENV.APP_URL}/api/search`, JSON.stringify({ query, numResults: 3 }), { headers: { "Content-Type": "application/json" }, timeout: "10s", } ); check(response, { "status 200": (r) => r.status === 200, "has results": (r) => JSON.parse(r.body).results?.length > 0, "latency < 3s": (r) => r.timings.duration < 3000, }); sleep(0.5 + Math.random()); // 0.5-1.5s between requests }
# Run load test k6 run --env APP_URL=http://localhost:3000 exa-load-test.js
Step 2: Throughput Maximizer with Request Queue
import Exa from "exa-js"; import PQueue from "p-queue"; const exa = new Exa(process.env.EXA_API_KEY); // Stay under 10 QPS rate limit const searchQueue = new PQueue({ concurrency: 8, // max concurrent requests interval: 1000, // per second intervalCap: 10, // Exa's QPS limit }); async function highThroughputSearch(queries: string[]) { const results = []; for (const query of queries) { const promise = searchQueue.add(async () => { const result = await exa.searchAndContents(query, { type: "auto", numResults: 3, text: { maxCharacters: 500 }, }); return { query, results: result.results }; }); results.push(promise); } return Promise.all(results); } // Process 100 queries respecting rate limits const queries = Array.from({ length: 100 }, (_, i) => `research topic ${i}`); console.time("batch"); const results = await highThroughputSearch(queries); console.timeEnd("batch"); // Expected: ~10-12 seconds (100 queries / 10 QPS)
Step 3: Caching for Scale
import { LRUCache } from "lru-cache"; // Cache eliminates repeat queries entirely const cache = new LRUCache<string, any>({ max: 10000, ttl: 3600 * 1000, // 1-hour TTL }); async function scalableSearch(query: string, opts: any) { const key = `${query.toLowerCase().trim()}:${opts.type}:${opts.numResults}`; const cached = cache.get(key); if (cached) return cached; const result = await searchQueue.add(() => exa.searchAndContents(query, opts) ); cache.set(key, result); return result; } // With 50% cache hit rate: // 100 unique queries → 50 API calls → 5 seconds instead of 10
Step 4: Capacity Planning Calculator
interface CapacityEstimate { dailySearches: number; peakQPS: number; cacheHitRate: number; effectiveQPS: number; withinLimits: boolean; recommendation: string; } function estimateCapacity( dailySearches: number, peakMultiplier = 3, expectedCacheHitRate = 0.5 ): CapacityEstimate { const avgQPS = dailySearches / (24 * 3600); const peakQPS = avgQPS * peakMultiplier; const effectiveQPS = peakQPS * (1 - expectedCacheHitRate); const withinLimits = effectiveQPS <= 10; // Default Exa limit let recommendation = "Within default limits"; if (effectiveQPS > 10 && effectiveQPS <= 50) { recommendation = "Contact hello@exa.ai for Enterprise rate limits"; } else if (effectiveQPS > 50) { recommendation = "Requires Enterprise plan + aggressive caching + request queue"; } return { dailySearches, peakQPS, cacheHitRate: expectedCacheHitRate, effectiveQPS, withinLimits, recommendation }; } // Example: 50,000 searches/day const estimate = estimateCapacity(50000); console.log(estimate); // { effectiveQPS: ~0.87, withinLimits: true, recommendation: "Within default limits" }
Benchmark Results Template
## Exa Performance Benchmark **Date:** YYYY-MM-DD | **SDK:** exa-js X.Y.Z | Metric | Value | |--------|-------| | Total Requests | N | | Success Rate | X% | | Cache Hit Rate | X% | | P50 Latency | Xms | | P95 Latency | Xms | | Peak QPS (actual API calls) | X | | 429 Rate Limit Errors | N |
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| 429 errors in load test | Exceeding 10 QPS | Reduce concurrency, add cache |
| Inconsistent latency | Different search types | Standardize on one type per test |
| Timeout errors | Deep search under load | Use or for load tests |
| Cache miss rate high | Unique queries per request | Use a fixed query pool |
Resources
Next Steps
For reliability patterns, see
exa-reliability-patterns.