Claude-code-plugins-plus-skills perplexity-load-scale
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/perplexity-pack/skills/perplexity-load-scale" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-perplexity-load-scale && rm -rf "$T"
manifest:
plugins/saas-packs/perplexity-pack/skills/perplexity-load-scale/SKILL.mdsource content
Perplexity Load & Scale
Overview
Load testing and capacity planning for Perplexity Sonar API. Key constraint: Perplexity rate limits at 50 RPM (default tier), and every request performs a live web search with variable latency. Load testing must respect these limits to avoid burning through credits.
Capacity Constraints
| Constraint | Default Limit | Impact |
|---|---|---|
| RPM (requests per minute) | 50 | Hard ceiling on throughput |
| Context window | 127K tokens | Limits conversation history |
latency | 1-3s | Throughput: ~20-50 concurrent |
latency | 3-8s | Throughput: ~6-16 concurrent |
| 20 domains max | Per-request limit |
Prerequisites
- k6 load testing tool installed
- Separate Perplexity API key for load testing
- Budget approval (load tests cost money)
Instructions
Step 1: k6 Load Test Script
// perplexity-load-test.js import http from "k6/http"; import { check, sleep } from "k6"; import { Rate, Trend } from "k6/metrics"; const errorRate = new Rate("perplexity_errors"); const citationCount = new Trend("perplexity_citations"); export const options = { stages: [ { duration: "1m", target: 5 }, // Ramp to 5 VUs { duration: "3m", target: 5 }, // Steady at 5 VUs { duration: "1m", target: 15 }, // Ramp to 15 VUs { duration: "3m", target: 15 }, // Steady at 15 VUs { duration: "1m", target: 0 }, // Ramp down ], thresholds: { http_req_duration: ["p(95)<10000"], // 10s P95 for sonar perplexity_errors: ["rate<0.05"], // <5% error rate }, }; const queries = [ "What is TypeScript?", "Latest Node.js features", "Python vs JavaScript for web development", "Current state of AI in healthcare", "Best practices for REST API design", ]; export default function () { const query = queries[Math.floor(Math.random() * queries.length)]; const response = http.post( "https://api.perplexity.ai/chat/completions", JSON.stringify({ model: "sonar", messages: [{ role: "user", content: query }], max_tokens: 200, }), { headers: { "Content-Type": "application/json", Authorization: `Bearer ${__ENV.PERPLEXITY_API_KEY}`, }, timeout: "15s", } ); const success = check(response, { "status is 200": (r) => r.status === 200, "has content": (r) => { try { return JSON.parse(r.body).choices[0].message.content.length > 0; } catch { return false; } }, }); errorRate.add(!success); if (response.status === 200) { try { const body = JSON.parse(response.body); citationCount.add(body.citations?.length || 0); } catch {} } // Critical: stay within 50 RPM sleep(1.5 + Math.random()); }
Step 2: Run Load Test
set -euo pipefail # Minimal test (5 queries, verify setup) k6 run --vus 1 --duration 30s \ --env PERPLEXITY_API_KEY=$PERPLEXITY_API_KEY \ perplexity-load-test.js # Full test (respecting 50 RPM) k6 run --env PERPLEXITY_API_KEY=$PERPLEXITY_API_KEY \ perplexity-load-test.js
Step 3: Capacity Estimation
interface CapacityEstimate { maxRPM: number; avgLatencyMs: number; maxConcurrent: number; dailyCapacity: number; estimatedDailyCost: number; } function estimateCapacity( rpm: number, avgLatency: number, model: "sonar" | "sonar-pro" ): CapacityEstimate { const costPerRequest = model === "sonar-pro" ? 0.02 : 0.005; return { maxRPM: rpm, avgLatencyMs: avgLatency, maxConcurrent: Math.floor((rpm / 60) * (avgLatency / 1000)), dailyCapacity: rpm * 60 * 24, estimatedDailyCost: rpm * 60 * 24 * costPerRequest, }; } // Example: 50 RPM, 2s avg latency, sonar const capacity = estimateCapacity(50, 2000, "sonar"); // { maxRPM: 50, maxConcurrent: 1, dailyCapacity: 72000, estimatedDailyCost: $360 }
Step 4: Request Queue for Scale
import PQueue from "p-queue"; // Queue that respects 50 RPM const searchQueue = new PQueue({ concurrency: 5, interval: 60_000, intervalCap: 45, // 45 RPM (safety margin below 50) }); async function scalableSearch(query: string) { return searchQueue.add(() => perplexity.chat.completions.create({ model: "sonar", messages: [{ role: "user", content: query }], max_tokens: 500, }) ); } // Queue status for monitoring function queueStatus() { return { pending: searchQueue.pending, size: searchQueue.size, isPaused: searchQueue.isPaused, }; }
Step 5: Scaling Strategy
| Scale | Queries/Day | Architecture | Cost/Day |
|---|---|---|---|
| Small | <1,000 | Direct API calls | <$5 |
| Medium | 1K-10K | Queue + cache (30%+ hit rate) | $5-$50 |
| Large | 10K-100K | Multi-key + cache + queue | $50-$500 |
| Enterprise | 100K+ | Contact Perplexity for custom limits | Custom |
For Medium+ scale, caching is mandatory. A 50% cache hit rate halves your API costs and doubles effective throughput.
Benchmark Results Template
## Perplexity Load Test Report **Date:** YYYY-MM-DD | **Model:** sonar | **Duration:** 10 min | Metric | Value | |--------|-------| | Total Requests | | | Success Rate | | | P50 Latency | | | P95 Latency | | | P99 Latency | | | Avg Citations/Response | | | Max Sustained RPM | | | Estimated Cost | |
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| 429 during load test | Exceeding 50 RPM | Reduce VUs, increase sleep |
| Inconsistent latency | Web search variability | Normal; use P95 not avg |
| k6 timeout | sonar-pro queries >15s | Increase timeout to 30s |
| High cost from test | Too many queries | Use for load tests |
Output
- k6 load test script calibrated for Perplexity rate limits
- Capacity estimation calculator
- Request queue for sustained throughput
- Scaling strategy by volume tier
Resources
Next Steps
For reliability patterns, see
perplexity-reliability-patterns.