Claude-code-plugins-plus-skills perplexity-rate-limits
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/perplexity-pack/skills/perplexity-rate-limits" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-perplexity-rate-limits && rm -rf "$T"
manifest:
plugins/saas-packs/perplexity-pack/skills/perplexity-rate-limits/SKILL.mdsource content
Perplexity Rate Limits
Overview
Handle Perplexity Sonar API rate limits. Perplexity uses a leaky bucket algorithm: burst capacity is available, with tokens refilling continuously at your assigned rate. Rate limits are based on requests per minute (RPM).
Rate Limit Tiers
| Tier | RPM | Notes |
|---|---|---|
| Free / Starter | 50 | Default for new API keys |
| Search API | ~3 req/sec | Per-endpoint limit |
| Higher tiers | Contact sales | Custom limits available |
Rate limits apply per API key, not per model. Using
sonar-pro counts against the same RPM as sonar.
Prerequisites
setPERPLEXITY_API_KEY- Understanding of HTTP 429 responses
Instructions
Step 1: Exponential Backoff with Jitter
async function withExponentialBackoff<T>( operation: () => Promise<T>, config = { maxRetries: 5, baseDelayMs: 1000, maxDelayMs: 30000, jitterMs: 500 } ): Promise<T> { for (let attempt = 0; attempt <= config.maxRetries; attempt++) { try { return await operation(); } catch (error: any) { if (attempt === config.maxRetries) throw error; const status = error.status || error.response?.status; // Only retry on 429 (rate limit) and 5xx (server errors) if (status && status !== 429 && status < 500) throw error; const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt); const jitter = Math.random() * config.jitterMs; const delay = Math.min(exponentialDelay + jitter, config.maxDelayMs); console.warn(`[Perplexity] ${status || "error"} — retry ${attempt + 1}/${config.maxRetries} in ${delay.toFixed(0)}ms`); await new Promise((r) => setTimeout(r, delay)); } } throw new Error("Unreachable"); } // Usage const result = await withExponentialBackoff(() => perplexity.chat.completions.create({ model: "sonar", messages: [{ role: "user", content: "test query" }], }) );
Step 2: Queue-Based Rate Limiting
import PQueue from "p-queue"; // 50 RPM = ~0.83 req/sec. Set intervalCap=1, interval=1200ms for safety. const perplexityQueue = new PQueue({ concurrency: 3, interval: 1200, intervalCap: 1, }); async function queuedSearch(query: string, model = "sonar") { return perplexityQueue.add(() => withExponentialBackoff(() => perplexity.chat.completions.create({ model, messages: [{ role: "user", content: query }], }) ) ); } // Batch queries are automatically rate-limited const queries = ["query 1", "query 2", "query 3", "query 4", "query 5"]; const results = await Promise.all(queries.map((q) => queuedSearch(q)));
Step 3: Token Bucket Implementation (No Dependencies)
class TokenBucket { private tokens: number; private lastRefill: number; constructor( private maxTokens: number = 50, private refillRate: number = 50 / 60 // 50 per minute = 0.83/sec ) { this.tokens = maxTokens; this.lastRefill = Date.now(); } async acquire(): Promise<void> { this.refill(); if (this.tokens >= 1) { this.tokens -= 1; return; } // Wait until a token is available const waitMs = (1 / this.refillRate) * 1000; await new Promise((r) => setTimeout(r, waitMs)); this.refill(); this.tokens -= 1; } private refill() { const now = Date.now(); const elapsed = (now - this.lastRefill) / 1000; this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate); this.lastRefill = now; } get available(): number { this.refill(); return Math.floor(this.tokens); } } const bucket = new TokenBucket(50, 50 / 60); async function rateLimitedSearch(query: string) { await bucket.acquire(); return perplexity.chat.completions.create({ model: "sonar", messages: [{ role: "user", content: query }], }); }
Step 4: Python Rate Limiting
import time, asyncio from collections import deque class RateLimiter: def __init__(self, rpm: int = 50): self.rpm = rpm self.window = deque() def wait_if_needed(self): now = time.time() # Remove timestamps older than 60 seconds while self.window and self.window[0] < now - 60: self.window.popleft() if len(self.window) >= self.rpm: sleep_time = 60 - (now - self.window[0]) time.sleep(max(0, sleep_time)) self.window.append(time.time()) limiter = RateLimiter(rpm=50) def rate_limited_search(client, query: str, model: str = "sonar"): limiter.wait_if_needed() return client.chat.completions.create( model=model, messages=[{"role": "user", "content": query}], )
Error Handling
| Signal | Meaning | Action |
|---|---|---|
| HTTP 429 | RPM exceeded | Backoff and retry |
header | Seconds until reset | Honor this value exactly |
| Repeated 429s | Sustained overload | Reduce concurrency or add queue |
| 429 on burst | Bucket empty | Space requests 1.2s apart |
Output
- Automatic retry with exponential backoff and jitter
- Queue-based rate limiting for batch operations
- Token bucket for fine-grained control
- Python rate limiter for synchronous code
Resources
Next Steps
For security configuration, see
perplexity-security-basics.