Claude-code-plugins openrouter-rate-limits
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/openrouter-pack/skills/openrouter-rate-limits" ~/.claude/skills/jeremylongshore-claude-code-plugins-openrouter-rate-limits && rm -rf "$T"
manifest:
plugins/saas-packs/openrouter-pack/skills/openrouter-rate-limits/SKILL.mdsource content
OpenRouter Rate Limits
Overview
OpenRouter rate limits are per-key, not per-account. Free tier keys get lower limits; paid keys get higher limits that scale with credit balance. The OpenAI SDK has built-in retry with exponential backoff for 429 responses. Check your current limits via
GET /api/v1/auth/key. Rate limit headers are returned on every response.
Check Your Rate Limits
# Query current rate limit configuration for your key curl -s https://openrouter.ai/api/v1/auth/key \ -H "Authorization: Bearer $OPENROUTER_API_KEY" | jq '{ label: .data.label, rate_limit: .data.rate_limit, is_free_tier: .data.is_free_tier, credits_used: .data.usage, credit_limit: .data.limit }' # Example output: # { # "label": "my-app-prod", # "rate_limit": {"requests": 200, "interval": "10s"}, # "is_free_tier": false, # "credits_used": 12.34, # "credit_limit": 100 # }
Rate Limit Tiers
| Tier | Requests | Interval | Who |
|---|---|---|---|
| Free (no credits) | 20 | 10s | New accounts |
| Free (with credits) | 200 | 10s | Accounts with any credits |
| Paid | Higher | Varies | Based on credit balance |
Free models have separate limits: 50 req/day (free users), 1000 req/day (with $10+ credits).
Read Rate Limit Headers
import os from openai import OpenAI import requests as http_requests # The OpenAI SDK abstracts headers, so use requests for direct access def check_rate_headers(): """Make a request and inspect rate limit headers.""" resp = http_requests.post( "https://openrouter.ai/api/v1/chat/completions", headers={ "Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}", "Content-Type": "application/json", "HTTP-Referer": "https://my-app.com", }, json={ "model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "hi"}], "max_tokens": 1, }, ) return { "status": resp.status_code, "x-ratelimit-limit": resp.headers.get("x-ratelimit-limit"), "x-ratelimit-remaining": resp.headers.get("x-ratelimit-remaining"), "x-ratelimit-reset": resp.headers.get("x-ratelimit-reset"), "retry-after": resp.headers.get("retry-after"), }
Retry Strategy with OpenAI SDK
from openai import OpenAI # The SDK handles 429 retries automatically with exponential backoff client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ["OPENROUTER_API_KEY"], max_retries=5, # Default is 2; increase for high-throughput timeout=60.0, # Per-request timeout default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"}, ) # The SDK will: # 1. Catch 429 responses # 2. Read Retry-After header # 3. Wait with exponential backoff (+ jitter) # 4. Retry up to max_retries times response = client.chat.completions.create( model="anthropic/claude-3.5-sonnet", messages=[{"role": "user", "content": "Hello"}], max_tokens=200, )
Custom Rate Limiter (Client-Side)
import time, threading from collections import deque class TokenBucket: """Client-side rate limiter to prevent hitting server limits.""" def __init__(self, rate: int = 200, interval: float = 10.0): self.rate = rate # Max requests per interval self.interval = interval self._timestamps = deque() self._lock = threading.Lock() def acquire(self, timeout: float = 30.0) -> bool: """Block until a request slot is available.""" deadline = time.monotonic() + timeout while time.monotonic() < deadline: with self._lock: now = time.monotonic() # Remove timestamps outside the window while self._timestamps and now - self._timestamps[0] > self.interval: self._timestamps.popleft() if len(self._timestamps) < self.rate: self._timestamps.append(now) return True time.sleep(0.1) # Wait and retry return False # Timed out limiter = TokenBucket(rate=150, interval=10.0) # Stay under 200 limit def rate_limited_completion(messages, **kwargs): """Completion with client-side rate limiting.""" if not limiter.acquire(timeout=30): raise TimeoutError("Rate limiter timeout") return client.chat.completions.create(messages=messages, **kwargs)
Batch Processing with Rate Awareness
import asyncio from openai import AsyncOpenAI async def batch_with_rate_limit(prompts: list[str], model="openai/gpt-4o-mini", max_concurrent=10, delay_between=0.05): """Process a batch of prompts with rate-aware concurrency.""" semaphore = asyncio.Semaphore(max_concurrent) aclient = AsyncOpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ["OPENROUTER_API_KEY"], max_retries=5, default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"}, ) async def process(prompt, idx): await asyncio.sleep(idx * delay_between) # Stagger requests async with semaphore: response = await aclient.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], max_tokens=200, ) return response.choices[0].message.content return await asyncio.gather(*[process(p, i) for i, p in enumerate(prompts)])
Error Handling
| Error | Cause | Fix |
|---|---|---|
| 429 Too Many Requests | Exceeded requests per interval | SDK auto-retries; increase |
| Retry storm | Multiple clients retrying simultaneously | Add random jitter (0-1s) to retry delay |
| Silent throttling | Responses slow down before 429 | Monitor latency; proactively reduce rate |
| Free tier limit hit | 50 req/day on free models | Add credits ($10+) for 1000 req/day limit |
Enterprise Considerations
- Rate limits are per-key: use multiple keys to multiply effective throughput
- The OpenAI SDK handles 429 retries automatically -- configure
(default 2)max_retries - Implement client-side rate limiting to stay under limits proactively (cheaper than retries)
- Free models have daily limits separate from the per-key rate limit
- Monitor
headers to detect approaching limits before hitting 429x-ratelimit-remaining - For batch workloads, use staggered concurrent requests rather than burst patterns