Skillshub anth-rate-limits
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/anth-rate-limits" ~/.claude/skills/comeonoliver-skillshub-anth-rate-limits && rm -rf "$T"
manifest:
skills/jeremylongshore/claude-code-plugins-plus-skills/anth-rate-limits/SKILL.mdsource content
Anthropic Rate Limits
Overview
The Claude API uses token-bucket rate limiting measured in three dimensions: requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM). Limits increase automatically as you move through usage tiers.
Rate Limit Dimensions
| Dimension | Header | Description |
|---|---|---|
| RPM | | Requests per minute |
| ITPM | | Input tokens per minute |
| OTPM | | Output tokens per minute |
Limits are per-organization and per-model-class. Cached input tokens do NOT count toward ITPM limits.
Usage Tiers (Auto-Upgrade)
| Tier | Monthly Spend | Key Benefit |
|---|---|---|
| Tier 1 (Free) | $0 | Evaluation access |
| Tier 2 | $40+ | Higher RPM |
| Tier 3 | $200+ | Production-grade limits |
| Tier 4 | $2,000+ | High-throughput access |
| Scale | Custom | Custom limits via sales |
Check your current tier and limits at console.anthropic.com.
SDK Built-In Retry
import anthropic # The SDK retries 429 and 5xx errors automatically (2 retries by default) client = anthropic.Anthropic(max_retries=5) # Increase for high-traffic apps # Disable auto-retry for manual control client = anthropic.Anthropic(max_retries=0)
const client = new Anthropic({ maxRetries: 5 });
Custom Rate Limiter with Header Awareness
import time import anthropic class RateLimitedClient: def __init__(self): self.client = anthropic.Anthropic(max_retries=0) # We handle retries self.remaining_requests = 100 self.remaining_tokens = 100000 self.reset_at = 0.0 def create_message(self, **kwargs): # Pre-check: wait if near limit if self.remaining_requests < 3 and time.time() < self.reset_at: wait = self.reset_at - time.time() print(f"Pre-throttle: waiting {wait:.1f}s") time.sleep(wait) for attempt in range(5): try: response = self.client.messages.create(**kwargs) # Update from response headers (via _response) headers = response._response.headers self.remaining_requests = int(headers.get("anthropic-ratelimit-requests-remaining", 100)) self.remaining_tokens = int(headers.get("anthropic-ratelimit-tokens-remaining", 100000)) reset = headers.get("anthropic-ratelimit-requests-reset") if reset: from datetime import datetime self.reset_at = datetime.fromisoformat(reset.replace("Z", "+00:00")).timestamp() return response except anthropic.RateLimitError as e: retry_after = float(e.response.headers.get("retry-after", 2 ** attempt)) print(f"429 — retry in {retry_after}s (attempt {attempt + 1})") time.sleep(retry_after) raise Exception("Exhausted rate limit retries")
Queue-Based Throughput Control
import PQueue from 'p-queue'; import Anthropic from '@anthropic-ai/sdk'; const client = new Anthropic(); // Enforce 50 RPM with concurrency limit const queue = new PQueue({ concurrency: 10, interval: 60_000, intervalCap: 50, }); async function rateLimitedCall(prompt: string) { return queue.add(() => client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 1024, messages: [{ role: 'user', content: prompt }], }) ); } // Process 200 prompts without hitting limits const results = await Promise.all( prompts.map(p => rateLimitedCall(p)) );
Cost-Saving: Use Batches for Bulk Work
# Message Batches API: 50% cheaper, no rate limit pressure on real-time quota batch = client.messages.batches.create( requests=[ {"custom_id": f"req-{i}", "params": { "model": "claude-sonnet-4-20250514", "max_tokens": 1024, "messages": [{"role": "user", "content": prompt}] }} for i, prompt in enumerate(prompts) ] )
Error Handling
| Header | Description | Action |
|---|---|---|
| Seconds until next request allowed | Sleep this duration exactly |
| Requests left in window | Throttle if < 5 |
| Tokens left in window | Reduce if low |
| ISO timestamp of window reset | Schedule retry after this time |
Resources
Next Steps
For security configuration, see
anth-security-basics.