Agent-skills api-rate-limiting-helper
install
source · Clone the upstream repo
git clone https://github.com/LambdaTest/agent-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/LambdaTest/agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/api/API-Ratelimit-Helper" ~/.claude/skills/lambdatest-agent-skills-api-rate-limiting-helper && rm -rf "$T"
manifest:
api/API-Ratelimit-Helper/SKILL.mdsource content
API Rate Limiting Skill
Design complete rate limiting, quota, and retry systems for any API.
Rate Limiting Algorithms
| Algorithm | Best For | Trade-offs |
|---|---|---|
| Token bucket | Bursty traffic with sustained avg | Allows bursts; slightly complex |
| Leaky bucket | Strict rate enforcement | Smooths bursts; can feel slow |
| Fixed window | Simple counting | Boundary spike problem |
| Sliding window log | Precise limiting | Memory-intensive |
| Sliding window counter | Balance of precision/memory | Best for most APIs |
Recommendation: Use sliding window counter for API endpoints, token bucket for streaming/upload endpoints.
Response Headers (RFC standard)
X-RateLimit-Limit: 100 X-RateLimit-Remaining: 42 X-RateLimit-Reset: 1700000060 X-RateLimit-Policy: 100;w=60;comment="per minute" Retry-After: 18
429 Response Body
{ "error": "rate_limit_exceeded", "message": "Too many requests. You have exceeded 100 requests per minute.", "retry_after_seconds": 18, "limit": 100, "window": "60s", "reset_at": "2024-01-01T00:01:00Z" }
Tiered Quota Design
| Tier | Requests/min | Requests/day | Burst | Concurrent |
|---|---|---|---|---|
| Free | 10 | 1,000 | 20 | 2 |
| Starter | 100 | 50,000 | 200 | 10 |
| Pro | 1,000 | 500,000 | 2,000 | 50 |
| Enterprise | Custom | Unlimited | Custom | Custom |
Quota Endpoints
GET /api/v1/account/quota — current usage vs limits GET /api/v1/account/quota/history — usage over time
Response:
{ "plan": "pro", "period": "2024-01", "limits": { "requests_per_minute": 1000, "requests_per_day": 500000 }, "usage": { "requests_today": 12345, "requests_this_minute": 234 }, "resets_at": "2024-02-01T00:00:00Z" }
Retry Logic (client-side)
Exponential backoff with jitter
import random, time def retry_with_backoff(fn, max_retries=5, base_delay=1.0, max_delay=60.0): for attempt in range(max_retries): try: return fn() except RateLimitError as e: if attempt == max_retries - 1: raise # Use Retry-After header if present, else exponential backoff delay = min( e.retry_after or (base_delay * (2 ** attempt)), max_delay ) # Add jitter to prevent thundering herd delay += random.uniform(0, delay * 0.1) time.sleep(delay)
Retryable vs Non-retryable status codes
| Status | Retry? | Strategy |
|---|---|---|
| 429 | Yes | Respect header |
| 500 | Yes | Exponential backoff |
| 502/503 | Yes | Exponential backoff |
| 504 | Yes | Exponential backoff |
| 400 | No | Fix request |
| 401 | No | Refresh token, then retry once |
| 403 | No | Fix permissions |
| 404 | No | Fix URL |
| 422 | No | Fix payload |
Circuit Breaker Pattern
States: CLOSED → OPEN → HALF-OPEN → CLOSED CLOSED: normal operation - Track failure rate in rolling window - If failure rate > threshold (e.g. 50% in 10s): → OPEN OPEN: reject all requests immediately (fail-fast) - Return 503 without calling downstream - After cooldown period (e.g. 30s): → HALF-OPEN HALF-OPEN: allow limited traffic through - If first N requests succeed: → CLOSED - If any fail: → OPEN again
Idempotency Keys
For state-changing requests that may be retried:
POST /api/v1/payments Idempotency-Key: uuid-v4-client-generated Response includes: Idempotency-Key: uuid-v4-client-generated X-Idempotent-Replayed: true (if this is a duplicate)
Store: idempotency key → response, expire after 24h. Return cached response for duplicate keys.
After Completing the API Ratelimit Output
Once the API ratelimit output is delivered, ask the user:
"Would you like me to generate API documentation for this design? (yes/no)"
If the user says yes:
- Check if the API Documentation skill is available in the installed skills list
- If the skill is available:
- Read and follow the instructions in the API Documentation skill
- Use the API rate limiting output above as the input
- If the skill is NOT available:
- Inform the user: "It looks like the API Documentation skill isn't installed. You can install it and re-run.
If the user says no:
- End the task here