Skills rate-limiter
git clone https://github.com/TerminalSkills/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/TerminalSkills/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/rate-limiter" ~/.claude/skills/terminalskills-skills-rate-limiter && rm -rf "$T"
skills/rate-limiter/SKILL.md- eval/exec/Function constructor
- references .env files
Rate Limiter
Overview
This skill enables AI agents to design, implement, and configure production-grade rate limiting for APIs. It covers algorithm selection, middleware generation, Redis-backed distributed counting, abuse pattern detection, and proper HTTP response headers.
Instructions
1. Assess the API Surface
Before writing any code, analyze the target application:
- List all public endpoints and their HTTP methods
- Classify endpoints by sensitivity: authentication (highest), write operations (high), read operations (medium), static/health (low)
- Identify existing middleware stack and framework (Express, Fastify, Django, Gin, etc.)
- Check if Redis or another shared store is available for distributed rate limiting
2. Choose the Right Algorithm
Select based on the use case:
| Algorithm | Best For | Trade-off |
|---|---|---|
| Fixed Window | Simple per-minute caps | Burst at window edges |
| Sliding Window Log | Precise per-user limits | Higher memory per key |
| Sliding Window Counter | Balance of accuracy and memory | Slight approximation |
| Token Bucket | APIs with burst allowance | More complex to tune |
| Leaky Bucket | Smooth output rate | Delays rather than rejects |
Default recommendation: Sliding Window Counter — it handles 95% of use cases with good accuracy and reasonable memory usage.
3. Implement Layered Limits
Always implement at least two layers:
Layer 1 — Global IP limit: Catches volumetric abuse before authentication. Typical: 100-300 req/min per IP.
Layer 2 — Endpoint-specific limits: Different limits per endpoint category. Auth endpoints get the strictest limits (3-10 req/min).
Layer 3 — Authenticated user quotas (if applicable): Daily or hourly caps per API key or user ID. Return quota status in response headers.
4. Response Headers
Always include these headers on EVERY response (not just 429s):
X-RateLimit-Limit: 60 X-RateLimit-Remaining: 45 X-RateLimit-Reset: 1708012800 Retry-After: 30 (only on 429 responses)
5. Abuse Detection Patterns
Beyond simple counting, detect:
- Credential stuffing: Many unique usernames from one IP on auth endpoints
- Scraping: Sequential access patterns on listing endpoints
- Header anomalies: Missing User-Agent, rotating headers on rapid requests
- Distributed attacks: Same user-agent/fingerprint across many IPs
6. Graceful Degradation
Always implement a fallback when Redis is unavailable:
- Fall back to in-memory rate limiting per process
- Log the Redis failure prominently
- Do NOT fail open (allowing unlimited requests) unless explicitly configured
7. Avoid Common Pitfalls
- Trust proxy configuration: Always verify how
is parsed. Behind a load balancer, the client IP may not beX-Forwarded-Forreq.ip - Shared NAT: Don't set per-IP limits too low — corporate networks may have thousands of users behind one IP. Minimum 100 req/min for non-auth endpoints
- Key design: Use composite keys (
,ip:endpoint
) not just IP aloneuserId:endpoint - Time sync: Use Redis server time (
command) not local clock for distributed setupsTIME
Examples
Example 1: Express middleware with Redis sliding window
Prompt: "Add rate limiting to my Express API. Auth endpoints should allow 5 requests per minute per IP. All other endpoints 60 per minute per authenticated user."
Output: The agent creates
src/middleware/rateLimiter.ts:
import { Redis } from 'ioredis'; import { Request, Response, NextFunction } from 'express'; const redis = new Redis(process.env.REDIS_URL); interface RateLimitConfig { windowMs: number; max: number; keyGenerator: (req: Request) => string; } const configs: Record<string, RateLimitConfig> = { auth: { windowMs: 60_000, max: 5, keyGenerator: (req) => `rl:auth:${req.ip}`, }, default: { windowMs: 60_000, max: 60, keyGenerator: (req) => `rl:api:${(req as any).userId ?? req.ip}`, }, }; export function rateLimit(category: keyof typeof configs) { const config = configs[category]; return async (req: Request, res: Response, next: NextFunction) => { const key = config.keyGenerator(req); const now = Date.now(); const windowStart = now - config.windowMs; const multi = redis.multi(); multi.zremrangebyscore(key, 0, windowStart); multi.zadd(key, now.toString(), `${now}:${Math.random()}`); multi.zcard(key); multi.expire(key, Math.ceil(config.windowMs / 1000)); const results = await multi.exec(); const count = results![2][1] as number; res.set('X-RateLimit-Limit', config.max.toString()); res.set('X-RateLimit-Remaining', Math.max(0, config.max - count).toString()); res.set('X-RateLimit-Reset', Math.ceil((now + config.windowMs) / 1000).toString()); if (count > config.max) { res.set('Retry-After', Math.ceil(config.windowMs / 1000).toString()); return res.status(429).json({ error: 'Too many requests' }); } next(); }; }
Example 2: Abuse detection for credential stuffing
Prompt: "Detect credential stuffing on my login endpoint. Block IPs that try more than 20 unique usernames in 5 minutes."
Output: The agent adds a detection middleware that tracks unique username attempts per IP using a Redis HyperLogLog, blocking IPs that exceed the threshold and logging the event with full request metadata for incident response.
Guidelines
- Always test rate limiting with concurrent requests — race conditions in counter logic are common
- Include integration tests that verify 429 responses and header values
- Document the rate limits in your API documentation or OpenAPI spec
- Consider offering rate limit increase for paid tiers via configuration, not code changes
- Log all rate limit events in structured format for security monitoring