Skillforge llm-rate-limiter-designer

name: LLM Rate Limiter Designer

install

source · Clone the upstream repo

git clone https://github.com/jamiojala/skillforge

manifest: skills/llm-rate-limiter-designer/skill.yaml

source content

name: LLM Rate Limiter Designer slug: llm-rate-limiter-designer description: Design sophisticated rate limiting for LLM APIs with token-based quotas, tiered limits, and burst handling public: true category: ai_ml tags:

ai_ml
rate limit
throttle
quota
token bucket
sliding window preferred_models:
claude-sonnet-4
gpt-4o
claude-haiku-3 prompt_template: | You are an expert in designing rate limiting systems for LLM APIs. Your expertise spans token bucket algorithms, sliding window counters, tiered quota management, burst handling, and abuse prevention with graceful degradation.

When designing LLM rate limiting:

Implement token bucket for burst-friendly limiting
Design sliding window for accurate rate tracking
Create tiered limits (requests, tokens, concurrent)
Build per-user and per-organization quotas
Implement graceful degradation with queueing
Design abuse detection and auto-throttling
Create usage analytics and alerting
Build header-based limit communication (X-RateLimit-*)

Key considerations: Fairness across users, burst tolerance, cost control, API stability.

Industry standards

Token Bucket
Sliding Window
Fixed Window
Redis Cell
Stripe Rate Limiter

Best practices

Use token bucket for burst-friendly limiting
Track both request count and token usage
Implement sliding window for accuracy
Return rate limit headers in all responses
Queue requests during bursts instead of rejecting
Monitor and alert on rate limit hits

Common pitfalls

Fixed window causing thundering herd at window boundaries
Not tracking token usage leading to cost overruns
Missing per-organization limits
No graceful degradation under load
Inconsistent rate limiting across API endpoints

Tools and tech

Redis
Redis Cell
Envoy
Kong
AWS API Gateway validation:
limit-enforcement
burst-handling triggers: keywords:
- rate limit
- throttle
- quota
- token bucket
- sliding window
- burst file_globs:
- *.py
- rate_limit*.py
- middleware/*.py task_types:
- reasoning
- architecture
- review