git clone https://github.com/Intense-Visions/harness-engineering
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/api-rate-limit-headers" ~/.claude/skills/intense-visions-harness-engineering-api-rate-limit-headers-908e07 && rm -rf "$T"
agents/skills/claude-code/api-rate-limit-headers/SKILL.mdAPI Rate Limit Headers
RATE LIMIT HEADERS ARE THE CONSUMER'S INSTRUMENTATION — WITHOUT X-RATELIMIT-REMAINING AND X-RATELIMIT-RESET, CLIENTS CANNOT IMPLEMENT PROACTIVE THROTTLING AND ARE FORCED INTO REACTIVE RETRY LOOPS THAT AMPLIFY LOAD ON ALREADY-STRESSED INFRASTRUCTURE PRECISELY WHEN THE API NEEDS RELIEF MOST.
When to Use
- Implementing rate limit response headers for a new or existing API
- Auditing an API that returns 429 errors without actionable throttling context
- Migrating from ad-hoc
headers to the IETF RateLimit draft standardX-RateLimit-* - Writing client-side SDK code that needs to read rate limit state and implement proactive throttling
- Documenting the rate limit header contract in an API style guide
- Debugging an integration where the client is exceeding rate limits despite implementing backoff
Instructions
Key Concepts
-
,X-RateLimit-Limit
,X-RateLimit-Remaining
— the de facto standard — These three headers form the widely-adopted informal standard for rate limit signaling, used by GitHub, Twitter, Stripe, and hundreds of other APIs before any formal RFC existed.X-RateLimit-Reset
is the total quota for the current window.X-RateLimit-Limit
is the requests left in the current window.X-RateLimit-Remaining
is the Unix timestamp when the window resets andX-RateLimit-Reset
returns toX-RateLimit-Remaining
. Emit all three on every response — not only on 429 — so clients can monitor their quota consumption proactively.X-RateLimit-Limit -
IETF RateLimit Headers Draft (draft-ietf-httpapi-ratelimit-headers) — The IETF HTTPAPI working group is standardizing rate limit headers under the names
,RateLimit-Limit
, andRateLimit-Remaining
(without theRateLimit-Reset
prefix, asX-
prefixed headers are deprecated per RFC 6648). The draft also introducesX-
, which describes the quota policy in a machine-readable format. Adopt the draft standard for new implementations; emit both theRateLimit-Policy
variants and the draft standard variants during the transition period for backward compatibility.X-RateLimit-* -
semantics —Retry-After
appears on 429 responses and specifies when the client may safely retry. It accepts two formats: a delay in seconds (Retry-After
) or an HTTP-date (Retry-After: 60
). The delay format is simpler to implement and parse; the date format is more precise for long windows. A 429 response withoutRetry-After: Wed, 01 Jan 2025 00:00:00 GMT
forces clients into guesswork; a 429 with a preciseRetry-After
enables clients to sleep exactly the right amount and retry once with confidence.Retry-After -
Multiple quota windows — named resources — Some APIs apply different limits to different endpoint groups. GitHub uses
,X-RateLimit-Resource: core
, andX-RateLimit-Resource: search
to distinguish quota pools. Each resource has its ownX-RateLimit-Resource: graphql
,Limit
, andRemaining
values. When an API has multiple quota dimensions (per-user, per-app, per-endpoint-group), use named resource headers so clients know which quota they are consuming and which limit they hit.Reset -
— machine-readable quota description — The IETF draft'sRateLimit-Policy
header describes the quota policy in a structured format:RateLimit-Policy
means 100 requests per 60-second window with a burst allowance of 20. This enables SDK authors and monitoring tools to parse quota policies without scraping documentation pages. IncludeRateLimit-Policy: 100;w=60;burst=20
on all responses when adopting the IETF draft format.RateLimit-Policy -
Header emission timing — on all responses, not just 429 — Rate limit headers must be emitted on successful responses (200, 201, 204) in addition to throttled responses (429). Clients need to track their remaining quota on every response to implement proactive throttling — slowing down before they hit the limit rather than reacting after a 429. A client that only sees rate limit headers on 429 responses cannot avoid the 429 in the first place.
Worked Example
GitHub REST API rate limit headers — the most-referenced public implementation.
Successful response (quota tracking):
HTTP/1.1 200 OK Content-Type: application/json X-RateLimit-Limit: 5000 X-RateLimit-Remaining: 4987 X-RateLimit-Reset: 1704070800 X-RateLimit-Used: 13 X-RateLimit-Resource: core
The client can compute: 13 requests used, 4987 remaining, window resets at Unix timestamp
1704070800. If the client needs to make 5000 requests, it can calculate that it will exhaust its quota and needs to spread the work across multiple windows.
Rate limit exceeded — 429 with Retry-After:
HTTP/1.1 429 Too Many Requests X-RateLimit-Limit: 5000 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1704070800 X-RateLimit-Used: 5000 X-RateLimit-Resource: core Retry-After: 3587 Content-Type: application/json { "message": "API rate limit exceeded.", "documentation_url": "https://docs.github.com/rest/overview/rate-limits" }
Retry-After: 3587 tells the client to sleep for 3587 seconds (until the reset timestamp). No guessing, no exponential backoff to a random interval — a precise sleep.
IETF draft format (forward-compatible):
HTTP/1.1 200 OK RateLimit-Limit: 100 RateLimit-Remaining: 87 RateLimit-Reset: 60 RateLimit-Policy: 100;w=60;burst=20
RateLimit-Reset: 60 is a relative delay in seconds in the draft format (unlike X-RateLimit-Reset, which is an absolute Unix timestamp). Clients must parse both formats differently; document which format your API uses.
Twilio rate limit headers — per-endpoint differentiation:
HTTP/1.1 200 OK X-RateLimit-Limit: 100 X-RateLimit-Remaining: 95 X-RateLimit-Reset: 1704067260 X-Home-Region: us1
Twilio scopes limits per account + region, allowing higher limits for regional endpoints without a global quota.
Anti-Patterns
-
Emitting
headers only on 429 responses. Headers emitted only on throttled responses prevent clients from implementing proactive throttling. The client's first signal of approaching the limit is the 429 itself — at which point the request has already been rejected. Emit rate limit headers on every response.X-RateLimit-* -
as a relative delay in seconds. TheX-RateLimit-Reset
header (de facto standard) is conventionally an absolute Unix timestamp. Using a relative delay (seconds until reset) is inconsistent with GitHub, Twitter, and Stripe implementations and will break clients that parse it as an absolute timestamp. The IETF draft standard uses a relative delay — but that is a different header name (X-RateLimit-Reset
). Do not mix the two conventions.RateLimit-Reset -
Omitting
from 429 responses. A 429 withoutRetry-After
tells the client it has been throttled but not when it can retry. The client must implement exponential backoff with jitter to avoid thundering herd behavior.Retry-After
makes the retry safe and deterministic.Retry-After
is a MUST in RFC 6585 for 429 responses when the server knows the reset time.Retry-After -
Different
semantics across endpoints. Using Unix timestamps for some endpoints and relative delays for others within the same API creates parsing complexity in clients. Standardize on one format — preferably Unix timestamp for theX-RateLimit-Reset
headers — and document it clearly.X-RateLimit-*
Details
Client-Side Proactive Throttling
A well-implemented API client reads
X-RateLimit-Remaining and X-RateLimit-Reset on every response and slows down before hitting the limit. A simple proactive strategy: when X-RateLimit-Remaining drops below 10% of X-RateLimit-Limit, insert a delay between requests equal to (X-RateLimit-Reset - now) / X-RateLimit-Remaining. This spreads the remaining quota evenly across the remaining window, preventing a burst at the end that triggers 429 errors. Octokit (GitHub's official SDK) implements this pattern natively, making rate limit management transparent to SDK consumers.
Concurrent Request Challenges
Rate limit headers reflect the server's view at the time the response is sent. In a distributed system, multiple concurrent requests may be in flight simultaneously, all returning
X-RateLimit-Remaining: 50 — but each consuming quota. The client-side view of remaining quota is always stale by the number of in-flight requests. High-concurrency clients should track in-flight request count and subtract it from the server-reported remaining value to get a more accurate estimate of true remaining capacity.
Real-World Case Study: Twitter API Rate Limit Header Adoption
Twitter's v1.1 API introduced
X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset in 2012, and the pattern spread rapidly across the API industry. Before standardized headers, developers had to scrape error message text to determine when a rate limit would reset, leading to fragile string-parsing logic in client libraries. Twitter's headers enabled a generation of Twitter client libraries to implement reliable rate limit tracking. When Twitter moved to API v2, they maintained the same header names for backward compatibility — demonstrating that once a rate limit header contract is published, changing it imposes migration costs on every consumer's client library.
Source
- IETF draft-ietf-httpapi-ratelimit-headers
- RFC 6585 — Additional HTTP Status Codes (429 Too Many Requests)
- GitHub Rate Limit Headers Documentation
- RFC 6648 — Deprecating the "X-" Prefix in Application Protocols
- Stripe Rate Limiting Headers
Process
- Emit
,X-RateLimit-Limit
, andX-RateLimit-Remaining
(Unix timestamp) on every API response — not only on 429.X-RateLimit-Reset - Add
(seconds until reset) to every 429 response; compute it asRetry-After
.X-RateLimit-Reset - current_unix_time - For APIs with multiple quota dimensions, add
to identify which quota pool the response consumes.X-RateLimit-Resource - Add
header when adopting the IETF draft format to enable machine-readable quota policy discovery.RateLimit-Policy - Run
to confirm skill files are well-formed and related skills are correctly cross-referenced.harness validate
Harness Integration
- Type: knowledge — this skill is a reference document, not a procedural workflow.
- No tools or state — consumed as context by other skills and agents.
- related_skills: api-rate-limiting, api-retry-guidance, api-http-methods
Success Criteria
,X-RateLimit-Limit
, andX-RateLimit-Remaining
are present on every API response, including successful 200 responses.X-RateLimit-Reset- Every 429 response includes
with a value equal to the seconds until the rate limit window resets.Retry-After
uses a consistent format (Unix timestamp) across all endpoints in the API.X-RateLimit-Reset- APIs with multiple quota pools use
to distinguish which limit is being reported.X-RateLimit-Resource - Client SDK documentation explains how to read rate limit headers and implement proactive throttling.