Harness-engineering api-rate-limit-headers

API Rate Limit Headers

install

source · Clone the upstream repo

git clone https://github.com/Intense-Visions/harness-engineering

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/api-rate-limit-headers" ~/.claude/skills/intense-visions-harness-engineering-api-rate-limit-headers-908e07 && rm -rf "$T"

manifest: agents/skills/claude-code/api-rate-limit-headers/SKILL.md

source content

API Rate Limit Headers

RATE LIMIT HEADERS ARE THE CONSUMER'S INSTRUMENTATION — WITHOUT X-RATELIMIT-REMAINING AND X-RATELIMIT-RESET, CLIENTS CANNOT IMPLEMENT PROACTIVE THROTTLING AND ARE FORCED INTO REACTIVE RETRY LOOPS THAT AMPLIFY LOAD ON ALREADY-STRESSED INFRASTRUCTURE PRECISELY WHEN THE API NEEDS RELIEF MOST.

When to Use

Implementing rate limit response headers for a new or existing API
Auditing an API that returns 429 errors without actionable throttling context
Migrating from ad-hoc
```
X-RateLimit-*
```
headers to the IETF RateLimit draft standard
Writing client-side SDK code that needs to read rate limit state and implement proactive throttling
Documenting the rate limit header contract in an API style guide
Debugging an integration where the client is exceeding rate limits despite implementing backoff

Instructions

Key Concepts

```
X-RateLimit-Limit
```
,
X-RateLimit-Remaining
,
X-RateLimit-Reset
— the de facto standard — These three headers form the widely-adopted informal standard for rate limit signaling, used by GitHub, Twitter, Stripe, and hundreds of other APIs before any formal RFC existed.
```
X-RateLimit-Limit
```
is the total quota for the current window.
```
X-RateLimit-Remaining
```
is the requests left in the current window.
```
X-RateLimit-Reset
```
is the Unix timestamp when the window resets and
```
X-RateLimit-Remaining
```
returns to
```
X-RateLimit-Limit
```
. Emit all three on every response — not only on 429 — so clients can monitor their quota consumption proactively.
IETF RateLimit Headers Draft (draft-ietf-httpapi-ratelimit-headers) — The IETF HTTPAPI working group is standardizing rate limit headers under the names
```
RateLimit-Limit
```
,
```
RateLimit-Remaining
```
, and
```
RateLimit-Reset
```
(without the
```
X-
```
prefix, as
```
X-
```
prefixed headers are deprecated per RFC 6648). The draft also introduces
```
RateLimit-Policy
```
, which describes the quota policy in a machine-readable format. Adopt the draft standard for new implementations; emit both the
```
X-RateLimit-*
```
variants and the draft standard variants during the transition period for backward compatibility.
```
Retry-After
```
semantics —
```
Retry-After
```
appears on 429 responses and specifies when the client may safely retry. It accepts two formats: a delay in seconds (
```
Retry-After: 60
```
) or an HTTP-date (
```
Retry-After: Wed, 01 Jan 2025 00:00:00 GMT
```
). The delay format is simpler to implement and parse; the date format is more precise for long windows. A 429 response without
```
Retry-After
```
forces clients into guesswork; a 429 with a precise
```
Retry-After
```
enables clients to sleep exactly the right amount and retry once with confidence.
Multiple quota windows — named resources — Some APIs apply different limits to different endpoint groups. GitHub uses
```
X-RateLimit-Resource: core
```
,
```
X-RateLimit-Resource: search
```
, and
```
X-RateLimit-Resource: graphql
```
to distinguish quota pools. Each resource has its own
```
Limit
```
,
```
Remaining
```
, and
```
Reset
```
values. When an API has multiple quota dimensions (per-user, per-app, per-endpoint-group), use named resource headers so clients know which quota they are consuming and which limit they hit.
```
RateLimit-Policy
```
— machine-readable quota description — The IETF draft's
```
RateLimit-Policy
```
header describes the quota policy in a structured format:
```
RateLimit-Policy: 100;w=60;burst=20
```
means 100 requests per 60-second window with a burst allowance of 20. This enables SDK authors and monitoring tools to parse quota policies without scraping documentation pages. Include
```
RateLimit-Policy
```
on all responses when adopting the IETF draft format.
Header emission timing — on all responses, not just 429 — Rate limit headers must be emitted on successful responses (200, 201, 204) in addition to throttled responses (429). Clients need to track their remaining quota on every response to implement proactive throttling — slowing down before they hit the limit rather than reacting after a 429. A client that only sees rate limit headers on 429 responses cannot avoid the 429 in the first place.

Worked Example

GitHub REST API rate limit headers — the most-referenced public implementation.

Successful response (quota tracking):

HTTP/1.1 200 OK
Content-Type: application/json
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4987
X-RateLimit-Reset: 1704070800
X-RateLimit-Used: 13
X-RateLimit-Resource: core

The client can compute: 13 requests used, 4987 remaining, window resets at Unix timestamp

1704070800

. If the client needs to make 5000 requests, it can calculate that it will exhaust its quota and needs to spread the work across multiple windows.

Rate limit exceeded — 429 with Retry-After:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1704070800
X-RateLimit-Used: 5000
X-RateLimit-Resource: core
Retry-After: 3587
Content-Type: application/json

{
  "message": "API rate limit exceeded.",
  "documentation_url": "https://docs.github.com/rest/overview/rate-limits"
}

Retry-After: 3587

tells the client to sleep for 3587 seconds (until the reset timestamp). No guessing, no exponential backoff to a random interval — a precise sleep.

IETF draft format (forward-compatible):

HTTP/1.1 200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 87
RateLimit-Reset: 60
RateLimit-Policy: 100;w=60;burst=20

RateLimit-Reset: 60

is a relative delay in seconds in the draft format (unlike

X-RateLimit-Reset

, which is an absolute Unix timestamp). Clients must parse both formats differently; document which format your API uses.

Twilio rate limit headers — per-endpoint differentiation:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1704067260
X-Home-Region: us1

Twilio scopes limits per account + region, allowing higher limits for regional endpoints without a global quota.

Anti-Patterns

Emitting
```
X-RateLimit-*
```
headers only on 429 responses. Headers emitted only on throttled responses prevent clients from implementing proactive throttling. The client's first signal of approaching the limit is the 429 itself — at which point the request has already been rejected. Emit rate limit headers on every response.
```
X-RateLimit-Reset
```
as a relative delay in seconds. The
```
X-RateLimit-Reset
```
header (de facto standard) is conventionally an absolute Unix timestamp. Using a relative delay (seconds until reset) is inconsistent with GitHub, Twitter, and Stripe implementations and will break clients that parse it as an absolute timestamp. The IETF draft standard uses a relative delay — but that is a different header name (
```
RateLimit-Reset
```
). Do not mix the two conventions.
Omitting
```
Retry-After
```
from 429 responses. A 429 without
```
Retry-After
```
tells the client it has been throttled but not when it can retry. The client must implement exponential backoff with jitter to avoid thundering herd behavior.
```
Retry-After
```
makes the retry safe and deterministic.
```
Retry-After
```
is a MUST in RFC 6585 for 429 responses when the server knows the reset time.
Different
```
X-RateLimit-Reset
```
semantics across endpoints. Using Unix timestamps for some endpoints and relative delays for others within the same API creates parsing complexity in clients. Standardize on one format — preferably Unix timestamp for the
```
X-RateLimit-*
```
headers — and document it clearly.

Details

Client-Side Proactive Throttling

A well-implemented API client reads

X-RateLimit-Remaining

and

X-RateLimit-Reset

on every response and slows down before hitting the limit. A simple proactive strategy: when

X-RateLimit-Remaining

drops below 10% of

X-RateLimit-Limit

, insert a delay between requests equal to

(X-RateLimit-Reset - now) / X-RateLimit-Remaining

. This spreads the remaining quota evenly across the remaining window, preventing a burst at the end that triggers 429 errors. Octokit (GitHub's official SDK) implements this pattern natively, making rate limit management transparent to SDK consumers.

Concurrent Request Challenges

Rate limit headers reflect the server's view at the time the response is sent. In a distributed system, multiple concurrent requests may be in flight simultaneously, all returning

X-RateLimit-Remaining: 50

— but each consuming quota. The client-side view of remaining quota is always stale by the number of in-flight requests. High-concurrency clients should track in-flight request count and subtract it from the server-reported remaining value to get a more accurate estimate of true remaining capacity.

Real-World Case Study: Twitter API Rate Limit Header Adoption

Twitter's v1.1 API introduced

X-RateLimit-Limit

X-RateLimit-Remaining

, and

X-RateLimit-Reset

in 2012, and the pattern spread rapidly across the API industry. Before standardized headers, developers had to scrape error message text to determine when a rate limit would reset, leading to fragile string-parsing logic in client libraries. Twitter's headers enabled a generation of Twitter client libraries to implement reliable rate limit tracking. When Twitter moved to API v2, they maintained the same header names for backward compatibility — demonstrating that once a rate limit header contract is published, changing it imposes migration costs on every consumer's client library.

Source

Process

Emit
```
X-RateLimit-Limit
```
,
```
X-RateLimit-Remaining
```
, and
```
X-RateLimit-Reset
```
(Unix timestamp) on every API response — not only on 429.
Add
```
Retry-After
```
(seconds until reset) to every 429 response; compute it as
```
X-RateLimit-Reset - current_unix_time
```
.
For APIs with multiple quota dimensions, add
```
X-RateLimit-Resource
```
to identify which quota pool the response consumes.
Add
```
RateLimit-Policy
```
header when adopting the IETF draft format to enable machine-readable quota policy discovery.
Run
```
harness validate
```
to confirm skill files are well-formed and related skills are correctly cross-referenced.

Harness Integration

Type: knowledge — this skill is a reference document, not a procedural workflow.
No tools or state — consumed as context by other skills and agents.
related_skills: api-rate-limiting, api-retry-guidance, api-http-methods

Success Criteria

```
X-RateLimit-Limit
```
,
```
X-RateLimit-Remaining
```
, and
```
X-RateLimit-Reset
```
are present on every API response, including successful 200 responses.
Every 429 response includes
```
Retry-After
```
with a value equal to the seconds until the rate limit window resets.
```
X-RateLimit-Reset
```
uses a consistent format (Unix timestamp) across all endpoints in the API.
APIs with multiple quota pools use
```
X-RateLimit-Resource
```
to distinguish which limit is being reported.
Client SDK documentation explains how to read rate limit headers and implement proactive throttling.