git clone https://github.com/Intense-Visions/harness-engineering
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/api-retry-guidance" ~/.claude/skills/intense-visions-harness-engineering-api-retry-guidance && rm -rf "$T"
agents/skills/claude-code/api-retry-guidance/SKILL.mdAPI Retry Guidance
RETRY GUIDANCE SIGNALS CLIENTS WHEN AND HOW TO RETRY FAILED REQUESTS — CLASSIFYING ERRORS AS TRANSIENT OR PERMANENT, EMITTING Retry-After HEADERS, AND REQUIRING IDEMPOTENCY FOR SAFE RETRIES PREVENTS BOTH THUNDERING-HERD AMPLIFICATION AND UNNECESSARY REQUEST ABANDONMENT UNDER TEMPORARY LOAD.
When to Use
- Designing rate-limiting responses that tell clients exactly when to retry
- Reviewing a
response that lacks a503 Service Unavailable
headerRetry-After - Choosing whether to return
or429 Too Many Requests
for capacity-related refusals503 Service Unavailable - Implementing exponential backoff with jitter in a client SDK or HTTP middleware
- Classifying error responses in a client library as retryable vs. non-retryable
- Building a job queue or background worker that must handle transient downstream failures
- Documenting retry behavior expectations in an API style guide or SLA
- Implementing circuit-breaker logic that needs authoritative signals from the server to open and reset
Instructions
Key Concepts
-
Transient vs. permanent errors — A transient error is a temporary condition that may resolve without client-side changes: network timeout, service overload, brief database unavailability. A permanent error will not resolve on retry without action: invalid credentials, missing resource, malformed request. Retrying a permanent error wastes resources and delays diagnosis. Classifying errors correctly in the response allows clients to decide immediately whether to retry. As a rule:
errors (except4xx
) are permanent — they require the client to change the request.429
errors (except5xx
) are potentially transient — the server, not the request, is the problem.501 -
header — The HTTP standard header that tells clients when they may safely retry a request. It accepts two formats:Retry-After- Integer (seconds):
— retry after 60 seconds from now.Retry-After: 60 - HTTP-date:
— retry after this absolute timestamp.Retry-After: Fri, 11 Apr 2026 09:00:00 GMT
is mandatory forRetry-After
and strongly recommended for429 Too Many Requests
. Servers that omit it on503 Service Unavailable
responses leave clients to implement their own backoff, producing unpredictable retry storms.429
- Integer (seconds):
-
Exponential backoff — A retry strategy where the wait time doubles after each failed attempt: 1s, 2s, 4s, 8s, 16s, up to a configured maximum. Exponential backoff reduces the probability that all retrying clients hit the server at the same moment. The base interval and multiplier should be configurable. Backoff should respect the
header as a floor — never retry before the server-specified delay, regardless of the computed backoff value.Retry-After -
Jitter — Randomization added to the backoff interval to desynchronize retry storms. Without jitter, all clients that received the same
at the same moment compute the same retry interval and submit simultaneously. With full jitter (429
), retries are spread across the backoff window, dramatically reducing peak retry load. AWS's builders library recommends "decorrelated jitter" as the most effective pattern for high-concurrency workloads.sleep(random_between(0, computed_backoff)) -
vs429 Too Many Requests
— These are commonly confused:503 Service Unavailable
— The server is healthy, but this client has exceeded its rate limit. The server can still serve other clients. The issue is client-specific. Retry after429
elapses; the same client will succeed if the rate is respected.Retry-After
— The server is temporarily unavailable to all clients. The issue is server-wide: maintenance, overload, deployment in progress. Retry after503
elapses; success is not guaranteed even after waiting (the outage may continue). UseRetry-After
for circuit-breaker open states; use503
for rate limiting.429
Worked Example
AWS API Gateway rate-limiting and retry patterns in production:
Rate limit exceeded (429 with Retry-After):
GET /v1/metrics?start=2026-01-01&end=2026-04-01 Authorization: Bearer tok_...
HTTP/1.1 429 Too Many Requests Content-Type: application/problem+json Retry-After: 30 X-RateLimit-Limit: 100 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1744271400 { "type": "https://api.example.com/errors/rate-limit-exceeded", "title": "Rate Limit Exceeded", "status": 429, "detail": "You have exceeded 100 requests per minute. Retry after 30 seconds.", "instance": "/errors/correlation/f1a2-b3c4", "limit": 100, "remaining": 0, "reset": 1744271400 }
The client reads
Retry-After: 30 and waits at least 30 seconds. X-RateLimit-Reset provides the absolute timestamp for clients that prefer UTC synchronization. The body echoes the policy for logging and debugging.
Service temporarily unavailable (503 with Retry-After):
HTTP/1.1 503 Service Unavailable Content-Type: application/problem+json Retry-After: 120 { "type": "https://api.example.com/errors/service-unavailable", "title": "Service Temporarily Unavailable", "status": 503, "detail": "The service is undergoing scheduled maintenance. Expected recovery in 2 minutes.", "instance": "/errors/correlation/d5e6-f7a8" }
Exponential backoff with jitter (pseudo-code):
import random, time def retry_with_backoff(fn, max_attempts=5, base_delay=1.0, max_delay=60.0): for attempt in range(max_attempts): response = fn() if response.status_code not in (429, 503): return response # success or permanent error — stop retrying # Respect server-specified Retry-After as a floor retry_after = float(response.headers.get("Retry-After", 0)) # Compute exponential backoff with full jitter computed = min(base_delay * (2 ** attempt), max_delay) jittered = random.uniform(0, computed) wait = max(retry_after, jittered) time.sleep(wait) raise MaxRetriesExceeded(f"Failed after {max_attempts} attempts")
The critical line is
wait = max(retry_after, jittered) — the server's Retry-After is always respected as a minimum, and jitter is added on top of it to desynchronize concurrent retrying clients.
Anti-Patterns
-
Omitting
fromRetry-After
responses. Without429
, clients have no authoritative signal for when to retry. Common outcomes: aggressive retrying every few hundred milliseconds (worsening the rate-limit breach), or backing off so conservatively that legitimate requests are delayed for minutes. Fix: always includeRetry-After
onRetry-After
responses, set to the actual reset window in seconds.429 -
Retrying
errors other than4xx
. A429
,400
,401
,403
, or404
response will not succeed on retry without a change to the request. Retrying them wastes quota, delays error propagation, and produces confusing logs. Fix: only retry on422
(rate limit) and429
(server fault) responses. All other5xx
codes should surface as immediate errors to the caller.4xx -
Exponential backoff without jitter. Synchronized backoff (all clients waiting exactly 2, 4, 8 seconds) produces retry bursts at each interval boundary. If 1000 clients all received
at the same moment, they will all retry at T+2s, T+4s, T+8s — each interval triggers a new overload spike. Fix: add randomized jitter to desynchronize the retry distribution across the backoff window.503 -
Using
for per-client rate limiting. Returning503
when a specific client exceeds its rate limit misleads other clients (and monitoring systems) into thinking the server is globally down. It also prevents clients from distinguishing "my rate limit" from "server outage" — different actions are required. Fix: use503 Service Unavailable
for client-specific rate limits and429
for server-wide unavailability.503
Details
Idempotency Requirement for Safe Retries
Retrying a request that is not idempotent may produce duplicate side effects: a charge processed twice, a message sent twice, a record created twice. Before implementing retry logic, classify the endpoint:
- Safe and idempotent:
,GET
,HEAD
— retry freely.OPTIONS - Idempotent by HTTP semantics:
,PUT
— retry is safe if the server implements idempotency correctly.DELETE - Not inherently idempotent:
— requires explicit idempotency keys (seePOST
) to retry safely.api-idempotency-keys
Clients must never retry a
POST without an idempotency key unless the server documents that the endpoint is safe to re-invoke (e.g., a pure query wrapped in POST).
Circuit Breaker Integration
Server-side
Retry-After signals can drive client-side circuit breakers. When consecutive 503 responses arrive with Retry-After headers, the circuit breaker should open for at least the server-specified duration. When the Retry-After window expires, a single probe request determines whether to close the circuit. This prevents the thundering-herd problem where all clients simultaneously probe the recovering service.
Real-World Case Study: Stripe Retry Design
Stripe's client libraries implement a retry strategy that the Stripe engineering team has published: up to 2 automatic retries on
429 and 5xx responses, with exponential backoff starting at 0.5 seconds and capped at 2 seconds, with Retry-After honored as a floor. Stripe found in production analysis that the combination of (a) emitting accurate Retry-After values, (b) client jitter, and (c) idempotency keys on all mutating requests reduced duplicate charge incidents by over 95% compared to client implementations that retried blindly on any error. The key insight: the server's Retry-After signal is the coordination mechanism that transforms a retry storm into a smooth recovery curve.
Source
- Timeouts, Retries, and Backoff with Jitter — AWS Builders' Library
- RFC 9110 — HTTP Semantics, Section 10.2.3 (Retry-After)
- Stripe — Error Handling and Retries
- Google Cloud — Exponential Backoff
- Microsoft Azure — Transient Fault Handling
Process
- Classify all error responses as transient (
,429
) or permanent (5xx
except4xx
) and document the classification in the API reference.429 - Add
headers to allRetry-After
responses, set to the actual rate-limit reset window in seconds.429 - Add
headers toRetry-After
responses when the expected recovery time is known; omit when the outage duration is unknown.503 - Ensure all
endpoints that may be retried support idempotency keys (seePOST
).api-idempotency-keys - Run
to confirm skill files are well-formed and cross-references are correct.harness validate
Harness Integration
- Type: knowledge — this skill is a reference document, not a procedural workflow.
- No tools or state — consumed as context by other skills and agents.
- related_skills: api-rate-limit-headers, api-idempotency-keys, api-status-codes, api-error-contracts
Success Criteria
- All
responses include a429
header set to the rate-limit reset window in seconds.Retry-After
is used for client-specific rate limits;429
is used for server-wide unavailability — never interchanged.503- Client retry logic only retries on
and429
responses;5xx
errors (except4xx
) are surfaced immediately.429 - Retry implementations include jitter to desynchronize concurrent retry storms.
- Non-idempotent
endpoints require idempotency keys before retries are safe.POST