Harness-engineering api-retry-guidance

API Retry Guidance

install

source · Clone the upstream repo

git clone https://github.com/Intense-Visions/harness-engineering

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/api-retry-guidance" ~/.claude/skills/intense-visions-harness-engineering-api-retry-guidance && rm -rf "$T"

manifest: agents/skills/claude-code/api-retry-guidance/SKILL.md

source content

API Retry Guidance

RETRY GUIDANCE SIGNALS CLIENTS WHEN AND HOW TO RETRY FAILED REQUESTS — CLASSIFYING ERRORS AS TRANSIENT OR PERMANENT, EMITTING Retry-After HEADERS, AND REQUIRING IDEMPOTENCY FOR SAFE RETRIES PREVENTS BOTH THUNDERING-HERD AMPLIFICATION AND UNNECESSARY REQUEST ABANDONMENT UNDER TEMPORARY LOAD.

When to Use

Designing rate-limiting responses that tell clients exactly when to retry
Reviewing a
```
503 Service Unavailable
```
response that lacks a
```
Retry-After
```
header
Choosing whether to return
```
429 Too Many Requests
```
or
```
503 Service Unavailable
```
for capacity-related refusals
Implementing exponential backoff with jitter in a client SDK or HTTP middleware
Classifying error responses in a client library as retryable vs. non-retryable
Building a job queue or background worker that must handle transient downstream failures
Documenting retry behavior expectations in an API style guide or SLA
Implementing circuit-breaker logic that needs authoritative signals from the server to open and reset

Instructions

Key Concepts

Transient vs. permanent errors — A transient error is a temporary condition that may resolve without client-side changes: network timeout, service overload, brief database unavailability. A permanent error will not resolve on retry without action: invalid credentials, missing resource, malformed request. Retrying a permanent error wastes resources and delays diagnosis. Classifying errors correctly in the response allows clients to decide immediately whether to retry. As a rule:
```
4xx
```
errors (except
```
429
```
) are permanent — they require the client to change the request.
```
5xx
```
errors (except
```
501
```
) are potentially transient — the server, not the request, is the problem.
```
Retry-After
```
header — The HTTP standard header that tells clients when they may safely retry a request. It accepts two formats:
- Integer (seconds):
```
Retry-After: 60
```
  — retry after 60 seconds from now.
- HTTP-date:
```
Retry-After: Fri, 11 Apr 2026 09:00:00 GMT
```
  — retry after this absolute timestamp.
```
Retry-After
```
  is mandatory for
```
429 Too Many Requests
```
  and strongly recommended for
```
503 Service Unavailable
```
  . Servers that omit it on
```
429
```
  responses leave clients to implement their own backoff, producing unpredictable retry storms.
Exponential backoff — A retry strategy where the wait time doubles after each failed attempt: 1s, 2s, 4s, 8s, 16s, up to a configured maximum. Exponential backoff reduces the probability that all retrying clients hit the server at the same moment. The base interval and multiplier should be configurable. Backoff should respect the
```
Retry-After
```
header as a floor — never retry before the server-specified delay, regardless of the computed backoff value.
Jitter — Randomization added to the backoff interval to desynchronize retry storms. Without jitter, all clients that received the same
```
429
```
at the same moment compute the same retry interval and submit simultaneously. With full jitter (
```
sleep(random_between(0, computed_backoff))
```
), retries are spread across the backoff window, dramatically reducing peak retry load. AWS's builders library recommends "decorrelated jitter" as the most effective pattern for high-concurrency workloads.
```
429 Too Many Requests
```
vs
503 Service Unavailable
— These are commonly confused:
- ```
429
```
  — The server is healthy, but this client has exceeded its rate limit. The server can still serve other clients. The issue is client-specific. Retry after
```
Retry-After
```
  elapses; the same client will succeed if the rate is respected.
- ```
503
```
  — The server is temporarily unavailable to all clients. The issue is server-wide: maintenance, overload, deployment in progress. Retry after
```
Retry-After
```
  elapses; success is not guaranteed even after waiting (the outage may continue). Use
```
503
```
  for circuit-breaker open states; use
```
429
```
  for rate limiting.

Worked Example

AWS API Gateway rate-limiting and retry patterns in production:

Rate limit exceeded (429 with Retry-After):

GET /v1/metrics?start=2026-01-01&end=2026-04-01
Authorization: Bearer tok_...

HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1744271400

{
  "type": "https://api.example.com/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "You have exceeded 100 requests per minute. Retry after 30 seconds.",
  "instance": "/errors/correlation/f1a2-b3c4",
  "limit": 100,
  "remaining": 0,
  "reset": 1744271400
}

The client reads

Retry-After: 30

and waits at least 30 seconds.

X-RateLimit-Reset

provides the absolute timestamp for clients that prefer UTC synchronization. The body echoes the policy for logging and debugging.

Service temporarily unavailable (503 with Retry-After):

HTTP/1.1 503 Service Unavailable
Content-Type: application/problem+json
Retry-After: 120

{
  "type": "https://api.example.com/errors/service-unavailable",
  "title": "Service Temporarily Unavailable",
  "status": 503,
  "detail": "The service is undergoing scheduled maintenance. Expected recovery in 2 minutes.",
  "instance": "/errors/correlation/d5e6-f7a8"
}

Exponential backoff with jitter (pseudo-code):

import random, time

def retry_with_backoff(fn, max_attempts=5, base_delay=1.0, max_delay=60.0):
    for attempt in range(max_attempts):
        response = fn()
        if response.status_code not in (429, 503):
            return response  # success or permanent error — stop retrying

        # Respect server-specified Retry-After as a floor
        retry_after = float(response.headers.get("Retry-After", 0))

        # Compute exponential backoff with full jitter
        computed = min(base_delay * (2 ** attempt), max_delay)
        jittered = random.uniform(0, computed)

        wait = max(retry_after, jittered)
        time.sleep(wait)

    raise MaxRetriesExceeded(f"Failed after {max_attempts} attempts")

The critical line is

wait = max(retry_after, jittered)

— the server's

Retry-After

is always respected as a minimum, and jitter is added on top of it to desynchronize concurrent retrying clients.

Anti-Patterns

Omitting
```
Retry-After
```
from
429
responses. Without
```
Retry-After
```
, clients have no authoritative signal for when to retry. Common outcomes: aggressive retrying every few hundred milliseconds (worsening the rate-limit breach), or backing off so conservatively that legitimate requests are delayed for minutes. Fix: always include
```
Retry-After
```
on
```
429
```
responses, set to the actual reset window in seconds.
Retrying
```
4xx
```
errors other than
429
. A
```
400
```
,
```
401
```
,
```
403
```
,
```
404
```
, or
```
422
```
response will not succeed on retry without a change to the request. Retrying them wastes quota, delays error propagation, and produces confusing logs. Fix: only retry on
```
429
```
(rate limit) and
```
5xx
```
(server fault) responses. All other
```
4xx
```
codes should surface as immediate errors to the caller.
Exponential backoff without jitter. Synchronized backoff (all clients waiting exactly 2, 4, 8 seconds) produces retry bursts at each interval boundary. If 1000 clients all received
```
503
```
at the same moment, they will all retry at T+2s, T+4s, T+8s — each interval triggers a new overload spike. Fix: add randomized jitter to desynchronize the retry distribution across the backoff window.
Using
```
503
```
for per-client rate limiting. Returning
```
503 Service Unavailable
```
when a specific client exceeds its rate limit misleads other clients (and monitoring systems) into thinking the server is globally down. It also prevents clients from distinguishing "my rate limit" from "server outage" — different actions are required. Fix: use
```
429
```
for client-specific rate limits and
```
503
```
for server-wide unavailability.

Details

Idempotency Requirement for Safe Retries

Retrying a request that is not idempotent may produce duplicate side effects: a charge processed twice, a message sent twice, a record created twice. Before implementing retry logic, classify the endpoint:

Safe and idempotent:
```
GET
```
,
```
HEAD
```
,
```
OPTIONS
```
— retry freely.
Idempotent by HTTP semantics:
```
PUT
```
,
```
DELETE
```
— retry is safe if the server implements idempotency correctly.
Not inherently idempotent:
```
POST
```
— requires explicit idempotency keys (see
```
api-idempotency-keys
```
) to retry safely.

Clients must never retry a

POST

without an idempotency key unless the server documents that the endpoint is safe to re-invoke (e.g., a pure query wrapped in POST).

Circuit Breaker Integration

Server-side

Retry-After

signals can drive client-side circuit breakers. When consecutive

responses arrive with

Retry-After

headers, the circuit breaker should open for at least the server-specified duration. When the

Retry-After

window expires, a single probe request determines whether to close the circuit. This prevents the thundering-herd problem where all clients simultaneously probe the recovering service.

Real-World Case Study: Stripe Retry Design

Stripe's client libraries implement a retry strategy that the Stripe engineering team has published: up to 2 automatic retries on

and

5xx

responses, with exponential backoff starting at 0.5 seconds and capped at 2 seconds, with

Retry-After

honored as a floor. Stripe found in production analysis that the combination of (a) emitting accurate

Retry-After

values, (b) client jitter, and (c) idempotency keys on all mutating requests reduced duplicate charge incidents by over 95% compared to client implementations that retried blindly on any error. The key insight: the server's

Retry-After

signal is the coordination mechanism that transforms a retry storm into a smooth recovery curve.

Source

Process

Classify all error responses as transient (
```
429
```
,
```
5xx
```
) or permanent (
```
4xx
```
except
```
429
```
) and document the classification in the API reference.
Add
```
Retry-After
```
headers to all
```
429
```
responses, set to the actual rate-limit reset window in seconds.
Add
```
Retry-After
```
headers to
```
503
```
responses when the expected recovery time is known; omit when the outage duration is unknown.
Ensure all
```
POST
```
endpoints that may be retried support idempotency keys (see
```
api-idempotency-keys
```
).
Run
```
harness validate
```
to confirm skill files are well-formed and cross-references are correct.

Harness Integration

Type: knowledge — this skill is a reference document, not a procedural workflow.
No tools or state — consumed as context by other skills and agents.
related_skills: api-rate-limit-headers, api-idempotency-keys, api-status-codes, api-error-contracts

Success Criteria

All
```
429
```
responses include a
```
Retry-After
```
header set to the rate-limit reset window in seconds.
```
429
```
is used for client-specific rate limits;
```
503
```
is used for server-wide unavailability — never interchanged.
Client retry logic only retries on
```
429
```
and
```
5xx
```
responses;
```
4xx
```
errors (except
```
429
```
) are surfaced immediately.
Retry implementations include jitter to desynchronize concurrent retry storms.
Non-idempotent
```
POST
```
endpoints require idempotency keys before retries are safe.