git clone https://github.com/ComeOnOliver/skillshub
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/TerminalSkills/skills/api-load-tester" ~/.claude/skills/comeonoliver-skillshub-api-load-tester && rm -rf "$T"
skills/TerminalSkills/skills/api-load-tester/SKILL.mdAPI Load Tester
Overview
This skill generates realistic load test scripts from API definitions and executes them with proper ramp-up patterns, authentication flows, and assertions. It produces clear reports identifying breaking points, bottlenecks, and latency percentiles at each traffic level.
Instructions
Step 1: Choose Tool and Gather API Info
Prefer k6 for complex scenarios (multi-step flows, thresholds, custom metrics). Use wrk for quick single-endpoint benchmarks. Use autocannon if only Node.js is available.
Gather endpoint information from:
- OpenAPI/Swagger spec files
- Route definitions (Express, FastAPI, etc.)
- User-described endpoints
Step 2: Generate Realistic Payloads
Read request/response types from the codebase (TypeScript interfaces, Python dataclasses, Go structs) and generate payloads with:
- Realistic field values (not "test123" or "foo")
- Proper data distributions (varied product IDs, realistic quantities)
- Edge cases mixed in (long strings, special characters at ~5% rate)
Step 3: Design Test Scenarios
Create scenarios appropriate for the goal:
Ramp-up test (finding breaking point):
stages: [ { duration: '2m', target: 50 }, // warm-up { duration: '5m', target: 200 }, // ramp { duration: '3m', target: 500 }, // push { duration: '2m', target: 500 }, // sustain { duration: '2m', target: 0 }, // cool-down ]
Soak test (finding memory leaks, connection exhaustion):
stages: [ { duration: '5m', target: 100 }, // ramp { duration: '60m', target: 100 }, // sustain { duration: '5m', target: 0 }, // cool-down ]
Spike test (sudden traffic burst):
stages: [ { duration: '2m', target: 50 }, // normal { duration: '30s', target: 500 }, // spike { duration: '5m', target: 500 }, // sustain spike { duration: '30s', target: 50 }, // drop back ]
Step 4: Include Proper Assertions
Always add thresholds:
thresholds: { http_req_duration: ['p(95)<800', 'p(99)<2000'], http_req_failed: ['rate<0.05'], // Custom per-endpoint if needed }
Step 5: Execute and Report
Run the test and parse output into a structured report showing:
- Latency percentiles (p50, p95, p99) per stage
- Error rate per stage
- Requests per second achieved
- Breaking point (where thresholds first violated)
- Bottleneck hypothesis based on error patterns
Examples
Example 1: Quick Endpoint Benchmark
User prompt:
Benchmark our /api/search endpoint handling 100 concurrent connections
Agent output:
# Quick benchmark with k6 k6 run --vus 100 --duration 60s - <<'EOF' import http from 'k6/http'; import { check } from 'k6'; export default function () { const res = http.get('https://api.example.com/api/search?q=laptop&page=1'); check(res, { 'status 200': (r) => r.status === 200 }); } EOF
Results — /api/search @ 100 VUs for 60s Requests: 8,423 total (140.4 RPS) Latency: p50=89ms p95=340ms p99=890ms Errors: 12 (0.14%) Data: 24.3 MB received Verdict: Endpoint handles 140 RPS at 100 VUs. p99 approaching 1s suggests database query optimization needed for sustained higher load.
Example 2: Multi-Step Flow
User prompt:
Create a load test for our signup → login → create-project flow
Agent generates a k6 script with:
- Step 1: POST /api/auth/signup with randomized email/name
- Step 2: POST /api/auth/login to get JWT
- Step 3: POST /api/projects with auth header and realistic project data
- Custom metrics tracking each step's latency separately
- Sleep between steps to simulate real user behavior
Guidelines
- Never load test production without explicit confirmation — always clarify the target environment
- Start low, ramp gradually — sudden jumps make it hard to identify the exact breaking point
- Realistic think time — add
between requests to simulate real users; without it, you're testing throughput, not user concurrencysleep(1-3) - Authentication matters — many bottlenecks only appear with real auth flows (token validation, session lookups)
- Watch for connection reuse — k6 reuses connections by default, which is realistic for browsers but not for serverless/mobile clients
- Rate limit awareness — if the API has rate limiting, note it in the report; it's not a performance bottleneck, it's intentional
- Report infrastructure context — always note the server specs, pod count, and database size alongside results