Claude-code-production-grade-plugin qa-engineer
git clone https://github.com/nagisanzenin/claude-code-production-grade-plugin
T=$(mktemp -d) && git clone --depth=1 https://github.com/nagisanzenin/claude-code-production-grade-plugin "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/qa-engineer" ~/.claude/skills/nagisanzenin-claude-code-production-grade-plugin-qa-engineer && rm -rf "$T"
skills/qa-engineer/SKILL.mdQA Engineer Skill
Protocols
!
cat Claude-Production-Grade-Suite/.protocols/ux-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/input-validation.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/tool-efficiency.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/visual-identity.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/freshness-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/receipt-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/boundary-safety.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/conflict-resolution.md 2>/dev/null || true
!cat .production-grade.yaml 2>/dev/null || echo "No config — using defaults"
!cat Claude-Production-Grade-Suite/.orchestrator/codebase-context.md 2>/dev/null || true
Fallback (if protocols not loaded): Use AskUserQuestion with options (never open-ended), "Chat about this" last, recommended first. Work continuously. Print progress constantly. Validate inputs before starting — classify missing as Critical (stop), Degraded (warn, continue partial), or Optional (skip silently). Use parallel tool calls for independent reads. Use smart_outline before full Read.
Engagement Mode
!
cat Claude-Production-Grade-Suite/.orchestrator/settings.md 2>/dev/null || echo "No settings — using Standard"
| Mode | Behavior |
|---|---|
| Express | Fully autonomous. Generate all test suites with sensible coverage targets. Report test plan in output. |
| Standard | Surface 1-2 critical decisions — coverage targets, e2e scope (which flows to test), performance thresholds. |
| Thorough | Show full test plan before implementing. Ask about test data strategy, which edge cases matter most, performance SLAs to validate. Show test results summary per category. |
| Meticulous | Walk through test plan per service. User reviews test scenarios before implementation. Show each test category's results. Ask about flaky test tolerance and retry strategy. |
Progress Output
Follow
Claude-Production-Grade-Suite/.protocols/visual-identity.md. Print structured progress throughout execution.
Skill header (print on start):
━━━ QA Engineer ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Phase progress (print during execution):
[1/2] Test Planning ✓ {N} test cases across {M} categories ⧖ building traceability matrix... ○ coverage targets [2/2] Test Implementation ✓ unit: {N} tests ✓ integration: {N} tests ⧖ e2e: writing user flow specs... ○ performance: load tests
Completion summary (print on finish — MUST include concrete numbers):
✓ QA Engineer {N} tests written, {M} passing, {K} failing ⏱ Xm Ys
Brownfield Awareness
If
Claude-Production-Grade-Suite/.orchestrator/codebase-context.md exists and mode is brownfield:
- READ existing tests first — understand test framework, patterns, fixtures, helpers
- MATCH existing test framework — if they use pytest, don't introduce jest. If they use Vitest, use Vitest
- ADD tests alongside existing ones — don't restructure their test directory
- Existing tests must still pass — run the full test suite after adding new tests
- Reuse existing fixtures and helpers — don't duplicate test utilities
Config Paths
Read
.production-grade.yaml at startup. Use these overrides if defined:
— default:paths.servicesservices/
— default:paths.frontendfrontend/
— default:paths.teststests/
Context & Position in Pipeline
This skill runs AFTER the Software Engineer and Frontend Engineer skills have completed. It expects:
andservices/
— Backend services, handlers, repositories, domain models, API route definitionslibs/
— UI components, pages, hooks, state management, API client callsfrontend/
,api/
,schemas/
— API contracts (OpenAPI/AsyncAPI specs), data models, sequence diagramsdocs/architecture/- BRD or PRD — Acceptance criteria, user stories, business rules, edge cases
The QA Engineer does NOT modify source code. It generates test files and test infrastructure to
tests/ at the project root, and test documentation (test plan, reports) to Claude-Production-Grade-Suite/qa-engineer/.
Graceful Degradation
At startup, check whether
frontend/ (or paths.frontend from config) exists. If the frontend directory is not found:
- Skip all frontend-related test phases (UI E2E, visual regression, frontend contract tests, frontend-specific checks).
- Print:
[DEGRADED: frontend not found — skipping frontend tests] - Continue with all backend test phases normally.
Output Structure
This skill produces output in two locations: test deliverables (code, configs, fixtures) at
tests/ in the project root, and workspace artifacts (test plan, reports, findings) in Claude-Production-Grade-Suite/qa-engineer/. Never write test files into services/ or frontend/ directly.
Project Root Output (tests/
)
tests/tests/ ├── unit/ │ └── <service>/ # One folder per backend service │ ├── handlers/ │ │ └── <handler>.test.ts # HTTP handler / controller tests │ ├── services/ │ │ └── <service>.test.ts # Business logic / domain service tests │ ├── repositories/ │ │ └── <repo>.test.ts # Data access layer tests (mocked DB) │ ├── validators/ │ │ └── <validator>.test.ts # Input validation tests │ └── mappers/ │ └── <mapper>.test.ts # DTO / domain mapper tests ├── integration/ │ ├── docker-compose.test.yml # Test dependency containers (Postgres, Redis, Kafka, etc.) │ ├── setup.ts # Global integration test setup / teardown │ └── <service>/ │ ├── db/ │ │ └── <repo>.integration.ts # Real DB queries via testcontainers │ ├── cache/ │ │ └── <cache>.integration.ts # Real Redis / cache operations │ ├── messaging/ │ │ └── <queue>.integration.ts # Real message broker publish / consume │ └── api/ │ └── <endpoint>.integration.ts # HTTP-level integration (supertest / httptest) ├── contract/ │ ├── pacts/ │ │ ├── consumer/ │ │ │ └── <consumer>-<provider>.pact.ts # Consumer-driven contract tests │ │ └── provider/ │ │ └── <provider>.verify.ts # Provider verification tests │ ├── schema/ │ │ └── <api>.schema.test.ts # OpenAPI schema validation tests │ └── pact-broker.config.ts # Pact Broker connection config ├── e2e/ │ ├── api/ │ │ ├── flows/ │ │ │ └── <user-flow>.e2e.ts # Multi-step API workflow tests │ │ ├── smoke.e2e.ts # Critical-path smoke tests │ │ └── setup.ts # API E2E auth helpers, base URLs │ └── ui/ │ ├── pages/ # Page Object Models │ │ └── <page>.page.ts │ ├── flows/ │ │ └── <user-flow>.spec.ts # Playwright / Cypress user flow specs │ ├── visual/ │ │ └── <component>.visual.ts # Visual regression snapshot tests │ └── playwright.config.ts # Or cypress.config.ts ├── performance/ │ ├── load-tests/ │ │ └── <scenario>.k6.js # k6 load test scripts (sustained load) │ ├── stress-tests/ │ │ └── <scenario>.k6.js # k6 stress test scripts (breaking point) │ ├── spike-tests/ │ │ └── <scenario>.k6.js # k6 spike test scripts (sudden burst) │ ├── baselines/ │ │ └── <scenario>.baseline.json # Expected p50/p95/p99 latency, throughput │ └── thresholds.js # Shared k6 threshold definitions ├── fixtures/ │ ├── factories/ │ │ └── <entity>.factory.ts # Test data factories (fishery / factory-girl pattern) │ ├── seed-data/ │ │ ├── <entity>.seed.json # Static seed data for integration / E2E │ │ └── seed-runner.ts # Script to load seed data into test DBs │ └── mocks/ │ ├── <external-api>.mock.ts # External API mock servers (MSW / nock) │ └── <service>.stub.ts # Internal service stubs └── coverage/ └── thresholds.json # Per-service and global coverage gates
Workspace Output (Claude-Production-Grade-Suite/qa-engineer/
)
Claude-Production-Grade-Suite/qa-engineer/Claude-Production-Grade-Suite/qa-engineer/ ├── test-plan.md # Master test plan with traceability matrix ├── coverage-report.md # Coverage analysis and findings └── findings.md # QA findings and recommendations
Phases
Execute each phase sequentially. Do NOT skip phases. Each phase builds on the outputs of the previous one.
Parallel Execution Strategy
After Phase 1 (Test Planning), Phases 2-6 run in parallel — each test type is independent:
# After test plan is written, spawn all test types simultaneously: Agent(prompt="Write unit tests following Phase 2 rules. Read test-plan.md for traceability. Write to tests/unit/.", ...) Agent(prompt="Write integration tests following Phase 3 rules. Read test-plan.md. Write to tests/integration/.", ...) Agent(prompt="Write contract tests following Phase 4 rules. Read test-plan.md. Write to tests/contract/.", ...) Agent(prompt="Write E2E tests following Phase 5 rules. Read test-plan.md. Write to tests/e2e/.", ...) Agent(prompt="Write performance tests following Phase 6 rules. Read test-plan.md. Write to tests/performance/.", ...)
Wait for all 5 agents to complete, then run Phase 7 (Test Infrastructure) sequentially — it needs all test files to configure CI.
Why this works: Each test type reads source code independently and writes to its own directory. No conflicts. The test plan from Phase 1 provides shared context.
Execution order:
- Phase 1: Test Planning (sequential — foundational)
- Phases 2-6: Unit + Integration + Contract + E2E + Performance (PARALLEL)
- Phase 7: Test Infrastructure (sequential — needs all test files)
Phase 1 — Test Planning
Goal: Produce a traceability matrix linking every BRD acceptance criterion to concrete test cases, categorized by test type.
Inputs to read:
- BRD / PRD acceptance criteria (every GIVEN/WHEN/THEN or equivalent)
API contracts (OpenAPI specs, AsyncAPI specs)api/
data models andschemas/
sequence diagramsdocs/architecture/
service structure (list all services, handlers, repos)services/
component and page structure (if frontend exists; otherwise skip frontend inputs)frontend/
Actions:
- Extract every acceptance criterion and assign a unique ID (AC-001, AC-002, ...).
- For each criterion, determine which test types are required (unit, integration, contract, e2e, performance).
- Identify all services, modules, and components that need test coverage.
- Identify all external dependencies that require mocking or test containers.
- Identify critical user flows for E2E coverage.
- Identify performance-sensitive endpoints for load testing.
- Define coverage thresholds per service (lines, branches, functions).
Output: Write
Claude-Production-Grade-Suite/qa-engineer/test-plan.md with the following sections:
- Scope — What is being tested, what is explicitly out of scope
- Test Strategy — Test pyramid approach, which test types cover which risk areas
- Traceability Matrix — Table mapping AC-ID to test case IDs, test type, and priority
- Environment Requirements — Containers, external services, env vars needed
- Coverage Targets — Per-service and global coverage gates
- Risk Register — Areas with high complexity or insufficient testability
Phase 2 — Unit Tests
Goal: Test each service's business logic, handlers, and repositories in isolation with full mocking of external dependencies.
Inputs to read:
source code for each serviceservices/- The test plan from Phase 1
Rules:
- One test file per source file. Mirror the source directory structure under
.tests/unit/<service>/ - Mock ALL external dependencies: databases, caches, message brokers, HTTP clients, other services.
- Use dependency injection or module mocking — never patch globals.
- Test the happy path, error paths, edge cases, and boundary values for every public function.
- For handlers/controllers: test request parsing, validation error responses, correct status codes, response body shape.
- For services/domain logic: test business rule enforcement, state transitions, calculation correctness.
- For repositories: test query construction, parameter binding, result mapping (with mocked DB driver).
- For validators: test every validation rule, including null, empty, boundary, and malformed inputs.
- Every test must have a descriptive name that reads as a specification:
.it("should return 404 when order does not exist for the given user") - Use factories from
for test data — never inline large object literals.tests/fixtures/factories/ - Assert on specific values, not just truthiness. Prefer
overtoEqual
.toBeTruthy - Test error types and messages, not just that an error was thrown.
Output: Write test files to
tests/unit/<service>/.
Also write factories to
tests/fixtures/factories/ as you discover entity shapes.
Phase 3 — Integration Tests
Goal: Test service interactions with real dependencies using testcontainers or docker-compose.
Inputs to read:
database migrations, schemas, connection configsservices/
infrastructure requirements (which DBs, caches, brokers)docs/architecture/- The test plan from Phase 1
Rules:
- Write
with containers for every real dependency (PostgreSQL, Redis, Kafka, Elasticsearch, etc.). Pin exact image versions.tests/integration/docker-compose.test.yml - Write
with global before/after hooks: start containers, run migrations, seed base data, tear down after suite.tests/integration/setup.ts - Each integration test file connects to real containers — no mocks for the dependency under test.
- Test actual SQL queries against a real database with realistic data volumes (not just 1 row).
- Test cache read/write/eviction with a real Redis instance.
- Test message publishing and consumption with a real broker.
- Test API endpoints with real HTTP calls (supertest / httptest) against a running server.
- Each test must clean up its own data. Use transactions with rollback, or truncate tables in afterEach.
- Tests must be parallelizable — use unique identifiers to avoid cross-test data collisions.
- Test failure modes: connection timeouts, constraint violations, concurrent writes, deadlocks.
Output: Write test files to
tests/integration/<service>/.
Write
docker-compose.test.yml and setup.ts to tests/integration/.
Phase 4 — Contract Tests
Goal: Verify API consumers and providers agree on request/response schemas and that implementations conform to OpenAPI specifications.
Inputs to read:
OpenAPI specs and AsyncAPI specsapi/
API route definitions, request/response DTOsservices/
API client calls and expected response shapes (if frontend exists; otherwise skip consumer-side frontend contracts)frontend/
Rules:
- For each API consumer (frontend, other services), write a Pact consumer test that defines the expected interactions.
- For each API provider, write a Pact provider verification test that replays consumer expectations against the real provider.
- Write schema validation tests that load the OpenAPI spec and validate every endpoint's actual response against the schema.
- Test backward compatibility: if there are versioned APIs, verify old consumers still work with new providers.
- For async APIs (events, messages), write contract tests for message schemas using AsyncAPI specs.
- Configure Pact Broker connection in
(even if the broker URL is a placeholder).pact-broker.config.ts - Contract tests must fail if a required field is removed, a type changes, or a new required field is added without consumer agreement.
Output: Write contract tests to
tests/contract/.
Phase 5 — E2E Tests
Goal: Test critical user flows end-to-end through the full stack.
Inputs to read:
- BRD / PRD user stories and acceptance criteria (especially the critical path)
pages and navigation flow (if frontend exists; otherwise API-only E2E)frontend/
API endpointsservices/- The test plan from Phase 1 (critical user flows identified)
Rules:
- Identify the 5-10 most critical user flows (signup, login, core CRUD, payment, etc.).
- For API E2E: chain multiple API calls that represent a complete user journey. Use real auth tokens. Validate side effects (DB state, emails sent, events published).
- For UI E2E (skip if frontend not found): use Page Object Model pattern. Each page gets a class in
.tests/e2e/ui/pages/ - UI tests must use resilient selectors:
attributes, ARIA roles — never CSS classes or DOM structure.data-testid - Write a smoke test suite (
) that covers the absolute minimum "is the app alive" checks. This runs on every deploy.smoke.e2e.ts - E2E tests must be idempotent — running them twice produces the same result.
- Include setup/teardown that creates test users, seeds required data, and cleans up after.
- Add explicit waits for async operations — never use arbitrary
calls.sleep() - For visual regression (skip if frontend not found): capture screenshots of key pages and compare against baselines.
- Configure test timeouts generously (30s+ per test) — E2E is slow by nature.
- Cross-boundary journey testing (boundary-safety protocol pattern 5): For every multi-system flow (auth, payment, email, webhook), write at least one E2E test that traces the COMPLETE journey from user action to final state. Auth test must verify: unauthenticated user visits protected page → redirected to login → authenticates → redirected back to original page → sees authenticated content. Payment test must verify: user clicks pay → payment provider processes → callback fires → order status updates → user sees confirmation. Do NOT just test individual hops — test the full chain.
- Framework navigation correctness: Verify that no
or client-side<Link>
targets API routes, external URLs, or auth endpoints. These must use rawnavigate()
or<a href>
for full HTTP requests.window.location
Output: Write E2E tests and page objects to
tests/e2e/. Write Playwright or Cypress config.
Phase 6 — Performance Tests
Goal: Establish performance baselines and create load/stress test scripts for performance-sensitive endpoints.
Inputs to read:
NFRs (latency targets, throughput requirements, SLOs)docs/architecture/
API endpoints (especially high-traffic ones)services/- The test plan from Phase 1 (performance-sensitive areas)
Rules:
- Write k6 scripts (JavaScript). Each script targets a specific scenario (e.g., "user browsing products", "checkout flow under load").
- Load tests: simulate sustained normal traffic. Define realistic ramp-up patterns (e.g., 0 -> 100 VUs over 2 min, hold 10 min, ramp down).
- Stress tests: find the breaking point. Ramp VUs aggressively until error rate exceeds 5% or p99 exceeds SLO.
- Spike tests: simulate sudden traffic bursts (0 -> 500 VUs in 10 seconds).
- Define thresholds in each script:
,http_req_duration['p(95)'] < 500
.http_req_failed < 0.01 - Write baseline JSON files that record expected performance under normal load. CI compares against these.
- Use realistic test data — not the same request repeated. Parameterize with CSV data files or k6 SharedArray.
- Include authentication in test scripts (token generation, session management).
- Test both read-heavy and write-heavy endpoints separately.
- Add custom metrics for business-critical operations (e.g.,
).order_processing_time
Output: Write k6 scripts to
tests/performance/. Write baseline files to tests/performance/baselines/.
Phase 7 — Test Infrastructure
Goal: Configure CI test execution, coverage enforcement, and test reliability tooling.
Inputs to read:
- All test files generated in Phases 2-6
- Coverage thresholds from the test plan
- Project CI/CD system (GitHub Actions, GitLab CI, etc.)
Actions:
- Write
with per-service and global coverage gates:tests/coverage/thresholds.json{ "global": { "lines": 80, "branches": 75, "functions": 80, "statements": 80 }, "services": { "<service-name>": { "lines": 85, "branches": 80, "functions": 85, "statements": 85 } } } - Write
(or.github/workflows/test.yml
) with:ci/test-config.yml- Unit test stage — runs first, fast, no containers. Fails fast on coverage threshold breach.
- Integration test stage — starts docker-compose dependencies, runs integration suite, tears down.
- Contract test stage — runs Pact tests, publishes results to broker.
- E2E test stage — deploys to test environment, runs smoke + full E2E suite.
- Performance test stage — runs load tests against staging, compares to baselines.
- Parallel execution: split unit and integration tests across multiple CI runners by service.
- Test result artifacts: JUnit XML reports, coverage HTML reports, k6 JSON results.
- Flaky test detection: track test pass/fail history, quarantine tests with >5% flake rate.
- Retry policy: retry failed E2E tests up to 2 times before marking as failed.
- Write seed data runner to
.tests/fixtures/seed-data/seed-runner.ts - Write external API mock configurations to
.tests/fixtures/mocks/
Output: Write CI config to
.github/workflows/test.yml, coverage thresholds and test infrastructure to tests/.
Common Mistakes
| # | Mistake | Why It Fails | What to Do Instead |
|---|---|---|---|
| 1 | Writing tests inside or source directories | Pollutes source directories; violates pipeline separation | Always write tests to at project root exclusively |
| 2 | Testing implementation details instead of behavior | Tests break on every refactor, providing no safety net | Test public interfaces, inputs, and outputs — not private methods or internal state |
| 3 | Using type or skipping type assertions in test mocks | Mocks drift from real interfaces silently; tests pass but code is broken | Type mocks against the real interface; use or equivalent |
| 4 | Sharing mutable state between tests | Tests pass in isolation but fail when run together; order-dependent results | Reset state in beforeEach; use factory functions that return fresh instances |
| 5 | Hardcoding connection strings, ports, or URLs in test files | Tests break in CI, on other machines, or when container ports change | Use environment variables with sensible defaults; read from docker-compose labels |
| 6 | Writing integration tests that mock the dependency under test | You are just writing unit tests with extra steps; real bugs slip through | If testing DB queries, use a real database. If testing cache, use real Redis. Mock only the things NOT under test |
| 7 | E2E tests that depend on specific database IDs or auto-increment values | Tests break when seed data changes or when run against a non-empty database | Create test data as part of test setup; reference by unique business identifiers, not DB IDs |
| 8 | Performance test scripts with a single hardcoded request | Does not simulate real traffic patterns; results are misleading | Parameterize requests with varied data; simulate realistic user think-time with |
| 9 | Coverage thresholds set to 100% | Encourages meaningless tests written just to hit the number; blocks legitimate PRs | Set realistic thresholds (80-85% lines, 75-80% branches); focus on critical path coverage |
| 10 | Ignoring test execution time | Slow test suites get skipped by developers; CI feedback loops become painful | Parallelize tests by service; keep unit suite under 60 seconds; keep integration suite under 5 minutes |
| 11 | Not testing error paths and failure modes | Happy-path-only tests miss the bugs that actually cause production incidents | For every success test, write at least one failure test: invalid input, timeout, auth failure, conflict |
| 12 | Writing E2E tests with for async waits | Flaky on slow CI runners; wastes time on fast ones | Use explicit wait-for conditions: poll for element visibility, API response, or DB state change |
| 13 | Contract tests that only check status codes | Schema changes, missing fields, and type mismatches go undetected | Validate full response body shape, field types, required fields, and enum values against the contract |
| 14 | No seed data strategy — each test creates its own world from scratch | Integration and E2E suites become extremely slow; redundant setup logic everywhere | Build a shared seed-data layer with factories and a seed runner; tests add only their unique data on top |
| 15 | Generating test files without reading the actual implementation first | Tests reference nonexistent functions, wrong parameter names, or incorrect module paths | Always read the source file before writing its test file; match imports, function signatures, and error types exactly |
| 16 | Auth E2E tests that only check "token returned" | Misses redirect bugs, callback misconfig, and infinite loops that only appear in the full browser flow | Test the complete journey: visit protected page → redirect to login → authenticate → land on original page with authenticated state |
| 17 | Not testing cross-system flows end-to-end | Payment tests that check "Stripe returns success" but never check "order status is updated and user sees confirmation" miss the integration point bugs | For every multi-system flow (auth, payment, webhook), trace from user action to final visible state |
Execution Checklist
Before marking the skill as complete, verify:
-
has a traceability matrix covering every BRD acceptance criterionClaude-Production-Grade-Suite/qa-engineer/test-plan.md - Every service in
has corresponding unit tests inservices/tests/unit/ - Every repository/data-access module has integration tests with real database containers
- Every API endpoint has at least one contract test validating its schema
- The top 5-10 critical user flows have E2E tests
- At least 3 performance-sensitive endpoints have k6 load test scripts with baselines
-
defines all required test containers with pinned versionstests/integration/docker-compose.test.yml -
defines realistic per-service coverage gatestests/coverage/thresholds.json -
orchestrates all test stages with parallelization and artifact collection.github/workflows/test.yml - All test factories are in
and reused across test typestests/fixtures/factories/ - No test file has hardcoded secrets, credentials, or environment-specific values
- All tests can run independently and in any order