EasyPlatform integration-test
[Testing] Generate or review integration tests. Modes: generate (from git changes or prompt), review (quality audit of existing tests), diagnose (analyze test failures). Subcutaneous tests with real DI, no mocks.
git clone https://github.com/duc01226/EasyPlatform
T=$(mktemp -d) && git clone --depth=1 https://github.com/duc01226/EasyPlatform "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/integration-test" ~/.claude/skills/duc01226-easyplatform-integration-test && rm -rf "$T"
.claude/skills/integration-test/SKILL.md<!-- SYNC:critical-thinking-mindset -->[IMPORTANT] Use
to break ALL work into small tasks BEFORE starting — including tasks for each file read. This prevents context loss from long files. For simple tasks, AI MUST ATTENTION ask user whether to skip.TaskCreate
<!-- /SYNC:critical-thinking-mindset --> <!-- SYNC:ai-mistake-prevention -->Critical Thinking Mindset — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.
<!-- /SYNC:ai-mistake-prevention -->AI Mistake Prevention — Failure modes to avoid on every task:
- Check downstream references before deleting. Deleting components causes documentation and code staleness cascades. Map all referencing files before removal.
- Verify AI-generated content against actual code. AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing.
- Trace full dependency chain after edits. Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain.
- Trace ALL code paths when verifying correctness. Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips — not just happy path.
- When debugging, ask "whose responsibility?" before fixing. Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer — never patch symptom site.
- Assume existing values are intentional — ask WHY before changing. Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code.
- Verify ALL affected outputs, not just the first. Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks.
- Holistic-first debugging — resist nearest-attention trap. When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis.
- Surgical changes — apply the diff test. Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly.
- Surface ambiguity before coding — don't pick silently. If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path.
Prerequisites: MUST ATTENTION READ before executing:
<!-- SYNC:understand-code-first --><!-- /SYNC:understand-code-first -->Understand Code First — HARD-GATE: Do NOT write, plan, or fix until you READ existing code.
- Search 3+ similar patterns (
/grep) — citeglobevidencefile:line- Read existing files in target area — understand structure, base classes, conventions
- Run
whenpython .claude/scripts/code_graph trace <file> --direction both --jsonexists.code-graph/graph.db- Map dependencies via
orconnections— know what depends on your targetcallers_of- Write investigation to
for non-trivial tasks (3+ files).ai/workspace/analysis/- Re-read analysis file before implementing — never work from memory alone
- NEVER invent new patterns when existing ones work — match exactly or document deviation
BLOCKED until:
Read target files- [ ]Grep 3+ patterns- [ ]Graph trace (if graph.db exists)- [ ]Assumptions verified with evidence- [ ]
references/integration-test-patterns.md
— Domain entity catalog, relationships, cross-service sync (read when task involves business entities/models) (content auto-injected by hook — check for [Injected: ...] header before reading)docs/project-reference/domain-entities-reference.md
— Test specifications by module (read existing TCs for expected behavior; verify test-to-spec traceability)docs/test-specs/
<!-- /SYNC:graph-impact-analysis -->Graph Impact Analysis — When
exists, run.code-graph/graph.dbto detect ALL files affected by changes (7 edge types: CALLS, MESSAGE_BUS, API_ENDPOINT, TRIGGERS_EVENT, PRODUCES_EVENT, TRIGGERS_COMMAND_EVENT, INHERITS). Compute gap: impacted_files - changed_files = potentially stale files. Risk: <5 Low, 5-20 Medium, >20 High. Useblast-radius --jsonfor deep chains on high-impact files.trace --direction downstream
CRITICAL: Search existing patterns FIRST. Before generating ANY test, grep for existing integration test files in the same service. Read at least 1 existing test file to match conventions (namespace, usings, collection name, base class, helper usage). Never generate tests that contradict established patterns in the codebase.
CRITICAL: NO Smoke/Fake/Useless Tests. Every test MUST execute actual commands/handlers and verify data state in the database — like a QC tester testing the real system. NO DI-resolution-only tests (
). NO exception-check-only tests (GetRequiredService + NotBeNullalone). Before writing assertions: READ the handler/entity/event source to understand WHAT fields change, WHAT entities are created/updated/deleted, WHAT event handlers fire. Assert specific field values in the database.exception.Should().BeNull()
<!-- SYNC:repeatable-test-principle -->CRITICAL: Async Polling for ALL Data Assertions. ALWAYS wrap data state assertions in the project's async polling/retry helper. This is the DEFAULT for ALL data verification — not just "async event handlers". Data persistence may be delayed by entity event handlers, message bus consumers, background jobs, or DB write latency. Async polling retries with timeout and is always safe. Rule: If you assert data in the database, use async polling. No exceptions.
<!-- /SYNC:repeatable-test-principle -->Infinitely Repeatable Tests — Tests MUST run N times without failure. Like manual QC — run the suite 100 times, each run just adds more data.
- Unique data per run: Use the project's unique ID generator for ALL entity IDs created in tests. NEVER hardcode IDs.
- Additive only: Tests create data, never delete/reset. Prior test runs MUST NOT interfere with current run.
- No schema rollback dependency: Tests work with current schema only. Never rely on schema rollback or migration reversals.
- Idempotent seeders: Fixture-level seeders use create-if-missing pattern (check existence before insert). Test-level data uses unique IDs per execution.
- No cleanup required: No teardown, no database reset between runs. Each test is isolated by unique seed data, not by cleanup.
- Unique names/codes: When entities require unique names/codes, append a unique suffix using the project's ID generator.
For test specifications and test case generation from PBIs, use
skill (preferred) or/tdd-specskill instead./test-spec
External Memory: For complex or lengthy work (research, analysis, scan, review), write intermediate findings and final results to a report file in
— prevents context loss and serves as deliverable.plans/reports/
<!-- SYNC:red-flag-stop-conditions -->Evidence Gate: MANDATORY IMPORTANT MUST ATTENTION — every claim, finding, and recommendation requires
proof or traced evidence with confidence percentage (>80% to act, <80% must verify first).file:line
<!-- /SYNC:red-flag-stop-conditions --> <!-- SYNC:rationalization-prevention -->Red Flag Stop Conditions — STOP and escalate to user via AskUserQuestion when:
- Confidence drops below 60% on any critical decision
- Changes would affect >20 files (blast radius too large)
- Cross-service boundary is being crossed
- Security-sensitive code (auth, crypto, PII handling)
- Breaking change detected (interface, API contract, DB schema)
- Test coverage would decrease after changes
- Approach requires technology/pattern not in the project
NEVER proceed past a red flag without explicit user approval.
<!-- /SYNC:rationalization-prevention --> <!-- SYNC:incremental-persistence -->Rationalization Prevention — AI skips steps via these evasions. Recognize and reject:
Evasion Rebuttal "Too simple for a plan" Simple + wrong assumptions = wasted time. Plan anyway. "I'll test after" RED before GREEN. Write/verify test first. "Already searched" Show grep evidence with . No proof = no search.file:line"Just do it" Still need TaskCreate. Skip depth, never skip tracking. "Just a small fix" Small fix in wrong location cascades. Verify file:line first. "Code is self-explanatory" Future readers need evidence trail. Document anyway. "Combine steps to save time" Combined steps dilute focus. Each step has distinct purpose.
<!-- /SYNC:incremental-persistence --> <!-- SYNC:subagent-return-contract -->Incremental Result Persistence — MANDATORY for all sub-agents or heavy inline steps processing >3 files.
- Before starting: Create report file
plans/reports/{skill}-{date}-{slug}.md- After each file/section reviewed: Append findings to report immediately — never hold in memory
- Return to main agent: Summary only (per SYNC:subagent-return-contract) with
pathFull report:- Main agent: Reads report file only when resolving specific blockers
Why: Context cutoff mid-execution loses ALL in-memory findings. Each disk write survives compaction. Partial results are better than no results.
Report naming:
plans/reports/{skill-name}-{YYMMDD}-{HHmm}-{slug}.md
<!-- /SYNC:subagent-return-contract -->Sub-Agent Return Contract — When this skill spawns a sub-agent, the sub-agent MUST return ONLY this structure. Main agent reads only this summary — NEVER requests full sub-agent output inline.
## Sub-Agent Result: [skill-name] Status: ✅ PASS | ⚠️ PARTIAL | ❌ FAIL Confidence: [0-100]% ### Findings (Critical/High only — max 10 bullets) - [severity] [file:line] [finding] ### Actions Taken - [file changed] [what changed] ### Blockers (if any) - [blocker description] Full report: plans/reports/[skill-name]-[date]-[slug].mdMain agent reads
file ONLY when: (a) resolving a specific blocker, or (b) building a fix plan. Sub-agent writes full report incrementally (per SYNC:incremental-persistence) — not held in memory.Full report
Quick Summary
Goal: Generate integration test files for commands/queries using real DI (no mocks).
Project Pattern Discovery
Before implementation, search your codebase for project-specific patterns:
- Search for:
,IntegrationTest
,TestFixture
,TestUserContextIntegrationTestBase - Look for: existing test projects, test collection definitions, service-specific test base classes
MANDATORY IMPORTANT MUST ATTENTION Plan ToDo Task to READ
for project-specific patterns and code examples. If file not found, continue with search-based discovery above.integration-test-reference.md
Five modes: (1) From git changes (default) — detects uncommitted command/query files and generates matching tests. (2) From prompt — user specifies what to test. (3) Review — audit existing tests for quality, best practices, and flaky patterns. (4) Diagnose — analyze test failures to determine root cause (test bug vs code bug). (5) Verify-traceability — check test code matches test specs and feature docs.
Workflow:
- Detect mode — See Mode Detection section below
- Find targets — Identify test/command/query files
- Gather context — Read relevant files for the detected mode
- Execute — Generate, review, diagnose, or verify depending on mode
- Report — Build check (generate), quality report (review), root cause (diagnose)
Key Rules:
- MUST ATTENTION search for existing test patterns in the same service BEFORE generating
- MUST ATTENTION READ
before writing any testreferences/integration-test-patterns.md - Organize by domain feature, NOT by type — command and query tests for the same domain go in the same folder (e.g.,
+Orders/OrderCommandIntegrationTests.*
). NEVER create aOrders/OrderQueryIntegrationTests.*
orQueries/
folder.Commands/ - Use the project's unique name generator for ALL string test data (search test utilities for unique name helpers)
- Use the project's entity assertion helpers for DB verification with built-in async polling (search test base classes for
or equivalent patterns)AssertEntity* - CRITICAL MUST ATTENTION ENSURE: ALWAYS wrap ALL data state assertions in the project's async polling/retry helper. This is the DEFAULT — not just for "async" handlers. Data may be delayed by entity event handlers, message bus consumers, or background jobs. Async polling retries with timeout and is always safe. If you assert data in DB → use async polling. No exceptions.
- CRITICAL MUST ATTENTION ENSURE: Before writing assertions, READ the handler/entity/event source code. Understand WHAT fields change, WHAT entities are created/updated/deleted, WHAT event handlers fire. Assert specific field values, not just non-null. Smoke-only is FORBIDDEN unless side effect is truly unobservable.
- Minimum 3 test methods: happy path, validation failure, DB state check
- Authorization tests: Include tests with multiple user contexts (use the project's user context factory) — verify authorized access succeeds AND unauthorized access is rejected
- Every test method MUST ATTENTION have
comment AND a test-spec annotation (e.g.,// TC-{FEATURE}-{NNN}: Description
in xUnit — adapt to your framework) — placed before the test method, outside method body[Trait("TestSpec", "TC-{FEATURE}-{NNN}")] - If no TC exists in feature docs, auto-create it in Section 15 before generating the test
- For comprehensive test spec generation before coding, use
first/tdd-spec
Be skeptical. Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence percentages (Idea should be more than 80%).
Mandatory Task Ordering (MUST ATTENTION FOLLOW)
When generating integration tests, ALWAYS create and execute tasks in this exact order:
-
FIRST task: Verify/upsert test specs in feature docs
- Read feature doc Section 15 (
) for the target domaindocs/business-features/{App}/detailed-features/ - Read test-specs doc (
) if existsdocs/test-specs/{App}/README.md - For each test case to generate: verify a matching
exists in docsTC-{FEATURE}-{NNN} - If TC is MISSING: create the TC entry in Section 15 with Priority, Status, GIVEN/WHEN/THEN, Evidence
- If TC is INCORRECT: update it to reflect current command/query behavior
- Output: a TC mapping list (TC code → test method name) for subsequent tasks
- Read feature doc Section 15 (
-
MIDDLE tasks: Implement integration tests
- Generate test files using the TC mapping from task 1
- Each test method gets a TC annotation before it (outside method body) — adapt to your test framework (C#/xUnit example):
// TC-OM-001: Create valid order — happy path [Trait("TestSpec", "TC-OM-001")] [Fact] public async Task CreateOrder_WhenValidData_ShouldCreateSuccessfully() - Follow all existing patterns from the project's test base classes (collection/group, category, unique data helpers, assertion helpers, etc.)
-
FINAL task: Verify bidirectional traceability
- Grep for test-spec annotations in the test project (e.g.,
in xUnit,[Trait("TestSpec", ...)]
in JUnit — adapt to your framework)@Tag - Grep all
in feature doc Section 15 / test-specs docTC-{FEATURE}-{NNN} - Verify every test method links to a doc TC, and every doc TC links back to a test
- Flag orphans: tests without doc TCs, or doc TCs without matching tests
- Update
field in feature doc TCs withIntegrationTest{File}::{MethodName}
- Grep for test-spec annotations in the test project (e.g.,
Module Abbreviation Registry
| Module | Abbreviation | Test Folder |
|---|---|---|
| Order Management | OM | |
| Inventory | INV | |
| User Profiles | UP | |
| Notification Management | NM | |
| Report Generation | RG | |
| Feedback | FB | |
| Background Jobs | BJ | — |
TC Code Numbering Rules
When creating new
TC-{FEATURE}-{NNN} codes:
- Always check the feature doc's first —
contains existing TC codes. New codes must not collide.docs/business-features/{App}/detailed-features/ - Existing docs use decade-based grouping — e.g., OM: 001-004 (CRUD), 011-013 (validation), 021-023 (permissions), 031-033 (events). Find the next free decade.
- If a collision is unavoidable — renumber in the doc side only (e.g., TC-OM-031 → TC-OM-034). Keep the test-spec annotation in the test file unchanged and add a renumbering note in the doc.
- Feature doc is the canonical registry — the test-spec annotation in test files is for traceability, not the source of truth for numbering.
Integration Test Generation
Mode Detection
Args = command/query name (e.g., "/integration-test CreateOrderCommand") → FROM-PROMPT mode: generate tests for the specified command/query No args (e.g., "/integration-test") → FROM-CHANGES mode: detect changed command/query files from git Args = "review" (e.g., "/integration-test review Orders") → REVIEW mode: audit existing test quality, find flaky patterns, check best practices Args = "diagnose" (e.g., "/integration-test diagnose OrderCommandIntegrationTests") → DIAGNOSE mode: analyze why tests fail — determine test bug vs code bug Args = "verify" (e.g., "/integration-test verify {Service}") → VERIFY-TRACEABILITY mode: check test code matches specs and feature docs
Step 1: Find Targets
From-Changes Mode (default)
Run via Bash tool:
git diff --name-only; git diff --cached --name-only
Filter for command/query files using the project's naming conventions (e.g.,
*Command.*, *Query.*). Path patterns for services and test projects come from docs/project-config.json → modules or backendServices. Extract service from path:
| Path pattern | Service | Test project |
|---|---|---|
Per service path pattern | {Service} | (or project equivalent) |
Search your codebase for existing
*.IntegrationTests.* projects to find the correct mapping.
If no test project exists: inform user "No integration test project for {service}. See CLAUDE.md Integration Testing section to create one."
If test file already exists: ask user overwrite or skip.
From-Prompt Mode
User specifies command/query name. Use Grep tool (NOT bash grep):
Grep pattern="class {CommandName}" path="." glob="*.cs" (adapt path and extension to your project)
Step 2: Gather Context
For each target, read these files (in parallel):
- Command/query file — extract: class name, result type, DTO property, entity type
- Existing test files in same service — use Glob
, read 1+ for conventions (collection name, trait, namespace, usings, base class){Service}.IntegrationTests/**/*IntegrationTests.* - Service integration test base class — grep:
class.*ServiceIntegrationTestBase
— canonical templates (adapt {Service} placeholders)references/integration-test-patterns.md
Step 2b: Look Up TC Codes
For each target domain, read the matching test spec:
Section 15 (primary source of truth)docs/business-features/{App}/detailed-features/
(secondary reference)docs/test-specs/{App}/README.md
Build a mapping: test case description → TC code (e.g., "create valid order" → TC-OM-001). If no TC exists, CREATE IT in the feature doc Section 15 before generating the test. If TC is outdated or incorrect, UPDATE IT first. This is NOT optional — the doc is the source of truth and must be correct before tests reference it. If no TC exists and feature doc Section 15 is missing, run
/tdd-spec first to generate test specifications.
Step 3: Generate Test File
File path:
{project-test-dir}/{Service}.IntegrationTests/{Domain}/{CommandName}IntegrationTests{ext} (adapt path and extension to your project's conventions — see docs/project-config.json → integrationTestVerify.testProjectPattern)
Folder = domain feature.
is the business domain (Orders, Inventory, Notifications, UserProfiles, etc.), NOT the CQRS type. Both command and query tests for the same domain live in the same folder.{Domain}
Structure (C#/xUnit example — adapt namespace, collection/group attribute, category annotation, and base class to your framework):
#region using FluentAssertions; // ... service-specific usings (copy from existing tests) #endregion namespace {Service}.IntegrationTests.{Domain}; [Collection({Service}IntegrationTestCollection.Name)] [Trait("Category", "Command")] // or "Query" public class {CommandName}IntegrationTests : {Service}ServiceIntegrationTestBase { // Minimum 3 tests: happy path, validation failure, DB state verification }
Test method naming:
{CommandName}_When{Condition}_Should{Expectation}
Required patterns per command type:
| Command type | Required tests |
|---|---|
| Save/Create | Happy path + validation failure + DB state |
| Update | Create-then-update + verify updated fields in DB |
| Delete | Create-then-delete + |
| Query | Filter returns results + pagination + empty result |
Step 4: Verify
Build the test project using your project's build tool (see
/integration-test-verify for config-driven build and run).
Check:
- Test collection/group attribute present with correct collection name (framework-specific:
,[Collection]
, etc.)@Nested - Test category annotation present (framework-specific:
,[Trait("Category", ...)]
,@Tag
, etc.)@Category - All string test data uses the project's unique name generator
- User context created via the project's user context factory
- DB assertions use the project's entity assertion helpers with async polling
- No mocks — real DI only
- Every test method has
comment + test-spec annotation (adapt to your framework)// TC-{FEATURE}-{NNN}: Description
Example Files to Study
Search your codebase for existing integration test files to use as reference (adapt file extension to your project):
# Find existing integration test files (adapt path and extension to your project) find . -name "*IntegrationTests.*" -type f find . -name "*IntegrationTestBase.*" -type f find . -name "*IntegrationTestFixture.*" -type f
| Pattern | Shows |
|---|---|
| Create + update + validation |
| Query with create-then-query |
| Delete + cascade |
| Service base class pattern |
Related
| Skill | Relationship | When to Use |
|---|---|---|
| TC source — generates test specs this skill consumes | Run FIRST to create TCs in feature doc Section 15 before generating tests |
| Heavyweight planning — feeds test strategies | Use for complex test planning requiring deep investigation |
| Dashboard sync — keeps docs/test-specs/ in sync | Run AFTER generating tests to update the cross-module dashboard |
| Test runner — executes the generated tests | Run AFTER generating tests to verify they pass |
| Change review — reviews uncommitted changes | Run to review test files before committing |
How to Use for Each Case
Case: Generate tests from existing test specs (feature docs Section 15)
/integration-test CreateOrderCommand
→ Reads Section 15 TCs, generates test file with TC annotations
Case: Generate tests from git changes (default)
/integration-test
→ Detects changed command/query files, checks Section 15 for matching TCs, generates tests
Case: Generate tests after /tdd-spec created new TCs
/tdd-spec → /integration-test
→ tdd-spec writes TCs to Section 15, then integration-test generates tests from those TCs
Case: Review existing tests for quality
/integration-test review Orders
→ Audits test quality, finds flaky patterns, checks best practices
Case: Diagnose test failures
/integration-test diagnose OrderCommandIntegrationTests
→ Analyzes failures, determines test bug vs code bug
Case: Verify test-spec traceability
/integration-test verify {Service}
→ Checks test code matches specs and feature docs bidirectionally
REVIEW Mode — Test Quality Audit
When mode = REVIEW, audit existing integration tests for quality, flaky patterns, and best practices.
Review Workflow
- Find test files — Glob
{Service}.IntegrationTests/{Domain}/**/*IntegrationTests.* - Read each test file — analyze for quality issues
- Generate quality report — categorized findings with severity
Review Checklist
Flaky Test Detection (CRITICAL)
These patterns cause intermittent failures — flag as HIGH severity:
- Missing async polling — DB assertions after async event handlers without async polling/retry (e.g.,
or equivalent). Direct assertions on state changed by background threads WILL flake.WaitUntilAsync() - Missing retry for eventual consistency — Any assertion that checks state modified by message bus consumers, event handlers, or background jobs without polling/retry wrapper
- Hardcoded delays —
,Thread.Sleep()
instead of condition-based polling (Task.Delay()
, retry loops with timeout)WaitUntil - Race conditions — Multiple tests modifying shared state without isolation (e.g., same entity ID, same user context)
- Non-unique test data — Hardcoded strings/IDs instead of unique generators (search your test utilities for a unique name helper, e.g.,
or equivalent)IntegrationTestHelper.UniqueName() - Time-dependent assertions — Tests that depend on
without time abstractionDateTime.Now
Best Practice Checks
- Collection/group attribute — All test classes have correct collection/group for shared fixture
- Category trait —
or equivalent categorization present[Trait("Category", "Command")] - TC annotation — Every test method has TC code comment + test spec trait/attribute
- Minimum test coverage — At least 3 tests per command: happy path, validation, DB state
- No mocks — Real DI only, no mock frameworks in integration tests
- Unique test data — All string data uses unique generators
- User context — Test user context via factory, not hardcoded
- DB assertions — Uses entity assertion helpers (not raw DB queries)
- Cleanup — Tests don't leave orphaned data that affects other tests
Code Quality Checks
- Method naming — Follows
pattern{Action}_When{Condition}_Should{Expectation} - Arrange-Act-Assert — Clear separation in test methods
- No logic in tests — No conditionals, loops, or complex setup in test methods
- Test independence — Each test can run in isolation
Review Report Format
# Integration Test Quality Report — {Domain} ## Summary - Tests scanned: {N} - Issues found: {N} (HIGH: {n}, MEDIUM: {n}, LOW: {n}) - Overall quality: {GOOD|NEEDS_WORK|CRITICAL} ## HIGH Severity Issues (Flaky Risk) | Test | Issue | Fix | | ------------ | ------------------------------------------------ | -------------------------------------- | | {MethodName} | DB assertion without polling after async handler | Wrap in project's async polling helper | ## MEDIUM Severity Issues (Best Practice) | Test | Issue | Fix | | ---- | ----- | --- | ## LOW Severity Issues (Style) | Test | Issue | Fix | | ---- | ----- | --- | ## Recommendations 1. {Prioritized fix suggestions}
DIAGNOSE Mode — Test Failure Root Cause Analysis
When mode = DIAGNOSE, analyze failing tests to determine whether the failure is a test bug or an application code bug.
Diagnose Workflow
- Identify failing tests — User provides test class name or run test suite to collect failures
- Read test code — Understand what the test expects
- Read application code — Trace the command/query handler path
- Compare expected vs actual — Determine root cause
- Classify — Test bug vs code bug vs infrastructure issue
- Report — Root cause + recommended fix
Root Cause Decision Tree
Test fails ├── Compilation error? │ ├── Missing type/method → Code changed, test not updated → TEST BUG │ └── Wrong import/namespace → TEST BUG ├── Timeout/hang? │ ├── Missing async/await → TEST BUG │ ├── Deadlock in handler → CODE BUG │ └── Infrastructure down → INFRA ISSUE ├── Assertion failure? │ ├── Expected value wrong? │ │ ├── Test hardcoded old behavior → TEST BUG │ │ └── Business logic changed → CODE BUG (if unintended) or TEST BUG (if intended change) │ ├── Null/empty result? │ │ ├── Entity not found → Check if create step succeeded → TEST BUG (setup) or CODE BUG (handler) │ │ └── Query returns empty → Check filters/predicates → CODE BUG │ ├── Intermittent (passes sometimes)? │ │ ├── Async assertion without polling → TEST BUG (add async polling/retry) │ │ ├── Non-unique test data collision → TEST BUG (use unique name generator) │ │ └── Race condition in handler → CODE BUG │ └── Wrong count/order? │ ├── Test data leak from other tests → TEST BUG (isolation) │ └── Logic error in query → CODE BUG ├── Validation error (expected success)? │ ├── Test sends invalid data → TEST BUG │ └── Validation rule too strict → CODE BUG └── Exception thrown? ├── Known exception type in handler → CODE BUG └── DI/config error → INFRA ISSUE
Diagnose Report Format
# Test Failure Diagnosis — {TestClass} ## Failing Tests | Test Method | Error Type | Root Cause | Classification | | ----------- | ----------------- | ------------- | --------------------------- | | {Method} | {AssertionFailed} | {Description} | TEST BUG / CODE BUG / INFRA | ## Detailed Analysis ### {MethodName} **Error:** {error message} **Expected:** {what test expected} **Actual:** {what happened} **Root Cause:** {explanation with code evidence} **Classification:** TEST BUG | CODE BUG | INFRA ISSUE **Evidence:** `{file}:{line}` — {what the code does} **Recommended Fix:** {specific fix with code location} ## Summary - Test bugs: {N} — fix in test code - Code bugs: {N} — fix in application code - Infra issues: {N} — fix in configuration/environment
VERIFY-TRACEABILITY Mode — Test ↔ Spec ↔ Feature Doc Verification
When mode = VERIFY, perform bidirectional traceability check between test code, test specifications, and feature documentation.
Verify Workflow
- Collect test methods — Grep for test spec annotations in test project
- Collect doc TCs — Read feature doc Section 15 for all TC entries
- Build 3-way matrix — Test code ↔ test-specs/ ↔ feature doc Section 15
- Identify mismatches — Orphans, stale references, behavior drift
- Classify mismatches — Which source is correct?
- Report — Traceability matrix + recommended fixes
Mismatch Classification
When test code and spec disagree, determine which is correct:
| Scenario | Likely Correct Source | Action |
|---|---|---|
| Test passes, spec describes different behavior | Test (reflects current code) | Update spec to match test |
| Test fails, spec describes expected behavior | Spec (test is stale) | Update test to match spec |
| Test exists, no spec | Test (spec was never written) | Create spec from test |
| Spec exists, no test | Spec (test was never written) | Generate test from spec |
| Test and spec agree, but code behaves differently | Spec (code has regression) | Fix code or update spec+test |
Verification Checklist
- Every test method has a matching TC in feature doc Section 15
- Every TC in Section 15 has a matching test method (or is marked
)Status: Untested - TC descriptions in docs match what the test actually validates
- Evidence file paths in TCs point to current (not stale) code locations
- Test annotations match TC IDs (no typos, no orphaned IDs)
- Priority levels in docs match test categorization
-
dashboard is in sync with feature doc Section 15docs/test-specs/
Verify Report Format
# Traceability Report — {Service} ## Summary - TCs in feature docs: {N} - Test methods with TC annotations: {N} - Fully traced (both directions): {N} - Orphaned tests (no matching TC): {N} - Orphaned TCs (no matching test): {N} - Mismatched behavior: {N} ## Traceability Matrix | TC ID | Feature Doc? | Test Code? | Dashboard? | Status | | --------- | ------------ | ---------- | ---------- | ------------ | | TC-OM-001 | ✅ | ✅ | ✅ | Traced | | TC-OM-005 | ✅ | ❌ | ✅ | Missing test | | TC-OM-010 | ❌ | ✅ | ❌ | Missing spec | ## Orphaned Tests (no matching TC in docs) | Test File | Method | Annotation | Action | | --------- | -------- | ---------- | ------------------------ | | {file} | {method} | TC-OM-010 | Create TC in feature doc | ## Orphaned TCs (no matching test) | TC ID | Doc Location | Priority | Action | | --------- | ------------ | -------- | ----------------------------------- | | TC-OM-005 | Section 15 | P0 | Generate test via /integration-test | ## Behavior Mismatches | TC ID | Doc Says | Test Does | Correct Source | Action | | ----- | -------- | --------- | -------------- | ------ | ## Recommendations 1. {Prioritized actions}
Test Data Setup Guidelines
| Pattern | When to Use | Example |
|---|---|---|
| Per-test inline | Simple tests, unique data | |
| Factory methods | Repeated entity creation | |
| Builder pattern | Complex entities with many fields | |
| Shared fixture | Reference data needed by all tests | |
Rules:
- Every test creates its own data — no shared mutable state between tests
- Use unique identifiers for ALL string data (search your test utilities for a unique name/data generator helper)
- Factory methods return valid entities by default — tests override only what they test
- Cross-entity dependencies: create parent first, then child (e.g., create User, then create Order for that User)
- Seed data: If the feature requires reference/lookup data, set up seed data in the collection fixture or per-test preconditions
Workflow Recommendation
MANDATORY IMPORTANT MUST ATTENTION — NO EXCEPTIONS: If you are NOT already in a workflow, you MUST ATTENTION use
to ask the user. Do NOT judge task complexity or decide this is "simple enough to skip" — the user decides whether to use a workflow, not you:AskUserQuestion
- Activate
workflow (Recommended) — scout → integration-test → testtest-to-integration- Execute
directly — run this skill standalone/integration-test
Test Execution & Failure Diagnosis (MANDATORY)
IMPORTANT MUST ATTENTION: After generating/modifying integration tests, you MUST:
- Run tests: Use
(reads/integration-test-verifyfromquickRunCommand)docs/project-config.json- If tests fail: Diagnose root cause — is the failure because (a) test code has wrong setup/assertions → fix test code, or (b) actual service code has a bug → report as finding
- Never mark done until tests pass. Unrun tests have zero value.
- Iterate: Fix → rerun → verify until all tests pass or failures are confirmed as service bugs
Next Steps
MANDATORY IMPORTANT MUST ATTENTION — NO EXCEPTIONS after completing this skill, you MUST ATTENTION use
AskUserQuestion to present these options. Do NOT skip because the task seems "simple" or "obvious" — the user decides:
- "/integration-test-verify (Recommended)" — Run integration tests to verify they all pass
- "/workflow-review-changes" — Review all changes before committing
- "Skip, continue manually" — user decides
Closing Reminders
MANDATORY IMPORTANT MUST ATTENTION break work into small todo tasks using
TaskCreate BEFORE starting.
MANDATORY IMPORTANT MUST ATTENTION validate decisions with user via AskUserQuestion — never auto-decide.
MANDATORY IMPORTANT MUST ATTENTION add a final review todo task to verify work quality.
MANDATORY IMPORTANT MUST ATTENTION READ the following files before starting:
<!-- SYNC:understand-code-first:reminder -->
- MANDATORY IMPORTANT MUST ATTENTION search 3+ existing patterns and read code BEFORE any modification. Run graph trace when graph.db exists. <!-- /SYNC:understand-code-first:reminder --> <!-- SYNC:graph-impact-analysis:reminder -->
- MANDATORY IMPORTANT MUST ATTENTION run
when graph.db exists. Flag impacted files NOT in changeset as potentially stale. <!-- /SYNC:graph-impact-analysis:reminder --> <!-- SYNC:red-flag-stop-conditions:reminder -->blast-radius - MANDATORY IMPORTANT MUST ATTENTION STOP after 3 failed fix attempts. Report all attempts, ask user before continuing. <!-- /SYNC:red-flag-stop-conditions:reminder --> <!-- SYNC:rationalization-prevention:reminder -->
- MANDATORY IMPORTANT MUST ATTENTION follow ALL steps regardless of perceived simplicity. "Too simple to plan" is an evasion, not a reason. <!-- /SYNC:rationalization-prevention:reminder -->
- MANDATORY IMPORTANT MUST ATTENTION READ
before starting <!-- SYNC:critical-thinking-mindset:reminder -->references/integration-test-patterns.md - MUST ATTENTION apply critical thinking — every claim needs traced proof, confidence >80% to act. Anti-hallucination: never present guess as fact. <!-- /SYNC:critical-thinking-mindset:reminder --> <!-- SYNC:ai-mistake-prevention:reminder -->
- MUST ATTENTION apply AI mistake prevention — holistic-first debugging, fix at responsible layer, surface ambiguity before coding, re-read files after compaction. <!-- /SYNC:ai-mistake-prevention:reminder -->