EasyPlatform integration-test

[Testing] Generate or review integration tests. Modes: generate (from git changes or prompt), review (quality audit of existing tests), diagnose (analyze test failures). Subcutaneous tests with real DI, no mocks.

install

source · Clone the upstream repo

git clone https://github.com/duc01226/EasyPlatform

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/duc01226/EasyPlatform "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/integration-test" ~/.claude/skills/duc01226-easyplatform-integration-test && rm -rf "$T"

manifest: .claude/skills/integration-test/SKILL.md

source content

[IMPORTANT] Use
TaskCreate
to break ALL work into small tasks BEFORE starting — including tasks for each file read. This prevents context loss from long files. For simple tasks, AI MUST ATTENTION ask user whether to skip.

Critical Thinking Mindset — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.

AI Mistake Prevention — Failure modes to avoid on every task:

Check downstream references before deleting. Deleting components causes documentation and code staleness cascades. Map all referencing files before removal.

Verify AI-generated content against actual code. AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing.

Trace full dependency chain after edits. Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain.

Trace ALL code paths when verifying correctness. Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips — not just happy path.

When debugging, ask "whose responsibility?" before fixing. Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer — never patch symptom site.

Assume existing values are intentional — ask WHY before changing. Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code.

Verify ALL affected outputs, not just the first. Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks.

Holistic-first debugging — resist nearest-attention trap. When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis.

Surgical changes — apply the diff test. Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly.

Surface ambiguity before coding — don't pick silently. If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path.

Prerequisites: MUST ATTENTION READ before executing:

Understand Code First — HARD-GATE: Do NOT write, plan, or fix until you READ existing code.
Search 3+ similar patterns (
grep
/
glob
) — cite
file:line
evidence
Read existing files in target area — understand structure, base classes, conventions
Run
python .claude/scripts/code_graph trace <file> --direction both --json
when
.code-graph/graph.db
exists
Map dependencies via
connections
or
callers_of
— know what depends on your target
Write investigation to
.ai/workspace/analysis/
for non-trivial tasks (3+ files)
Re-read analysis file before implementing — never work from memory alone

NEVER invent new patterns when existing ones work — match exactly or document deviation
BLOCKED until:
- [ ]
Read target files
- [ ]
Grep 3+ patterns
- [ ]
Graph trace (if graph.db exists)
- [ ]
Assumptions verified with evidence

```
references/integration-test-patterns.md
```
```
docs/project-reference/domain-entities-reference.md
```
— Domain entity catalog, relationships, cross-service sync (read when task involves business entities/models) (content auto-injected by hook — check for [Injected: ...] header before reading)
```
docs/test-specs/
```
— Test specifications by module (read existing TCs for expected behavior; verify test-to-spec traceability)

Graph Impact Analysis — When
.code-graph/graph.db
exists, run
blast-radius --json
to detect ALL files affected by changes (7 edge types: CALLS, MESSAGE_BUS, API_ENDPOINT, TRIGGERS_EVENT, PRODUCES_EVENT, TRIGGERS_COMMAND_EVENT, INHERITS). Compute gap: impacted_files - changed_files = potentially stale files. Risk: <5 Low, 5-20 Medium, >20 High. Use
trace --direction downstream
for deep chains on high-impact files.

CRITICAL: Search existing patterns FIRST. Before generating ANY test, grep for existing integration test files in the same service. Read at least 1 existing test file to match conventions (namespace, usings, collection name, base class, helper usage). Never generate tests that contradict established patterns in the codebase.

CRITICAL: NO Smoke/Fake/Useless Tests. Every test MUST execute actual commands/handlers and verify data state in the database — like a QC tester testing the real system. NO DI-resolution-only tests (
GetRequiredService + NotBeNull
). NO exception-check-only tests (
exception.Should().BeNull()
alone). Before writing assertions: READ the handler/entity/event source to understand WHAT fields change, WHAT entities are created/updated/deleted, WHAT event handlers fire. Assert specific field values in the database.

CRITICAL: Async Polling for ALL Data Assertions. ALWAYS wrap data state assertions in the project's async polling/retry helper. This is the DEFAULT for ALL data verification — not just "async event handlers". Data persistence may be delayed by entity event handlers, message bus consumers, background jobs, or DB write latency. Async polling retries with timeout and is always safe. Rule: If you assert data in the database, use async polling. No exceptions.

Infinitely Repeatable Tests — Tests MUST run N times without failure. Like manual QC — run the suite 100 times, each run just adds more data.

Unique data per run: Use the project's unique ID generator for ALL entity IDs created in tests. NEVER hardcode IDs.

Additive only: Tests create data, never delete/reset. Prior test runs MUST NOT interfere with current run.

No schema rollback dependency: Tests work with current schema only. Never rely on schema rollback or migration reversals.

Idempotent seeders: Fixture-level seeders use create-if-missing pattern (check existence before insert). Test-level data uses unique IDs per execution.

No cleanup required: No teardown, no database reset between runs. Each test is isolated by unique seed data, not by cleanup.

Unique names/codes: When entities require unique names/codes, append a unique suffix using the project's ID generator.

For test specifications and test case generation from PBIs, use
/tdd-spec
skill (preferred) or
/test-spec
skill instead.

External Memory: For complex or lengthy work (research, analysis, scan, review), write intermediate findings and final results to a report file in
plans/reports/
— prevents context loss and serves as deliverable.

Evidence Gate: MANDATORY IMPORTANT MUST ATTENTION — every claim, finding, and recommendation requires
file:line
proof or traced evidence with confidence percentage (>80% to act, <80% must verify first).

Red Flag Stop Conditions — STOP and escalate to user via AskUserQuestion when:

Confidence drops below 60% on any critical decision

Changes would affect >20 files (blast radius too large)

Cross-service boundary is being crossed

Security-sensitive code (auth, crypto, PII handling)

Breaking change detected (interface, API contract, DB schema)

Test coverage would decrease after changes

Approach requires technology/pattern not in the project

NEVER proceed past a red flag without explicit user approval.

Rationalization Prevention — AI skips steps via these evasions. Recognize and reject:
Evasion Rebuttal
"Too simple for a plan" Simple + wrong assumptions = wasted time. Plan anyway.
"I'll test after" RED before GREEN. Write/verify test first.
"Already searched"
Show grep evidence with
file:line
. No proof = no search.
"Just do it" Still need TaskCreate. Skip depth, never skip tracking.
"Just a small fix" Small fix in wrong location cascades. Verify file:line first.
"Code is self-explanatory" Future readers need evidence trail. Document anyway.
"Combine steps to save time" Combined steps dilute focus. Each step has distinct purpose.

Incremental Result Persistence — MANDATORY for all sub-agents or heavy inline steps processing >3 files.
Before starting: Create report file
plans/reports/{skill}-{date}-{slug}.md
After each file/section reviewed: Append findings to report immediately — never hold in memory
Return to main agent: Summary only (per SYNC:subagent-return-contract) with
Full report:
path
Main agent: Reads report file only when resolving specific blockers
Why: Context cutoff mid-execution loses ALL in-memory findings. Each disk write survives compaction. Partial results are better than no results.

Report naming:
plans/reports/{skill-name}-{YYMMDD}-{HHmm}-{slug}.md

Sub-Agent Return Contract — When this skill spawns a sub-agent, the sub-agent MUST return ONLY this structure. Main agent reads only this summary — NEVER requests full sub-agent output inline.

## Sub-Agent Result: [skill-name]

Status: ✅ PASS | ⚠️ PARTIAL | ❌ FAIL
Confidence: [0-100]%

### Findings (Critical/High only — max 10 bullets)

- [severity] [file:line] [finding]

### Actions Taken

- [file changed] [what changed]

### Blockers (if any)

- [blocker description]

Full report: plans/reports/[skill-name]-[date]-[slug].md

Main agent reads

Full report

file ONLY when: (a) resolving a specific blocker, or (b) building a fix plan. Sub-agent writes full report incrementally (per SYNC:incremental-persistence) — not held in memory.

Quick Summary

Goal: Generate integration test files for commands/queries using real DI (no mocks).

Project Pattern Discovery

Before implementation, search your codebase for project-specific patterns:

Search for:

IntegrationTest

TestFixture

TestUserContext

IntegrationTestBase

Look for: existing test projects, test collection definitions, service-specific test base classes

MANDATORY IMPORTANT MUST ATTENTION Plan ToDo Task to READ
integration-test-reference.md
for project-specific patterns and code examples. If file not found, continue with search-based discovery above.

Five modes: (1) From git changes (default) — detects uncommitted command/query files and generates matching tests. (2) From prompt — user specifies what to test. (3) Review — audit existing tests for quality, best practices, and flaky patterns. (4) Diagnose — analyze test failures to determine root cause (test bug vs code bug). (5) Verify-traceability — check test code matches test specs and feature docs.

Workflow:

Detect mode — See Mode Detection section below
Find targets — Identify test/command/query files
Gather context — Read relevant files for the detected mode
Execute — Generate, review, diagnose, or verify depending on mode
Report — Build check (generate), quality report (review), root cause (diagnose)

Key Rules:

MUST ATTENTION search for existing test patterns in the same service BEFORE generating
MUST ATTENTION READ
```
references/integration-test-patterns.md
```
before writing any test
Organize by domain feature, NOT by type — command and query tests for the same domain go in the same folder (e.g.,
```
Orders/OrderCommandIntegrationTests.*
```
+
```
Orders/OrderQueryIntegrationTests.*
```
). NEVER create a
```
Queries/
```
or
```
Commands/
```
folder.
Use the project's unique name generator for ALL string test data (search test utilities for unique name helpers)
Use the project's entity assertion helpers for DB verification with built-in async polling (search test base classes for
```
AssertEntity*
```
or equivalent patterns)
CRITICAL MUST ATTENTION ENSURE: ALWAYS wrap ALL data state assertions in the project's async polling/retry helper. This is the DEFAULT — not just for "async" handlers. Data may be delayed by entity event handlers, message bus consumers, or background jobs. Async polling retries with timeout and is always safe. If you assert data in DB → use async polling. No exceptions.
CRITICAL MUST ATTENTION ENSURE: Before writing assertions, READ the handler/entity/event source code. Understand WHAT fields change, WHAT entities are created/updated/deleted, WHAT event handlers fire. Assert specific field values, not just non-null. Smoke-only is FORBIDDEN unless side effect is truly unobservable.
Minimum 3 test methods: happy path, validation failure, DB state check
Authorization tests: Include tests with multiple user contexts (use the project's user context factory) — verify authorized access succeeds AND unauthorized access is rejected
Every test method MUST ATTENTION have
```
// TC-{FEATURE}-{NNN}: Description
```
comment AND a test-spec annotation (e.g.,
```
[Trait("TestSpec", "TC-{FEATURE}-{NNN}")]
```
in xUnit — adapt to your framework) — placed before the test method, outside method body
If no TC exists in feature docs, auto-create it in Section 15 before generating the test
For comprehensive test spec generation before coding, use
```
/tdd-spec
```
first

Be skeptical. Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence percentages (Idea should be more than 80%).

Mandatory Task Ordering (MUST ATTENTION FOLLOW)

When generating integration tests, ALWAYS create and execute tasks in this exact order:

FIRST task: Verify/upsert test specs in feature docs
- Read feature doc Section 15 (
```
docs/business-features/{App}/detailed-features/
```
  ) for the target domain
- Read test-specs doc (
```
docs/test-specs/{App}/README.md
```
  ) if exists
- For each test case to generate: verify a matching
```
TC-{FEATURE}-{NNN}
```
  exists in docs
- If TC is MISSING: create the TC entry in Section 15 with Priority, Status, GIVEN/WHEN/THEN, Evidence
- If TC is INCORRECT: update it to reflect current command/query behavior
- Output: a TC mapping list (TC code → test method name) for subsequent tasks

MIDDLE tasks: Implement integration tests

Generate test files using the TC mapping from task 1

Each test method gets a TC annotation before it (outside method body) — adapt to your test framework (C#/xUnit example):

// TC-OM-001: Create valid order — happy path
[Trait("TestSpec", "TC-OM-001")]
[Fact]
public async Task CreateOrder_WhenValidData_ShouldCreateSuccessfully()

Follow all existing patterns from the project's test base classes (collection/group, category, unique data helpers, assertion helpers, etc.)

FINAL task: Verify bidirectional traceability
- Grep for test-spec annotations in the test project (e.g.,
```
[Trait("TestSpec", ...)]
```
  in xUnit,
```
@Tag
```
  in JUnit — adapt to your framework)
- Grep all
```
TC-{FEATURE}-{NNN}
```
  in feature doc Section 15 / test-specs doc
- Verify every test method links to a doc TC, and every doc TC links back to a test
- Flag orphans: tests without doc TCs, or doc TCs without matching tests
- Update
```
IntegrationTest
```
  field in feature doc TCs with
```
{File}::{MethodName}
```

Module Abbreviation Registry

Module	Abbreviation	Test Folder
Order Management	OM	`Orders/`
Inventory	INV	`Inventory/`
User Profiles	UP	`UserProfiles/`
Notification Management	NM	`Notifications/`
Report Generation	RG	`Reports/`
Feedback	FB	`Feedback/`
Background Jobs	BJ	—

TC Code Numbering Rules

When creating new

TC-{FEATURE}-{NNN}

codes:

Always check the feature doc's first —
```
docs/business-features/{App}/detailed-features/
```
contains existing TC codes. New codes must not collide.
Existing docs use decade-based grouping — e.g., OM: 001-004 (CRUD), 011-013 (validation), 021-023 (permissions), 031-033 (events). Find the next free decade.
If a collision is unavoidable — renumber in the doc side only (e.g., TC-OM-031 → TC-OM-034). Keep the test-spec annotation in the test file unchanged and add a renumbering note in the doc.
Feature doc is the canonical registry — the test-spec annotation in test files is for traceability, not the source of truth for numbering.

Integration Test Generation

Mode Detection

Args = command/query name (e.g., "/integration-test CreateOrderCommand")
  → FROM-PROMPT mode: generate tests for the specified command/query

No args (e.g., "/integration-test")
  → FROM-CHANGES mode: detect changed command/query files from git

Args = "review" (e.g., "/integration-test review Orders")
  → REVIEW mode: audit existing test quality, find flaky patterns, check best practices

Args = "diagnose" (e.g., "/integration-test diagnose OrderCommandIntegrationTests")
  → DIAGNOSE mode: analyze why tests fail — determine test bug vs code bug

Args = "verify" (e.g., "/integration-test verify {Service}")
  → VERIFY-TRACEABILITY mode: check test code matches specs and feature docs

Step 1: Find Targets

From-Changes Mode (default)

Run via Bash tool:

git diff --name-only; git diff --cached --name-only

Filter for command/query files using the project's naming conventions (e.g.,

*Command.*

*Query.*

). Path patterns for services and test projects come from

docs/project-config.json

→

modules

backendServices

. Extract service from path:

Path pattern	Service	Test project
Per `docs/project-config.json` service path pattern	{Service}	`{Service}.IntegrationTests` (or project equivalent)

Search your codebase for existing

*.IntegrationTests.*

projects to find the correct mapping.

If no test project exists: inform user "No integration test project for {service}. See CLAUDE.md Integration Testing section to create one."

If test file already exists: ask user overwrite or skip.

From-Prompt Mode

User specifies command/query name. Use Grep tool (NOT bash grep):

Grep pattern="class {CommandName}" path="." glob="*.cs" (adapt path and extension to your project)

Step 2: Gather Context

For each target, read these files (in parallel):

Command/query file — extract: class name, result type, DTO property, entity type
Existing test files in same service — use Glob
```
{Service}.IntegrationTests/**/*IntegrationTests.*
```
, read 1+ for conventions (collection name, trait, namespace, usings, base class)
Service integration test base class — grep:
```
class.*ServiceIntegrationTestBase
```
references/integration-test-patterns.md
— canonical templates (adapt {Service} placeholders)

Step 2b: Look Up TC Codes

For each target domain, read the matching test spec:

docs/business-features/{App}/detailed-features/

Section 15 (primary source of truth)

```
docs/test-specs/{App}/README.md
```
(secondary reference)

Build a mapping: test case description → TC code (e.g., "create valid order" → TC-OM-001). If no TC exists, CREATE IT in the feature doc Section 15 before generating the test. If TC is outdated or incorrect, UPDATE IT first. This is NOT optional — the doc is the source of truth and must be correct before tests reference it. If no TC exists and feature doc Section 15 is missing, run

/tdd-spec

first to generate test specifications.

Step 3: Generate Test File

File path:

{project-test-dir}/{Service}.IntegrationTests/{Domain}/{CommandName}IntegrationTests{ext}

(adapt path and extension to your project's conventions — see

docs/project-config.json

→

integrationTestVerify.testProjectPattern

)

Folder = domain feature.
{Domain}
is the business domain (Orders, Inventory, Notifications, UserProfiles, etc.), NOT the CQRS type. Both command and query tests for the same domain live in the same folder.

Structure (C#/xUnit example — adapt namespace, collection/group attribute, category annotation, and base class to your framework):

#region
using FluentAssertions;
// ... service-specific usings (copy from existing tests)
#endregion

namespace {Service}.IntegrationTests.{Domain};

[Collection({Service}IntegrationTestCollection.Name)]
[Trait("Category", "Command")]  // or "Query"
public class {CommandName}IntegrationTests : {Service}ServiceIntegrationTestBase
{
    // Minimum 3 tests: happy path, validation failure, DB state verification
}

Test method naming:

{CommandName}_When{Condition}_Should{Expectation}

Required patterns per command type:

Command type	Required tests
Save/Create	Happy path + validation failure + DB state
Update	Create-then-update + verify updated fields in DB
Delete	Create-then-delete + `AssertEntityDeletedAsync`
Query	Filter returns results + pagination + empty result

Step 4: Verify

Build the test project using your project's build tool (see

/integration-test-verify

for config-driven build and run).

Check:

Test collection/group attribute present with correct collection name (framework-specific:
```
[Collection]
```
,
```
@Nested
```
, etc.)
Test category annotation present (framework-specific:
```
[Trait("Category", ...)]
```
,
```
@Tag
```
,
```
@Category
```
, etc.)
All string test data uses the project's unique name generator
User context created via the project's user context factory
DB assertions use the project's entity assertion helpers with async polling
No mocks — real DI only
Every test method has
```
// TC-{FEATURE}-{NNN}: Description
```
comment + test-spec annotation (adapt to your framework)

Example Files to Study

Search your codebase for existing integration test files to use as reference (adapt file extension to your project):

# Find existing integration test files (adapt path and extension to your project)
find . -name "*IntegrationTests.*" -type f
find . -name "*IntegrationTestBase.*" -type f
find . -name "*IntegrationTestFixture.*" -type f

Pattern	Shows
`{Service}.IntegrationTests/{Domain}/CommandIntegrationTests.`	Create + update + validation
`{Service}.IntegrationTests/{Domain}/QueryIntegrationTests.`	Query with create-then-query
`{Service}.IntegrationTests/{Domain}/DeleteIntegrationTests.`	Delete + cascade
`{Service}.IntegrationTests/{Service}ServiceIntegrationTestBase.*`	Service base class pattern

Skill	Relationship	When to Use
`tdd-spec`	TC source — generates test specs this skill consumes	Run FIRST to create TCs in feature doc Section 15 before generating tests
`test-spec`	Heavyweight planning — feeds test strategies	Use for complex test planning requiring deep investigation
`test-specs-docs`	Dashboard sync — keeps docs/test-specs/ in sync	Run AFTER generating tests to update the cross-module dashboard
`test`	Test runner — executes the generated tests	Run AFTER generating tests to verify they pass
`review-changes`	Change review — reviews uncommitted changes	Run to review test files before committing

How to Use for Each Case

Case: Generate tests from existing test specs (feature docs Section 15)

/integration-test CreateOrderCommand

→ Reads Section 15 TCs, generates test file with TC annotations

Case: Generate tests from git changes (default)

/integration-test

→ Detects changed command/query files, checks Section 15 for matching TCs, generates tests

Case: Generate tests after /tdd-spec created new TCs

/tdd-spec → /integration-test

→ tdd-spec writes TCs to Section 15, then integration-test generates tests from those TCs

Case: Review existing tests for quality

/integration-test review Orders

→ Audits test quality, finds flaky patterns, checks best practices

Case: Diagnose test failures

/integration-test diagnose OrderCommandIntegrationTests

→ Analyzes failures, determines test bug vs code bug

Case: Verify test-spec traceability

/integration-test verify {Service}

→ Checks test code matches specs and feature docs bidirectionally

REVIEW Mode — Test Quality Audit

When mode = REVIEW, audit existing integration tests for quality, flaky patterns, and best practices.

Review Workflow

Find test files — Glob

{Service}.IntegrationTests/{Domain}/**/*IntegrationTests.*

Read each test file — analyze for quality issues
Generate quality report — categorized findings with severity

Review Checklist

Flaky Test Detection (CRITICAL)

These patterns cause intermittent failures — flag as HIGH severity:

Missing async polling — DB assertions after async event handlers without async polling/retry (e.g.,
```
WaitUntilAsync()
```
or equivalent). Direct assertions on state changed by background threads WILL flake.
Missing retry for eventual consistency — Any assertion that checks state modified by message bus consumers, event handlers, or background jobs without polling/retry wrapper
Hardcoded delays —
```
Thread.Sleep()
```
,
```
Task.Delay()
```
instead of condition-based polling (
```
WaitUntil
```
, retry loops with timeout)
Race conditions — Multiple tests modifying shared state without isolation (e.g., same entity ID, same user context)
Non-unique test data — Hardcoded strings/IDs instead of unique generators (search your test utilities for a unique name helper, e.g.,
```
IntegrationTestHelper.UniqueName()
```
or equivalent)
Time-dependent assertions — Tests that depend on
```
DateTime.Now
```
without time abstraction

Best Practice Checks

Collection/group attribute — All test classes have correct collection/group for shared fixture
Category trait —
```
[Trait("Category", "Command")]
```
or equivalent categorization present
TC annotation — Every test method has TC code comment + test spec trait/attribute
Minimum test coverage — At least 3 tests per command: happy path, validation, DB state
No mocks — Real DI only, no mock frameworks in integration tests
Unique test data — All string data uses unique generators
User context — Test user context via factory, not hardcoded
DB assertions — Uses entity assertion helpers (not raw DB queries)
Cleanup — Tests don't leave orphaned data that affects other tests

Code Quality Checks

Method naming — Follows

{Action}_When{Condition}_Should{Expectation}

pattern

Arrange-Act-Assert — Clear separation in test methods
No logic in tests — No conditionals, loops, or complex setup in test methods
Test independence — Each test can run in isolation

Review Report Format

# Integration Test Quality Report — {Domain}

## Summary

- Tests scanned: {N}
- Issues found: {N} (HIGH: {n}, MEDIUM: {n}, LOW: {n})
- Overall quality: {GOOD|NEEDS_WORK|CRITICAL}

## HIGH Severity Issues (Flaky Risk)

| Test         | Issue                                            | Fix                                    |
| ------------ | ------------------------------------------------ | -------------------------------------- |
| {MethodName} | DB assertion without polling after async handler | Wrap in project's async polling helper |

## MEDIUM Severity Issues (Best Practice)

| Test | Issue | Fix |
| ---- | ----- | --- |

## LOW Severity Issues (Style)

| Test | Issue | Fix |
| ---- | ----- | --- |

## Recommendations

1. {Prioritized fix suggestions}

DIAGNOSE Mode — Test Failure Root Cause Analysis

When mode = DIAGNOSE, analyze failing tests to determine whether the failure is a test bug or an application code bug.

Diagnose Workflow

Identify failing tests — User provides test class name or run test suite to collect failures
Read test code — Understand what the test expects
Read application code — Trace the command/query handler path
Compare expected vs actual — Determine root cause
Classify — Test bug vs code bug vs infrastructure issue
Report — Root cause + recommended fix

Root Cause Decision Tree

Test fails
├── Compilation error?
│   ├── Missing type/method → Code changed, test not updated → TEST BUG
│   └── Wrong import/namespace → TEST BUG
├── Timeout/hang?
│   ├── Missing async/await → TEST BUG
│   ├── Deadlock in handler → CODE BUG
│   └── Infrastructure down → INFRA ISSUE
├── Assertion failure?
│   ├── Expected value wrong?
│   │   ├── Test hardcoded old behavior → TEST BUG
│   │   └── Business logic changed → CODE BUG (if unintended) or TEST BUG (if intended change)
│   ├── Null/empty result?
│   │   ├── Entity not found → Check if create step succeeded → TEST BUG (setup) or CODE BUG (handler)
│   │   └── Query returns empty → Check filters/predicates → CODE BUG
│   ├── Intermittent (passes sometimes)?
│   │   ├── Async assertion without polling → TEST BUG (add async polling/retry)
│   │   ├── Non-unique test data collision → TEST BUG (use unique name generator)
│   │   └── Race condition in handler → CODE BUG
│   └── Wrong count/order?
│       ├── Test data leak from other tests → TEST BUG (isolation)
│       └── Logic error in query → CODE BUG
├── Validation error (expected success)?
│   ├── Test sends invalid data → TEST BUG
│   └── Validation rule too strict → CODE BUG
└── Exception thrown?
    ├── Known exception type in handler → CODE BUG
    └── DI/config error → INFRA ISSUE

Diagnose Report Format

# Test Failure Diagnosis — {TestClass}

## Failing Tests

| Test Method | Error Type        | Root Cause    | Classification              |
| ----------- | ----------------- | ------------- | --------------------------- |
| {Method}    | {AssertionFailed} | {Description} | TEST BUG / CODE BUG / INFRA |

## Detailed Analysis

### {MethodName}

**Error:** {error message}
**Expected:** {what test expected}
**Actual:** {what happened}
**Root Cause:** {explanation with code evidence}
**Classification:** TEST BUG | CODE BUG | INFRA ISSUE
**Evidence:** `{file}:{line}` — {what the code does}
**Recommended Fix:** {specific fix with code location}

## Summary

- Test bugs: {N} — fix in test code
- Code bugs: {N} — fix in application code
- Infra issues: {N} — fix in configuration/environment

VERIFY-TRACEABILITY Mode — Test ↔ Spec ↔ Feature Doc Verification

When mode = VERIFY, perform bidirectional traceability check between test code, test specifications, and feature documentation.

Verify Workflow

Collect test methods — Grep for test spec annotations in test project
Collect doc TCs — Read feature doc Section 15 for all TC entries
Build 3-way matrix — Test code ↔ test-specs/ ↔ feature doc Section 15
Identify mismatches — Orphans, stale references, behavior drift
Classify mismatches — Which source is correct?
Report — Traceability matrix + recommended fixes

Mismatch Classification

When test code and spec disagree, determine which is correct:

Scenario	Likely Correct Source	Action
Test passes, spec describes different behavior	Test (reflects current code)	Update spec to match test
Test fails, spec describes expected behavior	Spec (test is stale)	Update test to match spec
Test exists, no spec	Test (spec was never written)	Create spec from test
Spec exists, no test	Spec (test was never written)	Generate test from spec
Test and spec agree, but code behaves differently	Spec (code has regression)	Fix code or update spec+test

Verification Checklist

Every test method has a matching TC in feature doc Section 15
Every TC in Section 15 has a matching test method (or is marked
```
Status: Untested
```
)
TC descriptions in docs match what the test actually validates
Evidence file paths in TCs point to current (not stale) code locations
Test annotations match TC IDs (no typos, no orphaned IDs)
Priority levels in docs match test categorization
```
docs/test-specs/
```
dashboard is in sync with feature doc Section 15

Verify Report Format

# Traceability Report — {Service}

## Summary

- TCs in feature docs: {N}
- Test methods with TC annotations: {N}
- Fully traced (both directions): {N}
- Orphaned tests (no matching TC): {N}
- Orphaned TCs (no matching test): {N}
- Mismatched behavior: {N}

## Traceability Matrix

| TC ID     | Feature Doc? | Test Code? | Dashboard? | Status       |
| --------- | ------------ | ---------- | ---------- | ------------ |
| TC-OM-001 | ✅           | ✅         | ✅         | Traced       |
| TC-OM-005 | ✅           | ❌         | ✅         | Missing test |
| TC-OM-010 | ❌           | ✅         | ❌         | Missing spec |

## Orphaned Tests (no matching TC in docs)

| Test File | Method   | Annotation | Action                   |
| --------- | -------- | ---------- | ------------------------ |
| {file}    | {method} | TC-OM-010  | Create TC in feature doc |

## Orphaned TCs (no matching test)

| TC ID     | Doc Location | Priority | Action                              |
| --------- | ------------ | -------- | ----------------------------------- |
| TC-OM-005 | Section 15   | P0       | Generate test via /integration-test |

## Behavior Mismatches

| TC ID | Doc Says | Test Does | Correct Source | Action |
| ----- | -------- | --------- | -------------- | ------ |

## Recommendations

1. {Prioritized actions}

Test Data Setup Guidelines

Pattern	When to Use	Example
Per-test inline	Simple tests, unique data	`var order = new CreateOrderCommand { Name = UniqueName() }`
Factory methods	Repeated entity creation	`TestDataFactory.CreateValidOrder()`
Builder pattern	Complex entities with many fields	`new OrderBuilder().WithStatus(Active).WithItems(3).Build()`
Shared fixture	Reference data needed by all tests	`CollectionFixture.SeedReferenceData()`

Rules:

Every test creates its own data — no shared mutable state between tests
Use unique identifiers for ALL string data (search your test utilities for a unique name/data generator helper)
Factory methods return valid entities by default — tests override only what they test
Cross-entity dependencies: create parent first, then child (e.g., create User, then create Order for that User)
Seed data: If the feature requires reference/lookup data, set up seed data in the collection fixture or per-test preconditions

Workflow Recommendation

MANDATORY IMPORTANT MUST ATTENTION — NO EXCEPTIONS: If you are NOT already in a workflow, you MUST ATTENTION use
AskUserQuestion
to ask the user. Do NOT judge task complexity or decide this is "simple enough to skip" — the user decides whether to use a workflow, not you:
Activate
test-to-integration
workflow (Recommended) — scout → integration-test → test
Execute
/integration-test
directly — run this skill standalone

Test Execution & Failure Diagnosis (MANDATORY)

IMPORTANT MUST ATTENTION: After generating/modifying integration tests, you MUST:
Run tests: Use
/integration-test-verify
(reads
quickRunCommand
from
docs/project-config.json
)
If tests fail: Diagnose root cause — is the failure because (a) test code has wrong setup/assertions → fix test code, or (b) actual service code has a bug → report as finding

Never mark done until tests pass. Unrun tests have zero value.

Iterate: Fix → rerun → verify until all tests pass or failures are confirmed as service bugs

Next Steps

MANDATORY IMPORTANT MUST ATTENTION — NO EXCEPTIONS after completing this skill, you MUST ATTENTION use

AskUserQuestion

to present these options. Do NOT skip because the task seems "simple" or "obvious" — the user decides:

"/integration-test-verify (Recommended)" — Run integration tests to verify they all pass
"/workflow-review-changes" — Review all changes before committing
"Skip, continue manually" — user decides

Closing Reminders

MANDATORY IMPORTANT MUST ATTENTION break work into small todo tasks using

TaskCreate

BEFORE starting. MANDATORY IMPORTANT MUST ATTENTION validate decisions with user via

AskUserQuestion

— never auto-decide. MANDATORY IMPORTANT MUST ATTENTION add a final review todo task to verify work quality. MANDATORY IMPORTANT MUST ATTENTION READ the following files before starting:

MANDATORY IMPORTANT MUST ATTENTION search 3+ existing patterns and read code BEFORE any modification. Run graph trace when graph.db exists.
MANDATORY IMPORTANT MUST ATTENTION run
```
blast-radius
```
when graph.db exists. Flag impacted files NOT in changeset as potentially stale.
MANDATORY IMPORTANT MUST ATTENTION STOP after 3 failed fix attempts. Report all attempts, ask user before continuing.
MANDATORY IMPORTANT MUST ATTENTION follow ALL steps regardless of perceived simplicity. "Too simple to plan" is an evasion, not a reason.
MANDATORY IMPORTANT MUST ATTENTION READ
```
references/integration-test-patterns.md
```
before starting
MUST ATTENTION apply critical thinking — every claim needs traced proof, confidence >80% to act. Anti-hallucination: never present guess as fact.
MUST ATTENTION apply AI mistake prevention — holistic-first debugging, fix at responsible layer, surface ambiguity before coding, re-read files after compaction.

Evasion	Rebuttal
"Too simple for a plan"	Simple + wrong assumptions = wasted time. Plan anyway.
"I'll test after"	RED before GREEN. Write/verify test first.
"Already searched"	Show grep evidence with `file:line` . No proof = no search.
"Just do it"	Still need TaskCreate. Skip depth, never skip tracking.
"Just a small fix"	Small fix in wrong location cascades. Verify file:line first.
"Code is self-explanatory"	Future readers need evidence trail. Document anyway.
"Combine steps to save time"	Combined steps dilute focus. Each step has distinct purpose.