EasyPlatform debug

[Fix & Debug] Systematic debugging with root cause investigation. Use when bugfix workflow reaches debug step.

install

source · Clone the upstream repo

git clone https://github.com/duc01226/EasyPlatform

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/duc01226/EasyPlatform "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/debug-investigate" ~/.claude/skills/duc01226-easyplatform-debug && rm -rf "$T"

manifest: .claude/skills/debug-investigate/SKILL.md

source content

[IMPORTANT] Use
TaskCreate
to break ALL work into small tasks BEFORE starting — including tasks for each file read. This prevents context loss from long files. For simple tasks, AI MUST ATTENTION ask user whether to skip.

Critical Thinking Mindset — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.

AI Mistake Prevention — Failure modes to avoid on every task:

Check downstream references before deleting. Deleting components causes documentation and code staleness cascades. Map all referencing files before removal.

Verify AI-generated content against actual code. AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing.

Trace full dependency chain after edits. Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain.

Trace ALL code paths when verifying correctness. Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips — not just happy path.

When debugging, ask "whose responsibility?" before fixing. Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer — never patch symptom site.

Assume existing values are intentional — ask WHY before changing. Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code.

Verify ALL affected outputs, not just the first. Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks.

Holistic-first debugging — resist nearest-attention trap. When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis.

Surgical changes — apply the diff test. Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly.

Surface ambiguity before coding — don't pick silently. If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path.

Understand Code First — HARD-GATE: Do NOT write, plan, or fix until you READ existing code.
Search 3+ similar patterns (
grep
/
glob
) — cite
file:line
evidence
Read existing files in target area — understand structure, base classes, conventions
Run
python .claude/scripts/code_graph trace <file> --direction both --json
when
.code-graph/graph.db
exists
Map dependencies via
connections
or
callers_of
— know what depends on your target
Write investigation to
.ai/workspace/analysis/
for non-trivial tasks (3+ files)
Re-read analysis file before implementing — never work from memory alone

NEVER invent new patterns when existing ones work — match exactly or document deviation
BLOCKED until:
- [ ]
Read target files
- [ ]
Grep 3+ patterns
- [ ]
Graph trace (if graph.db exists)
- [ ]
Assumptions verified with evidence

Evidence-Based Reasoning — Speculation is FORBIDDEN. Every claim needs proof.
Cite
file:line
, grep results, or framework docs for EVERY claim
Declare confidence: >80% act freely, 60-80% verify first, <60% DO NOT recommend

Cross-service validation required for architectural changes

"I don't have enough evidence" is valid and expected output
BLOCKED until:
- [ ]
Evidence file path (
file:line
)
- [ ]
Grep search performed
- [ ]
3+ similar patterns found
- [ ]
Confidence level stated

Forbidden without proof: "obviously", "I think", "should be", "probably", "this is because" If incomplete → output:
"Insufficient evidence. Verified: [...]. Not verified: [...]."

```
docs/project-reference/domain-entities-reference.md
```
— Domain entity catalog, relationships, cross-service sync (read when task involves business entities/models) (content auto-injected by hook — check for [Injected: ...] header before reading)

Estimation — Modified Fibonacci: 1(trivial) → 2(small) → 3(medium) → 5(large) → 8(very large) → 13(epic, SHOULD split) → 21(MUST ATTENTION split). Output
story_points
and
complexity
in plan frontmatter. Complexity auto-derived: 1-2=Low, 3-5=Medium, 8=High, 13+=Critical.

Red Flag Stop Conditions — STOP and escalate to user via AskUserQuestion when:

Confidence drops below 60% on any critical decision

Changes would affect >20 files (blast radius too large)

Cross-service boundary is being crossed

Security-sensitive code (auth, crypto, PII handling)

Breaking change detected (interface, API contract, DB schema)

Test coverage would decrease after changes

Approach requires technology/pattern not in the project

NEVER proceed past a red flag without explicit user approval.

Fix-Layer Accountability — NEVER fix at the crash site. Trace the full flow, fix at the owning layer.

AI default behavior: see error at Place A → fix Place A. This is WRONG. The crash site is a SYMPTOM, not the cause.

MANDATORY before ANY fix:

Trace full data flow — Map the complete path from data origin to crash site across ALL layers (storage → backend → API → frontend → UI). Identify where the bad state ENTERS, not where it CRASHES.

Identify the invariant owner — Which layer's contract guarantees this value is valid? That layer is responsible. Fix at the LOWEST layer that owns the invariant — not the highest layer that consumes it.

One fix, maximum protection — Ask: "If I fix here, does it protect ALL downstream consumers with ONE change?" If fix requires touching 3+ files with defensive checks, you are at the wrong layer — go lower.

Verify no bypass paths — Confirm all data flows through the fix point. Check for: direct construction skipping factories, clone/spread without re-validation, raw data not wrapped in domain models, mutations outside the model layer.

BLOCKED until:
- [ ]
Full data flow traced (origin → crash)
- [ ]
Invariant owner identified with
file:line
evidence
- [ ]
All access sites audited (grep count)
- [ ]
Fix layer justified (lowest layer that protects most consumers)

Anti-patterns (REJECT these):

"Fix it where it crashes" — Crash site ≠ cause site. Trace upstream.

"Add defensive checks at every consumer" — Scattered defense = wrong layer. One authoritative fix > many scattered guards.

"Both fix is safer" — Pick ONE authoritative layer. Redundant checks across layers send mixed signals about who owns the invariant.

Quick Summary

Goal: Investigate and identify root cause of a bug with evidence.

Workflow:

Reproduce — Understand expected vs actual behavior
Hypothesize — Form theories about root cause
Trace — Follow code paths with file:line evidence
Confirm — Verify root cause with grep/read evidence
Report — Output root cause with confidence level

Key Rules:

Debug Mindset: every claim needs file:line proof
[ROOT-CAUSE-FIX] Never patch symptoms. Trace full call chain to find WHO is responsible. Fix at correct layer.
Never assume first hypothesis is correct
Output: confirmed root cause OR "hypothesis, not confirmed" with evidence gaps
This is investigation-only — hand off to /fix for implementation

Root Cause Debugging — Systematic approach, never guess-and-check.
Reproduce — Confirm the issue exists with evidence (error message, stack trace, screenshot)

Isolate — Narrow to specific file/function/line using binary search + graph trace

Trace — Follow data flow from input to failure point. Read actual code, don't infer.

Hypothesize — Form theory with confidence %. State what evidence supports/contradicts it

Verify — Test hypothesis with targeted grep/read. One variable at a time.
Fix — Address root cause, not symptoms. Verify fix doesn't break callers via graph
connections
NEVER: Guess without evidence. Fix symptoms instead of cause. Skip reproduction step.

Incremental Result Persistence — MANDATORY for all sub-agents or heavy inline steps processing >3 files.
Before starting: Create report file
plans/reports/{skill}-{date}-{slug}.md
After each file/section reviewed: Append findings to report immediately — never hold in memory
Return to main agent: Summary only (per SYNC:subagent-return-contract) with
Full report:
path
Main agent: Reads report file only when resolving specific blockers
Why: Context cutoff mid-execution loses ALL in-memory findings. Each disk write survives compaction. Partial results are better than no results.

Report naming:
plans/reports/{skill-name}-{YYMMDD}-{HHmm}-{slug}.md

Sub-Agent Return Contract — When this skill spawns a sub-agent, the sub-agent MUST return ONLY this structure. Main agent reads only this summary — NEVER requests full sub-agent output inline.

## Sub-Agent Result: [skill-name]

Status: ✅ PASS | ⚠️ PARTIAL | ❌ FAIL
Confidence: [0-100]%

### Findings (Critical/High only — max 10 bullets)

- [severity] [file:line] [finding]

### Actions Taken

- [file changed] [what changed]

### Blockers (if any)

- [blocker description]

Full report: plans/reports/[skill-name]-[date]-[slug].md

Main agent reads

Full report

file ONLY when: (a) resolving a specific blocker, or (b) building a fix plan. Sub-agent writes full report incrementally (per SYNC:incremental-persistence) — not held in memory.

Debug Mindset (NON-NEGOTIABLE)

Be skeptical. Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence percentages (Idea should be more than 80%).

Do NOT assume the first hypothesis is correct — verify with actual code traces
Every root cause claim must include
```
file:line
```
evidence
If you cannot prove a root cause with a code trace, state "hypothesis, not confirmed"
Question assumptions: "Is this really the cause?" → trace the actual execution path
Challenge completeness: "Are there other contributing factors?" → check related code paths

Confidence & Evidence Gate

MANDATORY IMPORTANT MUST ATTENTION declare

Confidence: X%

with evidence list +

file:line

proof for EVERY claim.

Confidence	Meaning	Action
95-100%	Full trace verified	Report as confirmed root cause
80-94%	Main path verified, edge cases uncertain	Report with caveats
60-79%	Partial trace	Report as hypothesis
<60%	Insufficient evidence	DO NOT report — gather more evidence

Workflow Details

Step 1: Reproduce

Clarify expected vs actual behavior
Identify trigger conditions (user action, data state, timing)

Step 2: Hypothesize

Form 2-3 theories about root cause
Rank by likelihood based on symptoms

Step 3: Trace

For each hypothesis, trace the code path:
- Find entry point (API, UI, job, event)
- Follow through handlers/services
- Check data transformations and state changes
- Verify error handling paths
Use grep/read to collect
```
file:line
```
evidence

Step 4: Confirm

Match evidence to a single root cause
Verify the root cause explains ALL symptoms
Check for secondary contributing factors

Dependency Tracing (MANDATORY — DO NOT SKIP when graph.db exists)

.code-graph/graph.db

exists, you MUST ATTENTION use structural queries to trace dependencies:

Graph reveals ALL callers and consumers of buggy code — grep alone misses structural relationships.

Who calls the buggy function:

python .claude/scripts/code_graph query callers_of <function> --json

Who imports the buggy module:

python .claude/scripts/code_graph query importers_of <file> --json

What tests exist:

python .claude/scripts/code_graph query tests_for <function> --json

What does this function call:

python .claude/scripts/code_graph query callees_of <function> --json

Graph-Assisted Debugging

After identifying suspect files, use graph trace to understand the full context:

python .claude/scripts/code_graph trace <suspect-file> --direction both --json

— see what calls this code AND what it triggers downstream

python .claude/scripts/code_graph trace <suspect-file> --direction upstream --json

— find all callers that could trigger the bug

This reveals implicit connections (MESSAGE_BUS, event handlers) that may propagate the issue across services

Step 5: Report

Output: confirmed root cause with evidence chain
Include: affected files, data flow, fix recommendation
Hand off to
```
/fix
```
for implementation

⚠️ MANDATORY: Post-Fix Verification

After

/fix

applies changes,
/prove-fix
MUST ATTENTION be run. It builds code proof traces per change with confidence scores. This is non-negotiable in all fix workflows.

Red Flags — STOP (Debugging-Specific)

If you're thinking:

"I see the problem, let me fix it" — Seeing symptoms is not understanding root cause. Investigate first.
"Quick fix for now, investigate later" — Quick fixes mask bugs and create debt. Find root cause.
"Just try changing X and see" — One hypothesis at a time. Scientific method, not trial and error.
"Already tried 2+ fixes, one more" — 3+ failed fixes = STOP. Question the architecture, not the fix.
"The error message is misleading" — Read it again carefully. Error messages are usually right.
"It works on my machine" — Reproduce in the failing environment. Your environment hides bugs.
"This can't be the cause" — Verify with evidence, not intuition. Unlikely causes are still causes.
"It's OOM, must be a large object" — For memory exhaustion, check row count BEFORE row size. An unbounded query loading thousands of records is the more common cause. Triage: (1) Is there a missing DB-level filter for the triggering condition? (2) Is each row excessively large?

Workflow Recommendation

MANDATORY IMPORTANT MUST ATTENTION — NO EXCEPTIONS: If you are NOT already in a workflow, you MUST ATTENTION use
AskUserQuestion
to ask the user. Do NOT judge task complexity or decide this is "simple enough to skip" — the user decides whether to use a workflow, not you:
Activate
bugfix
workflow (Recommended) — scout → investigate → debug → plan → fix → prove-fix → review → test
Execute
/debug
directly — run this skill standalone

Next Steps (Standalone: MUST ATTENTION ask user via

AskUserQuestion

. Skip if inside workflow.)

MANDATORY IMPORTANT MUST ATTENTION — NO EXCEPTIONS after completing this skill, you MUST ATTENTION use

AskUserQuestion

to present these options. Do NOT skip because the task seems "simple" or "obvious" — the user decides:

"Proceed with full workflow (Recommended)" — I'll detect the best workflow to continue from here (debug complete, root cause identified). This ensures fix, verification, review, and testing steps aren't skipped.
"/fix" — Apply fix based on debug findings
"/plan" — If fix requires planning
"Skip, continue manually" — user decides

Standalone Review Gate (Non-Workflow Only)

MANDATORY IMPORTANT MUST ATTENTION: If this skill is called outside a workflow (standalone
/debug
), you MUST ATTENTION create a
TaskCreate
todo task for
/review-changes
as the last task in your task list. This ensures all changes are reviewed before commit even without a workflow enforcing it.

If already running inside a workflow (e.g.,
bugfix
), skip this — the workflow sequence handles
/review-changes
at the appropriate step.

Closing Reminders

MANDATORY IMPORTANT MUST ATTENTION break work into small todo tasks using

TaskCreate

BEFORE starting. MANDATORY IMPORTANT MUST ATTENTION validate decisions with user via

AskUserQuestion

— never auto-decide. MANDATORY IMPORTANT MUST ATTENTION add a final review todo task to verify work quality. MANDATORY IMPORTANT MUST ATTENTION READ the following files before starting:

MANDATORY IMPORTANT MUST ATTENTION search 3+ existing patterns and read code BEFORE any modification. Run graph trace when graph.db exists.
MANDATORY IMPORTANT MUST ATTENTION cite
```
file:line
```
evidence for every claim. Confidence >80% to act, <60% = do NOT recommend.
MANDATORY IMPORTANT MUST ATTENTION include
```
story_points
```
and
```
complexity
```
in plan frontmatter. SP > 8 = split.
MANDATORY IMPORTANT MUST ATTENTION STOP after 3 failed fix attempts. Report all attempts, ask user before continuing.
MANDATORY IMPORTANT MUST ATTENTION trace full data flow and fix at the owning layer, not the crash site. Audit all access sites before adding
```
?.
```
.
MUST ATTENTION apply critical thinking — every claim needs traced proof, confidence >80% to act. Anti-hallucination: never present guess as fact.
MUST ATTENTION apply AI mistake prevention — holistic-first debugging, fix at responsible layer, surface ambiguity before coding, re-read files after compaction.