Skills investigate

install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ashish797/founderclaw/investigate" ~/.claude/skills/openclaw-skills-investigate && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/ashish797/founderclaw/investigate" ~/.openclaw/skills/openclaw-skills-investigate && rm -rf "$T"
manifest: skills/ashish797/founderclaw/investigate/SKILL.md
source content

Systematic Debugging

Iron Law

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.

Fixing symptoms creates whack-a-mole debugging. Every fix that doesn't address root cause makes the next bug harder to find. Find the root cause, then fix it.


Phase 1: Root Cause Investigation

Gather context before forming any hypothesis.

  1. Collect symptoms: Read the error messages, stack traces, and reproduction steps. If the user hasn't provided enough context, ask one question at a time.

  2. Read the code: Trace the code path from the symptom back to potential causes. Use grep to find all references, read to understand the logic.

  3. Check recent changes:

    git log --oneline -20 -- <affected-files>
    

    Was this working before? What changed? A regression means the root cause is in the diff.

  4. Reproduce: Can you trigger the bug deterministically? If not, gather more evidence before proceeding.

Output: "Root cause hypothesis: ..." — a specific, testable claim about what is wrong and why.


Phase 2: Pattern Analysis

Check if this bug matches a known pattern:

PatternSignatureWhere to look
Race conditionIntermittent, timing-dependentConcurrent access to shared state
Nil/null propagationNoMethodError, TypeErrorMissing guards on optional values
State corruptionInconsistent data, partial updatesTransactions, callbacks, hooks
Integration failureTimeout, unexpected responseExternal API calls, service boundaries
Configuration driftWorks locally, fails in staging/prodEnv vars, feature flags, DB state
Stale cacheShows old data, fixes on cache clearRedis, CDN, browser cache

Also check:

  • Project TODOs for related known issues
  • git log
    for prior fixes in the same area — recurring bugs in the same files are an architectural smell, not a coincidence

External pattern search: If the bug doesn't match a known pattern, search for:

  • "{framework} {generic error type}" — sanitize first: strip hostnames, IPs, file paths, SQL, customer data. Search the error category, not the raw message.
  • "{library} {component} known issues"

If a documented solution or known dependency bug surfaces, present it as a candidate hypothesis in Phase 3.


Phase 3: Hypothesis Testing

Before writing ANY fix, verify your hypothesis.

  1. Confirm the hypothesis: Add a temporary log statement, assertion, or debug output at the suspected root cause. Run the reproduction. Does the evidence match?

  2. If the hypothesis is wrong: Before forming the next hypothesis, search for the error. Sanitize first — strip hostnames, IPs, file paths, SQL fragments, customer identifiers. Search only the generic error type and framework context. Then return to Phase 1. Gather more evidence. Do not guess.

  3. 3-strike rule: If 3 hypotheses fail, STOP. Tell the user:

    3 hypotheses tested, none match. This may be an architectural issue
    rather than a simple bug.
    
    Options:
    A) Continue investigating — I have a new hypothesis: [describe]
    B) Escalate for human review — this needs someone who knows the system
    C) Add logging and wait — instrument the area and catch it next time
    

Red flags — if you see any of these, slow down:

  • "Quick fix for now" — there is no "for now." Fix it right or escalate.
  • Proposing a fix before tracing data flow — you're guessing.
  • Each fix reveals a new problem elsewhere — wrong layer, not wrong code.

Phase 4: Implementation

Once root cause is confirmed:

  1. Fix the root cause, not the symptom. The smallest change that eliminates the actual problem.

  2. Minimal diff: Fewest files touched, fewest lines changed. Resist the urge to refactor adjacent code.

  3. Write a regression test that:

    • Fails without the fix (proves the test is meaningful)
    • Passes with the fix (proves the fix works)
  4. Run the full test suite. Paste the output. No regressions allowed.

  5. If the fix touches >5 files: Flag the blast radius to the user:

    This fix touches N files. That's a large blast radius for a bug fix.
    A) Proceed — the root cause genuinely spans these files
    B) Split — fix the critical path now, defer the rest
    C) Rethink — maybe there's a more targeted approach
    

Phase 5: Verification & Report

Fresh verification: Reproduce the original bug scenario and confirm it's fixed. This is not optional.

Run the test suite and paste the output.

Output a structured debug report:

DEBUG REPORT
════════════════════════════════════════
Symptom:         [what the user observed]
Root cause:      [what was actually wrong]
Fix:             [what was changed, with file:line references]
Evidence:        [test output, reproduction attempt showing fix works]
Regression test: [file:line of the new test]
Related:         [prior bugs in same area, architectural notes]
Status:          DONE | DONE_WITH_CONCERNS | BLOCKED
════════════════════════════════════════

Important Rules

  • 3+ failed fix attempts → STOP and question the architecture. Wrong architecture, not failed hypothesis.
  • Never apply a fix you cannot verify. If you can't reproduce and confirm, don't ship it.
  • Never say "this should fix it." Verify and prove it. Run the tests.
  • If fix touches >5 files → ask the user about blast radius before proceeding.
  • Completion status:
    • DONE — root cause found, fix applied, regression test written, all tests pass
    • DONE_WITH_CONCERNS — fix works but architectural concern remains
    • BLOCKED — cannot determine root cause, needs human expertise
    • NEEDS_CONTEXT — missing reproduction steps or access