Nexus-agents bug-fix

install
source · Clone the upstream repo
git clone https://github.com/williamzujkowski/nexus-agents
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/williamzujkowski/nexus-agents "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bug-fix" ~/.claude/skills/williamzujkowski-nexus-agents-bug-fix && rm -rf "$T"
manifest: skills/bug-fix/SKILL.md
source content

Bug Fix Skill

<!-- CANONICAL SOURCE: docs/development/CONTRIBUTION_GUIDE.md -->

Full workflow: CONTRIBUTION_GUIDE.md

Stop-the-Line Protocol

When a test, build, or lint fails mid-flow — before making any more changes:

  1. STOP — do not edit, do not retry, do not "just try one more thing"
  2. PRESERVE — capture the raw error, stack trace, last passing commit, and repro command. Paste into the PR description if relevant
  3. DIAGNOSE — run the Triage Sequence below; identify the causal layer, not the first surface that surfaced the error
  4. FIX — change the root cause only. No speculative refactors, no drive-by cleanups
  5. GUARD — add a regression test that would have caught the original failure
  6. RESUME — re-run the full check suite; only then continue the original task

Rationale: failures compound. A bug unresolved at step N makes every change N+1…N+k incorrect. We observed this in #1871 (consensus_vote silent hang) — the symptoms looked like adapter timeouts, the cause was missing overall-deadline race.

Triage Sequence

Run this order before "Implement fix." Skipping steps produces symptom-level patches that regress.

  1. Reproduce reliably. If non-reproducible: log timestamps, add artificial delays to widen race windows, diff CI vs local env. Document the conditions.
  2. Localize to a layer: UI / service / data / build / external dep. Resist "it must be X" — verify with a log, not a hunch.
  3. Reduce to the minimal failing case. Smaller repros make the cause obvious; large repros hide it.
  4. Fix the root cause. If your fix is upstream of where the error surfaced, you're probably right. If it's downstream (dedupe in the view, catch in the caller), you're probably patching the symptom.
  5. Guard with a regression test that fails without the fix.
  6. Verify end-to-end:
    pnpm lint && pnpm typecheck && pnpm test
    plus any integration/smoke test for the affected area.

Anti-Rationalization — Debugging

Debugging tempts four recurring shortcuts. Each is a trap.

ExcuseCounter
"I know what the bug is"Unverified guesses resolve only ~70% of the time. Reproduce first — the cost of confirming is minutes; the cost of a wrong fix is hours plus a regression.
"The failing test must be wrong"Sometimes true, but verify against the spec before changing the test. A test asserting correct-but-inconvenient behavior gets "fixed" to assert the bug.
"It works on my machine"Environment is a variable. Diff Node version, env vars, lockfile drift, CI container, clock/timezone, file system case sensitivity.
"This is flaky, ignore it"Flaky = the test is exposing a real race/ordering/timing bug. Fix the intermittency or understand why it happens.
.skip
with a TODO is a ticking debt.

Workflow

  1. Create/verify GitHub issue with reproduction steps

    gh issue list --state open
    gh issue create --title "bug: [description]" --label "bug"
    
  2. Write failing test that demonstrates the bug

    pnpm test -- --watch
    
  3. Implement fix — minimal changes, don't refactor surrounding code

  4. Verify test passes

    pnpm lint && pnpm typecheck && pnpm test
    
  5. Check for similar bugs elsewhere

    # Search for similar patterns in codebase
    
  6. Create PR

    git commit -m "fix(scope): description
    
    Closes #<issue>
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>"
    
    gh pr create --title "fix(scope): description" --base main
    

Quality Checklist

  • Reproduced reliably (or non-reproducibility documented)
  • Localized to the causal layer (not the surfacing layer)
  • Failing test written before fix
  • Fix is at the root cause, not the symptom
  • Fix is minimal (no unrelated refactors)
  • Similar patterns checked elsewhere
  • Regression test would have caught the original bug
  • pnpm lint && pnpm typecheck && pnpm test
    passes
  • Issue referenced in commit