Agent-almanac build-coherence
git clone https://github.com/pjt222/agent-almanac
T=$(mktemp -d) && git clone --depth=1 https://github.com/pjt222/agent-almanac "$T" && mkdir -p ~/.claude/skills && cp -r "$T/i18n/caveman-ultra/skills/build-coherence" ~/.claude/skills/pjt222-agent-almanac-build-coherence-a286b6 && rm -rf "$T"
i18n/caveman-ultra/skills/build-coherence/SKILL.mdBuild Coherence
Evaluate competing approaches → independent assess, explicit reasoning-out-loud advocacy, confidence-calibrated commit thresholds, structured deadlock res → coherent decisions from multi-path reasoning.
Use When
ID'd many valid approaches, must selectforage-solutions- Oscillating between 2 approaches, no commit
- Need to justify decision w/ structured reasoning (arch, tool, impl strategy)
- Prev decision by gut, needs evidence validation
- Internal reasoning → contradictory conclusions, restore coherence
- Before irreversible action (merge, deploy, delete) where wrong = high cost
In
- Required: ≥2 competing approaches
- Optional: Quality assessments from prior scouting (see
)forage-solutions - Optional: Decision stakes (reversible, moderate, irreversible) for threshold calibration
- Optional: Time budget
- Optional: Known failure mode (oscillation, premature commit, groupthink)
Do
Step 1: Independent Evaluate
Assess each on own merits before comparing. Critical: A's assessment doesn't bias B.
For each approach, evaluate independently:
Approach Evaluation Template: ┌────────────────────────┬──────────────────────────────────────────┐ │ Dimension │ Assessment │ ├────────────────────────┼──────────────────────────────────────────┤ │ Approach name │ │ ├────────────────────────┼──────────────────────────────────────────┤ │ Core mechanism │ How does this approach solve the problem? │ ├────────────────────────┼──────────────────────────────────────────┤ │ Strengths (2-3) │ What does this approach do well? │ ├────────────────────────┼──────────────────────────────────────────┤ │ Risks (2-3) │ What could go wrong? What is assumed? │ ├────────────────────────┼──────────────────────────────────────────┤ │ Evidence quality │ How well-supported is this approach? │ │ │ (verified / inferred / speculated) │ ├────────────────────────┼──────────────────────────────────────────┤ │ Quality score (0-100) │ Overall assessment │ ├────────────────────────┼──────────────────────────────────────────┤ │ Confidence (0-100) │ How confident in this assessment? │ └────────────────────────┴──────────────────────────────────────────┘
Fill out each separately. No comparison until all individual evals complete.
→ Independent evals, each on own terms. B's eval doesn't ref A. Scores = real assessment, not ranking.
If err: Evals contaminated (writing "better than A" while assessing B) → reset. Assess A fully, clear frame, assess B fresh. All scores identical → dimensions too coarse, add domain-specific criteria.
Step 2: Waggle Dance — Reason Out Loud
Advocate proportional to quality. AI eq of bee waggle: implicit reasoning → explicit + public.
- Each approach, state case — as if presenting to skeptical user:
- "Approach A strong because [evidence]. Main risk [risk], mitigated by [mitigation]."
- Advocacy intensity proportional to quality score:
- High: detailed advocacy + specific evidence
- Medium: brief advocacy + acknowledged limits
- Low: mentioned for completeness, not actively advocated
- Cross-inspection: After advocating A, actively seek evidence supporting B. After B, seek A. Counters confirmation bias
Point of reasoning-out-loud = decision auditable. Can't articulate → assessment shallower than score suggests.
→ Explicit reasoning per approach, persuasive to neutral observer. Cross-inspection reveals ≥1 initially overlooked consideration.
If err: Advocacy perfunctory (motions) → approaches maybe not genuinely diff, just variations. Differ in mechanism or only impl detail? Latter → decision doesn't matter much, pick either, move on.
Step 3: Quorum Threshold + Commit
Confidence threshold to commit, calibrated to stakes.
Confidence Thresholds by Stakes: ┌─────────────────────┬───────────┬──────────────────────────────────┐ │ Decision Type │ Threshold │ Rationale │ ├─────────────────────┼───────────┼──────────────────────────────────┤ │ Easily reversible │ 60% │ Cost of trying and reverting is │ │ (can undo) │ │ low. Speed matters more than │ │ │ │ certainty │ ├─────────────────────┼───────────┼──────────────────────────────────┤ │ Moderate stakes │ 75% │ Reverting has cost but is │ │ (costly to reverse) │ │ possible. Worth investing in │ │ │ │ evaluation │ ├─────────────────────┼───────────┼──────────────────────────────────┤ │ Irreversible or │ 90% │ Cannot undo. Must be confident. │ │ high-stakes │ │ If threshold not met, gather │ │ │ │ more information before deciding │ └─────────────────────┴───────────┴──────────────────────────────────┘
- Classify stakes
- Check: leading approach quality × confidence ≥ threshold?
- Yes → commit. State decision, reasoning, key risk accepted
- No → ID additional info that raises confidence to threshold
- Committed → don't revisit unless new disqualifying evidence
→ Clear commit moment + stated reasoning. Decision at right confidence for stakes.
If err: Threshold never met (can't hit 90% on irreversible) → ask: truly irreversible? Decomposable into reversible test + irreversible commit? Most apparently irreversible can be staged. Impossible → tell user uncertainty, ask guidance.
Step 4: Deadlock Resolution
≥2 approaches similar scores + quorum not met for any.
Deadlock Resolution: ┌────────────────────────┬──────────────────────────────────────────┐ │ Deadlock Type │ Resolution │ ├────────────────────────┼──────────────────────────────────────────┤ │ Genuine tie │ The approaches are equivalent. Pick one │ │ (scores within 5%) │ and commit. The cost of deliberating │ │ │ exceeds the cost of picking the "wrong" │ │ │ equivalent option. Flip a coin mentally │ ├────────────────────────┼──────────────────────────────────────────┤ │ Information deficit │ The tie exists because evaluation is │ │ (scores uncertain) │ incomplete. Invest one more specific │ │ │ investigation — a targeted file read, a │ │ │ quick test — then re-score │ ├────────────────────────┼──────────────────────────────────────────┤ │ Oscillation │ Scoring keeps flip-flopping depending on │ │ (scores keep changing) │ which dimension gets attention. Time-box:│ │ │ set a timer, evaluate once more, commit │ │ │ to the result regardless │ ├────────────────────────┼──────────────────────────────────────────┤ │ Approach merge │ The best parts of A and B can be │ │ (compatible strengths) │ combined. Check for compatibility. If │ │ │ merge is coherent, use it. If forced, │ │ │ don't — pick one │ └────────────────────────┴──────────────────────────────────────────┘
→ Deadlock resolved via mechanism. Decisive — no lingering doubt that undermines execution.
If err: Deadlock persists through all strategies → decision premature. Ask user: "2 equally strong approaches: [A], [B]. [Brief case each.] Which aligns w/ priorities?" Delegating genuine tie = not fail, ack decision depends on values AI can't infer.
Step 5: Coherence Quality
Post-commit: real coherence or just a decision?
- Evidence-based or rubber-stamped initial pref?
- Test: Pref same before + after eval? Eval changed anything?
- Losing approaches genuinely considered or straw men?
- Test: Can articulate strongest case for losing approach?
- What signal triggers reassess?
- Specific obs that would invalidate ("If API doesn't support X, approach B better")
- Useful info from losing approaches for impl?
- Risk in B may apply to A too
→ Brief quality check that confirms decision OR IDs it as weak. Weak → return to earlier step, not proceed on shaky ground.
If err: Quality check reveals pref-based not evidence-based → ack honestly. Sometimes pref all that's available — label as such, not dressed up as analysis.
Check
- Each approach evaluated independently before comparison
- Advocacy proportional to quality (not equal regardless of merit)
- Cross-inspection done (counter-evidence after advocacy)
- Quorum threshold calibrated to stakes
- Deadlocked → specific resolution strategy applied
- Post-decision quality check done
- Reassess trigger defined
Traps
- Premature commit: Decide before evaluating all. First approach has anchoring advantage (more mental attention from being first). Evaluate all before comparing
- Equal advocacy, unequal approaches: A=85, B=45 → equal time = wasted effort + false equivalence
- Rubber-stamp: Going through process to justify already-made decision. Test: could eval have changed outcome? If not = theater
- Threshold avoidance: Lower threshold to ease decision vs gather info needed to meet appropriate threshold
- Ignore losing side: Losing approach often contains warnings applying to winner. Risks in B don't vanish just because A chosen
→
— multi-agent consensus model this adapts to single-agent reasoningbuild-consensus
— scouts solution space coherence evaluates; typically precedes thisforage-solutions
— manages info flow during multi-path evalcoordinate-reasoning
— baseline needed for unbiased evalcenter
— clears assumptions between evaluating diff approachesmeditate