Agent-almanac build-coherence

install

source · Clone the upstream repo

git clone https://github.com/pjt222/agent-almanac

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/pjt222/agent-almanac "$T" && mkdir -p ~/.claude/skills && cp -r "$T/i18n/caveman-ultra/skills/build-coherence" ~/.claude/skills/pjt222-agent-almanac-build-coherence-a286b6 && rm -rf "$T"

manifest: i18n/caveman-ultra/skills/build-coherence/SKILL.md

source content

Build Coherence

Evaluate competing approaches → independent assess, explicit reasoning-out-loud advocacy, confidence-calibrated commit thresholds, structured deadlock res → coherent decisions from multi-path reasoning.

Use When

```
forage-solutions
```
ID'd many valid approaches, must select
Oscillating between 2 approaches, no commit
Need to justify decision w/ structured reasoning (arch, tool, impl strategy)
Prev decision by gut, needs evidence validation
Internal reasoning → contradictory conclusions, restore coherence
Before irreversible action (merge, deploy, delete) where wrong = high cost

In

Required: ≥2 competing approaches
Optional: Quality assessments from prior scouting (see
```
forage-solutions
```
)
Optional: Decision stakes (reversible, moderate, irreversible) for threshold calibration
Optional: Time budget
Optional: Known failure mode (oscillation, premature commit, groupthink)

Do

Step 1: Independent Evaluate

Assess each on own merits before comparing. Critical: A's assessment doesn't bias B.

For each approach, evaluate independently:

Approach Evaluation Template:
┌────────────────────────┬──────────────────────────────────────────┐
│ Dimension              │ Assessment                               │
├────────────────────────┼──────────────────────────────────────────┤
│ Approach name          │                                          │
├────────────────────────┼──────────────────────────────────────────┤
│ Core mechanism         │ How does this approach solve the problem? │
├────────────────────────┼──────────────────────────────────────────┤
│ Strengths (2-3)        │ What does this approach do well?          │
├────────────────────────┼──────────────────────────────────────────┤
│ Risks (2-3)            │ What could go wrong? What is assumed?     │
├────────────────────────┼──────────────────────────────────────────┤
│ Evidence quality        │ How well-supported is this approach?      │
│                        │ (verified / inferred / speculated)        │
├────────────────────────┼──────────────────────────────────────────┤
│ Quality score (0-100)  │ Overall assessment                        │
├────────────────────────┼──────────────────────────────────────────┤
│ Confidence (0-100)     │ How confident in this assessment?         │
└────────────────────────┴──────────────────────────────────────────┘

Fill out each separately. No comparison until all individual evals complete.

→ Independent evals, each on own terms. B's eval doesn't ref A. Scores = real assessment, not ranking.

If err: Evals contaminated (writing "better than A" while assessing B) → reset. Assess A fully, clear frame, assess B fresh. All scores identical → dimensions too coarse, add domain-specific criteria.

Step 2: Waggle Dance — Reason Out Loud

Advocate proportional to quality. AI eq of bee waggle: implicit reasoning → explicit + public.

Each approach, state case — as if presenting to skeptical user:
- "Approach A strong because [evidence]. Main risk [risk], mitigated by [mitigation]."
Advocacy intensity proportional to quality score:
- High: detailed advocacy + specific evidence
- Medium: brief advocacy + acknowledged limits
- Low: mentioned for completeness, not actively advocated
Cross-inspection: After advocating A, actively seek evidence supporting B. After B, seek A. Counters confirmation bias

Point of reasoning-out-loud = decision auditable. Can't articulate → assessment shallower than score suggests.

→ Explicit reasoning per approach, persuasive to neutral observer. Cross-inspection reveals ≥1 initially overlooked consideration.

If err: Advocacy perfunctory (motions) → approaches maybe not genuinely diff, just variations. Differ in mechanism or only impl detail? Latter → decision doesn't matter much, pick either, move on.

Step 3: Quorum Threshold + Commit

Confidence threshold to commit, calibrated to stakes.

Confidence Thresholds by Stakes:
┌─────────────────────┬───────────┬──────────────────────────────────┐
│ Decision Type       │ Threshold │ Rationale                        │
├─────────────────────┼───────────┼──────────────────────────────────┤
│ Easily reversible   │ 60%       │ Cost of trying and reverting is  │
│ (can undo)          │           │ low. Speed matters more than     │
│                     │           │ certainty                        │
├─────────────────────┼───────────┼──────────────────────────────────┤
│ Moderate stakes     │ 75%       │ Reverting has cost but is        │
│ (costly to reverse) │           │ possible. Worth investing in     │
│                     │           │ evaluation                       │
├─────────────────────┼───────────┼──────────────────────────────────┤
│ Irreversible or     │ 90%       │ Cannot undo. Must be confident.  │
│ high-stakes         │           │ If threshold not met, gather     │
│                     │           │ more information before deciding │
└─────────────────────┴───────────┴──────────────────────────────────┘

Classify stakes
Check: leading approach quality × confidence ≥ threshold?
Yes → commit. State decision, reasoning, key risk accepted
No → ID additional info that raises confidence to threshold
Committed → don't revisit unless new disqualifying evidence

→ Clear commit moment + stated reasoning. Decision at right confidence for stakes.

If err: Threshold never met (can't hit 90% on irreversible) → ask: truly irreversible? Decomposable into reversible test + irreversible commit? Most apparently irreversible can be staged. Impossible → tell user uncertainty, ask guidance.

Step 4: Deadlock Resolution

≥2 approaches similar scores + quorum not met for any.

Deadlock Resolution:
┌────────────────────────┬──────────────────────────────────────────┐
│ Deadlock Type          │ Resolution                               │
├────────────────────────┼──────────────────────────────────────────┤
│ Genuine tie            │ The approaches are equivalent. Pick one  │
│ (scores within 5%)     │ and commit. The cost of deliberating     │
│                        │ exceeds the cost of picking the "wrong"  │
│                        │ equivalent option. Flip a coin mentally  │
├────────────────────────┼──────────────────────────────────────────┤
│ Information deficit    │ The tie exists because evaluation is     │
│ (scores uncertain)     │ incomplete. Invest one more specific     │
│                        │ investigation — a targeted file read, a  │
│                        │ quick test — then re-score               │
├────────────────────────┼──────────────────────────────────────────┤
│ Oscillation            │ Scoring keeps flip-flopping depending on │
│ (scores keep changing) │ which dimension gets attention. Time-box:│
│                        │ set a timer, evaluate once more, commit  │
│                        │ to the result regardless                 │
├────────────────────────┼──────────────────────────────────────────┤
│ Approach merge         │ The best parts of A and B can be         │
│ (compatible strengths) │ combined. Check for compatibility. If    │
│                        │ merge is coherent, use it. If forced,    │
│                        │ don't — pick one                         │
└────────────────────────┴──────────────────────────────────────────┘

→ Deadlock resolved via mechanism. Decisive — no lingering doubt that undermines execution.

If err: Deadlock persists through all strategies → decision premature. Ask user: "2 equally strong approaches: [A], [B]. [Brief case each.] Which aligns w/ priorities?" Delegating genuine tie = not fail, ack decision depends on values AI can't infer.

Step 5: Coherence Quality

Post-commit: real coherence or just a decision?

Evidence-based or rubber-stamped initial pref?
- Test: Pref same before + after eval? Eval changed anything?
Losing approaches genuinely considered or straw men?
- Test: Can articulate strongest case for losing approach?
What signal triggers reassess?
- Specific obs that would invalidate ("If API doesn't support X, approach B better")
Useful info from losing approaches for impl?
- Risk in B may apply to A too

→ Brief quality check that confirms decision OR IDs it as weak. Weak → return to earlier step, not proceed on shaky ground.

If err: Quality check reveals pref-based not evidence-based → ack honestly. Sometimes pref all that's available — label as such, not dressed up as analysis.

Check

Each approach evaluated independently before comparison
Advocacy proportional to quality (not equal regardless of merit)
Cross-inspection done (counter-evidence after advocacy)
Quorum threshold calibrated to stakes
Deadlocked → specific resolution strategy applied
Post-decision quality check done
Reassess trigger defined

Traps

Premature commit: Decide before evaluating all. First approach has anchoring advantage (more mental attention from being first). Evaluate all before comparing
Equal advocacy, unequal approaches: A=85, B=45 → equal time = wasted effort + false equivalence
Rubber-stamp: Going through process to justify already-made decision. Test: could eval have changed outcome? If not = theater
Threshold avoidance: Lower threshold to ease decision vs gather info needed to meet appropriate threshold
Ignore losing side: Losing approach often contains warnings applying to winner. Risks in B don't vanish just because A chosen

→

```
build-consensus
```
— multi-agent consensus model this adapts to single-agent reasoning
```
forage-solutions
```
— scouts solution space coherence evaluates; typically precedes this
```
coordinate-reasoning
```
— manages info flow during multi-path eval
```
center
```
— baseline needed for unbiased eval
```
meditate
```
— clears assumptions between evaluating diff approaches