Claude-skill-registry calibrate

Run an evidence-seeking calibration roundtable to realign the plan with the North Star. Use when pausing between phases, when agents disagree, when reviewing work, when the user mentions "calibrate" or "realign", or when making decisions that affect the plan.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/calibrate" ~/.claude/skills/majiayu000-claude-skill-registry-calibrate && rm -rf "$T"
manifest: skills/data/calibrate/SKILL.md
source content

Calibrate — Orchestrator

Hard stop. Evidence-based calibration. Realign to North Star.

Pattern: This skill uses the orchestrator-subagent pattern. Each phase runs in a fresh context for optimal performance. See

docs/guides/ORCHESTRATOR_SUBAGENT_PATTERN.md
.

When This Applies

SignalAction
Phase completionRun scheduled calibration
User says "calibrate" or "realign"Run full protocol
Agents disagree on approachRun challenge/synthesis
Drift detectedAd-hoc calibration
User says "/calibrate"Run full protocol

Tool Reference

File Operations

ToolPurpose
Read(north_star_path)
Read North Star Card
Read(requirements_path)
Read REQ-/AC- specs
Write(file_path, content)
Write phase reports

Beads/BV

CommandPurpose
bd list --json
Get all beads with status
bd view <id>
View specific bead
bv --robot-summary
Dependency overview
bv --robot-alerts
Check for issues

Testing

CommandPurpose
pytest
Run test suite
pytest --cov
Coverage check
ubs --staged
Security scan

Historical Context

CommandPurpose
cass search "calibration" --robot --limit 5
Find past calibration decisions
cass search "drift" --robot --days 30
Find recent drift incidents
cm context "calibration for <phase>" --json
Get learned patterns

Evidence Types

TypeExample
Code
src/auth/validator.ts:42
Test
npm test auth
→ PASS
DocURL + excerpt
Measurement"Response: 150ms"
Discriminating testFails A, passes B

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    CALIBRATE ORCHESTRATOR                        │
│  - Creates session: sessions/calibrate-{timestamp}/              │
│  - Manages TodoWrite state                                       │
│  - Spawns subagents with minimal context                         │
│  - Passes report_path + summary between phases                   │
└─────────────────────────────────────────────────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         │                    │                    │
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│  Coverage Agent │  │   Drift Agent   │  │ Challenge Agent │
│  agents/coverage│  │  agents/drift   │  │ agents/challenge│
│  Fresh context  │  │  Fresh context  │  │  Fresh context  │
└────────┬────────┘  └────────┬────────┘  └────────┬────────┘
         │                    │                    │
         ▼                    ▼                    ▼
    01_coverage.md      02_drift.md        03_challenge.md
         │                    │                    │
         └────────────────────┼────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐
│ Synthesize Agent│  │  Report Agent   │ → Final output to user
│agents/synthesize│  │  agents/report  │
│  Fresh context  │  │  Fresh context  │
└────────┬────────┘  └────────┬────────┘
         │                    │
    04_synthesis.md     05_user_report.md

Subagents

PhaseAgentInputOutput
1
agents/coverage.md
requirements, beadscoverage gaps
2
agents/drift.md
North Star, coverage reportdrift items
3
agents/challenge.md
coverage + drift reportstest results
4
agents/synthesize.md
all reportsdecisions + dissent
5
agents/report.md
synthesisuser-facing report

Philosophy

Tests adjudicate, not rhetoric. Pursue verifiable truth, not persuasive agreement.

Key insight (DebateCoder, 2025): "Tests are the medium of disagreement, not rhetoric." Rhetorical debate degrades outcomes—voting alone beats extended debate (

research/003-debate-or-vote.md
).

PrincipleMeaning
Tests over rhetoricDisagreements resolved by test results, not persuasion
Write discriminating testsTests that PASS for one approach, FAIL for another
No compromiseEvidence decides winner; don't average opinions
Preserve dissentIf tests don't discriminate, present both positions to user
User decides when value-dependentIf the "right" answer depends on user preferences, stop and ask

Execution Flow

1. Setup (Orchestrator)

1. Create session directory:
   mkdir -p sessions/calibrate-{timestamp}

2. Initialize TodoWrite with phases:
   - [ ] Phase 1: Coverage Analysis
   - [ ] Phase 2: Drift Detection
   - [ ] Phase 3: Test-Based Challenge
   - [ ] Phase 4: Synthesis
   - [ ] Phase 5: User Report

3. Gather inputs:
   - phase_name: The phase being calibrated
   - north_star_path: Path to North Star Card
   - requirements_path: Path to REQ-*/AC-* file
   - beads_status: bd list --json

2. Phase 1: Coverage Analysis

Spawn:

agents/coverage.md

Input:

{
  "phase_name": "<phase>",
  "session_dir": "sessions/calibrate-{timestamp}",
  "requirements_path": "PLAN/01_requirements.md",
  "beads_status": "<bd list --json output>"
}

Expected output:

{
  "report_path": "sessions/.../01_coverage_report.md",
  "p0_coverage": "4/5 (80%)",
  "gaps_summary": "1 P0 missing bead, 1 P0 missing tests"
}

3. Phase 2: Drift Detection

Spawn:

agents/drift.md

Input:

{
  "phase_name": "<phase>",
  "session_dir": "sessions/calibrate-{timestamp}",
  "north_star_path": "PLAN/00_north_star.md",
  "coverage_report_path": "<from Phase 1>"
}

Expected output:

{
  "report_path": "sessions/.../02_drift_report.md",
  "alignment_summary": "5/7 ALIGNED, 1 DRIFTING, 1 OFF-TRACK",
  "drift_items": ["NS-1: Auth method", "NS-3: Mobile support"]
}

4. Phase 3: Test-Based Challenge

Spawn:

agents/challenge.md

Input:

{
  "session_dir": "sessions/calibrate-{timestamp}",
  "coverage_report_path": "<from Phase 1>",
  "drift_report_path": "<from Phase 2>"
}

Expected output:

{
  "report_path": "sessions/.../03_challenge_report.md",
  "verified_claims": ["NS-1 drift", "NS-3 mobile gap"],
  "unresolved": ["API rate limit assumption"]
}

5. Phase 4: Synthesis

Spawn:

agents/synthesize.md

Input:

{
  "session_dir": "sessions/calibrate-{timestamp}",
  "coverage_report_path": "<from Phase 1>",
  "drift_report_path": "<from Phase 2>",
  "challenge_report_path": "<from Phase 3>"
}

Expected output:

{
  "report_path": "sessions/.../04_synthesis_report.md",
  "decisions": [{"action": "Implement SSO", "priority": "P0"}],
  "user_questions": ["Load test timing?"],
  "preserved_dissent": ["API rate limit adequacy"]
}

6. Phase 5: User Report

Spawn:

agents/report.md

Input:

{
  "session_dir": "sessions/calibrate-{timestamp}",
  "synthesis_report_path": "<from Phase 4>",
  "north_star_path": "PLAN/00_north_star.md"
}

Expected output:

{
  "report_path": "sessions/.../05_user_report.md",
  "summary": {"alignment": "5/7", "blocking": 2},
  "user_questions": ["Load test timing?", "bd-130 scope creep?"]
}

7. Finalize (Orchestrator)

  1. Update TodoWrite (all phases complete)
  2. Read
    05_user_report.md
  3. Present to user
  4. Log changes to
    .beads/change-log.md
    if decisions made

Context Optimization

Why subagents beat monolithic calibration:

MonolithicSubagent Pattern
All context in one windowEach phase gets fresh 200k
"Lost in middle" riskNo degradation
One failure corrupts allPhases are isolated
~3000 token prompt~500 tokens per phase

Research backing:

  • research/056-multi-agent-orchestrator.md
    : +90.2% over single-agent
  • research/004-context-length-hurts.md
    : Context degradation is real

Evidence Standard (All Subagents)

For any non-trivial claim, include at least one:

Evidence TypeExample
Code evidence
src/auth/validator.ts:42
Test evidence
npm test auth
→ PASS
Doc evidenceURL + relevant excerpt
Measurement"Response time: 150ms"
Discriminating testTest that fails one option, passes another

Status Labels

CategoryValues
Beads
SOUND
/
FLAWED
/
UNCERTAIN
Alignment
ALIGNED
/
DRIFTING
/
OFF-TRACK
Assumptions
VERIFIED
/
UNVERIFIED
/
RISKY
Challenges
ACCEPTED
/
REJECTED

Anti-Patterns

Don'tWhy
Compromise for harmonyTruth > harmony
Soften criticismClarity > comfort
Skip pre-workUnprepared = unproductive
Force agreementPreserve dissent
Argue by rhetoricEvidence only
Pass full content between phasesPass paths + summaries

Templates

Located in

.claude/templates/calibration/
:

  • user-report.md
    — Final output to user
  • broadcast.md
    — Agent analysis broadcast
  • response.md
    — Challenge responses
  • decision.md
    — Falsifiable decisions
  • summary.md
    — Agent-to-agent summary
  • change-log-entry.md
    — Plan change records

See Also

  • agents/
    — Subagent definitions
  • docs/guides/ORCHESTRATOR_SUBAGENT_PATTERN.md
    — Pattern documentation
  • docs/workflow/IDEATION_TO_PRODUCTION.md
    — Complete pipeline
  • docs/workflow/PROTOCOLS.md
    — Protocol cards