Mycelium framework-health

Evaluate Mycelium's own process effectiveness. Measures cycle velocity, discard trends, confidence calibration, gate effectiveness, regression rate. Run quarterly or every 20 cycles.

install
source · Clone the upstream repo
git clone https://github.com/haabe/mycelium
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/haabe/mycelium "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/framework-health" ~/.claude/skills/haabe-mycelium-framework-health && rm -rf "$T"
manifest: .claude/skills/framework-health/SKILL.md
source content

Framework Health Check

Mycelium evaluates its own process. This is triple-loop learning — the framework assessing whether it is getting better at producing good outcomes.

When to Use

  • Quarterly review (scheduled)
  • After 20 completed leaf cycles (triggered by cycle-history.yml count)
  • When process friction is suspected
  • Before major framework changes (baseline measurement)

Workflow

1. Load Cycle Data

Read

canvas/cycle-history.yml
. If fewer than 5 cycles recorded, report: "Insufficient cycle data for framework health assessment. [N] cycles recorded; minimum 5 needed. Continue recording outcomes."

2. Measure Five Dimensions

For each dimension, compute the metric and compare against trend (if prior assessments exist):

Cycle Velocity:

  • Average days from diamond creation to completion, grouped by scale
  • Trend: improving / stable / degrading
  • If degrading: flag for investigation

Discard Rate:

  • Count of discards per lifecycle phase
  • Average discard phase (1-10 scale)
  • Trend: shifting earlier (good) / shifting later (bad) / stable
  • If >50% of discards at Phase 7+: flag "late discard pattern"

Confidence Calibration:

  • For all cycles with predicted confidence and actual outcome:
    • Compute: actual success rate per confidence band (0.3-0.5, 0.5-0.7, 0.7-0.9)
    • Compare with expected rate (confidence 0.7 should succeed ~70%)
    • Report calibration factor: actual/expected
  • If calibration factor < 0.8 or > 1.2: flag miscalibration

Gate Effectiveness:

  • For each theory gate, count: times checked, times passed, times failed
  • Compute hit rate: failures / total checks
  • Flag rubber stamps (0% failure rate) and hard blocks (>80% failure rate)

Regression Rate:

  • Count diamonds that regressed at least once / total diamonds
  • Trend: decreasing (good) / increasing (bad) / stable

3. Run Threshold Calibration

If cycle count ≥ minimum_n for any threshold in

canvas/thresholds.yml
:

  • Apply calibration rules from
    engine/adaptive-thresholds.md
  • Update calibrated values
  • Log changes in decision-log.md

4. Check Goodhart Counter-Metrics

For each dimension, verify the counter-metric is not degrading:

  • Velocity improving BUT outcome quality declining? Flag.
  • Earlier discards BUT false positive rate rising? Flag.
  • Better calibration BUT decision speed dropping? Flag.

5. Generate Dashboard

Output

## Framework Health Dashboard

Assessment date: [date]
Cycles analyzed: [N]
Period: [date range]

### Dimensions

| Dimension | Current | Trend | Status | Counter-Metric |
|-----------|---------|-------|--------|----------------|
| Cycle velocity | [X days avg] | [improving/stable/degrading] | [healthy/warning/critical] | Outcome quality: [OK/degrading] |
| Discard rate | [avg phase X] | [earlier/stable/later] | [healthy/warning/critical] | False positive rate: [OK/rising] |
| Confidence calibration | [factor X.XX] | [improving/stable/diverging] | [healthy/warning/critical] | Decision speed: [OK/slowing] |
| Gate effectiveness | [see detail] | — | [healthy/warning/critical] | Flow speed: [OK/slowing] |
| Regression rate | [X%] | [decreasing/stable/increasing] | [healthy/warning/critical] | Innovation rate: [OK/declining] |

### Threshold Calibration

| Threshold | Default | Calibrated | Based On | Change |
|-----------|---------|-----------|----------|--------|
| ICE advance | 100 | [value or "insufficient data"] | N cycles | [+/-] |
| Confidence factor | 1.0 | [value or "insufficient data"] | N cycles | [+/-] |
| Bakeoff delta | 20% | [value or "insufficient data"] | N bakeoffs | [+/-] |

### Pattern Signals Active

[List any active pattern detector signals from engine/pattern-detector.md]

### Recommendations

[Specific actions based on findings — not generic advice]

Rules

  • Never modify thresholds without sufficient data (respect minimum_n)
  • Always check counter-metrics before celebrating improvement
  • Log all threshold changes in decision-log.md
  • If all dimensions are healthy, say so and suggest next review date

Theory Citations

  • Argyris: Triple-loop learning (learning how to learn)
  • Forsgren: Accelerate (measuring capabilities, not just outputs)
  • Goodhart: Counter-metrics for every metric
  • Deming: Statistical process control (data-driven threshold adjustment)