Mycelium framework-health
Evaluate Mycelium's own process effectiveness. Measures cycle velocity, discard trends, confidence calibration, gate effectiveness, regression rate. Run quarterly or every 20 cycles.
install
source · Clone the upstream repo
git clone https://github.com/haabe/mycelium
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/haabe/mycelium "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/framework-health" ~/.claude/skills/haabe-mycelium-framework-health && rm -rf "$T"
manifest:
.claude/skills/framework-health/SKILL.mdsource content
Framework Health Check
Mycelium evaluates its own process. This is triple-loop learning — the framework assessing whether it is getting better at producing good outcomes.
When to Use
- Quarterly review (scheduled)
- After 20 completed leaf cycles (triggered by cycle-history.yml count)
- When process friction is suspected
- Before major framework changes (baseline measurement)
Workflow
1. Load Cycle Data
Read
canvas/cycle-history.yml. If fewer than 5 cycles recorded, report:
"Insufficient cycle data for framework health assessment. [N] cycles recorded; minimum 5 needed. Continue recording outcomes."
2. Measure Five Dimensions
For each dimension, compute the metric and compare against trend (if prior assessments exist):
Cycle Velocity:
- Average days from diamond creation to completion, grouped by scale
- Trend: improving / stable / degrading
- If degrading: flag for investigation
Discard Rate:
- Count of discards per lifecycle phase
- Average discard phase (1-10 scale)
- Trend: shifting earlier (good) / shifting later (bad) / stable
- If >50% of discards at Phase 7+: flag "late discard pattern"
Confidence Calibration:
- For all cycles with predicted confidence and actual outcome:
- Compute: actual success rate per confidence band (0.3-0.5, 0.5-0.7, 0.7-0.9)
- Compare with expected rate (confidence 0.7 should succeed ~70%)
- Report calibration factor: actual/expected
- If calibration factor < 0.8 or > 1.2: flag miscalibration
Gate Effectiveness:
- For each theory gate, count: times checked, times passed, times failed
- Compute hit rate: failures / total checks
- Flag rubber stamps (0% failure rate) and hard blocks (>80% failure rate)
Regression Rate:
- Count diamonds that regressed at least once / total diamonds
- Trend: decreasing (good) / increasing (bad) / stable
3. Run Threshold Calibration
If cycle count ≥ minimum_n for any threshold in
canvas/thresholds.yml:
- Apply calibration rules from
engine/adaptive-thresholds.md - Update calibrated values
- Log changes in decision-log.md
4. Check Goodhart Counter-Metrics
For each dimension, verify the counter-metric is not degrading:
- Velocity improving BUT outcome quality declining? Flag.
- Earlier discards BUT false positive rate rising? Flag.
- Better calibration BUT decision speed dropping? Flag.
5. Generate Dashboard
Output
## Framework Health Dashboard Assessment date: [date] Cycles analyzed: [N] Period: [date range] ### Dimensions | Dimension | Current | Trend | Status | Counter-Metric | |-----------|---------|-------|--------|----------------| | Cycle velocity | [X days avg] | [improving/stable/degrading] | [healthy/warning/critical] | Outcome quality: [OK/degrading] | | Discard rate | [avg phase X] | [earlier/stable/later] | [healthy/warning/critical] | False positive rate: [OK/rising] | | Confidence calibration | [factor X.XX] | [improving/stable/diverging] | [healthy/warning/critical] | Decision speed: [OK/slowing] | | Gate effectiveness | [see detail] | — | [healthy/warning/critical] | Flow speed: [OK/slowing] | | Regression rate | [X%] | [decreasing/stable/increasing] | [healthy/warning/critical] | Innovation rate: [OK/declining] | ### Threshold Calibration | Threshold | Default | Calibrated | Based On | Change | |-----------|---------|-----------|----------|--------| | ICE advance | 100 | [value or "insufficient data"] | N cycles | [+/-] | | Confidence factor | 1.0 | [value or "insufficient data"] | N cycles | [+/-] | | Bakeoff delta | 20% | [value or "insufficient data"] | N bakeoffs | [+/-] | ### Pattern Signals Active [List any active pattern detector signals from engine/pattern-detector.md] ### Recommendations [Specific actions based on findings — not generic advice]
Rules
- Never modify thresholds without sufficient data (respect minimum_n)
- Always check counter-metrics before celebrating improvement
- Log all threshold changes in decision-log.md
- If all dimensions are healthy, say so and suggest next review date
Theory Citations
- Argyris: Triple-loop learning (learning how to learn)
- Forsgren: Accelerate (measuring capabilities, not just outputs)
- Goodhart: Counter-metrics for every metric
- Deming: Statistical process control (data-driven threshold adjustment)