Mycelium prompt-optimizer
A/B test CLAUDE.md instruction changes against eval benchmarks. Capture baselines, test variants, compare results.
install
source · Clone the upstream repo
git clone https://github.com/haabe/mycelium
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/haabe/mycelium "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/prompt-optimizer" ~/.claude/skills/haabe-mycelium-prompt-optimizer && rm -rf "$T"
manifest:
.claude/skills/prompt-optimizer/SKILL.mdsource content
Prompt Optimizer
Systematically improve Mycelium instructions through measurement. Adapted from n-trax.
Commands
baseline
-- Capture current performance
baseline- Run
— record as optimization scores/eval-runner run-split optimization - Run
— record as holdout scores/eval-runner run-split holdout - Record both to
: timestamp, CLAUDE.md hash, optimization metrics, holdout metrics, overall and per-category metrics.claude/optimization/baseline.json
test <variant>
-- Test a variant
test <variant>- Read variant from
.claude/optimization/variants/<variant>.md - Apply the CLAUDE.md changes described
- Run
— this is the hill-climbing signal/eval-runner run-split optimization - Run
— this validates generalization/eval-runner run-split holdout - Store results in
.claude/optimization/results/<variant>.json - Compare against baseline. Flag overfitting if optimization improves but holdout degrades.
- Do NOT auto-revert -- let user decide
report
-- Compare all variants
reportGenerate comparison table with split-aware columns:
| Variant | Opt Pass Rate | Holdout Pass Rate | Delta Opt | Delta Holdout | Overfit? | Decision |
Flag
Overfit? = YES when optimization delta is positive but holdout delta is negative.
exemplar <eval-name>
-- Capture winning trajectory
exemplar <eval-name>After a clean eval win (1 iteration, fast), save the approach to
.claude/optimization/exemplars/.
Workflow
- Capture baseline
- Hypothesize an instruction improvement
- Document in variants/ directory
- Test the variant
- Compare via report
- Keep or revert based on data
- Capture exemplars from clean wins