Claude-skill-registry code-evolution
Autonomous multi-agent code evolution system for optimization problems. Use when solving complex optimization problems (packing, geometry, scheduling, search) through evolutionary approaches with multiple independent AI agents. Multi-start hybrid heuristic+SLSQP methods significantly outperform single approaches. Triggers include genetic algorithms, evolutionary optimization, multi-agent problem solving, parameter tuning at scale, AlphaEvolve-style research, or evolving code solutions across generations.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/code-evolution" ~/.claude/skills/majiayu000-claude-skill-registry-code-evolution && rm -rf "$T"
skills/data/code-evolution/SKILL.mdCode Evolution
Architecture
orchestrator (you) ├── spawn agents (Task tool, subagent_type='general-purpose') ├── evaluate solutions (run evaluate.py) ├── manage archive (best solutions per generation) └── plan next generation
Critical Principle: Agent Autonomy
NEVER write solution code yourself. You (the orchestrator) ONLY:
- Create the fixed evaluation harness (read-only for agents)
- Spawn autonomous subagents via Task tool
- Evaluate results using the harness
- Plan next generation based on results
Agents have full autonomy to implement their assigned approach. You don't guide their code - you guide their problem-solving strategy.
Workflow
Phase 0: Setup (Orchestrator Only)
Create the immutable harness - agents can ONLY use, never alter:
- problem definition (READ-ONLY for agents)problems/<name>/problem.md
- evaluation function (FROZEN, not modifiable by agents)problems/<name>/evaluation/evaluate.py
- benchmark, constraints, metadataproblems/<name>/config.json
Agents receive paths to these files but cannot modify them.
Phase 1: Generation Loop (3-7 generations)
- Plan Strategies: Design 2-4 different approaches for agents to explore
- Spawn Agents: Use Task tool with
(15s timeout per agent)subagent_type='general-purpose'- Each agent gets problem description, their specific approach, and path to evaluator
- Agents write solutions to
generations/gen{N}/agent_{id}.py - Agents run themselves:
subprocess.run([sys.executable, agent_file]) - Output: JSON with
and"score""circles"
- Evaluate: You run evaluator on agent outputs (agents cannot run this)
- Cross-Inspiration: Share winning ideas with next generation agents for inspiration
- Prune: Keep only the best 1-2 approaches from previous generation
- Archive: Store best solution to
generations/archive/
Phase 2: Cross-Inspiration & Pruning
Between generations:
- Reference winners: Show agents the best previous solution's strategy
- Prune dead approaches: Stop testing approaches that underperform
- Mix winning ideas: Combine best techniques from multiple agents
- Diversify within winners: Vary parameters (seeds, iteration counts, thresholds)
File Structure
problems/<name>/ ├── problem.md ├── config.json ├── evaluation/evaluate.py └── generations/ ├── gen1/agent_*.py └── archive/best_solution.py
Core Design Principles
Separation of Concerns
- Orchestrator role: Strategy planning, harness building, result evaluation, pruning
- Agent role: Implementation autonomy within their assigned strategy
- Harness: Frozen, read-only, immutable contract between them
Evolution Mechanics
- Diverse exploration (Gen 1-3): Different approaches find different optima
- Cross-inspiration (Gen 2+): Winning ideas inspire next generation
- Pruning (Gen 3+): Kill weak approaches, double down on winners
- Multi-start within winners: Vary parameters of proven strategies (+2-5% improvement)
- Validation first: Invalid solutions score 0 - harness is source of truth
Evolution Strategy
| Phase | Generations | Orchestrator Action |
|---|---|---|
| Explore | 1-3 | Spawn 3-4 agents with diverse strategies. Find winners. |
| Prune | After Gen 2-3 | Kill underperforming approaches. Keep 1-2 best. |
| Cross-Inspire | Before Gen 4+ | Share winning solution code/strategy with next agents. |
| Exploit | 4-5 | Spawn agents that refine/combine winning approaches. Vary seeds/params. |
| Polish | 6-7 | Multi-start within best approach. Push toward benchmark. |
Orchestrator Responsibilities
What YOU Do (Never Delegate)
- Create immutable evaluation harness (problem definition, evaluator, config)
- Spawn agents with Task tool
- Analyze results and plan next generation
- Prune: Decide which approaches to continue, which to kill
- Cross-inspire: Extract winning ideas and share with next agents
- Archive best solutions
What Agents Do (Full Autonomy)
- Implement their assigned strategy
- Write solution code
- Self-validate before output
- Run themselves and produce JSON output
Cross-Inspiration Strategy
After each generation, extract and communicate:
## What Worked - Agent X achieved Y% with [strategy description] - Key insight: [what made it work] - Code reference: [location or snippet] ## What Failed - Agent Z's [strategy] only achieved W% - Likely issue: [root cause analysis] - Don't repeat: [specific thing to avoid] ## Recommended Evolution - Agents should build on: [winning strategy] - Vary these parameters: [list of what to try] - Combine techniques: [which ideas from multiple winners]
Agents use this to:
- Understand what works (cross-inspiration)
- Avoid dead ends (prune knowledge)
- Focus effort on proven directions
References
- Agent spawning: See references/agent-prompts.md
- Evaluator template: See references/evaluator-template.md
Adding New Problems
- Create
(objective, constraints, benchmark, format)problems/<name>/problem.md - Create
(benchmark value, metadata)problems/<name>/config.json - Create
(validate, score, evaluate functions)problems/<name>/evaluation/evaluate.py