Claude-skills hyperagent
Run a self-referential self-improving agent loop where a meta-agent iteratively modifies a task-agent's code to optimize for any measurable target. Based on Facebook Research's Hyperagents paper (arXiv:2603.19461). Use when asked to "run hyperagent", "self-improve this", "optimize with self-modification", or "evolve this agent/script".
git clone https://github.com/ckorhonen/claude-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/ckorhonen/claude-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/hyperagent" ~/.claude/skills/ckorhonen-claude-skills-hyperagent && rm -rf "$T"
skills/hyperagent/SKILL.mdHyperagent
Quick Start — Simple Examples
New to Hyperagent? Try these beginner-friendly tasks before the full setup.
1. Optimize a simple Python script to run faster
Say: "Use hyperagent to optimize this script for speed" and paste something like:
# slow_sort.py def sort_numbers(nums): result = [] while nums: smallest = min(nums) result.append(smallest) nums.remove(smallest) return result
Hyperagent will benchmark it, propose a faster implementation, and validate the improvement.
2. Improve a prompt to get better answers
Say: "Run hyperagent on this prompt and improve accuracy" with a prompt like:
Summarize this article in one sentence.
The meta-agent iterates on the prompt, measures quality, and keeps improvements that score higher.
3. Make a sorting function more efficient
Say: "Evolve this function with hyperagent" and paste any function. Hyperagent creates a benchmark, runs generations of improvements, and shows you the performance gain per generation.
4. Self-improve any script
Say: "Self-improve this agent/script" and point to any Python file. Hyperagent wraps it in an evaluation loop, proposes modifications, and tracks what works.
The simplest possible setup: create
that printstask.sh, then runMETRIC score=0.5. From there the loop is fully automated.python3 scripts/init_session.py
Self-referential self-improvement: a meta-agent that modifies a task-agent (and itself) to optimize any measurable objective.
Inspired by Facebook Research's Hyperagents paper (arXiv:2603.19461), which demonstrated that agents combining a task-solver and a self-modifying meta-level into a single editable program can achieve open-ended, compounding improvements that transfer across domains.
How It Works
A hyperagent is a system with two components in a single editable codebase:
- Task Agent — solves the target task (benchmark, code generation, data processing, etc.)
- Meta Agent — analyzes task performance history and proposes modifications to the task agent's code (and optionally its own code)
The key insight from the paper: when the meta-level modification procedure is itself editable, the system can improve not just task performance but also the mechanism that generates future improvements — enabling compounding, transferable gains.
Core Principles
-
Self-referential modification
The meta-agent can modify the task-agent's code AND its own strategy. Both live in the same editable workspace. This enables metacognitive self-improvement: improving how you improve.
-
Population-based exploration (archive)
Don't just keep the best variant — maintain an archive of all successful variants as stepping stones. Parent selection favors high performers with unexplored potential.
-
Empirical evaluation gates everything
No change is accepted without measurement. Every candidate is evaluated against the task benchmark with repeated trials.
-
Persistent memory and performance tracking
The system maintains a structured history of all experiments, hypotheses, and outcomes. Later generations build on earlier insights — no rediscovering dead ends.
-
Transfer across domains
Meta-level improvements (performance tracking, evaluation strategies, hypothesis generation patterns) are domain-agnostic and can be transferred to new tasks.
Available Scripts
— shared utilities (archive management, metrics, reporting)scripts/common.py
— initialize a hyperagent session, scaffold the workspacescripts/init_session.py
— evaluate a task-agent variant and record metricsscripts/run_task.py
— log evaluated record, decide disposition, update archive and reportsscripts/log_variant.py
— generate HTML report of the full evolutionary historyscripts/render_report.py
— select a parent from the archive for the next generationscripts/select_parent.py
Note: There is no
script — the meta-agent role (hypothesis generation and code modification) is performed by the LLM agent itself, not by a script.generate_variant.py
All scripts are non-interactive, expose
--help, emit structured JSON on stdout, and keep diagnostics on stderr.
Default Workflow
-
Initialize the session after defining the optimization target:
python3 scripts/init_session.py \ --goal "Improve prompt accuracy on math benchmark" \ --metric-name accuracy \ --unit pct \ --direction higher \ --task-command ./task.sh \ --checks-command ./checks.sh \ --scope src/agent.py \ --max-generations 50 -
Evaluate the baseline (generation 0):
python3 scripts/run_task.py \ --id gen-000 \ --hypothesis "Control: unmodified task agent" \ --change-summary "No modifications" \ --baseline \ --output .hyperagent/gen-000.json python3 scripts/log_variant.py --input .hyperagent/gen-000.json -
Selection → Modification → Evaluation loop:
# Select a parent from the archive python3 scripts/select_parent.py --output .hyperagent/parent.json # Generate a variant (meta-agent proposes modifications) # This is where YOU (the LLM agent) act as the meta-agent: # - Read the parent's code and performance history # - Hypothesize an improvement # - Apply code modifications # - Record what you changed and why # Evaluate the variant python3 scripts/run_task.py \ --id gen-001 \ --hypothesis "Add chain-of-thought prompting to improve reasoning" \ --change-summary "Wrap task prompt in step-by-step reasoning template" \ --parent gen-000 \ --output .hyperagent/gen-001.json python3 scripts/log_variant.py --input .hyperagent/gen-001.json -
Render reports at any time:
python3 scripts/render_report.py
Up-Front Q&A
Before starting, gather or confirm:
- Objective — what are we optimizing?
- Primary metric — exact name, unit, direction (lower/higher)
- Task command — the script that runs the task agent and emits
linesMETRIC name=value - Correctness gates — tests or checks that must pass for a variant to be kept
- Scope — which files can the meta-agent modify?
- Meta-scope — can the meta-agent modify its own strategy? (default: yes)
- Generation budget — max generations before stopping
- Minimum improvement threshold — default 1%
Workspace Setup
-
Prefer a dedicated worktree on a fresh branch:
git worktree add ../hyperagent-<goal>-<date> -b hyperagent/<goal>-<date> -
Create:
— checked in, durable session brief with full evolutionary historyhyperagent.md
— checked in, benchmark runner (emitstask.sh
)METRIC name=value
— checked in, correctness gateschecks.sh
— local artifact directory, NOT checked in.hyperagent/
-
Ensure artifacts stay untracked:
rg -qxF '.hyperagent/' .git/info/exclude || printf '\n.hyperagent/\n' >> .git/info/exclude
The Meta-Agent Role
You (the LLM) are the meta-agent. Your job each generation is:
- Select parent — use
or choose based on the archivescripts/select_parent.py - Analyze — read the parent's code, performance history, and past experiment outcomes
- Hypothesize — propose a specific, testable modification with a causal theory for why it should help
- Modify — apply code changes to the task agent (and optionally to your own strategy notes in
)hyperagent.md - Evaluate — run
to measure the variantscripts/run_task.py - Log — use
to record the result and update the archivescripts/log_variant.py - Reflect — update
with what you learnedhyperagent.md
Meta-Level Self-Modification
The meta-agent can improve its own process by updating:
- Strategy notes in
(hypothesis generation patterns, evaluation heuristics)hyperagent.md - Memory entries in
(qualitative insights, correction plans).hyperagent/memory.jsonl - The task evaluation protocol (adding secondary metrics, changing trial counts)
These meta-improvements compound across generations and transfer to new tasks.
Required Files
hyperagent.md
hyperagent.mdThe durable contract and evolutionary history. A fresh agent can resume from this.
# Hyperagent: <goal> ## Objective <What is being optimized and why.> ## Configuration - Primary metric: - Unit: - Direction: - Minimum improvement: X% - Task command: - Correctness gates: - Generation budget: ## Scope - Task agent files: - Meta-agent can self-modify: yes/no ## Archive `.hyperagent/archive.jsonl` ## Lineage <Tree showing parent→child relationships and which variants were kept> ## Meta-Strategy <Current approach to hypothesis generation — updated as the meta-agent learns> ## What We've Learned <Key wins, dead ends, transferable insights> ## Performance Tracking <Best variant, improvement trajectory, current plateau status>
task.sh
task.shBash script that runs the task agent and emits
METRIC name=value lines:
#!/bin/bash set -euo pipefail # Run the task agent python3 src/agent.py --input data/test.json 2>/dev/null # The agent script should emit: METRIC accuracy=0.85
Archive Structure
The archive (
.hyperagent/archive.jsonl) stores every variant ever evaluated:
{ "id": "gen-007", "generation": 7, "parent_id": "gen-003", "timestamp": "2026-03-27T20:00:00Z", "hypothesis": "Add few-shot examples to improve pattern recognition", "change_summary": "Inserted 3 domain-specific examples into the task prompt", "files_touched": ["src/agent.py"], "metric_name": "accuracy", "direction": "higher", "warmup_trials": [0.82, 0.83], "measured_trials": [0.85, 0.86, 0.84, 0.85, 0.87], "summary": {"median": 0.85, "mean": 0.854, "min": 0.84, "max": 0.87}, "checks": "passed", "disposition": "keep", "children_count": 0, "meta_modifications": ["Updated strategy notes with few-shot pattern"], "reason": "Improved by 3.2% over parent gen-003 (0.824). Checks passed." }
Parent Selection
Selection probability for a parent is proportional to:
- Performance score (higher is better for archive diversity)
- Inverse of children count (favor unexplored high-performers)
This balances exploitation (good variants) with exploration (understudied variants).
python3 scripts/select_parent.py # Output: {"selected_parent": "gen-003", "score": 0.824, "children": 1, "reason": "High performer with few children"}
Decision Rules
— variant beats current best by ≥ threshold, checks passkeep
— variant is worse, equal, or improvement below thresholddiscard
— metric improved but correctness gates failedchecks_failed
— variant could not be evaluatedcrash
Plateau Detection
Track improvement velocity. Stop or pivot when:
- 3+ consecutive generations with no improvement
- Hypothesis diversity drops (recycling ideas)
- Improvement velocity < 0.1% per generation over last 5
Loop Behavior
Run autonomously until:
- Generation budget exhausted
- Plateau detected (3 consecutive non-improvements)
- All promising hypotheses explored
- User interrupts
During the loop:
- One hypothesis per generation
- Record dead ends explicitly
- Keep the worktree clean between variants (revert discarded changes)
- Update
after every generationhyperagent.md
Common Pitfalls
1. Meta-Agent Overfitting Its Own Strategy
Symptom: Meta-strategy becomes over-specialized to early successes Fix: Periodically review and broaden the strategy; try categorically different approaches
2. Archive Bloat
Symptom: Archive grows large, selection becomes slow Fix: Archive old generations after 50 variants; maintain a compact summary
3. Self-Modification Destabilizing the Loop
Symptom: Meta-agent modifies evaluation or logging in ways that break the loop Fix: Keep outer-loop scripts (init, run, log, select) immutable. Only modify task code and strategy notes.
4. Hypothesis Recycling
Symptom: Later generations retry earlier failed ideas Fix: Always read
.hyperagent/memory.jsonl before proposing. Explicitly check against dead ends.
Transfer Protocol
To transfer meta-improvements to a new domain:
- Extract meta-strategy from
"What We've Learned" sectionhyperagent.md - Copy
as starting knowledge.hyperagent/memory.jsonl - Initialize new session with transferred strategy as initial context
- The meta-agent starts with accumulated wisdom instead of from scratch
Report Generation
python3 scripts/render_report.py
Generates
.hyperagent/report.html with:
- Lineage tree visualization
- Performance over generations
- Best-so-far trend
- Disposition breakdown
- Per-variant trial distributions
- Meta-strategy evolution timeline