Claude-skill-registry battle
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/battle" ~/.claude/skills/majiayu000-claude-skill-registry-battle && rm -rf "$T"
skills/data/battle/SKILL.mdBattle Skill
Red vs Blue Team Security Competition Orchestrator
Pits a Red Team (attack) against a Blue Team (defense) in a long-running competitive loop. Each team leverages all
.pi/skills to attack or defend a target codebase.
Architecture
Based on research into RvB framework, DARPA AIxCC, and Microsoft PyRIT:
┌─────────────────────────────────────────────────────────┐ │ Battle Orchestrator │ │ - Game loop (RvB pattern) │ │ - Concurrent Red/Blue execution │ │ - Entropy-driven termination │ │ - Checkpointing for overnight runs │ └─────────────────────────────────────────────────────────┘ │ │ ┌────┴────┐ ┌────┴────┐ │ Red Team │ │ Blue Team│ │ (Thread) │ │ (Thread) │ ├──────────┤ ├──────────┤ │ Skills: │ │ Skills: │ │ - hack │ │ - anvil │ │ - memory │ │ - memory │ └──────────┘ └──────────┘ │ │ └──────────┬───────────────────┘ │ ┌───────────────┴────────────────────┐ │ Digital Twin │ │ ┌─────────────────────────────┐ │ │ │ Mode: git_worktree │ │ │ │ - Red attacks arena │ │ │ │ - Blue patches workspace │ │ │ │ - Cherry-pick to test │ │ │ ├─────────────────────────────┤ │ │ │ Mode: docker │ │ │ │ - Isolated containers │ │ │ │ - Battle network │ │ │ ├─────────────────────────────┤ │ │ │ Mode: qemu │ │ │ │ - Emulated firmware │ │ │ │ - GDB attach points │ │ │ └─────────────────────────────┘ │ └────────────────────────────────────┘
Digital Twin Modes
The battle skill supports multiple target types through its Digital Twin system:
1. Source Code (git_worktree)
For battling over git repositories. Creates isolated git worktrees for each team.
./run.sh battle /path/to/repo --rounds 100
2. Docker Container (docker)
For battling over containerized applications. Spins up separate containers for each team.
# Using a Docker image ./run.sh battle --docker-image nginx:latest --rounds 100 # Using a Dockerfile in the target directory ./run.sh battle /path/with/Dockerfile --mode docker
3. Firmware/Microprocessor (qemu)
For battling over firmware and embedded systems. Boots firmware in QEMU emulator.
# Auto-detect architecture from ELF header ./run.sh battle firmware.elf --rounds 100 # Specify machine type explicitly ./run.sh battle firmware.bin --qemu-machine arm ./run.sh battle firmware.bin --qemu-machine riscv64 ./run.sh battle bios.rom --qemu-machine x86_64
Supported QEMU machines:
- ARM Cortex-M (STM32, etc.)arm
- ARM64aarch64
/riscv32
- RISC-Vriscv64
/x86_64
- x86i386
- MIPS (routers, embedded)mips
4. Copy Mode (fallback)
For non-git directories. Creates simple file copies for each team.
Commands
# Start a battle (10 rounds for testing) ./run.sh battle /path/to/codebase --rounds 10 # Start overnight battle (1000 rounds) ./run.sh battle /path/to/codebase --overnight # Battle a Docker container ./run.sh battle --docker-image myapp:latest --rounds 100 # Battle firmware with QEMU ./run.sh battle firmware.bin --qemu-machine arm --rounds 100 # Check battle status ./run.sh status # Resume interrupted battle ./run.sh resume <battle-id> # Generate report from completed battle ./run.sh report <battle-id>
Scoring System (AIxCC-style)
| Metric | Weight | Description |
|---|---|---|
| Vulnerability Discovery | 1x | Red team finds vulnerability |
| Exploit Proof | +0.5x | Red team proves exploitability |
| Successful Patch | 3x | Blue team patches vulnerability |
| Time Decay | Variable | Faster responses score higher |
| Functionality Preserved | Required | Patches must not break code |
Scores
- TDSR (True Defense Success Rate): Vulnerabilities fixed AND code works
- FDSR (Fake Defense Success Rate): Attack blocked but code broken
- ASC (Attack Success Count): Total unique exploits discovered
Game Loop (Learning-Based)
Each round follows a learn → act → reflect pattern:
Round k: ┌─────────────────────────────────────────────────────────────┐ │ 1. RESEARCH PHASE │ ├─────────────────────────────────────────────────────────────┤ │ Red Team: Blue Team: │ │ - Recall past attack attempts - Recall past defenses │ │ - Query /dogpile for new - Query /dogpile for │ │ exploitation techniques hardening strategies │ │ - Review opponent's patterns - Analyze attack evolution │ │ (Budget: 3 research calls max) │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 2. ACTION PHASE │ ├─────────────────────────────────────────────────────────────┤ │ Red Team Attack: Blue Team Defense: │ │ - Execute learned strategy - Apply patches via anvil │ │ - AFL++ fuzzing with coverage - Verify via QCOW2 overlay │ │ - Collect crashes/findings - Run regression tests │ │ - Tag findings with /taxonomy - Tag patches with /taxonomy │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 3. REFLECTION PHASE │ ├─────────────────────────────────────────────────────────────┤ │ Both Teams: │ │ - Archive round episode (actions, outcomes, learnings) │ │ - Store successful strategies in /memory │ │ - Update belief about opponent's capabilities │ │ - Evolve strategy for next round │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 4. SCORING & CHECKPOINT │ ├─────────────────────────────────────────────────────────────┤ │ - Calculate AIxCC-style scores │ │ - Check termination conditions │ │ - Save checkpoint (QEMU state + team memories) │ └─────────────────────────────────────────────────────────────┘
Memory Architecture
Each team maintains isolated knowledge:
battle_red_<battle_id>/ battle_blue_<battle_id>/ ├── strategies/ ├── strategies/ │ ├── successful_attacks │ ├── successful_patches │ └── failed_attempts │ └── broken_defenses ├── research/ ├── research/ │ └── dogpile_results │ └── dogpile_results ├── episodes/ ├── episodes/ │ ├── round_001.json │ ├── round_001.json │ └── round_002.json │ └── round_002.json └── taxonomy/ └── taxonomy/ ├── cwe_classifications ├── mitigation_types └── severity_scores └── effectiveness_scores
Teams cannot access opponent's memory - this creates true adversarial learning.
Termination Conditions
Battle ends when ANY condition is met:
- Null Production: Both teams fail to generate new findings for 3 rounds
- Maximum Rounds: Configured limit reached
- Metric Convergence: Scores stable for 5 consecutive rounds
- Kill Switch: Manual termination via
./run.sh stop
Task Monitor Integration
Battles register with task-monitor for overnight progress tracking:
# View battle progress in TUI .pi/skills/task-monitor/run.sh tui --filter battle
Report Output
After battle completion, generates:
- Executive Summary: Winner, key metrics, risk score
- Vulnerability Report: By severity, category, remediation status
- Attack Evolution: How Red team adapted over rounds
- Defense Timeline: Blue team improvements over time
- Recommendations: Prioritized security improvements
Leveraged Skills
| Skill | Team | Purpose |
|---|---|---|
| hack | Red | Scanning, auditing, exploitation |
| anvil | Blue | Multi-agent patching (Thunderdome) |
| memory | Both | Recall prior strategies |
| treesitter | Blue | Code structure analysis |
| taxonomy | Both | Classify findings |
| task-monitor | Orchestrator | Progress tracking |
| ops-docker | Both | Container management |
Example Battle
# Start 100-round battle on current project ./run.sh battle --target . --rounds 100 # Output: # Battle ID: battle_20250128_221500 # Target: /home/user/project # Rounds: 100 # # Registering with task-monitor... # Starting Round 1/100... # [Red] Scanning target with hack... # [Red] Found 3 potential vulnerabilities # [Blue] Analyzing attack logs... # [Blue] Generating patch for SQL injection... # [Blue] Patch applied, running verification... # Round 1 complete. Red: 3 pts, Blue: 9 pts # ... # # Battle Complete! # Winner: Blue Team (847 pts vs 423 pts) # Report: ./reports/battle_20250128_221500.md