Ccswarm benchmark-runner

Benchmark Runner Workflow

install

source · Clone the upstream repo

git clone https://github.com/nwiizo/ccswarm

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/nwiizo/ccswarm "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/benchmark-runner" ~/.claude/skills/nwiizo-ccswarm-benchmark-runner && rm -rf "$T"

manifest: .claude/skills/benchmark-runner/SKILL.md

source content

Benchmark Runner Workflow

Multi-step workflow for running performance benchmarks on ccswarm.

Overview

This skill guides you through running, analyzing, and comparing performance benchmarks for the ccswarm system.

Benchmark Types

1. Microbenchmarks

Fine-grained performance measurements for specific functions.

2. Integration Benchmarks

End-to-end performance for complete workflows.

3. Load Testing

System behavior under sustained load.

Running Benchmarks

1. Setup

# Ensure criterion is available
cargo install cargo-criterion

# Build in release mode first
cargo build --release --workspace

2. Run All Benchmarks

# Run with criterion
cargo criterion --workspace

# Or standard bench
cargo bench --workspace

3. Run Specific Benchmarks

# By crate
cargo bench -p ccswarm

# By benchmark name
cargo bench --bench session_benchmark

# By filter
cargo bench -- "orchestrator"

Benchmark Configuration

Criterion Config

// benches/bench.rs
use criterion::{criterion_group, criterion_main, Criterion};

fn benchmark_task_delegation(c: &mut Criterion) {
    c.bench_function("delegate_task", |b| {
        b.iter(|| {
            // Benchmark code
        })
    });
}

criterion_group!(benches, benchmark_task_delegation);
criterion_main!(benches);

Cargo.toml

[[bench]]
name = "orchestrator_bench"
harness = false

Analysis

1. View Results

# Results are in target/criterion/
open target/criterion/report/index.html

2. Compare with Baseline

# Save baseline
cargo bench -- --save-baseline before

# Make changes, then compare
cargo bench -- --baseline before

3. Profiling

# With flamegraph
cargo flamegraph --bench orchestrator_bench

# With perf (Linux)
perf record cargo bench --bench orchestrator_bench
perf report

Key Metrics

Session Management

Session creation time
Context compression ratio
Message throughput

Orchestration

Task delegation latency
Agent assignment time
Consensus round time

Memory

Peak memory usage
Memory per agent
Context size

Benchmark Checklist

CI Integration

GitHub Actions

benchmark:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: dtolnay/rust-toolchain@stable
    - run: cargo bench -- --save-baseline ci
    - uses: actions/upload-artifact@v4
      with:
        name: benchmark-results
        path: target/criterion/

Interpreting Results

Good Performance

< 1ms for task delegation
< 100ms for full workflow
< 10MB memory per agent

Warning Signs

High variance (> 20%)
Memory growth over time
Degradation vs baseline

Optimization Workflow

Run baseline benchmark
Profile to identify hotspots
Implement optimization
Run comparison benchmark
Document improvements