Asi benchmark

Run and interpret engine-stack benchmarks (Steel, ember, shale)

install
source · Clone the upstream repo
git clone https://github.com/plurigrid/asi
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/benchmark" ~/.claude/skills/plurigrid-asi-benchmark && rm -rf "$T"
manifest: skills/benchmark/SKILL.md
source content

Project Engines Benchmark Skill

======= description: Run and interpret basin-engines benchmarks (Steel, ember, shale) model: haiku

Basin Engines Benchmark Skill

origin/main

Run benchmarks for Steel, ember, and shale engines.

CRITICAL: Read Before Benchmarking

<<<<<<< HEAD ALWAYS read first:
~/p/benchmark-suite/docs/BENCHMARK_FAIRNESS.md

ALWAYS read first:

~/p/basin-bench/docs/BENCHMARK_FAIRNESS.md

origin/main

This document contains hard-won lessons about benchmark fairness. Ignoring it leads to misleading claims.

Pre-Benchmark Checklist

CheckWhyHow
<<<<<<< HEAD
Read BENCHMARK_FAIRNESS.mdContains all fairness lessons
cat ~/p/benchmark-suite/docs/BENCHMARK_FAIRNESS.md
=======
Read BENCHMARK_FAIRNESS.mdContains all fairness lessons
cat ~/p/basin-bench/docs/BENCHMARK_FAIRNESS.md

origin/main | Use

--batched
for LMDB/redb | 7-24x improvement with proper config | Add
--batched --batch-size 1000
| | Scale sled cache | Undersized cache = 17x slower | Add
--cache-mb 2048
for 1M+ records | | Check dataset vs RAM | If data fits in RAM, you're measuring memory | Use larger datasets for I/O testing |

Note: Steel uses verify-once checksums (like RocksDB/WiredTiger) - verify on first read from disk, then trust page cache. Use

FileLayoutConfig::fast()
to disable checksums entirely for ZFS/ECC storage.

Quick Commands

Steel (Oak engine)

# Build
<<<<<<< HEAD
cd ~/p/benchmark-suite && graft build --release -p ycsb-steel
=======
cd ~/p/basin-bench && graft build --release -p ycsb-steel
>>>>>>> origin/main

# Single-threaded
ycsb-steel --fast --data-dir /tmp/bench --workload a --records 50000 --ops 200000

# Multi-threaded with sharding
ycsb-steel --fast --shards 64 --threads 4 --data-dir /tmp/bench --workload a --records 50000 --ops 200000

# Ultimate adversarial benchmark (vs sled)
<<<<<<< HEAD
cd ~/p/engine-stack/engines/steel
=======
cd ~/p/basin-engines/engines/steel
>>>>>>> origin/main
graft run --release --example ultimate_adversarial

Fair 4-Engine Comparison

# Use the fair comparison script (includes proper batching for all engines)
<<<<<<< HEAD
RECORDS=50000 OPS=200000 ~/p/benchmark-suite/scripts/steel-fair-compare.sh
=======
RECORDS=50000 OPS=200000 ~/p/basin-bench/scripts/steel-fair-compare.sh
>>>>>>> origin/main

Individual Engine Commands (Fair Config)

# Steel
ycsb-steel --fast --workload a --records 50000 --ops 200000 --data-dir /tmp/bench

# sled (scaled cache)
ycsb-sled --high-throughput --cache-mb 256 --workload a --records 50000 --ops 200000 --data-dir /tmp/bench

# LMDB (batched + nosync)
ycsb-lmdb --batched --nosync --batch-size 1000 --workload a --records 50000 --ops 200000 --data-dir /tmp/bench

# redb (batched)
ycsb-redb --batched --batch-size 1000 --workload a --records 50000 --ops 200000 --data-dir /tmp/bench

Steel Results (2025-12-23) - Steel Wins All

Steel now beats LMDB on ALL workloads!

WorkloadSteelLMDBredbsledWinner
A (writes)2.49M2.24M687K744KSteel +11%
B (reads)3.01M2.90M2.05M1.55MSteel +3.8%
C (pure read)3.03M1.79M1.05M1.81MSteel +69%

Optimizations That Closed the Gap

Implemented (see

docs/STEEL_OPTIMIZATIONS.md
):

  • get_ref()
    +8.4% - zero-copy reads (KEY WIN)
  • get_cached_epoch()
    +1% - thread-local epoch
  • get_fast()
    - seqlock skip (no gain, kept for API)

Gap closed! Previous 43% gap on Workload B eliminated via zero-copy optimization.

Where Steel Actually Wins

ScenarioSteel AdvantageNotes
Write-heavy (Workload A)1.07x vs LMDBCOW efficiency
Pure reads (Workload C)1.52x vs LMDBZero-copy mmap
Cold reads after restart3x vs sledNo log replay
Range scans3.4x vs sledCOW pages
Simplicity~6K LOC vs 20K+Easier to understand/debug

Sharded Write Performance (2025-12-25)

With 64 shards, Steel beats sled by 2.3x:

WritersShardsSteel writes/svs sled
1163.0M149%
46410.8M230%
86416.8M237%

Where Steel Does NOT Win

ScenarioWinnerNotes
Multi-key transactionsredb/LMDBSteel has single-key atomicity only
30+ years production hardeningLMDBEcosystem maturity

Common Mistakes (Avoid These)

MistakeWhat HappensFix
Benchmark LMDB without
--batched
7.9x slowerUse
--batched --batch-size 1000
Benchmark redb without
--batched
24x slowerUse
--batched --batch-size 1000
Claim "47x faster than redb"MisleadingFair comparison is ~1.9x
Small dataset (50MB)Memory-bound, not I/OUse 500MB+ for I/O testing
Forget to clear between enginesCache effectsSleep or clear page cache

Key Files

PurposeLocation
<<<<<<< HEAD
Steel YCSB
~/p/benchmark-suite/engines/ycsb-steel/
Fair script
~/p/benchmark-suite/scripts/steel-fair-compare.sh
Fairness docs
~/p/benchmark-suite/docs/BENCHMARK_FAIRNESS.md
Steel benchmarks
~/p/engine-stack/engines/steel/BENCHMARKS.md
Roadmap to #1
~/p/engine-stack/engines/steel/ROADMAP_BEST_KV.md
Ultimate adversarial
~/p/engine-stack/engines/steel/examples/ultimate_adversarial.rs
=======
Steel YCSB
~/p/basin-bench/engines/ycsb-steel/
Fair script
~/p/basin-bench/scripts/steel-fair-compare.sh
Fairness docs
~/p/basin-bench/docs/BENCHMARK_FAIRNESS.md
Steel benchmarks
~/p/basin-engines/engines/steel/BENCHMARKS.md
Roadmap to #1
~/p/basin-engines/engines/steel/ROADMAP_BEST_KV.md
Ultimate adversarial
~/p/basin-engines/engines/steel/examples/ultimate_adversarial.rs

origin/main

Dialectical Improvement

When benchmarking, always ask:

  1. "What would a competitor's maintainer criticize about this benchmark?"
  2. "Am I using each engine's recommended configuration?"
  3. "What am I NOT measuring that matters?"
  4. "Is this result surprising? If so, investigate before publishing."