install
source · Clone the upstream repo
git clone https://github.com/plurigrid/asi
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/benchmark" ~/.claude/skills/plurigrid-asi-benchmark && rm -rf "$T"
manifest:
skills/benchmark/SKILL.mdsource content
Project Engines Benchmark Skill
======= description: Run and interpret basin-engines benchmarks (Steel, ember, shale) model: haiku
Basin Engines Benchmark Skill
origin/main
Run benchmarks for Steel, ember, and shale engines.
CRITICAL: Read Before Benchmarking
<<<<<<< HEAD
ALWAYS read first: ~/p/benchmark-suite/docs/BENCHMARK_FAIRNESS.md
~/p/benchmark-suite/docs/BENCHMARK_FAIRNESS.mdALWAYS read first:
~/p/basin-bench/docs/BENCHMARK_FAIRNESS.md
origin/main
This document contains hard-won lessons about benchmark fairness. Ignoring it leads to misleading claims.
Pre-Benchmark Checklist
| Check | Why | How |
|---|---|---|
| <<<<<<< HEAD | ||
| Read BENCHMARK_FAIRNESS.md | Contains all fairness lessons | |
| ======= | ||
| Read BENCHMARK_FAIRNESS.md | Contains all fairness lessons | |
origin/main | Use
for LMDB/redb | 7-24x improvement with proper config | Add--batched| | Scale sled cache | Undersized cache = 17x slower | Add--batched --batch-size 1000for 1M+ records | | Check dataset vs RAM | If data fits in RAM, you're measuring memory | Use larger datasets for I/O testing |--cache-mb 2048
Note: Steel uses verify-once checksums (like RocksDB/WiredTiger) - verify on first read from disk, then trust page cache. Use
FileLayoutConfig::fast() to disable checksums entirely for ZFS/ECC storage.
Quick Commands
Steel (Oak engine)
# Build <<<<<<< HEAD cd ~/p/benchmark-suite && graft build --release -p ycsb-steel ======= cd ~/p/basin-bench && graft build --release -p ycsb-steel >>>>>>> origin/main # Single-threaded ycsb-steel --fast --data-dir /tmp/bench --workload a --records 50000 --ops 200000 # Multi-threaded with sharding ycsb-steel --fast --shards 64 --threads 4 --data-dir /tmp/bench --workload a --records 50000 --ops 200000 # Ultimate adversarial benchmark (vs sled) <<<<<<< HEAD cd ~/p/engine-stack/engines/steel ======= cd ~/p/basin-engines/engines/steel >>>>>>> origin/main graft run --release --example ultimate_adversarial
Fair 4-Engine Comparison
# Use the fair comparison script (includes proper batching for all engines) <<<<<<< HEAD RECORDS=50000 OPS=200000 ~/p/benchmark-suite/scripts/steel-fair-compare.sh ======= RECORDS=50000 OPS=200000 ~/p/basin-bench/scripts/steel-fair-compare.sh >>>>>>> origin/main
Individual Engine Commands (Fair Config)
# Steel ycsb-steel --fast --workload a --records 50000 --ops 200000 --data-dir /tmp/bench # sled (scaled cache) ycsb-sled --high-throughput --cache-mb 256 --workload a --records 50000 --ops 200000 --data-dir /tmp/bench # LMDB (batched + nosync) ycsb-lmdb --batched --nosync --batch-size 1000 --workload a --records 50000 --ops 200000 --data-dir /tmp/bench # redb (batched) ycsb-redb --batched --batch-size 1000 --workload a --records 50000 --ops 200000 --data-dir /tmp/bench
Steel Results (2025-12-23) - Steel Wins All
Steel now beats LMDB on ALL workloads!
| Workload | Steel | LMDB | redb | sled | Winner |
|---|---|---|---|---|---|
| A (writes) | 2.49M | 2.24M | 687K | 744K | Steel +11% |
| B (reads) | 3.01M | 2.90M | 2.05M | 1.55M | Steel +3.8% |
| C (pure read) | 3.03M | 1.79M | 1.05M | 1.81M | Steel +69% |
Optimizations That Closed the Gap
Implemented (see
docs/STEEL_OPTIMIZATIONS.md):
+8.4% - zero-copy reads (KEY WIN)get_ref()
+1% - thread-local epochget_cached_epoch()
- seqlock skip (no gain, kept for API)get_fast()
Gap closed! Previous 43% gap on Workload B eliminated via zero-copy optimization.
Where Steel Actually Wins
| Scenario | Steel Advantage | Notes |
|---|---|---|
| Write-heavy (Workload A) | 1.07x vs LMDB | COW efficiency |
| Pure reads (Workload C) | 1.52x vs LMDB | Zero-copy mmap |
| Cold reads after restart | 3x vs sled | No log replay |
| Range scans | 3.4x vs sled | COW pages |
| Simplicity | ~6K LOC vs 20K+ | Easier to understand/debug |
Sharded Write Performance (2025-12-25)
With 64 shards, Steel beats sled by 2.3x:
| Writers | Shards | Steel writes/s | vs sled |
|---|---|---|---|
| 1 | 16 | 3.0M | 149% |
| 4 | 64 | 10.8M | 230% |
| 8 | 64 | 16.8M | 237% |
Where Steel Does NOT Win
| Scenario | Winner | Notes |
|---|---|---|
| Multi-key transactions | redb/LMDB | Steel has single-key atomicity only |
| 30+ years production hardening | LMDB | Ecosystem maturity |
Common Mistakes (Avoid These)
| Mistake | What Happens | Fix |
|---|---|---|
Benchmark LMDB without | 7.9x slower | Use |
Benchmark redb without | 24x slower | Use |
| Claim "47x faster than redb" | Misleading | Fair comparison is ~1.9x |
| Small dataset (50MB) | Memory-bound, not I/O | Use 500MB+ for I/O testing |
| Forget to clear between engines | Cache effects | Sleep or clear page cache |
Key Files
| Purpose | Location |
|---|---|
| <<<<<<< HEAD | |
| Steel YCSB | |
| Fair script | |
| Fairness docs | |
| Steel benchmarks | |
| Roadmap to #1 | |
| Ultimate adversarial | |
| ======= | |
| Steel YCSB | |
| Fair script | |
| Fairness docs | |
| Steel benchmarks | |
| Roadmap to #1 | |
| Ultimate adversarial | |
origin/main
Dialectical Improvement
When benchmarking, always ask:
- "What would a competitor's maintainer criticize about this benchmark?"
- "Am I using each engine's recommended configuration?"
- "What am I NOT measuring that matters?"
- "Is this result surprising? If so, investigate before publishing."