Babysitter performance-benchmark-suite
SDK performance benchmarking and regression detection
install
source · Clone the upstream repo
git clone https://github.com/a5c-ai/babysitter
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/sdk-platform-development/skills/performance-benchmark-suite" ~/.claude/skills/a5c-ai-babysitter-performance-benchmark-suite && rm -rf "$T"
manifest:
library/specializations/sdk-platform-development/skills/performance-benchmark-suite/SKILL.mdsource content
Performance Benchmark Suite Skill
Overview
This skill implements comprehensive SDK performance benchmarking, tracking latency, throughput, memory usage, and detecting performance regressions across versions.
Capabilities
- Measure latency percentiles (p50, p95, p99)
- Track memory usage and allocation patterns
- Detect performance regressions automatically
- Generate visual benchmark reports
- Compare performance across SDK versions
- Implement microbenchmarks for critical paths
- Configure continuous benchmarking in CI
- Support load testing scenarios
Target Processes
- Performance Benchmarking
- SDK Testing Strategy
- SDK Versioning and Release Management
Integration Points
- k6 for load testing
- Artillery for HTTP benchmarking
- hyperfine for CLI benchmarking
- Benchmark.js for JavaScript
- pytest-benchmark for Python
- Continuous benchmark systems (Bencher)
Input Requirements
- Performance requirements (SLOs)
- Benchmark scenarios
- Baseline versions for comparison
- Environment specifications
- Reporting requirements
Output Artifacts
- Benchmark test suite
- Performance baseline data
- Regression detection rules
- Visual benchmark reports
- CI benchmark configuration
- Historical trend analysis
Usage Example
skill: name: performance-benchmark-suite context: tool: k6 scenarios: - name: basic-crud operations: ["create", "read", "update", "delete"] vus: 10 duration: "30s" - name: high-load vus: 100 duration: "5m" slos: p95_latency: "100ms" p99_latency: "500ms" error_rate: "0.1%" compareWith: "v1.0.0" regressionThreshold: "10%"
Best Practices
- Establish baselines before optimization
- Track percentiles, not just averages
- Run benchmarks in consistent environments
- Automate regression detection in CI
- Monitor memory alongside latency
- Document benchmark methodology