Babysitter A/B Test Statistical Analyzer

Performs statistical analysis for A/B testing experiments

install

source · Clone the upstream repo

git clone https://github.com/a5c-ai/babysitter

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/data-engineering-analytics/skills/ab-test-statistical-analyzer" ~/.claude/skills/a5c-ai-babysitter-a-b-test-statistical-analyzer && rm -rf "$T"

manifest: library/specializations/data-engineering-analytics/skills/ab-test-statistical-analyzer/SKILL.md

A/B Test Statistical Analyzer

Overview

Performs statistical analysis for A/B testing experiments. This skill provides rigorous statistical methods to determine experiment validity and significance.

Capabilities

Sample size calculation
Statistical significance testing
Bayesian analysis
Sequential testing
Multi-armed bandit analysis
Segment analysis
Novelty/primacy effect detection
SRM (Sample Ratio Mismatch) detection
Confidence interval calculation
Power analysis

Input Schema

{
  "experimentData": {
    "control": "object",
    "variants": ["object"]
  },
  "metrics": [{
    "name": "string",
    "type": "conversion|continuous|ratio"
  }],
  "analysisType": "frequentist|bayesian|sequential"
}

Output Schema

{
  "results": [{
    "metric": "string",
    "controlValue": "number",
    "variantValues": ["number"],
    "pValue": "number",
    "confidenceInterval": "object",
    "significant": "boolean"
  }],
  "srmCheck": "object",
  "recommendation": "string"
}

Target Processes

A/B Testing Pipeline
Feature Store Setup

Usage Guidelines

Provide complete experiment data for control and variants
Define metrics with appropriate types
Select analysis methodology based on requirements
Review SRM checks before interpreting results

Best Practices

Always check for sample ratio mismatch before analysis
Use appropriate statistical tests for metric types
Consider practical significance alongside statistical significance
Account for multiple comparison corrections
Document assumptions and limitations