Babysitter A/B Test Design
Statistical experiment design and analysis capabilities for product experimentation
install
source · Clone the upstream repo
git clone https://github.com/a5c-ai/babysitter
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/product-management/skills/ab-test-design" ~/.claude/skills/a5c-ai-babysitter-a-b-test-design && rm -rf "$T"
manifest:
library/specializations/product-management/skills/ab-test-design/SKILL.mdsource content
A/B Test Design Skill
Overview
Specialized skill for statistical experiment design and analysis capabilities. Enables product teams to design rigorous experiments, calculate sample sizes, and interpret results with statistical confidence.
Capabilities
Experiment Design
- Calculate required sample sizes for experiments
- Design experiment variants and hypotheses
- Define success metrics and guardrail metrics
- Create experiment documentation templates
- Design multi-variant tests (A/B/n)
- Plan sequential and Bayesian experiments
Statistical Analysis
- Validate statistical significance of results
- Calculate practical significance and effect sizes
- Detect interaction effects and segments
- Perform power analysis
- Calculate confidence intervals
- Handle multiple comparison corrections
Decision Support
- Recommend ship/iterate/kill decisions
- Identify segment-specific impacts
- Assess long-term vs short-term effects
- Generate experiment reports
- Track experiment velocity metrics
Target Processes
This skill integrates with the following processes:
- Validation experiments for PMF hypothesesproduct-market-fit.js
- Funnel optimization experimentsconversion-funnel-analysis.js
- A/B testing during beta phasesbeta-program.js
Input Schema
{ "type": "object", "properties": { "experimentType": { "type": "string", "enum": ["ab", "multivariate", "sequential", "bandit"], "description": "Type of experiment to design" }, "hypothesis": { "type": "string", "description": "Hypothesis to test" }, "primaryMetric": { "type": "object", "properties": { "name": { "type": "string" }, "baseline": { "type": "number" }, "mde": { "type": "number", "description": "Minimum detectable effect" } } }, "guardrailMetrics": { "type": "array", "items": { "type": "string" }, "description": "Metrics that should not regress" }, "trafficAllocation": { "type": "number", "description": "Percentage of traffic for experiment" }, "confidenceLevel": { "type": "number", "default": 0.95, "description": "Statistical confidence level" } }, "required": ["experimentType", "hypothesis", "primaryMetric"] }
Output Schema
{ "type": "object", "properties": { "experimentPlan": { "type": "object", "properties": { "name": { "type": "string" }, "hypothesis": { "type": "string" }, "variants": { "type": "array", "items": { "type": "object" } }, "sampleSize": { "type": "number" }, "duration": { "type": "string" }, "metrics": { "type": "object" } } }, "powerAnalysis": { "type": "object", "properties": { "requiredSampleSize": { "type": "number" }, "estimatedDuration": { "type": "string" }, "power": { "type": "number" } } }, "implementation": { "type": "object", "properties": { "trackingEvents": { "type": "array", "items": { "type": "string" } }, "segmentation": { "type": "array", "items": { "type": "string" } }, "rolloutPlan": { "type": "string" } } }, "analysisFramework": { "type": "object", "properties": { "primaryAnalysis": { "type": "string" }, "secondaryAnalyses": { "type": "array", "items": { "type": "string" } }, "decisionCriteria": { "type": "object" } } } } }
Usage Example
const experimentDesign = await executeSkill('ab-test-design', { experimentType: 'ab', hypothesis: 'Adding social proof to pricing page increases conversion by 10%', primaryMetric: { name: 'pricing_page_conversion', baseline: 0.05, mde: 0.10 }, guardrailMetrics: ['revenue_per_visitor', 'bounce_rate'], trafficAllocation: 50, confidenceLevel: 0.95 });
Dependencies
- Statistical libraries for power analysis
- Experimentation platform integrations (Optimizely, LaunchDarkly, etc.)