Claude-Skills ab-test-setup
install
source · Clone the upstream repo
git clone https://github.com/borghei/Claude-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/borghei/Claude-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/marketing/ab-test-setup" ~/.claude/skills/borghei-claude-skills-ab-test-setup && rm -rf "$T"
manifest:
marketing/ab-test-setup/SKILL.mdsource content
A/B Test Setup Skill
Overview
Production-ready A/B testing toolkit for calculating sample sizes, designing rigorous test plans, and analyzing results with statistical significance testing. Designed for growth teams, product managers, and marketers who need to make data-driven decisions from controlled experiments.
Quick Start
# Calculate required sample sizes for a test python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80 # Design a complete A/B test plan python scripts/test_designer.py test_config.json # Analyze A/B test results python scripts/results_analyzer.py results.json
Tools Overview
| Tool | Purpose | Input | Output |
|---|---|---|---|
| Sample size calculation | Baseline rate, MDE, power | Required samples + duration |
| Test plan design | JSON test config | Complete test plan document |
| Results analysis | JSON with test results | Statistical analysis + recommendation |
Workflows
Workflow 1: New A/B Test Setup
- Define hypothesis and success metric
- Run
with baseline conversion and minimum detectable effectsample_size_calculator.py - Create test configuration JSON (see Common Patterns)
- Run
to generate complete test plantest_designer.py - Share plan with stakeholders for alignment before launch
Workflow 2: Test Results Analysis
- Collect test results into JSON format
- Run
to get statistical significanceresults_analyzer.py - Review confidence interval, p-value, and effect size
- Check for segment-level effects if overall result is inconclusive
- Make ship/no-ship decision based on analysis
Workflow 3: Experimentation Program Review
- Compile results from multiple past tests
- Run
on all resultsresults_analyzer.py --batch - Review win rate, average effect size, and velocity
- Identify patterns in winning vs losing tests
- Optimize test pipeline based on learnings
Reference Documentation
See
references/ab-testing-guide.md for comprehensive methodology covering:
- Statistical foundations (z-tests, confidence intervals)
- Sample size theory and trade-offs
- Common experimentation pitfalls
- Multi-variant and sequential testing
- Bayesian vs frequentist approaches
Common Patterns
Pattern: Test Configuration JSON
{ "test_name": "Homepage CTA Button Color", "hypothesis": "Changing the CTA button from blue to green will increase click-through rate", "metric_primary": "cta_click_rate", "metric_secondary": ["signup_rate", "bounce_rate"], "baseline_rate": 0.045, "minimum_detectable_effect": 0.10, "significance_level": 0.05, "power": 0.80, "variants": [ {"name": "control", "description": "Current blue CTA button"}, {"name": "treatment", "description": "Green CTA button"} ], "daily_traffic": 5000, "allocation": {"control": 0.50, "treatment": 0.50} }
Pattern: Test Results JSON
{ "test_name": "Homepage CTA Button Color", "variants": { "control": {"visitors": 12500, "conversions": 563}, "treatment": {"visitors": 12500, "conversions": 625} }, "metric": "cta_click_rate", "significance_level": 0.05 }
Quick Reference: Common Effect Sizes
| Context | Small Effect | Medium Effect | Large Effect |
|---|---|---|---|
| Conversion Rate | 2-5% relative | 5-15% relative | > 15% relative |
| Revenue per User | 1-3% | 3-8% | > 8% |
| Engagement Rate | 3-5% | 5-10% | > 10% |