Claude-Skills ab-test-setup

install

source · Clone the upstream repo

git clone https://github.com/borghei/Claude-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/borghei/Claude-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/marketing/ab-test-setup" ~/.claude/skills/borghei-claude-skills-ab-test-setup && rm -rf "$T"

manifest: marketing/ab-test-setup/SKILL.md

source content

A/B Test Setup Skill

Overview

Production-ready A/B testing toolkit for calculating sample sizes, designing rigorous test plans, and analyzing results with statistical significance testing. Designed for growth teams, product managers, and marketers who need to make data-driven decisions from controlled experiments.

Quick Start

# Calculate required sample sizes for a test
python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80

# Design a complete A/B test plan
python scripts/test_designer.py test_config.json

# Analyze A/B test results
python scripts/results_analyzer.py results.json

Tools Overview

Tool	Purpose	Input	Output
`sample_size_calculator.py`	Sample size calculation	Baseline rate, MDE, power	Required samples + duration
`test_designer.py`	Test plan design	JSON test config	Complete test plan document
`results_analyzer.py`	Results analysis	JSON with test results	Statistical analysis + recommendation

Workflows

Workflow 1: New A/B Test Setup

Define hypothesis and success metric
Run
```
sample_size_calculator.py
```
with baseline conversion and minimum detectable effect
Create test configuration JSON (see Common Patterns)
Run
```
test_designer.py
```
to generate complete test plan
Share plan with stakeholders for alignment before launch

Workflow 2: Test Results Analysis

Collect test results into JSON format
Run
```
results_analyzer.py
```
to get statistical significance
Review confidence interval, p-value, and effect size
Check for segment-level effects if overall result is inconclusive
Make ship/no-ship decision based on analysis

Workflow 3: Experimentation Program Review

Compile results from multiple past tests
Run
```
results_analyzer.py --batch
```
on all results
Review win rate, average effect size, and velocity
Identify patterns in winning vs losing tests
Optimize test pipeline based on learnings

Reference Documentation

See

references/ab-testing-guide.md

for comprehensive methodology covering:

Statistical foundations (z-tests, confidence intervals)
Sample size theory and trade-offs
Common experimentation pitfalls
Multi-variant and sequential testing
Bayesian vs frequentist approaches

Common Patterns

Pattern: Test Configuration JSON

{
  "test_name": "Homepage CTA Button Color",
  "hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
  "metric_primary": "cta_click_rate",
  "metric_secondary": ["signup_rate", "bounce_rate"],
  "baseline_rate": 0.045,
  "minimum_detectable_effect": 0.10,
  "significance_level": 0.05,
  "power": 0.80,
  "variants": [
    {"name": "control", "description": "Current blue CTA button"},
    {"name": "treatment", "description": "Green CTA button"}
  ],
  "daily_traffic": 5000,
  "allocation": {"control": 0.50, "treatment": 0.50}
}

Pattern: Test Results JSON

{
  "test_name": "Homepage CTA Button Color",
  "variants": {
    "control": {"visitors": 12500, "conversions": 563},
    "treatment": {"visitors": 12500, "conversions": 625}
  },
  "metric": "cta_click_rate",
  "significance_level": 0.05
}

Quick Reference: Common Effect Sizes

Context	Small Effect	Medium Effect	Large Effect
Conversion Rate	2-5% relative	5-15% relative	> 15% relative
Revenue per User	1-3%	3-8%	> 8%
Engagement Rate	3-5%	5-10%	> 10%