Claude-Skills ab-test-setup

install
source · Clone the upstream repo
git clone https://github.com/borghei/Claude-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/borghei/Claude-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/marketing/ab-test-setup" ~/.claude/skills/borghei-claude-skills-ab-test-setup && rm -rf "$T"
manifest: marketing/ab-test-setup/SKILL.md
source content

A/B Test Setup Skill

Overview

Production-ready A/B testing toolkit for calculating sample sizes, designing rigorous test plans, and analyzing results with statistical significance testing. Designed for growth teams, product managers, and marketers who need to make data-driven decisions from controlled experiments.

Quick Start

# Calculate required sample sizes for a test
python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80

# Design a complete A/B test plan
python scripts/test_designer.py test_config.json

# Analyze A/B test results
python scripts/results_analyzer.py results.json

Tools Overview

ToolPurposeInputOutput
sample_size_calculator.py
Sample size calculationBaseline rate, MDE, powerRequired samples + duration
test_designer.py
Test plan designJSON test configComplete test plan document
results_analyzer.py
Results analysisJSON with test resultsStatistical analysis + recommendation

Workflows

Workflow 1: New A/B Test Setup

  1. Define hypothesis and success metric
  2. Run
    sample_size_calculator.py
    with baseline conversion and minimum detectable effect
  3. Create test configuration JSON (see Common Patterns)
  4. Run
    test_designer.py
    to generate complete test plan
  5. Share plan with stakeholders for alignment before launch

Workflow 2: Test Results Analysis

  1. Collect test results into JSON format
  2. Run
    results_analyzer.py
    to get statistical significance
  3. Review confidence interval, p-value, and effect size
  4. Check for segment-level effects if overall result is inconclusive
  5. Make ship/no-ship decision based on analysis

Workflow 3: Experimentation Program Review

  1. Compile results from multiple past tests
  2. Run
    results_analyzer.py --batch
    on all results
  3. Review win rate, average effect size, and velocity
  4. Identify patterns in winning vs losing tests
  5. Optimize test pipeline based on learnings

Reference Documentation

See

references/ab-testing-guide.md
for comprehensive methodology covering:

  • Statistical foundations (z-tests, confidence intervals)
  • Sample size theory and trade-offs
  • Common experimentation pitfalls
  • Multi-variant and sequential testing
  • Bayesian vs frequentist approaches

Common Patterns

Pattern: Test Configuration JSON

{
  "test_name": "Homepage CTA Button Color",
  "hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
  "metric_primary": "cta_click_rate",
  "metric_secondary": ["signup_rate", "bounce_rate"],
  "baseline_rate": 0.045,
  "minimum_detectable_effect": 0.10,
  "significance_level": 0.05,
  "power": 0.80,
  "variants": [
    {"name": "control", "description": "Current blue CTA button"},
    {"name": "treatment", "description": "Green CTA button"}
  ],
  "daily_traffic": 5000,
  "allocation": {"control": 0.50, "treatment": 0.50}
}

Pattern: Test Results JSON

{
  "test_name": "Homepage CTA Button Color",
  "variants": {
    "control": {"visitors": 12500, "conversions": 563},
    "treatment": {"visitors": 12500, "conversions": 625}
  },
  "metric": "cta_click_rate",
  "significance_level": 0.05
}

Quick Reference: Common Effect Sizes

ContextSmall EffectMedium EffectLarge Effect
Conversion Rate2-5% relative5-15% relative> 15% relative
Revenue per User1-3%3-8%> 8%
Engagement Rate3-5%5-10%> 10%