Clawfu-skills ab-test-stats
Calculate A/B test statistical significance. Use when: determining if test results are significant; calculating required sample size; estimating test duration; analyzing conversion experiments; making data-driven decisions
install
source · Clone the upstream repo
git clone https://github.com/guia-matthieu/clawfu-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/guia-matthieu/clawfu-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/analytics/ab-test-stats" ~/.claude/skills/guia-matthieu-clawfu-skills-ab-test-stats && rm -rf "$T"
manifest:
skills/analytics/ab-test-stats/SKILL.mdsource content
A/B Test Statistics Calculator
Calculate statistical significance for A/B tests - know when your results are real, not random chance.
When to Use This Skill
- Test analysis - Determine if results are statistically significant
- Sample planning - Calculate required sample size before testing
- Duration estimation - Know how long to run experiments
- Power analysis - Ensure tests can detect meaningful differences
What Claude Does vs What You Decide
| Claude Does | You Decide |
|---|---|
| Structures analysis frameworks | Metric definitions |
| Identifies patterns in data | Business interpretation |
| Creates visualization templates | Dashboard design |
| Suggests optimization areas | Action priorities |
| Calculates statistical measures | Decision thresholds |
Dependencies
pip install scipy numpy click
Commands
Check Significance
python scripts/main.py significance --control 1000,50 --variant 1000,65 python scripts/main.py significance --control 5000,250 --variant 5000,300 --confidence 0.99
Calculate Sample Size
python scripts/main.py sample-size --baseline 0.05 --mde 0.02 python scripts/main.py sample-size --baseline 0.10 --mde 0.01 --power 0.90
Estimate Duration
python scripts/main.py duration --traffic 1000 --baseline 0.05 --mde 0.02
Examples
Example 1: Analyze Test Results
# Control: 1000 visitors, 50 conversions (5%) # Variant: 1000 visitors, 65 conversions (6.5%) python scripts/main.py significance --control 1000,50 --variant 1000,65 # Output: # A/B Test Results # ───────────────────────── # Control: 5.00% (50/1000) # Variant: 6.50% (65/1000) # Lift: +30.0% # # Statistical Analysis # ───────────────────────── # p-value: 0.089 # Confidence: 91.1% # Result: NOT SIGNIFICANT (need 95%) # # Recommendation: Continue test for more data
Example 2: Plan Sample Size
# Baseline 5% conversion, want to detect 20% relative lift (1% absolute) python scripts/main.py sample-size --baseline 0.05 --mde 0.01 # Output: # Sample Size Calculator # ────────────────────────────── # Baseline conversion: 5.0% # Minimum detectable effect: 1.0% (20% relative) # Target conversion: 6.0% # # Required per variant: 3,842 visitors # Total required: 7,684 visitors # # At 1000 daily visitors: ~8 days
Key Concepts
| Term | Definition |
|---|---|
| p-value | Probability result is due to chance |
| Confidence | 1 - p-value (usually want 95%+) |
| Power | Probability of detecting real effect (usually 80%) |
| MDE | Minimum Detectable Effect - smallest lift worth detecting |
| Lift | Relative improvement (variant - control) / control |
When Results Are Significant
| p-value | Confidence | Verdict |
|---|---|---|
| < 0.01 | > 99% | Highly Significant ✓ |
| < 0.05 | > 95% | Significant ✓ |
| < 0.10 | > 90% | Marginally Significant |
| ≥ 0.10 | < 90% | Not Significant ✗ |
Skill Boundaries
What This Skill Does Well
- Structuring data analysis
- Identifying patterns and trends
- Creating visualization frameworks
- Calculating statistical measures
What This Skill Cannot Do
- Access your actual data
- Replace statistical expertise
- Make business decisions
- Guarantee prediction accuracy
Related Skills
- cohort-analysis - Analyze user cohorts
- funnel-analyzer - Analyze conversion funnels
Skill Metadata
- Mode: centaur
category: analytics subcategory: statistics dependencies: [scipy, numpy] difficulty: intermediate time_saved: 3+ hours/week