Claude-code-templates statistical-analysis
Statistical analysis toolkit. Hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, Bayesian stats, power analysis, assumption checks, APA reporting, for academic research.
git clone https://github.com/davila7/claude-code-templates
T=$(mktemp -d) && git clone --depth=1 https://github.com/davila7/claude-code-templates "$T" && mkdir -p ~/.claude/skills && cp -r "$T/cli-tool/components/skills/scientific/statistical-analysis" ~/.claude/skills/davila7-claude-code-templates-statistical-analysis && rm -rf "$T"
cli-tool/components/skills/scientific/statistical-analysis/SKILL.mdStatistical Analysis
Overview
Statistical analysis is a systematic process for testing hypotheses and quantifying relationships. Conduct hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, and Bayesian analyses with assumption checks and APA reporting. Apply this skill for academic research.
When to Use This Skill
This skill should be used when:
- Conducting statistical hypothesis tests (t-tests, ANOVA, chi-square)
- Performing regression or correlation analyses
- Running Bayesian statistical analyses
- Checking statistical assumptions and diagnostics
- Calculating effect sizes and conducting power analyses
- Reporting statistical results in APA format
- Analyzing experimental or observational data for research
Core Capabilities
1. Test Selection and Planning
- Choose appropriate statistical tests based on research questions and data characteristics
- Conduct a priori power analyses to determine required sample sizes
- Plan analysis strategies including multiple comparison corrections
2. Assumption Checking
- Automatically verify all relevant assumptions before running tests
- Provide diagnostic visualizations (Q-Q plots, residual plots, box plots)
- Recommend remedial actions when assumptions are violated
3. Statistical Testing
- Hypothesis testing: t-tests, ANOVA, chi-square, non-parametric alternatives
- Regression: linear, multiple, logistic, with diagnostics
- Correlations: Pearson, Spearman, with confidence intervals
- Bayesian alternatives: Bayesian t-tests, ANOVA, regression with Bayes Factors
4. Effect Sizes and Interpretation
- Calculate and interpret appropriate effect sizes for all analyses
- Provide confidence intervals for effect estimates
- Distinguish statistical from practical significance
5. Professional Reporting
- Generate APA-style statistical reports
- Create publication-ready figures and tables
- Provide complete interpretation with all required statistics
Workflow Decision Tree
Use this decision tree to determine your analysis path:
START │ ├─ Need to SELECT a statistical test? │ └─ YES → See "Test Selection Guide" │ └─ NO → Continue │ ├─ Ready to check ASSUMPTIONS? │ └─ YES → See "Assumption Checking" │ └─ NO → Continue │ ├─ Ready to run ANALYSIS? │ └─ YES → See "Running Statistical Tests" │ └─ NO → Continue │ └─ Need to REPORT results? └─ YES → See "Reporting Results"
Test Selection Guide
Quick Reference: Choosing the Right Test
Use
references/test_selection_guide.md for comprehensive guidance. Quick reference:
Comparing Two Groups:
- Independent, continuous, normal → Independent t-test
- Independent, continuous, non-normal → Mann-Whitney U test
- Paired, continuous, normal → Paired t-test
- Paired, continuous, non-normal → Wilcoxon signed-rank test
- Binary outcome → Chi-square or Fisher's exact test
Comparing 3+ Groups:
- Independent, continuous, normal → One-way ANOVA
- Independent, continuous, non-normal → Kruskal-Wallis test
- Paired, continuous, normal → Repeated measures ANOVA
- Paired, continuous, non-normal → Friedman test
Relationships:
- Two continuous variables → Pearson (normal) or Spearman correlation (non-normal)
- Continuous outcome with predictor(s) → Linear regression
- Binary outcome with predictor(s) → Logistic regression
Bayesian Alternatives: All tests have Bayesian versions that provide:
- Direct probability statements about hypotheses
- Bayes Factors quantifying evidence
- Ability to support null hypothesis
- See
references/bayesian_statistics.md
Assumption Checking
Systematic Assumption Verification
ALWAYS check assumptions before interpreting test results.
Use the provided
scripts/assumption_checks.py module for automated checking:
from scripts.assumption_checks import comprehensive_assumption_check # Comprehensive check with visualizations results = comprehensive_assumption_check( data=df, value_col='score', group_col='group', # Optional: for group comparisons alpha=0.05 )
This performs:
- Outlier detection (IQR and z-score methods)
- Normality testing (Shapiro-Wilk test + Q-Q plots)
- Homogeneity of variance (Levene's test + box plots)
- Interpretation and recommendations
Individual Assumption Checks
For targeted checks, use individual functions:
from scripts.assumption_checks import ( check_normality, check_normality_per_group, check_homogeneity_of_variance, check_linearity, detect_outliers ) # Example: Check normality with visualization result = check_normality( data=df['score'], name='Test Score', alpha=0.05, plot=True ) print(result['interpretation']) print(result['recommendation'])
What to Do When Assumptions Are Violated
Normality violated:
- Mild violation + n > 30 per group → Proceed with parametric test (robust)
- Moderate violation → Use non-parametric alternative
- Severe violation → Transform data or use non-parametric test
Homogeneity of variance violated:
- For t-test → Use Welch's t-test
- For ANOVA → Use Welch's ANOVA or Brown-Forsythe ANOVA
- For regression → Use robust standard errors or weighted least squares
Linearity violated (regression):
- Add polynomial terms
- Transform variables
- Use non-linear models or GAM
See
references/assumptions_and_diagnostics.md for comprehensive guidance.
Running Statistical Tests
Python Libraries
Primary libraries for statistical analysis:
- scipy.stats: Core statistical tests
- statsmodels: Advanced regression and diagnostics
- pingouin: User-friendly statistical testing with effect sizes
- pymc: Bayesian statistical modeling
- arviz: Bayesian visualization and diagnostics
Example Analyses
T-Test with Complete Reporting
import pingouin as pg import numpy as np # Run independent t-test result = pg.ttest(group_a, group_b, correction='auto') # Extract results t_stat = result['T'].values[0] df = result['dof'].values[0] p_value = result['p-val'].values[0] cohens_d = result['cohen-d'].values[0] ci_lower = result['CI95%'].values[0][0] ci_upper = result['CI95%'].values[0][1] # Report print(f"t({df:.0f}) = {t_stat:.2f}, p = {p_value:.3f}") print(f"Cohen's d = {cohens_d:.2f}, 95% CI [{ci_lower:.2f}, {ci_upper:.2f}]")
ANOVA with Post-Hoc Tests
import pingouin as pg # One-way ANOVA aov = pg.anova(dv='score', between='group', data=df, detailed=True) print(aov) # If significant, conduct post-hoc tests if aov['p-unc'].values[0] < 0.05: posthoc = pg.pairwise_tukey(dv='score', between='group', data=df) print(posthoc) # Effect size eta_squared = aov['np2'].values[0] # Partial eta-squared print(f"Partial η² = {eta_squared:.3f}")
Linear Regression with Diagnostics
import statsmodels.api as sm from statsmodels.stats.outliers_influence import variance_inflation_factor # Fit model X = sm.add_constant(X_predictors) # Add intercept model = sm.OLS(y, X).fit() # Summary print(model.summary()) # Check multicollinearity (VIF) vif_data = pd.DataFrame() vif_data["Variable"] = X.columns vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])] print(vif_data) # Check assumptions residuals = model.resid fitted = model.fittedvalues # Residual plots import matplotlib.pyplot as plt fig, axes = plt.subplots(2, 2, figsize=(12, 10)) # Residuals vs fitted axes[0, 0].scatter(fitted, residuals, alpha=0.6) axes[0, 0].axhline(y=0, color='r', linestyle='--') axes[0, 0].set_xlabel('Fitted values') axes[0, 0].set_ylabel('Residuals') axes[0, 0].set_title('Residuals vs Fitted') # Q-Q plot from scipy import stats stats.probplot(residuals, dist="norm", plot=axes[0, 1]) axes[0, 1].set_title('Normal Q-Q') # Scale-Location axes[1, 0].scatter(fitted, np.sqrt(np.abs(residuals / residuals.std())), alpha=0.6) axes[1, 0].set_xlabel('Fitted values') axes[1, 0].set_ylabel('√|Standardized residuals|') axes[1, 0].set_title('Scale-Location') # Residuals histogram axes[1, 1].hist(residuals, bins=20, edgecolor='black', alpha=0.7) axes[1, 1].set_xlabel('Residuals') axes[1, 1].set_ylabel('Frequency') axes[1, 1].set_title('Histogram of Residuals') plt.tight_layout() plt.show()
Bayesian T-Test
import pymc as pm import arviz as az import numpy as np with pm.Model() as model: # Priors mu1 = pm.Normal('mu_group1', mu=0, sigma=10) mu2 = pm.Normal('mu_group2', mu=0, sigma=10) sigma = pm.HalfNormal('sigma', sigma=10) # Likelihood y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group_a) y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group_b) # Derived quantity diff = pm.Deterministic('difference', mu1 - mu2) # Sample trace = pm.sample(2000, tune=1000, return_inferencedata=True) # Summarize print(az.summary(trace, var_names=['difference'])) # Probability that group1 > group2 prob_greater = np.mean(trace.posterior['difference'].values > 0) print(f"P(μ₁ > μ₂ | data) = {prob_greater:.3f}") # Plot posterior az.plot_posterior(trace, var_names=['difference'], ref_val=0)
Effect Sizes
Always Calculate Effect Sizes
Effect sizes quantify magnitude, while p-values only indicate existence of an effect.
See
references/effect_sizes_and_power.md for comprehensive guidance.
Quick Reference: Common Effect Sizes
| Test | Effect Size | Small | Medium | Large |
|---|---|---|---|---|
| T-test | Cohen's d | 0.20 | 0.50 | 0.80 |
| ANOVA | η²_p | 0.01 | 0.06 | 0.14 |
| Correlation | r | 0.10 | 0.30 | 0.50 |
| Regression | R² | 0.02 | 0.13 | 0.26 |
| Chi-square | Cramér's V | 0.07 | 0.21 | 0.35 |
Important: Benchmarks are guidelines. Context matters!
Calculating Effect Sizes
Most effect sizes are automatically calculated by pingouin:
# T-test returns Cohen's d result = pg.ttest(x, y) d = result['cohen-d'].values[0] # ANOVA returns partial eta-squared aov = pg.anova(dv='score', between='group', data=df) eta_p2 = aov['np2'].values[0] # Correlation: r is already an effect size corr = pg.corr(x, y) r = corr['r'].values[0]
Confidence Intervals for Effect Sizes
Always report CIs to show precision:
from pingouin import compute_effsize_from_t # For t-test d, ci = compute_effsize_from_t( t_statistic, nx=len(group1), ny=len(group2), eftype='cohen' ) print(f"d = {d:.2f}, 95% CI [{ci[0]:.2f}, {ci[1]:.2f}]")
Power Analysis
A Priori Power Analysis (Study Planning)
Determine required sample size before data collection:
from statsmodels.stats.power import ( tt_ind_solve_power, FTestAnovaPower ) # T-test: What n is needed to detect d = 0.5? n_required = tt_ind_solve_power( effect_size=0.5, alpha=0.05, power=0.80, ratio=1.0, alternative='two-sided' ) print(f"Required n per group: {n_required:.0f}") # ANOVA: What n is needed to detect f = 0.25? anova_power = FTestAnovaPower() n_per_group = anova_power.solve_power( effect_size=0.25, ngroups=3, alpha=0.05, power=0.80 ) print(f"Required n per group: {n_per_group:.0f}")
Sensitivity Analysis (Post-Study)
Determine what effect size you could detect:
# With n=50 per group, what effect could we detect? detectable_d = tt_ind_solve_power( effect_size=None, # Solve for this nobs1=50, alpha=0.05, power=0.80, ratio=1.0, alternative='two-sided' ) print(f"Study could detect d ≥ {detectable_d:.2f}")
Note: Post-hoc power analysis (calculating power after study) is generally not recommended. Use sensitivity analysis instead.
See
references/effect_sizes_and_power.md for detailed guidance.
Reporting Results
APA Style Statistical Reporting
Follow guidelines in
references/reporting_standards.md.
Essential Reporting Elements
- Descriptive statistics: M, SD, n for all groups/variables
- Test statistics: Test name, statistic, df, exact p-value
- Effect sizes: With confidence intervals
- Assumption checks: Which tests were done, results, actions taken
- All planned analyses: Including non-significant findings
Example Report Templates
Independent T-Test
Group A (n = 48, M = 75.2, SD = 8.5) scored significantly higher than Group B (n = 52, M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77, 95% CI [0.36, 1.18], two-tailed. Assumptions of normality (Shapiro-Wilk: Group A W = 0.97, p = .18; Group B W = 0.96, p = .12) and homogeneity of variance (Levene's F(1, 98) = 1.23, p = .27) were satisfied.
One-Way ANOVA
A one-way ANOVA revealed a significant main effect of treatment condition on test scores, F(2, 147) = 8.45, p < .001, η²_p = .10. Post hoc comparisons using Tukey's HSD indicated that Condition A (M = 78.2, SD = 7.3) scored significantly higher than Condition B (M = 71.5, SD = 8.1, p = .002, d = 0.87) and Condition C (M = 70.1, SD = 7.9, p < .001, d = 1.07). Conditions B and C did not differ significantly (p = .52, d = 0.18).
Multiple Regression
Multiple linear regression was conducted to predict exam scores from study hours, prior GPA, and attendance. The overall model was significant, F(3, 146) = 45.2, p < .001, R² = .48, adjusted R² = .47. Study hours (B = 1.80, SE = 0.31, β = .35, t = 5.78, p < .001, 95% CI [1.18, 2.42]) and prior GPA (B = 8.52, SE = 1.95, β = .28, t = 4.37, p < .001, 95% CI [4.66, 12.38]) were significant predictors, while attendance was not (B = 0.15, SE = 0.12, β = .08, t = 1.25, p = .21, 95% CI [-0.09, 0.39]). Multicollinearity was not a concern (all VIF < 1.5).
Bayesian Analysis
A Bayesian independent samples t-test was conducted using weakly informative priors (Normal(0, 1) for mean difference). The posterior distribution indicated that Group A scored higher than Group B (M_diff = 6.8, 95% credible interval [3.2, 10.4]). The Bayes Factor BF₁₀ = 45.3 provided very strong evidence for a difference between groups, with a 99.8% posterior probability that Group A's mean exceeded Group B's mean. Convergence diagnostics were satisfactory (all R̂ < 1.01, ESS > 1000).
Bayesian Statistics
When to Use Bayesian Methods
Consider Bayesian approaches when:
- You have prior information to incorporate
- You want direct probability statements about hypotheses
- Sample size is small or planning sequential data collection
- You need to quantify evidence for the null hypothesis
- The model is complex (hierarchical, missing data)
See
references/bayesian_statistics.md for comprehensive guidance on:
- Bayes' theorem and interpretation
- Prior specification (informative, weakly informative, non-informative)
- Bayesian hypothesis testing with Bayes Factors
- Credible intervals vs. confidence intervals
- Bayesian t-tests, ANOVA, regression, and hierarchical models
- Model convergence checking and posterior predictive checks
Key Advantages
- Intuitive interpretation: "Given the data, there is a 95% probability the parameter is in this interval"
- Evidence for null: Can quantify support for no effect
- Flexible: No p-hacking concerns; can analyze data as it arrives
- Uncertainty quantification: Full posterior distribution
Resources
This skill includes comprehensive reference materials:
References Directory
- test_selection_guide.md: Decision tree for choosing appropriate statistical tests
- assumptions_and_diagnostics.md: Detailed guidance on checking and handling assumption violations
- effect_sizes_and_power.md: Calculating, interpreting, and reporting effect sizes; conducting power analyses
- bayesian_statistics.md: Complete guide to Bayesian analysis methods
- reporting_standards.md: APA-style reporting guidelines with examples
Scripts Directory
- assumption_checks.py: Automated assumption checking with visualizations
: Complete workflowcomprehensive_assumption_check()
: Normality testing with Q-Q plotscheck_normality()
: Levene's test with box plotscheck_homogeneity_of_variance()
: Regression linearity checkscheck_linearity()
: IQR and z-score outlier detectiondetect_outliers()
Best Practices
- Pre-register analyses when possible to distinguish confirmatory from exploratory
- Always check assumptions before interpreting results
- Report effect sizes with confidence intervals
- Report all planned analyses including non-significant results
- Distinguish statistical from practical significance
- Visualize data before and after analysis
- Check diagnostics for regression/ANOVA (residual plots, VIF, etc.)
- Conduct sensitivity analyses to assess robustness
- Share data and code for reproducibility
- Be transparent about violations, transformations, and decisions
Common Pitfalls to Avoid
- P-hacking: Don't test multiple ways until something is significant
- HARKing: Don't present exploratory findings as confirmatory
- Ignoring assumptions: Check them and report violations
- Confusing significance with importance: p < .05 ≠ meaningful effect
- Not reporting effect sizes: Essential for interpretation
- Cherry-picking results: Report all planned analyses
- Misinterpreting p-values: They're NOT probability that hypothesis is true
- Multiple comparisons: Correct for family-wise error when appropriate
- Ignoring missing data: Understand mechanism (MCAR, MAR, MNAR)
- Overinterpreting non-significant results: Absence of evidence ≠ evidence of absence
Getting Started Checklist
When beginning a statistical analysis:
- Define research question and hypotheses
- Determine appropriate statistical test (use test_selection_guide.md)
- Conduct power analysis to determine sample size
- Load and inspect data
- Check for missing data and outliers
- Verify assumptions using assumption_checks.py
- Run primary analysis
- Calculate effect sizes with confidence intervals
- Conduct post-hoc tests if needed (with corrections)
- Create visualizations
- Write results following reporting_standards.md
- Conduct sensitivity analyses
- Share data and code
Support and Further Reading
For questions about:
- Test selection: See references/test_selection_guide.md
- Assumptions: See references/assumptions_and_diagnostics.md
- Effect sizes: See references/effect_sizes_and_power.md
- Bayesian methods: See references/bayesian_statistics.md
- Reporting: See references/reporting_standards.md
Key textbooks:
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences
- Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics
- Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models
- Kruschke, J. K. (2014). Doing Bayesian Data Analysis
Online resources:
- APA Style Guide: https://apastyle.apa.org/
- Statistical Consulting: Cross Validated (stats.stackexchange.com)