Gsd-skill-creator statistical-computing

Computational tools and algorithms for statistical analysis. Covers simulation, resampling methods (bootstrap, permutation tests), Monte Carlo methods, random number generation, numerical optimization (Newton-Raphson, EM algorithm), cross-validation, and reproducible analysis workflows. Emphasizes the bootstrap revolution and the shift from formula-based to computation-based inference. Use when implementing statistical procedures, running simulations, bootstrapping confidence intervals, performing cross-validation, or building reproducible analysis pipelines.

install

source · Clone the upstream repo

git clone https://github.com/Tibsfox/gsd-skill-creator

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Tibsfox/gsd-skill-creator "$T" && mkdir -p ~/.claude/skills && cp -r "$T/examples/skills/statistics/statistical-computing" ~/.claude/skills/tibsfox-gsd-skill-creator-statistical-computing && rm -rf "$T"

manifest: examples/skills/statistics/statistical-computing/SKILL.md

source content

Statistical Computing

Statistical computing transformed the discipline. Before 1970, inference depended on mathematical formulas and tables. The bootstrap (Efron, 1979), permutation tests, and Monte Carlo methods showed that a computer could replace analytical derivations with brute-force resampling -- and often provide more accurate answers with fewer assumptions. This skill covers the computational toolkit that modern statistics depends on.

Agent affinity: efron (bootstrap, computational methods), box (simulation for model checking), george (simulation-based pedagogy)

Concept IDs: stat-probability-foundations, stat-hypothesis-testing, stat-descriptive-statistics

The Bootstrap

The idea

Given a sample of n observations, the bootstrap generates new "samples" by resampling with replacement from the original data. Each bootstrap sample has size n. The distribution of a statistic across many bootstrap samples approximates the sampling distribution of that statistic.

Algorithm (nonparametric bootstrap)

Observe data x_1, x_2, ..., x_n.
For b = 1, 2, ..., B (typically B = 1000 to 10000): a. Draw a sample of size n with replacement from the original data. b. Compute the statistic of interest T_b (e.g., mean, median, regression coefficient).
The distribution of T_1, T_2, ..., T_B approximates the sampling distribution of T.

Bootstrap confidence intervals

Method	Construction	Properties
Percentile	[T_(alpha/2), T_(1-alpha/2)] from bootstrap distribution	Simple; can be biased for skewed distributions
Basic (reverse percentile)	[2T_obs - T_(1-alpha/2), 2T_obs - T_(alpha/2)]	Corrects for some bias
BCa (bias-corrected and accelerated)	Adjusts percentiles for bias and skewness	Gold standard; requires more computation
Studentized	Bootstrap the t-statistic, not the raw estimate	Best coverage properties; most complex

When to use the bootstrap:

When the sampling distribution of the statistic has no closed-form formula.
When the sample size is too small for the CLT to apply.
When you want to check whether a formula-based interval is trustworthy.
For statistics that are not simple means (medians, ratios, trimmed means, correlation coefficients).

When the bootstrap fails

Very small samples (n < 10): not enough data to resample meaningfully.
Extreme quantiles: the bootstrap cannot reliably estimate the tails of a distribution.
Non-i.i.d. data: dependent data (time series, clustered data) require block bootstrap or other modifications.
Infinite variance distributions: the bootstrap relies on finite variance for consistency.

Permutation Tests

The idea

To test whether two groups differ, randomly permute the group labels many times and compute the test statistic under each permutation. The proportion of permutations where the test statistic is as extreme as the observed one is the p-value.

Algorithm

Observe test statistic T_obs from the original data.
For b = 1, 2, ..., B: a. Randomly permute the group labels. b. Compute T_b under the permuted labels.
p-value = (number of T_b >= T_obs) / B (for a one-tailed test).

Advantages

Exact test: the permutation distribution is the exact null distribution (no distributional assumptions).
Valid for any test statistic, not just t or F.
Works with small samples.
Conceptually transparent: "if the groups really don't differ, shuffling labels shouldn't matter."

Limitations

Computationally expensive for large samples (but approximate permutation tests with B = 10000 are fast).
Tests equality of distributions, not just means.
Does not produce confidence intervals (but can be inverted to do so).

Monte Carlo Simulation

For probability estimation

When a probability is too complex to compute analytically, simulate the experiment many times and estimate the probability as the proportion of times the event occurs.

Example: Estimate P(at least two people in a group of 23 share a birthday).

Simulate 10000 groups of 23 random birthdays.
Count how many groups have at least one shared birthday.
Estimated probability = count / 10000.

For statistical properties

Simulation is the standard way to study the properties of statistical procedures:

Bias: Simulate data from a known model, apply the estimator, and compare the average estimate to the true parameter.
Coverage: Generate confidence intervals from many simulated datasets and check what fraction contain the true parameter.
Power: Simulate data under the alternative hypothesis and check what fraction of tests reject H_0.
Robustness: Simulate data that violate assumptions and check how the procedure degrades.

Random number generation

All simulation depends on pseudo-random number generators (PRNGs).

Uniform generation: Mersenne Twister (MT19937) is the standard. Period of 2^19937 - 1.
From uniform to any distribution: Inverse CDF method (F^(-1)(U)), acceptance-rejection, Box-Muller (for normals).
Seeds and reproducibility: Always set the random seed before a simulation so results are reproducible.

Cross-Validation

The idea

To estimate how well a model generalizes to new data, partition the data into training and validation sets. Fit the model on training data, evaluate on validation data.

Methods

Method	Procedure	Properties
Holdout	Split data 70/30 or 80/20	Simple but high variance; depends on the split
k-fold CV	Split into k equal parts; rotate which part is held out	Standard choice (k = 5 or 10); lower variance than holdout
Leave-one-out (LOOCV)	k = n; each observation is held out once	Unbiased but high variance; expensive
Repeated k-fold	Run k-fold CV multiple times with different splits	Lower variance than single k-fold
Stratified k-fold	Each fold preserves the class distribution	Essential for imbalanced classification

Bias-variance tradeoff in CV

Small k (e.g., k = 2): High bias (training sets are small), low variance.
Large k (e.g., LOOCV): Low bias (training sets are nearly full-sized), but high variance (folds are very similar).
k = 10: Widely accepted as a good compromise.

Numerical Optimization

Maximum likelihood estimation

Most statistical models are fit by maximizing the log-likelihood:

theta-hat = argmax_theta log L(theta | data).

For simple models, this has a closed form. For complex models, numerical optimization is required.

Newton-Raphson method

theta_{t+1} = theta_t - [H(theta_t)]^(-1) * g(theta_t)

where g is the gradient (score) and H is the Hessian (observed information matrix). Converges quadratically near the maximum.

Fisher scoring

Replace the Hessian with the expected information matrix I(theta). More stable than Newton-Raphson when far from the maximum. This is the standard method for fitting generalized linear models.

EM algorithm

For models with latent variables (mixture models, missing data):

E-step: Compute the expected value of the complete-data log-likelihood, given current parameter estimates and observed data.
M-step: Maximize that expected log-likelihood to get new parameter estimates.
Repeat until convergence.

The EM algorithm always increases the likelihood but can converge to local maxima. Run from multiple starting points.

Reproducible Analysis

Principles

Script everything. No point-and-click analyses. Every step from data loading to final figure should be in code.
Set seeds. Every simulation and resampling procedure should record its random seed.
Version control. Track analysis code in git.
Document dependencies. Record the versions of all packages used.
Literate programming. Use Jupyter notebooks, R Markdown, or Quarto to interleave code, output, and narrative.

Common tools

Tool	Language	Strengths
R + tidyverse	R	Statistical modeling, visualization (ggplot2), CRAN ecosystem
Python + scipy/statsmodels	Python	General-purpose, integration with ML (scikit-learn), Jupyter
Stan	Stan (called from R/Python)	Bayesian modeling, MCMC, Hamiltonian Monte Carlo
Julia	Julia	Speed for numerical work, growing statistics ecosystem

Common Mistakes

Mistake	Why it fails	Fix
Too few bootstrap replicates	Noisy confidence intervals	Use B >= 1000 for CIs; B >= 10000 for BCa
Not setting random seeds	Results are not reproducible	Set seed before every simulation
Using LOOCV when k-fold suffices	LOOCV has high variance and is expensive	Use 10-fold CV as default
Ignoring bootstrap failures	Small n, extreme quantiles, or dependence violate bootstrap assumptions	Check bootstrap conditions; use block bootstrap for dependent data
Fitting and evaluating on the same data	Overly optimistic performance estimates	Always use held-out data or cross-validation

Cross-References

efron agent: Bootstrap methods, empirical Bayes, computational statistics philosophy.
box agent: Simulation for model checking, iterative model building.
george agent: Simulation-based inference as a teaching approach.
probability-theory skill: Random variables and distributions that simulation draws from.
inferential-statistics skill: The hypothesis tests and intervals that computational methods extend.
bayesian-methods skill: MCMC as a computational Bayesian tool.

References

Efron, B. (1979). "Bootstrap methods: another look at the jackknife." The Annals of Statistics, 7(1), 1-26.
Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC.
Good, P. I. (2005). Permutation, Parametric, and Bootstrap Tests of Hypotheses. 3rd edition. Springer.
Rizzo, M. L. (2019). Statistical Computing with R. 2nd edition. CRC Press.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning. 2nd edition. Springer.