Awesome-Agent-Skills-for-Empirical-Research statspai

Name: statspai
Author: brycewang-stanford

Agent-native causal inference & econometrics toolkit for Python. 390+ functions, one import, unified API. Covers OLS, IV, DID, staggered DID, RDD, PSM, SCM, DML, Causal Forest, Meta-Learners, TMLE, neural causal models, and more. Every function returns structured result objects with self-describing schemas for LLM-driven workflows.

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/00-brycewang-stanford-StatsPAI" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-statspai && rm -rf "$T"

manifest: skills/00-brycewang-stanford-StatsPAI/SKILL.md

source content

StatsPAI: Agent-Native Causal Inference & Econometrics

StatsPAI is the agent-native Python package for causal inference and applied econometrics. One

import statspai as sp

, 390+ functions, covering the complete empirical research workflow.

Source: https://github.com/brycewang-stanford/StatsPAI PyPI:

pip install statspai

Paper: Published in Journal of Open Source Software (JOSS)

Why StatsPAI for Agents?

StatsPAI is the first econometrics toolkit purpose-built for LLM-driven research workflows:

Self-describing API:
```
sp.list_functions()
```
,
```
sp.describe_function("did")
```
,
```
sp.function_schema("rdrobust")
```
— agents can discover and understand functions without documentation lookup

Unified result objects: Every function returns a

CausalResult

with

.summary()

.plot()

.to_latex()

.to_word()

.to_excel()

.cite()

One import: No need to juggle 20+ packages —
```
import statspai as sp
```
covers everything
Publication-ready output: Word, Excel, LaTeX, HTML export in every function

Core Methods

Classical Econometrics

sp.regress(df, "y ~ x1 + x2", cluster="firm_id")        # OLS
sp.ivreg(df, "y ~ x1 | z1 + z2", cluster="state")        # IV/2SLS
sp.panel(df, "y ~ x1 + x2", entity="firm", time="year", model="fe")  # Panel FE
sp.heckman(df, "y ~ x1", "select ~ z1 + z2")              # Heckman selection
sp.qreg(df, "y ~ x1 + x2", quantile=0.5)                  # Quantile regression

Difference-in-Differences

sp.did(df, "y", "treated", "post")                         # Auto-dispatch (2x2 or staggered)
sp.callaway_santanna(df, "y", "group", "time")             # Staggered DID (CS 2021)
sp.sun_abraham(df, "y", "cohort", "time")                  # Interaction-weighted event study
sp.bacon_decomposition(df, "y", "treated", "time")         # TWFE diagnostic
sp.honest_did(result, method="smoothness")                 # Sensitivity to PT violations
sp.continuous_did(df, "y", "dose", "time")                 # Continuous treatment

Regression Discontinuity

sp.rdrobust(df, "y", "running_var", cutoff=0)              # Sharp RD (CCT 2014)
sp.rdrobust(df, "y", "running_var", fuzzy="treatment")     # Fuzzy RD
sp.rddensity(df, "running_var")                            # McCrary density test
sp.rdmc(df, "y", "running_var", cutoffs=[0, 5, 10])        # Multi-cutoff RD
sp.rkd(df, "y", "running_var", cutoff=0)                   # Regression kink design

Matching & Reweighting

sp.match(df, "treatment", covariates, method="psm")        # Propensity score matching
sp.match(df, "treatment", covariates, method="cem")        # Coarsened exact matching
sp.ebalance(df, "treatment", covariates)                   # Entropy balancing

Synthetic Control

sp.synth(df, "y", "unit", "time", treated_unit=1, treated_period=2000)  # ADH SCM
sp.sdid(df, "y", "unit", "time", treated_units, treated_periods)        # Synthetic DID

Machine Learning Causal Inference

sp.dml(df, "y", "treatment", controls, model="PLR")       # Double/Debiased ML
sp.causal_forest(df, "y", "treatment", controls)           # Causal Forest (GRF)
sp.metalearner(df, "y", "treatment", controls, learner="dr")  # DR-Learner
sp.tmle(df, "y", "treatment", controls)                    # Targeted MLE
sp.aipw(df, "y", "treatment", controls)                    # Augmented IPW

Neural Causal Models

sp.tarnet(df, "y", "treatment", controls)                  # TARNet
sp.cfrnet(df, "y", "treatment", controls)                  # CFRNet
sp.dragonnet(df, "y", "treatment", controls)               # DragonNet

Robustness & Workflow

sp.spec_curve(df, "y", "treatment", controls, specs)       # Specification curve
sp.robustness_report(result)                               # Automated robustness report
sp.subgroup_analysis(df, "y", "treatment", subgroups)      # Heterogeneity with Wald test
result.to_latex()                                          # Export to LaTeX
result.to_word("output.docx")                              # Export to Word
result.cite()                                              # Auto-generate citation

Interactive Visualization (v0.6+)

fig = result.plot()
sp.interactive(fig)  # Stata Graph Editor-style WYSIWYG editing, 29 academic themes

Agent Integration Pattern

import statspai as sp

# Step 1: Discover available functions
functions = sp.list_functions()

# Step 2: Understand a specific function
info = sp.describe_function("callaway_santanna")

# Step 3: Get JSON schema for structured calls
schema = sp.function_schema("callaway_santanna")

# Step 4: Execute and get structured results
result = sp.callaway_santanna(df, "y", "group", "time")
print(result.summary())
result.to_latex("tables/did_results.tex")

When to Use StatsPAI vs Other Packages

Scenario	Use StatsPAI	Alternative
Agent-driven analysis pipeline	✅ Best choice — self-describing API	pyfixest (no agent API)
Full causal inference workflow	✅ 390+ functions, one import	Assemble 10+ R/Python packages
Publication-ready output needed	✅ Word/Excel/LaTeX/HTML built-in	statsmodels (no export)
Staggered DID with diagnostics	✅ CS + SA + Bacon + HonestDID	differences (partial)
Neural causal models	✅ TARNet/CFRNet/DragonNet	econml (partial)
Stata users migrating to Python	✅ Stata-equivalent function names	linearmodels (limited)