Awesome-Agent-Skills-for-Empirical-Research statspai
Agent-native causal inference & econometrics toolkit for Python. 390+ functions, one import, unified API. Covers OLS, IV, DID, staggered DID, RDD, PSM, SCM, DML, Causal Forest, Meta-Learners, TMLE, neural causal models, and more. Every function returns structured result objects with self-describing schemas for LLM-driven workflows.
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/00-brycewang-stanford-StatsPAI" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-statspai && rm -rf "$T"
manifest:
skills/00-brycewang-stanford-StatsPAI/SKILL.mdsource content
StatsPAI: Agent-Native Causal Inference & Econometrics
StatsPAI is the agent-native Python package for causal inference and applied econometrics. One
import statspai as sp, 390+ functions, covering the complete empirical research workflow.
Source: https://github.com/brycewang-stanford/StatsPAI PyPI:
pip install statspai
Paper: Published in Journal of Open Source Software (JOSS)
Why StatsPAI for Agents?
StatsPAI is the first econometrics toolkit purpose-built for LLM-driven research workflows:
- Self-describing API:
,sp.list_functions()
,sp.describe_function("did")
— agents can discover and understand functions without documentation lookupsp.function_schema("rdrobust") - Unified result objects: Every function returns a
withCausalResult
,.summary()
,.plot()
,.to_latex()
,.to_word()
,.to_excel().cite() - One import: No need to juggle 20+ packages —
covers everythingimport statspai as sp - Publication-ready output: Word, Excel, LaTeX, HTML export in every function
Core Methods
Classical Econometrics
sp.regress(df, "y ~ x1 + x2", cluster="firm_id") # OLS sp.ivreg(df, "y ~ x1 | z1 + z2", cluster="state") # IV/2SLS sp.panel(df, "y ~ x1 + x2", entity="firm", time="year", model="fe") # Panel FE sp.heckman(df, "y ~ x1", "select ~ z1 + z2") # Heckman selection sp.qreg(df, "y ~ x1 + x2", quantile=0.5) # Quantile regression
Difference-in-Differences
sp.did(df, "y", "treated", "post") # Auto-dispatch (2x2 or staggered) sp.callaway_santanna(df, "y", "group", "time") # Staggered DID (CS 2021) sp.sun_abraham(df, "y", "cohort", "time") # Interaction-weighted event study sp.bacon_decomposition(df, "y", "treated", "time") # TWFE diagnostic sp.honest_did(result, method="smoothness") # Sensitivity to PT violations sp.continuous_did(df, "y", "dose", "time") # Continuous treatment
Regression Discontinuity
sp.rdrobust(df, "y", "running_var", cutoff=0) # Sharp RD (CCT 2014) sp.rdrobust(df, "y", "running_var", fuzzy="treatment") # Fuzzy RD sp.rddensity(df, "running_var") # McCrary density test sp.rdmc(df, "y", "running_var", cutoffs=[0, 5, 10]) # Multi-cutoff RD sp.rkd(df, "y", "running_var", cutoff=0) # Regression kink design
Matching & Reweighting
sp.match(df, "treatment", covariates, method="psm") # Propensity score matching sp.match(df, "treatment", covariates, method="cem") # Coarsened exact matching sp.ebalance(df, "treatment", covariates) # Entropy balancing
Synthetic Control
sp.synth(df, "y", "unit", "time", treated_unit=1, treated_period=2000) # ADH SCM sp.sdid(df, "y", "unit", "time", treated_units, treated_periods) # Synthetic DID
Machine Learning Causal Inference
sp.dml(df, "y", "treatment", controls, model="PLR") # Double/Debiased ML sp.causal_forest(df, "y", "treatment", controls) # Causal Forest (GRF) sp.metalearner(df, "y", "treatment", controls, learner="dr") # DR-Learner sp.tmle(df, "y", "treatment", controls) # Targeted MLE sp.aipw(df, "y", "treatment", controls) # Augmented IPW
Neural Causal Models
sp.tarnet(df, "y", "treatment", controls) # TARNet sp.cfrnet(df, "y", "treatment", controls) # CFRNet sp.dragonnet(df, "y", "treatment", controls) # DragonNet
Robustness & Workflow
sp.spec_curve(df, "y", "treatment", controls, specs) # Specification curve sp.robustness_report(result) # Automated robustness report sp.subgroup_analysis(df, "y", "treatment", subgroups) # Heterogeneity with Wald test result.to_latex() # Export to LaTeX result.to_word("output.docx") # Export to Word result.cite() # Auto-generate citation
Interactive Visualization (v0.6+)
fig = result.plot() sp.interactive(fig) # Stata Graph Editor-style WYSIWYG editing, 29 academic themes
Agent Integration Pattern
import statspai as sp # Step 1: Discover available functions functions = sp.list_functions() # Step 2: Understand a specific function info = sp.describe_function("callaway_santanna") # Step 3: Get JSON schema for structured calls schema = sp.function_schema("callaway_santanna") # Step 4: Execute and get structured results result = sp.callaway_santanna(df, "y", "group", "time") print(result.summary()) result.to_latex("tables/did_results.tex")
When to Use StatsPAI vs Other Packages
| Scenario | Use StatsPAI | Alternative |
|---|---|---|
| Agent-driven analysis pipeline | ✅ Best choice — self-describing API | pyfixest (no agent API) |
| Full causal inference workflow | ✅ 390+ functions, one import | Assemble 10+ R/Python packages |
| Publication-ready output needed | ✅ Word/Excel/LaTeX/HTML built-in | statsmodels (no export) |
| Staggered DID with diagnostics | ✅ CS + SA + Bacon + HonestDID | differences (partial) |
| Neural causal models | ✅ TARNet/CFRNet/DragonNet | econml (partial) |
| Stata users migrating to Python | ✅ Stata-equivalent function names | linearmodels (limited) |