Awesome-Agent-Skills-for-Empirical-Research causal-inference-mixtape

This skill should be used when the user asks to "implement a DiD regression", "write a causal inference pipeline", "set up an event study", "implement instrumental variables", "run a regression discontinuity design", "build a synthetic control model", "implement propensity score matching", "write parallel trends test", "implement Bacon decomposition", or needs code templates for causal inference methods in Python, R, or Stata. Based on Scott Cunningham's Causal Inference: The Mixtape.

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/10-Jill0099-causal-inference-mixtape" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-causal-inference- && rm -rf "$T"
manifest: skills/10-Jill0099-causal-inference-mixtape/SKILL.md
source content

Causal Inference: The Mixtape — Code Skill

Practitioner-oriented causal inference skill built from Scott Cunningham's Causal Inference: The Mixtape repository. Covers 10 identification strategies with ready-to-run code templates in Python, R, and Stata.


Methods Covered

MethodPythonRStataReference
OLS / Regressionstatsmodelsestimatrreg/reghdfe
references/method-patterns.md
§1
Difference-in-Differencesstatsmodels + C()lfe/fixestxtreg/reghdfe
references/method-patterns.md
§2
Event Study (Dynamic DiD)manual lead/lagestimatrreghdfe
references/method-patterns.md
§3
Staggered DiD / TWFEstatsmodelsbacondecompbacondecomp
references/method-patterns.md
§4
Regression Discontinuitystatsmodels polynomialrdrobustrdplot/rdrobust
references/method-patterns.md
§5
Instrumental Variableslinearmodels IV2SLSAER/ivregivregress 2sls
references/method-patterns.md
§6
Synthetic Controlrpy2 → R SynthSynth + SCtoolssynth
references/method-patterns.md
§7
Matching / PSM / IPWmanual logit + weightsMatchIt + Zeligteffects/cem
references/method-patterns.md
§8
DAGs / Collider Biasdagitty (conceptual)dagitty/ggdag
references/method-patterns.md
§9
Randomization Inferencepermutation loopri2ritest
references/method-patterns.md
§10

Core Workflow

Implement a Causal Method

  1. Identify the method from the table above
  2. Load the appropriate template from
    references/method-patterns.md
  3. Adapt variable names, fixed effects, and clustering to the user's data
  4. Add robustness checks (parallel trends for DiD, McCrary for RDD, first-stage F for IV)

Choose the Right Language

ScenarioRecommendation
ML pipeline integrationPython (statsmodels + linearmodels)
Synthetic ControlR (Synth package) or Stata (synth) — Python lacks mature implementation
Bacon decompositionR (bacondecomp) or Stata — no Python equivalent
Publication-ready tablesStata (outreg2/esttab) or R (stargazer/modelsummary)
Coarsened Exact MatchingStata (cem) or R (MatchIt) — no Python equivalent
Quick prototypingPython with statsmodels

Cross-Language Equivalents

TaskPythonRStata
OLS with robust SE
smf.ols().fit(cov_type='HC1')
lm_robust()
reg y x, robust
Cluster SE
fit(cov_type='cluster', cov_kwds={'groups': g})
`felm(y ~ x0
Two-way FE
C(id) + C(time)
in formula
`felm(y ~ xid + time)`
IV / 2SLS
IV2SLS.from_formula('y ~ 1 + exog + [endog ~ inst]')
`ivreg(y ~ exoginst)`
DiD
C(treat)*C(post)
treat:post
in formula
did_multiplegt
or interaction

Key Python Patterns

DiD with Cluster-Robust SE

import statsmodels.formula.api as smf

model = smf.ols('y ~ C(treated)*C(post) + controls', data=df)
results = model.fit(cov_type='cluster', cov_kwds={'groups': df['firm_id']})

Event Study (Lead/Lag)

# Create relative time dummies
for k in range(-4, 5):
    col = f'rel_{k}' if k >= 0 else f'rel_m{abs(k)}'
    df[col] = (df['relative_time'] == k).astype(int)

# Drop t=-1 as reference
formula = 'y ~ ' + ' + '.join([c for c in rel_cols if c != 'rel_m1']) + ' + C(id) + C(year)'

IV / 2SLS

from linearmodels.iv import IV2SLS

model = IV2SLS.from_formula('y ~ 1 + exog + [endog ~ instrument]', data=df)
results = model.fit(cov_type='clustered', clusters=df['cluster_var'])

Robustness Check Patterns

MethodRequired Checks
DiDParallel trends (event study plot), placebo treatment dates
RDDMcCrary density test, bandwidth robustness (half/double IK optimal), polynomial robustness
IVFirst-stage F > 10, exclusion restriction argument, over-identification test
Synthetic ControlPre-treatment RMSPE, placebo distribution, leave-one-out
MatchingCovariate balance table, caliper sensitivity

Common Pitfalls

  1. TWFE with staggered treatment — standard two-way FE is biased when treatment timing varies. Use Bacon decomposition or Sun & Abraham / Callaway & Sant'Anna estimators.
  2. Synthetic Control with many treated units — the Synth package handles one treated unit. For multiple, use augmented synthetic control or stacked approach.
  3. RDD without McCrary test — always test for manipulation at the cutoff before estimating.
  4. IV weak instruments — report first-stage F-statistic. Below 10 indicates weak instrument bias.
  5. Python Synth gap — no mature Python Synth package exists. Use
    rpy2
    to call R's
    Synth
    from Python.

Additional Resources

Reference Files

  • references/method-patterns.md
    — Detailed code templates for all 10 methods with full examples
  • references/r-stata-comparison.md
    — Cross-language package comparison and method coverage gaps

Prompt Files

  • prompts/01-implement-method.md
    — Copy-paste prompt for implementing any causal method
  • prompts/02-robustness-checks.md
    — Copy-paste prompt for generating robustness check code