Awesome-Agent-Skills-for-Empirical-Research causal-ml

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/11-James-Traina-compound-science/skills/causal-ml" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-causal-ml && rm -rf "$T"
manifest: skills/11-James-Traina-compound-science/skills/causal-ml/SKILL.md
source content

Causal Machine Learning

Reference for semiparametric ML estimators: DML with cross-fitting, generalized random forests, debiased regularization, and nuisance function approximation. Covers Neyman-orthogonal moment conditions, sample splitting, plug-in bias correction, and heterogeneous treatment effects.

When to Use This Skill

Use when the user is:

  • Estimating treatment effects with high-dimensional controls (p large relative to n)
  • Interested in heterogeneous treatment effects (CATE) as a primary estimand
  • Applying ML for flexible nuisance function estimation within a causal framework
  • Implementing cross-fitting, sample splitting, or Neyman-orthogonal estimators
  • Using
    econml
    ,
    DoubleML
    , or
    grf
    packages

Skip when:

  • Sample is small (n < 500 — ML nuisance models need data)
  • A well-specified parametric model is available and defensible
  • The task is standard IV/DiD/RDD without high-dimensional controls (use
    causal-inference
    skill)
  • Structural modeling is needed (use
    structural-modeling
    skill)
  • The task needs formal identification proof (use
    identification-proofs
    skill)

Where to Start

  • Choosing a method? Jump to Method Selection Guide
  • ATE with many controls? See
    references/dml.md
  • Heterogeneous treatment effects? See
    references/grf-meta-learners.md
  • Variable selection for controls? See
    references/high-dim-cross-fitting.md
  • Reporting HTE results? See
    references/hte-inference.md
  • Connecting to traditional methods? See
    references/connections-traditional.md

Causal ML vs Traditional Methods

DimensionTraditional (IV, DiD, RDD)Causal ML
Functional formParametricNonparametric / semi-parametric
High-dimensional controlsProblematicNative support
Heterogeneous effectsSecondary (subgroup analysis)Primary estimand (CATE)
Sample requirementsModerate NML nuisance needs large N
IdentificationExplicit (IV, DiD, RCT)Same assumptions — ML is estimation, not identification

Critical point: Causal ML does not relax identification assumptions. If you need a valid instrument, parallel trends, or no unmeasured confounding, those must still hold.


Double Machine Learning (DML)

DML (Chernozhukov et al. 2018) fixes regularization bias in naive ML-in-regression. Partial out controls X from both Y and D using separate ML nuisance models, then regress residuals. Two properties: Neyman orthogonality (moment condition locally insensitive to nuisance error) and cross-fitting (prevents overfitting bias).

PLR (Partially Linear Regression): $Y = \theta D + g(X) + \varepsilon$. Workhorse for continuous or binary D with ATE under selection on observables. IRM (Interactive Regression Model): relaxes additive separability for binary D with heterogeneous effects.

Full implementation (Python/R code, cross-fitting from scratch, diagnostics) in

references/dml.md
.

Causal Forests

Causal forests (Wager-Athey 2018; Athey-Tibshirani-Wager 2019) estimate CATE $\tau(x) = E[Y(1)-Y(0)|X=x]$ using honest forests (structure learned on one subsample, effects estimated on another). Use when CATE is the primary estimand and n $\geq$ 2,000. Always run the calibration test before reporting heterogeneity.

R (

grf
) and Python (
econml
) implementations, ATE/ATT extraction, BLP projections in
references/grf-meta-learners.md
.

Meta-Learners

Decompose CATE estimation into supervised learning sub-problems. DR-Learner (Kennedy 2023): best properties when both nuisance models are well-specified. T-Learner: simplest baseline. X-Learner: designed for imbalanced treatment. For applied work: DR-Learner primary, T-Learner benchmark. Large disagreement signals nuisance model problems.

All implementations in

references/grf-meta-learners.md
.

High-Dimensional Controls

PDS-LASSO (Belloni-Chernozhukov-Hansen 2014): separate LASSOes of Y on X and D on X, union of selected variables, then OLS. Works at moderate n (~200 with sparse confounders). See

references/high-dim-cross-fitting.md
.

HTE Inference

Before reporting CATE, test for genuine heterogeneity using BLP calibration test. Do not report heterogeneous effects if calibration test fails (p > 0.10). See

references/hte-inference.md
.


Method Selection Guide

Decision Heuristic

1. n < 500? → Use standard methods (causal-inference skill)
2. High-dim controls (p > 20), want ATE? → PDS-LASSO or DML-PLR; binary D → DML-IRM
3. CATE is primary estimand? → Causal Forest (large n) or DR-Learner (doubly robust)
4. Endogenous treatment with instrument? → DML-PLIV
5. Treatment is rare/imbalanced? → X-Learner
6. Quick benchmark? → Always compute T-Learner as baseline

Full Method Comparison

MethodEstimandPythonRMin nKey diagnostic
DML-PLRATE
doubleml
,
econml
DoubleML
~500Nuisance R², residual balance
DML-IRMATE (binary D)
doubleml
,
econml
DoubleML
~500Propensity AUC, trim threshold
DML-PLIVLATE
doubleml
,
econml
DoubleML
~1,000Effective F-stat
Causal ForestCATE(x)
econml
grf
~2,000Calibration test, ATE match
DR-LearnerCATE(x)
econml.dr
manual/
grf
~1,000Propensity calibration
PDS-LASSOATE (high-dim X)
sklearn
+ manual
hdm
~200Union size, penalty sensitivity
X-LearnerCATE (imbalanced D)
econml
manual~1,000Compare to DR-Learner

Limitations to State Explicitly

  • ML needs data: Causal forests need n $\geq$ 2,000; DML needs n $\geq$ 500. Below these, use parametric methods.
  • Identification is not relaxed: ML is better nuisance estimation, not weaker assumptions.
  • CATE inference is hard: Individual-level CIs are conservative; policy targeting requires care.
  • Publication: DML and causal forests are mainstream in top applied micro journals. Compare to traditional estimators.

Connections to Traditional Methods

Causal ML nests traditional estimators: DML with linear nuisance = OLS (Frisch-Waugh), DML + IV = PLIV, causal forests + instrument = heterogeneous LATE (

grf::instrumental_forest
), post-LASSO + many instruments = sparse instrument selection then 2SLS. Details in
references/connections-traditional.md
.


Integration with Plugin

Agents:

econometric-reviewer
(post-estimation review, table/code consistency),
identification-critic
(IV/PLIV assumptions),
numerical-auditor
(convergence, seeding, Monte Carlo validation).

Cross-references:

empirical-playbook
skill →
sensitivity-analysis.md
(specification curve over ML choices),
empirical-playbook
skill →
diagnostic-battery.md
(nuisance R², overlap, calibration),
numerical-auditor
agent (synthetic data with known CATE).

Relationship to

causal-inference
skill: Use
causal-inference
to establish identification; use
causal-ml
for implementation with high-dimensional controls or when heterogeneity is primary. Complements, not substitutes.

Reference Files

  • references/dml.md
    — Full DML implementation: PLR, IRM, PLIV with econml/DoubleML, cross-fitting, diagnostics
  • references/grf-meta-learners.md
    — Causal forests (grf/econml), DR/T/S/X-Learner, calibration tests
  • references/high-dim-cross-fitting.md
    — PDS-LASSO, Belloni-Chernozhukov-Hansen, cross-fitting protocols
  • references/hte-inference.md
    — Calibration tests, individual CATE CIs, BLP projections, subgroup analysis
  • references/connections-traditional.md
    — DML-OLS equivalence, PLIV, instrumental forests, post-LASSO