Awesome-Agent-Skills-for-Empirical-Research causal-inference

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/23-Learning-Bayesian-Statistics-baygent-skills/causal-inference" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-causal-inference-be790c && rm -rf "$T"

manifest: skills/23-Learning-Bayesian-Statistics-baygent-skills/causal-inference/SKILL.md

source content

Causal Inference

Dependencies

This skill requires the bayesian-workflow skill for all PyMC modeling steps (priors, sampling, diagnostics, calibration, reporting).

Detect it:

ls ~/.claude/skills/bayesian-workflow/SKILL.md 2>/dev/null || ls .claude/skills/bayesian-workflow/SKILL.md 2>/dev/null

If not found, install it:

git clone https://github.com/Learning-Bayesian-Statistics/baygent-skills.git /tmp/baygent-skills
cp -r /tmp/baygent-skills/bayesian-workflow ~/.claude/skills/

For all PyMC modeling steps (priors, sampling, diagnostics, calibration, reporting), follow the bayesian-workflow skill.

Workflow overview

Every causal analysis follows this sequence. Steps 1-4 are the thinking phase (no code). Steps 5-8 are the doing phase. Think before you do.

Formulate the causal question — Propose precise estimand (ATE, ATT, LATE, etc.). ⚠️ ASK USER TO CONFIRM.
Draw the DAG — Propose causal graph with nodes, edges, and explicit non-edges. ⚠️ ASK USER TO CONFIRM. See references/dags-and-identification.md
Identify — Determine identification strategy (backdoor, front-door, IV, RDD, DiD). ⚠️ ASK USER TO CONFIRM untestable assumptions. See references/dags-and-identification.md
Choose design — Match problem to method using table below. ⚠️ ASK USER TO CONFIRM. See references/quasi-experiments.md or references/structural-models.md
Estimate — Build and fit the model. Delegate all PyMC mechanics to bayesian-workflow skill.
Refute — MANDATORY. Run design-specific robustness checks. See references/refutation.md
Interpret — Effect size + decision-relevant HDIs + probability of direction.
Report — Generate causal analysis report. See references/reporting.md

Design selection guide

Design	Use when	Key assumption	Tool
DiD	Treatment at known time, control group available	Parallel trends	CausalPy
Staggered DiD	Treatment rolls out at different times	Parallel trends per cohort	CausalPy
Synthetic Control	Single treated unit, donor pool available	Weighted donors approximate counterfactual	CausalPy
ITS	Time series, intervention at known time, no control	No confounding event at treatment time	CausalPy
RDD	Treatment by threshold on running variable	No manipulation at threshold	CausalPy
IV	Endogenous treatment, valid instrument	Exclusion restriction, relevance	CausalPy
IPSW	Observational data, treatment modeled	No unmeasured confounders, positivity	CausalPy
Structural (do/observe)	Full causal theory, model mechanisms	Correct DAG specification	PyMC
Counterfactual	"What would Y have been if X differed?"	Correct structural model	PyMC

Critical rules

No estimation without a confirmed DAG. A causal graph is not optional decoration — it makes assumptions explicit and determines the adjustment set. If the user resists, explain why the DAG is non-negotiable before proceeding.
No causal claims without refutation. Every design has failure modes. Run at minimum one design-specific robustness check (placebo test, sensitivity analysis, falsification test) before reporting results. See references/refutation.md.
State assumptions before results. Lead with what must be true for the estimate to be causal. Bury the estimate after the assumptions, not before. This is not optional politeness — it prevents misuse of results.
Adapt HDIs to the decision context. The bayesian-workflow skill's 94% HDI is a sensible default; adapt it with explicit explanation when the decision stakes warrant it (e.g., 89% for exploratory, 97% for high-stakes policy). Report multiple intervals when the decision threshold matters.
Downgrade causal language when warranted. If identification assumptions are unverifiable or refutation raises flags, soften claims: "consistent with a causal effect" not "causes", "estimated effect" not "true effect". Flag uncertainty loudly in the report.
Ask the user when domain knowledge is needed. You cannot know whether an instrument is valid, whether parallel trends holds, or whether a confounder exists without domain expertise. Ask before assuming.
Delegate PyMC mechanics to bayesian-workflow. This skill handles causal structure and design. The bayesian-workflow skill handles priors, sampling, diagnostics, calibration, and reporting format. Don't duplicate those rules here.

Common gotchas

These are battle-tested lessons that save hours of debugging:

CausalPy formula syntax uses
C()
for categoricals. Passing a string column directly without
```
C()
```
will silently produce wrong dummy coding. Always wrap categorical treatment and group variables:
```
"y ~ C(treatment) + C(group)"
```
.
DoWhy requires explicit
U
nodes for unobserved confounders. Omitting them from the graph will make DoWhy treat your model as fully identified when it isn't. Add latent nodes explicitly and mark them as unobserved.
CausalPy's PyMC models don't auto-store log-likelihood. Same issue as bayesian-workflow: nutpie silently drops it. Call
```
pm.compute_log_likelihood(idata, model=model)
```
after sampling if you need it for model comparison.
Parallel trends is untestable in the post-treatment period. Pre-treatment trend tests are necessary but not sufficient — passing them doesn't prove the assumption holds after treatment. State this explicitly in every DiD report.
Synthetic control requires the treated unit to lie within the convex hull of donors. If the treated unit is an outlier (highest GDP, largest city), no weighted combination of donors can approximate its counterfactual. Check this before running — if violated, the design is invalid.
DiD group variable must be dummy-coded (0/1). CausalPy rejects string labels like "treatment"/"control". Use integers: 1 = treatment, 0 = control. Data also requires a
```
unit
```
column.
SyntheticControl expects wide-format data. Index = time, columns = unit names, values = outcome. If your data is long format, pivot first:
```
df.pivot(index="date", columns="unit", values="outcome")
```
.

When things go wrong

Symptom	Likely cause	Fix
Refutation fails	Assumption violated	Diagnose which assumption, try alternative design or sensitivity bounds
DiD effect at placebo time	Parallel trends violated	Try synthetic control or add group-specific time trends
RDD: bunching at threshold	Manipulation of running variable	Design is invalid for this threshold — report and stop
SC: poor pre-treatment fit	Donors don't span treated unit	Add donors, expand donor pool, or reconsider design
DoWhy says "not identifiable"	Insufficient adjustment set	Revise DAG, add measured variables, or change design
CausalPy formula error	Wrong formula syntax	Use `C()` for categoricals, check variable names match dataframe columns