Awesome-Agent-Skills-for-Empirical-Research causal-inference

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/23-Learning-Bayesian-Statistics-baygent-skills/causal-inference" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-causal-inference-be790c && rm -rf "$T"
manifest: skills/23-Learning-Bayesian-Statistics-baygent-skills/causal-inference/SKILL.md
source content

Causal Inference

Dependencies

This skill requires the bayesian-workflow skill for all PyMC modeling steps (priors, sampling, diagnostics, calibration, reporting).

Detect it:

ls ~/.claude/skills/bayesian-workflow/SKILL.md 2>/dev/null || ls .claude/skills/bayesian-workflow/SKILL.md 2>/dev/null

If not found, install it:

git clone https://github.com/Learning-Bayesian-Statistics/baygent-skills.git /tmp/baygent-skills
cp -r /tmp/baygent-skills/bayesian-workflow ~/.claude/skills/

For all PyMC modeling steps (priors, sampling, diagnostics, calibration, reporting), follow the bayesian-workflow skill.

Workflow overview

Every causal analysis follows this sequence. Steps 1-4 are the thinking phase (no code). Steps 5-8 are the doing phase. Think before you do.

  1. Formulate the causal question — Propose precise estimand (ATE, ATT, LATE, etc.). ⚠️ ASK USER TO CONFIRM.
  2. Draw the DAG — Propose causal graph with nodes, edges, and explicit non-edges. ⚠️ ASK USER TO CONFIRM. See references/dags-and-identification.md
  3. Identify — Determine identification strategy (backdoor, front-door, IV, RDD, DiD). ⚠️ ASK USER TO CONFIRM untestable assumptions. See references/dags-and-identification.md
  4. Choose design — Match problem to method using table below. ⚠️ ASK USER TO CONFIRM. See references/quasi-experiments.md or references/structural-models.md
  5. Estimate — Build and fit the model. Delegate all PyMC mechanics to bayesian-workflow skill.
  6. Refute — MANDATORY. Run design-specific robustness checks. See references/refutation.md
  7. Interpret — Effect size + decision-relevant HDIs + probability of direction.
  8. Report — Generate causal analysis report. See references/reporting.md

Design selection guide

DesignUse whenKey assumptionTool
DiDTreatment at known time, control group availableParallel trendsCausalPy
Staggered DiDTreatment rolls out at different timesParallel trends per cohortCausalPy
Synthetic ControlSingle treated unit, donor pool availableWeighted donors approximate counterfactualCausalPy
ITSTime series, intervention at known time, no controlNo confounding event at treatment timeCausalPy
RDDTreatment by threshold on running variableNo manipulation at thresholdCausalPy
IVEndogenous treatment, valid instrumentExclusion restriction, relevanceCausalPy
IPSWObservational data, treatment modeledNo unmeasured confounders, positivityCausalPy
Structural (do/observe)Full causal theory, model mechanismsCorrect DAG specificationPyMC
Counterfactual"What would Y have been if X differed?"Correct structural modelPyMC

Critical rules

  • No estimation without a confirmed DAG. A causal graph is not optional decoration — it makes assumptions explicit and determines the adjustment set. If the user resists, explain why the DAG is non-negotiable before proceeding.
  • No causal claims without refutation. Every design has failure modes. Run at minimum one design-specific robustness check (placebo test, sensitivity analysis, falsification test) before reporting results. See references/refutation.md.
  • State assumptions before results. Lead with what must be true for the estimate to be causal. Bury the estimate after the assumptions, not before. This is not optional politeness — it prevents misuse of results.
  • Adapt HDIs to the decision context. The bayesian-workflow skill's 94% HDI is a sensible default; adapt it with explicit explanation when the decision stakes warrant it (e.g., 89% for exploratory, 97% for high-stakes policy). Report multiple intervals when the decision threshold matters.
  • Downgrade causal language when warranted. If identification assumptions are unverifiable or refutation raises flags, soften claims: "consistent with a causal effect" not "causes", "estimated effect" not "true effect". Flag uncertainty loudly in the report.
  • Ask the user when domain knowledge is needed. You cannot know whether an instrument is valid, whether parallel trends holds, or whether a confounder exists without domain expertise. Ask before assuming.
  • Delegate PyMC mechanics to bayesian-workflow. This skill handles causal structure and design. The bayesian-workflow skill handles priors, sampling, diagnostics, calibration, and reporting format. Don't duplicate those rules here.

Common gotchas

These are battle-tested lessons that save hours of debugging:

  • CausalPy formula syntax uses
    C()
    for categoricals.
    Passing a string column directly without
    C()
    will silently produce wrong dummy coding. Always wrap categorical treatment and group variables:
    "y ~ C(treatment) + C(group)"
    .
  • DoWhy requires explicit
    U
    nodes for unobserved confounders.
    Omitting them from the graph will make DoWhy treat your model as fully identified when it isn't. Add latent nodes explicitly and mark them as unobserved.
  • CausalPy's PyMC models don't auto-store log-likelihood. Same issue as bayesian-workflow: nutpie silently drops it. Call
    pm.compute_log_likelihood(idata, model=model)
    after sampling if you need it for model comparison.
  • Parallel trends is untestable in the post-treatment period. Pre-treatment trend tests are necessary but not sufficient — passing them doesn't prove the assumption holds after treatment. State this explicitly in every DiD report.
  • Synthetic control requires the treated unit to lie within the convex hull of donors. If the treated unit is an outlier (highest GDP, largest city), no weighted combination of donors can approximate its counterfactual. Check this before running — if violated, the design is invalid.
  • DiD group variable must be dummy-coded (0/1). CausalPy rejects string labels like "treatment"/"control". Use integers: 1 = treatment, 0 = control. Data also requires a
    unit
    column.
  • SyntheticControl expects wide-format data. Index = time, columns = unit names, values = outcome. If your data is long format, pivot first:
    df.pivot(index="date", columns="unit", values="outcome")
    .

When things go wrong

SymptomLikely causeFix
Refutation failsAssumption violatedDiagnose which assumption, try alternative design or sensitivity bounds
DiD effect at placebo timeParallel trends violatedTry synthetic control or add group-specific time trends
RDD: bunching at thresholdManipulation of running variableDesign is invalid for this threshold — report and stop
SC: poor pre-treatment fitDonors don't span treated unitAdd donors, expand donor pool, or reconsider design
DoWhy says "not identifiable"Insufficient adjustment setRevise DAG, add measured variables, or change design
CausalPy formula errorWrong formula syntaxUse
C()
for categoricals, check variable names match dataframe columns