Awesome-Agent-Skills-for-Empirical-Research replicate-paper

Skill: /replicate-paper

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/28-maxwell2732-paper-replicate-agent-demo/dot-claude/skills/replicate-paper" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-replicate-paper && rm -rf "$T"

manifest: skills/28-maxwell2732-paper-replicate-agent-demo/dot-claude/skills/replicate-paper/SKILL.md

source content

Skill: /replicate-paper

Trigger:

/replicate-paper [paper.pdf] [data.csv|dta]

or "replicate this paper"

Purpose: Full 6-phase autonomous replication of a biomedical/epidemiology paper using UK Biobank or similar data. Produces Python and R scripts plus a polished validation report.

Invocation

/replicate-paper papers/AuthorYear.pdf data/ukb_extract.csv

Or with just: "replicate this paper" (Claude will ask for paths if not provided).

The 6-Phase Pipeline

Phase 1: Intake

Goal: Understand exactly what needs to be replicated.

Read the paper PDF (all sections: Abstract, Methods, Results, Supplementary)
Identify every table and figure that presents empirical results
For each: record the gold standard values, SEs/CIs, sample sizes, and source location

Save targets to

quality_reports/[paper_name]_replication_targets.md

Summarize: original software, data source, sample N, key methods, any replication package available

Output:

quality_reports/[paper_name]_replication_targets.md

Phase 2: Data Audit

Goal: Confirm what we can and cannot replicate given the available data.

Load the provided dataset (
```
data/[filename]
```
)
Compare to paper's described sample:
- Total N, exposed N, event counts
- Key variable distributions
- Missing data patterns
Apply inclusion/exclusion criteria as stated in Methods; document each step's effect on N
If variables are missing or differently named: document the gap; flag as a known discrepancy

Save audit summary to

quality_reports/[paper_name]_data_audit.md

Output:

quality_reports/[paper_name]_data_audit.md

Phase 3: Code Analysis

Goal: Map the paper's methods to our dataset before writing a single line of code.

Read original Stata/R code (if provided in replication package)
Map each variable name in original code → corresponding variable in our dataset
Identify methodological steps: sample construction, covariate coding, model fitting, SE clustering
Flag any steps where original code differs from Methods text (use the paper, not the code, as ground truth)

Document the mapping in

quality_reports/[paper_name]_variable_map.md

Output:

quality_reports/[paper_name]_variable_map.md

Phase 4: Translation

Goal: Produce clean, reproducible Python and R scripts that implement the paper's analysis.

Rules:

Line-by-line translation first — no improvements during replication

python-code-conventions.md

and

r-code-conventions.md

exactly

Set seed:

random.seed(YYYYMMDD)

numpy.random.seed(YYYYMMDD)

(Python);

set.seed(YYYYMMDD)

(R)

Use
```
pathlib.Path
```
for all Python paths;
```
here::here()
```
for all R paths
Comment every non-obvious Stata→Python or Stata→R translation decision
Refer to
```
replication-protocol.md
```
translation pitfall tables

Python script:

replications/[paper_name]/python/replicate.py

Structure:

# Replication: [Paper Author (Year)]
# Date: YYYY-MM-DD
# Original: Stata / R
# Python version: X.Y.Z
# Key packages: pandas X.X, statsmodels X.X, lifelines X.X

from pathlib import Path
import random
import numpy as np
import pandas as pd
# ... other imports

random.seed(YYYYMMDD)
np.random.seed(YYYYMMDD)

DATA_DIR = Path(__file__).parents[3] / "data"
RESULTS_DIR = Path(__file__).parent / "results"
RESULTS_DIR.mkdir(exist_ok=True)

# --- 1. Load Data ---
# --- 2. Sample Construction ---
# --- 3. Model Fitting ---
# --- 4. Save Results ---

R script:

replications/[paper_name]/R/replicate.R

Structure:

# Replication: [Paper Author (Year)]
# Date: YYYY-MM-DD
# Original: Stata / Python
# R version: X.Y.Z
# Key packages: survival X.X, fixest X.X

library(here)
library(tidyverse)
library(survival)
# ... other packages

set.seed(YYYYMMDD)

data_dir <- here("data")
results_dir <- here("replications", "[paper_name]", "R", "results")
dir.create(results_dir, recursive = TRUE, showWarnings = FALSE)

# --- 1. Load Data ---
# --- 2. Sample Construction ---
# --- 3. Model Fitting ---
# --- 4. Save Results ---

Outputs:

replications/[paper_name]/python/replicate.py

```
replications/[paper_name]/R/replicate.R
```

replications/[paper_name]/python/results/

(parquet/pkl files)

```
replications/[paper_name]/R/results/
```
(rds files)

Phase 5: Validation

Goal: Run both scripts and compare results to gold standard targets.

Execute Python script:

python replications/[paper_name]/python/replicate.py

Execute R script:

Rscript replications/[paper_name]/R/replicate.R

Load results; compare to targets using tolerance thresholds from
```
replication-protocol.md
```
:
- Integers: exact
- Point estimates: ±0.01
- SEs: ±0.05
- P-values: same significance bracket
- Percentages: ±0.1pp
For each mismatch: investigate root cause before proceeding

Save

replications/[paper_name]/validation_report.md

Output:

replications/[paper_name]/validation_report.md

Phase 6: Report

Goal: Produce a polished, self-contained replication report.

Report structure:

# Replication Report: [Paper Author (Year)]
**Date:** [YYYY-MM-DD]
**Replicator:** Claude (domain-reviewer verified)

## Paper Summary
[1 paragraph: research question, population, exposure, outcome, key finding]

## Methods Summary
[Bullet list: sample, exclusions, covariates, model, SEs, software]

## Data
[Bullet list: our dataset, N after exclusions, any discrepancies vs. paper sample]

## Results Comparison

| Target | Table/Fig | Paper Value | Our Value (Python) | Our Value (R) | Diff | Status |
|--------|-----------|-------------|-------------------|---------------|------|--------|

## Discrepancies
[Each discrepancy: what, investigated how, resolved or not]

## Corrective Steps Taken
[Any adjustments made during validation and why]

## Verdict
**[REPLICATED / PARTIAL / FAILED]**
- Targets matched: N / Total
- Remaining discrepancies: [list or "none"]

## Reproducibility
- Python: X.Y.Z | pandas X.X | statsmodels X.X | lifelines X.X
- R: X.Y.Z | survival X.X | fixest X.X
- Data: [filename, UKB application ID if applicable]
- Seed: YYYYMMDD

Save to:

reports/[paper_name]_replication_report.md

After saving: run domain-reviewer agent on the report.

Quality Gate

After Phase 6, score the output. Minimum 80/100 to commit.

Auto-commit if score >= 80:

git add replications/[paper_name]/ reports/[paper_name]_replication_report.md quality_reports/[paper_name]_*.md
git commit -m "Replicate [Paper Author (Year)] -- [VERDICT]: N/Total targets matched"

Failure Modes & Recovery

Failure	Recovery
Script syntax error	Fix before proceeding
N mismatch > 5%	Stop, audit inclusion/exclusion criteria
All point estimates off by same factor	Check unit conversion (HR vs. log-HR, OR vs. log-OR)
SEs systematically too large	Check clustering level
Cannot install package	Document, note in report, use closest alternative
Data variable missing	Document gap; attempt proxy; flag as ASSUMED in report