Awesome-Agent-Skills-for-Empirical-Research data-analysis
End-to-end R data analysis for the sewage project. Writes analysis scripts following project conventions (here::here, arrow/parquet, fixest, modelsummary, native pipe), runs code review, and produces publication-ready tables and figures. This skill should be used when asked to "run an analysis", "estimate the model", "add a specification", or "write an R script".
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/41-sticerd-eee-sewage-econometrics-check/skills/data-analysis" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-data-analysis-d2a28c && rm -rf "$T"
manifest:
skills/41-sticerd-eee-sewage-econometrics-check/skills/data-analysis/SKILL.mdsource content
Data Analysis
Run an end-to-end data analysis following sewage project conventions.
Input:
$ARGUMENTS — a dataset path, analysis goal description, or specification to estimate.
Project-Specific Context
Analysis Organisation
Scripts in
scripts/R/09_analysis/ by approach:
— Maps, scatter plots, Google Trends01_descriptive/
— Cross-sectional hedonic regressions02_hedonic/
— Repeat-transaction regressions03_repeat_sales/
— 250m grid-level long differences04_long_difference/
— DiD and event studies with media coverage05_news/
— Directional spillover06_upstream_downstream/
— Dry spill analysis07_dry_spills/
Datasets
— Analysis-ready datasetsdata/final/
— Intermediate pipeline outputs (parquet)data/processed/- All data loaded via
orarrow::read_parquet()arrow::open_dataset()
Output Destinations
- Tables:
(modelsummary → LaTeX with tabularray)output/tables/*.tex - Figures:
oroutput/figures/*.pdf*.png - Regression objects:
output/regs/*.rds - HTML interactive:
output/html_plots/
Required R Conventions
for all pathshere::here()- Native pipe
|>
for regressions withfixest::feols()vcov = "hetero"
for table output (tabularray format,modelsummary
placement)[H]
for parquet I/Oarrow
namingsnake_case
for factorsforcats::as_factor()
Workflow
Step 1: Context Gathering
- Understand the analysis goal from
$ARGUMENTS - Read existing analysis scripts in the relevant subdirectory for patterns
- Read
if spill metrics are involvedscripts/R/utils/spill_aggregation_utils.R - Check
for available datasetsdata/final/ - Read the relevant manuscript section in
if the analysis feeds into the paperdocs/overleaf/
Step 2: Write Analysis Script
Follow the analysis script structure:
# ================================================================ # [Descriptive Title] # Purpose: [What this script does] # Inputs: [Data files] # Outputs: [Figures, tables, RDS files] # ================================================================ # === 1. Setup ============================================ library(tidyverse) library(fixest) library(modelsummary) library(arrow) library(here) # === 2. Data Loading ===================================== df <- read_parquet(here("data", "final", "dataset.parquet")) # === 3. Main Analysis ==================================== model <- feols( log_price ~ spill_count | lsoa + year_quarter, data = df, vcov = "hetero" ) # === 4. Tables and Figures ================================ modelsummary( list("Main" = model), output = here("output", "tables", "table_name.tex"), fmt = 3 ) # === 5. Export ============================================ saveRDS(model, here("output", "regs", "model_name.rds"))
Step 3: Code Review
After writing the script, review it against the 9 categories from
/review-r:
- Script structure, console hygiene, reproducibility
- Function design, figure quality, data persistence
- Comments, error handling, polish
Fix any Critical or Major issues before presenting.
Step 4: Run the Script
If the user wants execution:
cd /Users/jacopoolivieri/Library/CloudStorage/Dropbox/01_projects/sewage Rscript scripts/R/09_analysis/[subdir]/[script_name].R
Step 5: Present Results
- Results summary — Key estimates with SEs and economic interpretation
- Script created — Path and description
- Output files — Tables and figures generated
- Code review notes — Any conventions to flag
- TODO items — Missing data, additional specifications needed
Principles
- Reproduce, don't guess. If a specific regression is requested, implement exactly that.
- Strategy alignment. If an analysis feeds into a manuscript section, the code must implement what the paper claims.
- Publication-ready output. Tables and figures should be directly includable in the paper.
- Follow existing patterns. Read neighbouring scripts in the same subdirectory for style consistency.
- Save everything. Every regression object saved as RDS, every table as LaTeX, every figure as PDF.