Awesome-Agent-Skills-for-Empirical-Research review-r

R code review for the sewage project. Checks script structure, reproducibility, function design, figure quality, and professional polish against project conventions (here::here, arrow/parquet, fixest, modelsummary, native pipe). This skill should be used when asked to "review the code", "check my script", or "code review".

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/41-sticerd-eee-sewage-econometrics-check/skills/review-r" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-review-r-fe5743 && rm -rf "$T"
manifest: skills/41-sticerd-eee-sewage-econometrics-check/skills/review-r/SKILL.md
source content

Review R Code

Run a code quality review on R scripts in the sewage project. Produces a report — does NOT edit source files.

Input:

$ARGUMENTS
— a
.R
filename, directory path, or
all
.


Project-Specific Conventions

Script Structure

Pipeline scripts (layers 01-06):

Header block (########) → roxygen description → initialise_environment() → setup_logging() → CONFIG list → functions → main() → conditional execution (sys.nframe() == 0)

Analysis scripts (09_analysis):

Numbered sections with # === separators, inline package loading, direct execution (no main() wrapper)

Required Conventions

  • Paths:
    here::here()
    — never relative paths or
    setwd()
  • Data formats:
    arrow
    (parquet) for intermediate/final; DuckDB for large joins
  • Pipe: Native R pipe
    |>
    (not
    %>%
    )
  • Naming:
    snake_case
    for functions/variables,
    UPPER_SNAKE_CASE
    for constants
  • Econometrics:
    fixest::feols()
    for regressions
  • Tables:
    modelsummary
    → LaTeX with
    tabularray
    format
  • SE:
    vcov = "hetero"
    for heteroskedasticity-robust
  • Factors:
    forcats::as_factor()
    /
    forcats::fct_drop()

Key Files

  • Scripts:
    scripts/R/01_data_ingestion/
    through
    scripts/R/09_analysis/
  • Utilities:
    scripts/R/utils/
  • Output:
    output/{figures,tables,html_plots,regs,log}/

Workflow

Step 1: Identify Scripts

  • If
    $ARGUMENTS
    is a specific
    .R
    file: review that file
  • If
    $ARGUMENTS
    is a directory: review all
    .R
    files in that directory
  • If
    $ARGUMENTS
    is
    all
    : review all scripts in
    scripts/R/
  • If no argument: ask which scripts to review

Step 2: Review Each Script (9 Categories)

Category 1: Script Structure

  • Pipeline scripts: header block, roxygen,
    initialise_environment()
    ,
    CONFIG
    list,
    main()
    , conditional execution
  • Analysis scripts: numbered sections with
    # ===
    separators
  • Clear separation of concerns

Category 2: Console Hygiene

  • No unnecessary
    print()
    /
    cat()
    pollution
  • Logging via
    setup_logging()
    in pipeline scripts
  • Clean output — only meaningful messages

Category 3: Reproducibility

  • set.seed()
    where randomness is involved
  • All paths via
    here::here()
    — no
    setwd()
    , no relative paths, no hardcoded absolute paths
  • No hardcoded values that should be in CONFIG

Category 4: Function Design

  • DRY — no duplicated code blocks
  • Functions at appropriate abstraction level
  • Shared utilities in
    scripts/R/utils/
    where reuse is needed

Category 5: Figure Quality

  • Axis labels present and readable
  • Appropriate dimensions (not too wide/narrow)
  • Consistent theme across plots
  • Alpha transparency for overlapping points
  • Saved to
    output/figures/
    via
    here::here()

Category 6: Data Persistence

  • Intermediate results saved as parquet (
    arrow::write_parquet()
    )
  • saveRDS()
    for R-specific objects
  • No orphaned temporary files

Category 7: Comments

  • Explain why, not what
  • Section headers for navigation
  • No commented-out dead code

Category 8: Error Handling

  • Graceful failures with informative messages in pipeline scripts
  • File existence checks before reading
  • Data validation where appropriate

Category 9: Polish

  • Consistent style (indentation, spacing)
  • No dead code or unused imports
  • Clean namespace —
    library()
    calls at top
  • Native pipe
    |>
    (not magrittr
    %>%
    )

Step 3: Present Summary

## Code Review: [filename/directory]
**Date:** YYYY-MM-DD
**Scripts reviewed:** N

### Issues by Severity
| Script | Critical | Major | Minor |
|--------|----------|-------|-------|
| ... | ... | ... | ... |

### Top 3 Critical Issues
1. ...
2. ...
3. ...

### Conventions Compliance
- [ ] here::here() for all paths
- [ ] Native pipe |>
- [ ] arrow/parquet for data
- [ ] fixest for regressions
- [ ] modelsummary for tables

### Score: XX / 100

Save report to

output/log/code_review_[target].md
.

Step 4: IMPORTANT

Do NOT edit any source files. Only produce reports. Fixes are applied after user review.


Principles

  • Report only, never edit. The reviewer is a critic, not a creator.
  • Project conventions matter. Flag deviations from the conventions above, not personal preferences.
  • Proportional severity. A missing
    set.seed()
    is Major. A missing comment is Minor. Using
    setwd()
    is Critical.
  • Language-specific. Check R idioms — vectorised operations over loops, tidyverse patterns, proper use of factors.
  • Pipeline vs analysis distinction. Different structure expectations for pipeline scripts vs analysis scripts.