Awesome-Agent-Skills-for-Empirical-Research r-python-translation

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/r-python-translation" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-r-python-translat && rm -rf "$T"
manifest: skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/r-python-translation/SKILL.md
source content

R-to-Python Translation Skill

R-to-Python translation reference for quantitative social science data analysis. Maps R ecosystem packages (tidyverse/dplyr, ggplot2, fixest, survey, sf, plm, lme4, marginaleffects, rdrobust) to DAAF Python equivalents (polars, plotnine, pyfixest, statsmodels, linearmodels, svy, geopandas). Use when user mentions R/RStudio background, requests R-equivalent code comments, needs to understand Python analysis code from an R perspective, or wants to translate R data analysis concepts to Python. Covers paradigm differences, verb-by-verb operation translations, regression modeling, causal inference, visualization, and workflow adaptation.

Cross-language translation reference for researchers moving between the R and Python data analysis ecosystems. This skill maps R packages, idioms, and workflows to their DAAF Python equivalents so that R-background users can audit, understand, and learn from DAAF-produced code, and so that code-producing agents can annotate their output with R equivalents when directed.

This skill is a routing hub — it provides overview tables, decision trees, and directs readers to the detailed reference files listed below. The reference files contain the exhaustive verb-by-verb mappings, code examples, and edge-case documentation.

What This Skill Does

  • Maps the R data analysis ecosystem to DAAF's Python stack across data wrangling, modeling, visualization, causal inference, surveys, spatial analysis, and workflow tooling
  • Provides a structured annotation protocol for agents to add inline R-equivalent comments to Python code
  • Identifies paradigm gaps where R and Python diverge fundamentally, so users know where to expect friction

Use cases:

  1. R user auditing DAAF Python code and needing to understand what operations are being performed
  2. Agent annotating code with R-equivalent comments for an R-background researcher
  3. R user learning Python for data analysis and needing a conceptual bridge
  4. Translating a specific R operation or idiom to its Python equivalent
  5. Understanding where R tools have no direct Python equivalent (and what the workaround is)

How to Use This Skill

Reference File Structure

Each topic in

./references/
contains focused documentation:

FilePurposeWhen to Read
paradigm-differences.md
Core language and paradigm differencesEncountering fundamental R-vs-Python confusion
polars-dplyr.md
Core dplyr/tidyr to polars verb mapping (select, filter, mutate, joins, reshaping, window functions, lazy eval)Reading or writing data manipulation code
polars-strings-dates-factors.md
String, date/time, and factor operations (stringr, lubridate, forcats to polars)Working with string/date/categorical columns
regression-modeling.md
fixest/stats/plm to pyfixest/statsmodels/linearmodelsReading or writing regression code
visualization.md
ggplot2/plotly R to plotnine/plotly PythonReading or writing visualization code
causal-inference.md
R causal inference ecosystem to Python equivalentsWorking with DiD, RDD, IV, event studies
survey-spatial-ml.md
survey/sf/tidymodels to svy/geopandas/scikit-learnWorking with surveys, spatial data, or ML
workflow-environment.md
RStudio/Quarto workflow to DAAF/marimo workflowAdapting to DAAF's execution model
external-resources.md
Curated guides and tutorials with provenanceSeeking additional learning materials
gotchas.md
Common R-user mistakes in PythonDebugging or reviewing code from R perspective

Reading Order

  1. R user auditing DAAF code:
    paradigm-differences.md
    then the relevant domain file (e.g.,
    polars-dplyr.md
    for data wrangling,
    regression-modeling.md
    for models) then
    gotchas.md
  2. Agent annotating code with R equivalents: Agent Code Annotation Protocol section below, then the relevant domain file for the code being annotated
  3. Learning Python from R background:
    paradigm-differences.md
    then
    polars-dplyr.md
    then
    workflow-environment.md
    then
    external-resources.md
  4. Looking up a specific translation: Quick Decision Trees below, then the relevant reference file

Quick Decision Trees

"How do I do X from R in Python?"

What kind of R operation?
├─ Data wrangling (filter, mutate, join, pivot, summarise)
│   └─ ./references/polars-dplyr.md
├─ Regression / statistical modeling
│   └─ ./references/regression-modeling.md
├─ Plotting / visualization
│   └─ ./references/visualization.md
├─ Causal inference (DiD, RDD, IV, event studies)
│   └─ ./references/causal-inference.md
├─ Surveys / spatial / machine learning
│   └─ ./references/survey-spatial-ml.md
└─ Fundamental language differences (types, syntax, environment)
    └─ ./references/paradigm-differences.md

"Why does this Python code look different from R?"

What looks unfamiliar?
├─ Expression syntax (pl.col().method().alias())
│   └─ ./references/paradigm-differences.md
├─ Missing values (None vs NaN vs null vs NA)
│   └─ ./references/paradigm-differences.md
├─ Formula interface (~) behaves differently
│   └─ ./references/regression-modeling.md
├─ Import patterns and namespacing
│   └─ ./references/gotchas.md
└─ No interactive REPL / console workflow
    └─ ./references/workflow-environment.md

"I want to translate an R script to Python"

What does the R script do?
├─ Loads and wrangles data (read_csv, dplyr verbs)
│   └─ ./references/polars-dplyr.md
├─ Runs regressions (lm, feols, plm)
│   └─ ./references/regression-modeling.md
├─ Creates plots (ggplot, plotly)
│   └─ ./references/visualization.md
├─ Uses survey weights (svydesign, svymean)
│   └─ ./references/survey-spatial-ml.md
├─ Spatial operations (sf, st_join)
│   └─ ./references/survey-spatial-ml.md
├─ Multiple of the above
│   └─ Start with ./references/paradigm-differences.md, then each relevant file
└─ Uses a package not listed above
    └─ ./references/external-resources.md for broader ecosystem guidance

"Something isn't working and I think it's an R habit"

What went wrong?
├─ 1-indexed access gave wrong element
│   └─ ./references/gotchas.md
├─ Factor/categorical behaves differently
│   └─ ./references/gotchas.md
├─ NA handling surprised me
│   └─ ./references/paradigm-differences.md
├─ Pipe operator (|> or %>%) not available
│   └─ ./references/paradigm-differences.md
├─ library() vs import confusion
│   └─ ./references/gotchas.md
└─ Model output structure is different
    └─ ./references/regression-modeling.md

"Which Python package replaces my R package?"

Which R package?
├─ dplyr / tidyr / readr / tibble → polars
│   └─ ./references/polars-dplyr.md
├─ ggplot2 → plotnine
│   └─ ./references/visualization.md
├─ plotly (R) → plotly (Python)
│   └─ ./references/visualization.md
├─ fixest → pyfixest
│   └─ ./references/regression-modeling.md
├─ stats (lm, glm) → statsmodels
│   └─ ./references/regression-modeling.md
├─ plm / lme4 / estimatr → linearmodels
│   └─ ./references/regression-modeling.md
├─ survey → svy
│   └─ ./references/survey-spatial-ml.md
├─ sf / terra → geopandas
│   └─ ./references/survey-spatial-ml.md
├─ tidymodels / caret → scikit-learn
│   └─ ./references/survey-spatial-ml.md
├─ marginaleffects → marginaleffects (Python)
│   └─ ./references/regression-modeling.md
├─ rdrobust / did / synthdid → rdrobust / pyfixest DiD
│   └─ ./references/causal-inference.md
└─ Quarto / RMarkdown → marimo
    └─ ./references/workflow-environment.md

Package Mapping Overview

Python PackageR EquivalentFidelityKey Difference
polarsdplyr + tidyr + data.tableLowExpression system vs verb grammar; method chaining vs pipe
pyfixestfixestHighNear-identical formula syntax; minor SE default differences
plotnineggplot2HighSame grammar of graphics; Python string quoting for aes
plotlyplotly (R)High
px.scatter()
vs
plot_ly()
; similar output
statsmodelsbase R stats + lmtest + sandwichMediumThree formula dialects; manual vcov specification
linearmodelsplm + lme4 + estimatrMediumRequires pandas MultiIndex for panel structure
scikit-learntidymodels / caretMediumImperative fit/predict vs declarative recipe pipeline
geopandassf + terraMediumshapely geometries vs sfc; different CRS handling
svysurvey (Lumley)MediumLimited GLM family coverage (gaussian/binomial/Poisson only)
marimoQuarto / RMarkdownMediumReactive cells vs knit-based linear execution

Fidelity key: High = near-direct translation, same mental model. Medium = same capability, different API patterns. Low = fundamentally different paradigm requiring conceptual remapping.

Library Versions

Translations in this skill reference specific library versions. Python versions are pinned in DAAF's Docker environment (Python 3.12). R versions reference CRAN releases as of March 2026. When syntax or behavior has changed between versions, the reference files note the change.

Python PackageDAAF VersionR EquivalentR Version (CRAN)
polars1.38.1dplyr + tidyr + data.tabledplyr 1.2.0, tidyr 1.3.2, data.table 1.18.2
pyfixest0.40.0fixest0.14.0
plotnine0.15.3ggplot24.0.2
plotly6.5.2plotly (R)4.12.0
statsmodels0.14.6base R stats + lmtest + sandwichlmtest 0.9-40, sandwich 3.1-1
linearmodelsunpinnedplm + lme4 + estimatrplm 2.6-7, lme4 2.0-1
scikit-learn1.8.0tidymodels / carettidymodels 1.4.1, caret 7.0-1
geopandas1.1.3sf + terrasf 1.1-0, terra 1.9-11
svy0.13.0surveysurvey 4.5
marginaleffectsunpinnedmarginaleffects (R)0.32.0
rdrobustunpinnedrdrobust (R)3.0.0
marimo0.19.11Quarto / RMarkdownQuarto 1.6.x

Unpinned packages: linearmodels, marginaleffects, and rdrobust install the latest version at Docker build time. Translations for these packages reference their documented API as of March 2026.

R version note: R package versions are from CRAN as of March 2026 (R 4.5.3). Check

packageVersion("pkg")
in your R installation to verify your local version matches.

Top 10 Paradigm Differences

These are the friction points R users encounter most frequently when reading or writing DAAF Python code. Each is covered in depth in the referenced file.

#Friction PointR WayPython WayReference
1Expression system
df %>% mutate(x = a + b)
df.with_columns((pl.col("a") + pl.col("b")).alias("x"))
paradigm-differences.md
2Formula fragmentationOne universal
~
syntax
Three dialects (pyfixest, statsmodels, linearmodels)
regression-modeling.md
3Missing valuesSingle
NA
type
None
,
NaN
, and
null
(context-dependent)
paradigm-differences.md
4mutate equivalent
mutate(new = expr)
with_columns(expr.alias("new"))
polars-dplyr.md
5No row indexTibbles have row numbersPolars has no row index; use
with_row_index()
paradigm-differences.md
6Polars-to-pandas bridgeData frames go directly into modelsMust call
.to_pandas()
before statsmodels/pyfixest
paradigm-differences.md
7Factor vs Categorical
factor()
with ordered levels
pl.Categorical
/
pd.Categorical
(different semantics)
gotchas.md
8Package fragmentationOne package per domain (fixest does it all)Multiple packages per domain (statsmodels + linearmodels + pyfixest)
paradigm-differences.md
91-indexed vs 0-indexed
x[1]
is first element
x[0]
is first element
gotchas.md
10Namespace model
library()
exports all names
import
requires explicit namespacing
gotchas.md

Agent Code Annotation Protocol

This section defines when and how code-producing agents add inline R-equivalent comments to DAAF Python scripts.

When to Annotate

Annotations are added only when the orchestrator explicitly passes an R-background directive to the agent. This is not a default behavior.

Trigger conditions (orchestrator activates this when any apply):

  • User states they have an R / RStudio background
  • User requests R-equivalent comments in code
  • User asks to understand Python code from an R perspective

How the orchestrator passes the directive: The orchestrator adds the following to the agent prompt:

"User has R background. Load r-python-translation skill. Add inline R-equivalent comments for non-trivial data operations."

Comment Format

# R: df %>% filter(year == 2020)
filtered = df.filter(pl.col("year") == 2020)

# R: df %>% mutate(pct = count / sum(count))
result = df.with_columns(
    (pl.col("count") / pl.col("count").sum()).alias("pct")
)

# R: feols(y ~ x1 + x2 | state + year, data = df, cluster = ~state)
fit = pf.feols("y ~ x1 + x2 | state + year", data=pdf, vcov={"CRV1": "state"})

What to Annotate

  • Annotate: Data wrangling (polars operations), modeling calls (pyfixest, statsmodels, linearmodels), visualization layer construction (plotnine, plotly), causal inference method calls
  • Do NOT annotate: Import statements,
    print()
    /
    assert
    validation lines, file I/O boilerplate (
    pl.read_parquet
    ,
    df.write_parquet
    ), config sections, section separator comments

Rules

  • One
    # R:
    comment per logical operation, placed on the line immediately above the Python code
  • Keep annotations to a single line; abbreviate complex R pipelines if needed
  • R annotations are in addition to standard IAT comments (
    # INTENT:
    ,
    # REASONING:
    ,
    # ASSUMES:
    ), not a replacement
  • Consumer agents: research-executor, code-reviewer, debugger, data-ingest

Related Skills

SkillRelationship
polars
Python-side data wrangling — detailed API reference for the dplyr/tidyr equivalent
pyfixest
Python-side fixed effects regression — detailed API for the fixest equivalent
plotnine
Python-side static visualization — detailed API for the ggplot2 equivalent
plotly
Python-side interactive visualization — detailed API for plotly R equivalent
statsmodels
Python-side general modeling — covers base R stats, lmtest, sandwich equivalents
linearmodels
Python-side panel/IV models — covers plm, lme4, estimatr equivalents
scikit-learn
Python-side ML — covers tidymodels/caret equivalents
geopandas
Python-side spatial data — covers sf/terra equivalents
svy
Python-side survey analysis — covers survey (Lumley) equivalents
marimo
Python-side notebooks — covers Quarto/RMarkdown workflow equivalents
stata-python-translation
Parallel skill for Stata-background users — shares the same Python target stack

Note: Individual tool skills contain library-specific usage guidance (syntax, gotchas, performance). This skill provides the R-to-Python conceptual bridge — use both together when an R-background user is working with a specific library.

Topic Index

TopicReference File
Pipe operator (
%>%
/ `
>`) equivalents
Expression system (pl.col, .alias)
./references/paradigm-differences.md
Missing value semantics (NA vs None/NaN/null)
./references/paradigm-differences.md
Type system differences
./references/paradigm-differences.md
Package/namespace model
./references/paradigm-differences.md
0-indexing vs 1-indexing
./references/paradigm-differences.md
Polars-to-pandas conversion for modeling
./references/paradigm-differences.md
Row index differences
./references/paradigm-differences.md
dplyr verb mapping (filter, select, mutate, arrange)
./references/polars-dplyr.md
summarise / group_by equivalents
./references/polars-dplyr.md
tidyr verbs (pivot_longer, pivot_wider, separate, unite)
./references/polars-dplyr.md
Join operations (left_join, inner_join, anti_join)
./references/polars-dplyr.md
String operations (stringr vs polars .str)
./references/polars-strings-dates-factors.md
Date operations (lubridate vs polars .dt)
./references/polars-strings-dates-factors.md
across() / where() equivalents
./references/polars-dplyr.md
case_when equivalent
./references/polars-dplyr.md
readr I/O equivalents
./references/polars-dplyr.md
fixest formula syntax in pyfixest
./references/regression-modeling.md
lm() / glm() in statsmodels
./references/regression-modeling.md
Formula interface comparison (three Python dialects)
./references/regression-modeling.md
Standard error specification differences
./references/regression-modeling.md
plm panel models in linearmodels
./references/regression-modeling.md
lme4 mixed effects equivalents
./references/regression-modeling.md
marginaleffects (R to Python)
./references/regression-modeling.md
Model summary / tidy output
./references/regression-modeling.md
Sandwich / robust SE equivalents
./references/regression-modeling.md
ggplot2 layer mapping to plotnine
./references/visualization.md
aes() string quoting in plotnine
./references/visualization.md
Theme customization
./references/visualization.md
Scale functions
./references/visualization.md
Faceting (facet_wrap, facet_grid)
./references/visualization.md
plotly R vs plotly Python
./references/visualization.md
ggsave equivalent
./references/visualization.md
Difference-in-differences (did, did2s)
./references/causal-inference.md
Regression discontinuity (rdrobust)
./references/causal-inference.md
Instrumental variables (ivreg vs pyfixest IV)
./references/causal-inference.md
Event study designs
./references/causal-inference.md
Synthetic control
./references/causal-inference.md
Matching / propensity scores
./references/causal-inference.md
survey package to svy
./references/survey-spatial-ml.md
svydesign / svymean / svyglm equivalents
./references/survey-spatial-ml.md
sf spatial operations to geopandas
./references/survey-spatial-ml.md
CRS / projection handling
./references/survey-spatial-ml.md
Spatial joins (st_join vs sjoin)
./references/survey-spatial-ml.md
tidymodels pipeline to scikit-learn
./references/survey-spatial-ml.md
RStudio vs DAAF workflow
./references/workflow-environment.md
Quarto / RMarkdown vs marimo
./references/workflow-environment.md
Interactive console vs file-first execution
./references/workflow-environment.md
Package management (renv vs pip/uv)
./references/workflow-environment.md
Project structure conventions
./references/workflow-environment.md
Curated R-to-Python migration guides
./references/external-resources.md
Package documentation links
./references/external-resources.md
Tutorial recommendations with provenance
./references/external-resources.md
1-indexed list/vector access
./references/gotchas.md
Factor vs Categorical pitfalls
./references/gotchas.md
library() vs import habits
./references/gotchas.md
T/F vs True/False
./references/gotchas.md
Assignment operator (<- vs =)
./references/gotchas.md
Vectorized operations expectations
./references/gotchas.md
NULL vs None differences
./references/gotchas.md
apply family vs map/list comprehension
./references/gotchas.md
Copying semantics (R copy-on-modify vs Python references)
./references/gotchas.md
Logical operators (& /vs and / or)
String interpolation (glue vs f-strings)
./references/gotchas.md
data.table vs polars
./references/polars-strings-dates-factors.md
Lazy evaluation (polars LazyFrame vs R lazy tibble)
./references/polars-dplyr.md
nest/unnest equivalents
./references/polars-dplyr.md
Window functions (over vs mutate + group_by)
./references/polars-dplyr.md
Coordinate systems (coord_flip, coord_polar)
./references/visualization.md
Stat layers (stat_smooth, stat_summary)
./references/visualization.md
Color palette mapping (viridis, brewer)
./references/visualization.md
Multi-panel layouts (patchwork vs subplot)
./references/visualization.md
Staggered DiD estimators
./references/causal-inference.md
Parallel trends testing
./references/causal-inference.md
BRR / jackknife replication weights
./references/survey-spatial-ml.md
Raster data handling (terra vs rasterio)
./references/survey-spatial-ml.md
Feature engineering (recipes vs sklearn Pipeline)
./references/survey-spatial-ml.md
Cross-validation (rsample vs sklearn)
./references/survey-spatial-ml.md
Environment/workspace differences (.RData vs nothing)
./references/workflow-environment.md
Debugging workflow (browser() vs breakpoint())
./references/workflow-environment.md
R help system (?func) vs Python help(func)
./references/workflow-environment.md
Cheat sheet and quick-reference links
./references/external-resources.md
Community resources (Stack Overflow tags, forums)
./references/external-resources.md