Claude-skill-registry r-anti-slop
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/anti-slop" ~/.claude/skills/majiayu000-claude-skill-registry-r-anti-slop && rm -rf "$T"
skills/data/anti-slop/SKILL.mdR Anti-Slop: Stop Writing df <- data
df <- dataWhen to Use This
Use this for:
- ✓ Any R code leaving your machine (analysis, packages, scripts)
- ✓ AI-generated code review (catches
,df
, missingresult
):: - ✓ CRAN submissions (they'll reject generic code anyway)
- ✓ Team code standards
Skip for:
- Quick console experiments (though habits form fast)
- Legacy code you can't touch
- Bioconductor or other style guides that override this
Quick Example
Before (AI Slop):
# Load the library library(dplyr) # Read the data df <- read.csv("data.csv") # Filter the data result <- df %>% filter(x > 0)
After (Anti-Slop):
customer_data <- readr::read_csv("data/customers.csv") active_customers <- customer_data |> dplyr::filter(status == "active", revenue > 0) return(active_customers)
What changed:
- ✓ Descriptive names (
notcustomer_data
)df - ✓ Namespace qualification (
,dplyr::
)readr:: - ✓ Native pipe (
not|>
)%>% - ✓ No obvious comments
- ✓ Explicit return
When to Use What
| If you need to... | Do this | Details |
|---|---|---|
| Name variables | Use , no // | reference/naming.md |
| Call tidyverse functions | Always use (e.g., ) | reference/tidyverse.md |
| Return from function | Always explicit statement | reference/naming.md |
| Write pipe chains | Use , break at 8+ operations | reference/tidyverse.md |
| Document functions | Specific , , no circular text | reference/documentation.md |
| Handle missing data | Explicit strategy + report data loss | reference/statistical-rigor.md |
| Validate data | Check assumptions with | reference/statistical-rigor.md |
| Format code | Use | reference/tidyverse.md |
| Check code quality | Use | reference/tidyverse.md |
Core Workflow
5-Step Quality Check
-
Namespace qualification - All external functions use
::# Good dplyr::filter(data, x > 0) # Bad filter(data, x > 0) -
Explicit returns - Every function has
return()# Good my_function <- function(x) { result <- x + 1 return(result) } # Bad my_function <- function(x) { x + 1 } -
Naming conventions - All objects use
snake_case# Good customer_lifetime_value <- calculate_clv(data) # Bad df <- calculate_clv(data) customerLifetimeValue <- calculate_clv(data) -
Documentation quality - No generic descriptions
# Good #' @param deaths Data frame with `age_group` and `count` columns # Bad #' @param data The data -
Code formatting - Run styler and lintr
styler::style_file("script.R") lintr::lint("script.R")
Quick Reference Checklist
Before committing R code, verify:
- All external functions qualified with
:: - All functions have explicit
return() - All objects use
snake_case - No generic names (
,df
,data
,result
)temp - Pipes (
) have space before, end lines|> - Long pipelines (>8 ops) broken into named steps
- Complex operations have WHY comments
- Data validated after transformations
- Seeds set before random operations
- Uncertainty reported (SE, CI) for statistical models
- No
callsattach() - No right-hand assignment (
)-> - Roxygen documentation is specific
- Examples are realistic and run
Common Workflows
Workflow 1: Clean Up AI-Generated R Script
Context: AI generated an analysis script with generic patterns.
Steps:
-
Run detection script
Rscript toolkit/scripts/detect_slop.R analysis.R --verbose -
Fix high-priority issues first
# Replace df, data, result with descriptive names # Before df <- readr::read_csv("data.csv") result <- df %>% filter(x > 0) # After customer_data <- readr::read_csv("data/customers.csv") active_customers <- customer_data |> dplyr::filter(status == "active") -
Add namespace qualification
# Before data %>% filter(x > 0) %>% summarize(mean(y)) # After data |> dplyr::filter(x > 0) |> dplyr::summarize(mean_y = mean(y)) -
Add explicit returns
# Before calculate_rate <- function(numerator, denominator) { numerator / denominator } # After calculate_rate <- function(numerator, denominator) { rate <- numerator / denominator return(rate) } -
Break long pipes
# Before (12 operations in one chain) result <- data |> filter(...) |> mutate(...) |> group_by(...) |> summarize(...) |> arrange(...) |> [7 more ops] # After clean_data <- data |> dplyr::filter(!is.na(value)) |> dplyr::mutate(category = categorize(value)) summary_stats <- clean_data |> dplyr::group_by(category) |> dplyr::summarize(mean_val = mean(value)) -
Format and validate
styler::style_file("analysis.R") lintr::lint("analysis.R")
Expected outcome: Score drops from 60+ to <20
Workflow 2: Fix Generic Package Documentation
Context: R package has generic roxygen documentation.
Steps:
-
Identify generic patterns
# Bad #' Process Data #' #' @description This function processes the data. #' @param data The data. #' @return The result. -
Make description specific
# Good #' Calculate age-adjusted mortality rates #' #' Computes mortality rates per 100,000 population, standardized to the #' 2000 US Census age distribution using direct standardization. -
Describe parameter structure
# Good #' @param deaths Data frame with columns `age_group` and `count`. #' @param population Data frame with columns `age_group` and `pop_size`. -
Specify return value
# Good #' @return A tibble with columns: #' \describe{ #' \item{county}{County FIPS code} #' \item{rate}{Age-adjusted rate per 100,000} #' \item{se}{Standard error of the rate} #' } -
Add realistic examples
# Good #' @examples #' counties <- data.frame( #' county = c("A", "B"), #' deaths = c(150, 200), #' population = c(50000, 80000) #' ) #' #' adjust_rates(counties, rate_per = 100000) #' #> # A tibble: 2 x 3 #' #> county rate se #' #> 1 A 312. 25.4 #' #> 2 B 258. 18.2
Expected outcome: Documentation that teaches, not restates
Workflow 3: Prepare Package for CRAN
Context: Final checks before CRAN submission.
Steps:
-
Run all quality checks
# Standard checks devtools::check() # Anti-slop checks lapply(list.files("R", full.names = TRUE), function(f) { system(paste("Rscript toolkit/scripts/detect_slop.R", f)) }) -
Fix documentation
- Check all
descriptions are specific@param - Verify
run and are realistic@examples - Ensure
describes structure@return
- Check all
-
Validate code quality
# Format all files styler::style_dir("R/") # Check lints lintr::lint_package() -
Check CRAN-specific requirements
- Validate URLs in DESCRIPTION and documentation
- Check examples run in < 5 seconds
- Verify package structure meets CRAN standards
Expected outcome: Clean
R CMD check with no slop patterns
Mandatory Rules Summary
1. Namespace Qualification
ALWAYS use
for external packages::
Exceptions (don't need
::):
- Base R:
,mean()
,sum()
, etc.log() - stats:
,lm()
,glm()
, etc.t.test() - utils:
,head()
,tail()
, etc.str()
2. Explicit Returns
ALWAYS use
- never implicitreturn()
3. Naming: snake_case
All objects use snake_case
- Variables:
notcustomer_data
orcustomerDatadf - Functions:
notcalculate_ratecalculateRate - Arguments:
notinput_datainputData
4. Native Pipe
Prefer
over |>
(unless R < 4.1)%>%
5. No Generic Names
Never use:
df, data, result, temp, x, n (except standard math notation)
Tidyverse Philosophy
Follow Tidyverse Style Guide as primary reference:
- Design for humans - Code should be readable and intuitive
- Reuse existing data structures - Work with tibbles and data frames
- Compose simple functions with pipes - Build complexity through composition
- Embrace functional programming - Functions are first-class objects
See reference/tidyverse.md for complete tidyverse conventions.
Resources & Advanced Topics
Reference Files
- reference/naming.md - Complete naming conventions and forbidden patterns
- reference/tidyverse.md - Pipe conventions, formatting, ggplot2 standards
- reference/documentation.md - Roxygen2, vignettes, README quality
- reference/statistical-rigor.md - Validation, uncertainty, reproducibility
- reference/forbidden-patterns.md - Complete antipattern catalog
Related Skills
- text/anti-slop - For cleaning prose in documentation
- quarto/anti-slop - For cleaning vignettes and documentation
Tools
- Auto-format codestyler::style_file()
- Check code qualitylintr::lint()
- Detect AI patternsRscript toolkit/scripts/detect_slop.R
Integration with Posit Skills
This skill focuses on code quality and avoiding generic patterns.
Use together with Posit skills for complete coverage:
| Task | Use This Skill | + Posit Skill |
|---|---|---|
| Write error messages | r/anti-slop (quality) | + r-lib/cli (structure) |
| Write tests | r/anti-slop (code quality) | + r-lib/testing (test patterns) |
| Prepare for CRAN | r/anti-slop (no slop) | + r-lib/cran-extrachecks (requirements) |
| Document lifecycle | r/anti-slop (doc quality) | + r-lib/lifecycle (deprecation) |