Marketplace r-development
Modern R development practices emphasizing tidyverse patterns (dplyr 1.1 and later, native pipe, join_by, .by grouping), rlang metaprogramming, performance optimization, and package development. Use when Claude needs to write R code, create R packages, optimize R performance, or provide R programming guidance.
git clone https://github.com/aiskillstore/marketplace
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/codingkaiser/r-development" ~/.claude/skills/aiskillstore-marketplace-r-development && rm -rf "$T"
skills/codingkaiser/r-development/SKILL.mdR Development
This skill provides comprehensive guidance for modern R development, emphasizing current best practices with tidyverse, performance optimization, and professional package development.
Core Principles
- Use modern tidyverse patterns - Prioritize dplyr 1.1+ features, native pipe, and current APIs
- Profile before optimizing - Use profvis and bench to identify real bottlenecks
- Write readable code first - Optimize only when necessary and after profiling
- Follow tidyverse style guide - Consistent naming, spacing, and structure
Modern Tidyverse Essentials
Native Pipe (|>
not %>%
)
|>%>%Always use native pipe
|> instead of magrittr %>% (R 4.1+):
# Modern data |> filter(year >= 2020) |> summarise(mean_value = mean(value)) # Avoid legacy pipe data %>% filter(year >= 2020)
Join Syntax (dplyr 1.1+)
Use
join_by() for all joins:
# Modern join syntax with equality transactions |> inner_join(companies, by = join_by(company == id)) # Inequality joins transactions |> inner_join(companies, join_by(company == id, year >= since)) # Rolling joins (closest match) transactions |> inner_join(companies, join_by(company == id, closest(year >= since)))
Control match behavior:
# Expect 1:1 matches inner_join(x, y, by = join_by(id), multiple = "error") # Ensure all rows match inner_join(x, y, by = join_by(id), unmatched = "error")
Per-Operation Grouping with .by
.byUse
.by instead of group_by() |> ... |> ungroup():
# Modern approach (always returns ungrouped) data |> summarise(mean_value = mean(value), .by = category) # Multiple grouping variables data |> summarise(total = sum(revenue), .by = c(company, year))
Column Operations
Use modern column selection and transformation functions:
# pick() for column selection in data-masking contexts data |> summarise( n_x_cols = ncol(pick(starts_with("x"))), n_y_cols = ncol(pick(starts_with("y"))) ) # across() for applying functions to multiple columns data |> summarise(across(where(is.numeric), mean, .names = "mean_{.col}"), .by = group) # reframe() for multi-row results per group data |> reframe(quantiles = quantile(x, c(0.25, 0.5, 0.75)), .by = group)
rlang Metaprogramming
For comprehensive rlang patterns, see references/rlang-patterns.md.
Quick Reference
- Forward function arguments to data-masking functions{{}}
- Inject single expressions or values!!
- Inject multiple arguments from a list!!!
- Access columns by name (character vectors).data[[]]
- Select columns inside data-masking functionspick()
Example function with embracing:
my_summary <- function(data, group_var, summary_var) { data |> summarise(mean_val = mean({{ summary_var }}), .by = {{ group_var }}) }
Performance Optimization
For detailed performance guidance, see references/performance.md.
Key Strategies
- Profile first: Use
andprofvis::profvis()bench::mark() - Vectorize operations: Avoid loops when vectorized alternatives exist
- Use dtplyr: For large data operations (lazy evaluation with data.table backend)
- Parallel processing: Use
for parallelizable workfurrr::future_map() - Memory efficiency: Pre-allocate, use appropriate data types
Quick example:
# Profile code profvis::profvis({ result <- data |> complex_operation() |> another_operation() }) # Benchmark alternatives bench::mark( approach_1 = method1(data), approach_2 = method2(data), check = FALSE )
Package Development
For complete package development guidance, see references/package-development.md.
Quick Guidelines
API Design:
- Use
parameter for per-operation grouping.by - Use
for column arguments{{}} - Return tibbles consistently
- Validate user-facing function inputs thoroughly
Dependencies:
- Add dependencies for significant functionality gains
- Core tidyverse packages usually worth including: dplyr, purrr, stringr, tidyr
- Minimize dependencies for widely-used packages
Testing:
- Unit tests for individual functions
- Integration tests for workflows
- Test edge cases and error conditions
Documentation:
- Document all exported functions
- Provide usage examples
- Explain non-obvious parameter interactions
Common Migration Patterns
Base R → Tidyverse
# Data manipulation subset(data, condition) → filter(data, condition) data[order(data$x), ] → arrange(data, x) aggregate(x ~ y, data, mean) → summarise(data, mean(x), .by = y) # Functional programming sapply(x, f) → map(x, f) # type-stable lapply(x, f) → map(x, f) # Strings grepl("pattern", text) → str_detect(text, "pattern") gsub("old", "new", text) → str_replace_all(text, "old", "new")
Old → New Tidyverse
# Pipes %>% → |> # Grouping group_by() |> ... |> ungroup() → summarise(..., .by = x) # Joins by = c("a" = "b") → by = join_by(a == b) # Reshaping gather()/spread() → pivot_longer()/pivot_wider()
Additional Resources
- rlang patterns: See references/rlang-patterns.md for comprehensive data-masking and metaprogramming guidance
- Performance optimization: See references/performance.md for profiling, benchmarking, and optimization strategies
- Package development: See references/package-development.md for complete package creation guidance
- Object systems: See references/object-systems.md for S3, S4, S7, R6, and vctrs guidance