Learn-skills.dev tidyverse-patterns
Modern tidyverse patterns for R including pipes, joins, grouping, purrr, and stringr. Use when writing tidyverse R code.
install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/ab604/claude-code-r-skills/tidyverse-patterns" ~/.claude/skills/neversight-learn-skills-dev-tidyverse-patterns && rm -rf "$T"
manifest:
data/skills-md/ab604/claude-code-r-skills/tidyverse-patterns/SKILL.mdsource content
Modern Tidyverse Patterns
Best practices for modern tidyverse development with dplyr 1.1+ and R 4.3+
Core Principles
- Use modern tidyverse patterns - Prioritize dplyr 1.1+ features, native pipe, and current APIs
- Profile before optimizing - Use profvis and bench to identify real bottlenecks
- Write readable code first - Optimize only when necessary and after profiling
- Follow tidyverse style guide - Consistent naming, spacing, and structure
Pipe Usage (|>
not %>%
)
|>%>%- Always use native pipe
instead of magrittr|>%>% - R 4.3+ provides all needed features
# Good - Modern native pipe data |> filter(year >= 2020) |> summarise(mean_value = mean(value)) # Avoid - Legacy magrittr pipe data %>% filter(year >= 2020) %>% summarise(mean_value = mean(value))
Join Syntax (dplyr 1.1+)
- Use
instead of character vectors for joinsjoin_by() - Support for inequality, rolling, and overlap joins
# Good - Modern join syntax transactions |> inner_join(companies, by = join_by(company == id)) # Good - Inequality joins transactions |> inner_join(companies, join_by(company == id, year >= since)) # Good - Rolling joins (closest match) transactions |> inner_join(companies, join_by(company == id, closest(year >= since))) # Avoid - Old character vector syntax transactions |> inner_join(companies, by = c("company" = "id"))
Multiple Match Handling
- Use
andmultiple
arguments for quality controlunmatched
# Expect 1:1 matches, error on multiple inner_join(x, y, by = join_by(id), multiple = "error") # Allow multiple matches explicitly inner_join(x, y, by = join_by(id), multiple = "all") # Ensure all rows match inner_join(x, y, by = join_by(id), unmatched = "error")
Data Masking and Tidy Selection
- Understand the difference between data masking and tidy selection
- Use
(embrace) for function arguments{{}} - Use
for character vectors.data[[]]
# Data masking functions: arrange(), filter(), mutate(), summarise() # Tidy selection functions: select(), relocate(), across() # Function arguments - embrace with {{}} my_summary <- function(data, group_var, summary_var) { data |> group_by({{ group_var }}) |> summarise(mean_val = mean({{ summary_var }})) } # Character vectors - use .data[[]] for (var in names(mtcars)) { mtcars |> count(.data[[var]]) |> print() } # Multiple columns - use across() data |> summarise(across({{ summary_vars }}, ~ mean(.x, na.rm = TRUE)))
Modern Grouping and Column Operations
- Use
for per-operation grouping (dplyr 1.1+).by - Use
for column selection inside data-masking functionspick() - Use
for applying functions to multiple columnsacross() - Use
for multi-row summariesreframe()
# Good - Per-operation grouping (always returns ungrouped) data |> summarise(mean_value = mean(value), .by = category) # Good - Multiple grouping variables data |> summarise(total = sum(revenue), .by = c(company, year)) # Good - pick() for column selection data |> summarise( n_x_cols = ncol(pick(starts_with("x"))), n_y_cols = ncol(pick(starts_with("y"))) ) # Good - across() for applying functions data |> summarise(across(where(is.numeric), mean, .names = "mean_{.col}"), .by = group) # Good - reframe() for multi-row results data |> reframe(quantiles = quantile(x, c(0.25, 0.5, 0.75)), .by = group) # Avoid - Old persistent grouping pattern data |> group_by(category) |> summarise(mean_value = mean(value)) |> ungroup()
Modern purrr Patterns
- Use
instead of supersededmap() |> list_rbind()map_dfr() - Use
for side effects (file writing, plotting)walk() - Use
for scaling across coresin_parallel()
# Modern data frame row binding (purrr 1.0+) models <- data_splits |> map(\(split) train_model(split)) |> list_rbind() # Replaces map_dfr() # Column binding summaries <- data_list |> map(\(df) get_summary_stats(df)) |> list_cbind() # Replaces map_dfc() # Side effects with walk() plots <- walk2(data_list, plot_names, \(df, name) { p <- ggplot(df, aes(x, y)) + geom_point() ggsave(name, p) }) # Parallel processing (purrr 1.1.0+) library(mirai) daemons(4) results <- large_datasets |> map(in_parallel(expensive_computation)) daemons(0)
String Manipulation with stringr
- Use stringr over base R string functions
- Consistent
prefix and string-first argument orderstr_ - Pipe-friendly and vectorized by design
# Good - stringr (consistent, pipe-friendly) text |> str_to_lower() |> str_trim() |> str_replace_all("pattern", "replacement") |> str_extract("\\d+") # Common patterns str_detect(text, "pattern") # vs grepl("pattern", text) str_extract(text, "pattern") # vs complex regmatches() str_replace_all(text, "a", "b") # vs gsub("a", "b", text) str_split(text, ",") # vs strsplit(text, ",") str_length(text) # vs nchar(text) str_sub(text, 1, 5) # vs substr(text, 1, 5) # String combination and formatting str_c("a", "b", "c") # vs paste0() str_glue("Hello {name}!") # templating str_pad(text, 10, "left") # padding str_wrap(text, width = 80) # text wrapping # Case conversion str_to_lower(text) # vs tolower() str_to_upper(text) # vs toupper() str_to_title(text) # vs tools::toTitleCase() # Pattern helpers for clarity str_detect(text, fixed("$")) # literal match str_detect(text, regex("\\d+")) # explicit regex str_detect(text, coll("e", locale = "fr")) # collation # Avoid - inconsistent base R functions grepl("pattern", text) # argument order varies regmatches(text, regexpr(...)) # complex extraction gsub("a", "b", text) # different arg order
Vectorization and Performance
# Good - vectorized operations result <- x + y # Good - Type-stable purrr functions map_dbl(data, mean) # always returns double map_chr(data, class) # always returns character # Avoid - Type-unstable base functions sapply(data, mean) # might return list or vector # Avoid - explicit loops for simple operations result <- numeric(length(x)) for(i in seq_along(x)) { result[i] <- x[i] + y[i] }
Common Anti-Patterns to Avoid
Legacy Patterns
# Avoid - Old pipe data %>% function() # Avoid - Old join syntax inner_join(x, y, by = c("a" = "b")) # Avoid - Implicit type conversion sapply() # Use map_*() instead # Avoid - String manipulation in data masking mutate(data, !!paste0("new_", var) := value) # Use across() or other approaches instead
Performance Anti-Patterns
# Avoid - Growing objects in loops result <- c() for(i in 1:n) { result <- c(result, compute(i)) # Slow! } # Good - Pre-allocate result <- vector("list", n) for(i in 1:n) { result[[i]] <- compute(i) } # Better - Use purrr result <- map(1:n, compute)
Migration from Old Patterns
From Base R to Modern Tidyverse
# Data manipulation subset(data, condition) -> filter(data, condition) data[order(data$x), ] -> arrange(data, x) aggregate(x ~ y, data, mean) -> summarise(data, mean(x), .by = y) # Functional programming sapply(x, f) -> map(x, f) # type-stable lapply(x, f) -> map(x, f) # String manipulation grepl("pattern", text) -> str_detect(text, "pattern") gsub("old", "new", text) -> str_replace_all(text, "old", "new") substr(text, 1, 5) -> str_sub(text, 1, 5) nchar(text) -> str_length(text) strsplit(text, ",") -> str_split(text, ",") paste0(a, b) -> str_c(a, b) tolower(text) -> str_to_lower(text)
From Old to New Tidyverse Patterns
# Pipes data %>% function() -> data |> function() # Grouping (dplyr 1.1+) group_by(data, x) |> summarise(mean(y)) |> ungroup() -> summarise(data, mean(y), .by = x) # Column selection across(starts_with("x")) -> pick(starts_with("x")) # for selection only # Joins by = c("a" = "b") -> by = join_by(a == b) # Multi-row summaries summarise(data, x, .groups = "drop") -> reframe(data, x) # Data reshaping gather()/spread() -> pivot_longer()/pivot_wider() # String separation (tidyr 1.3+) separate(col, into = c("a", "b")) -> separate_wider_delim(col, delim = "_", names = c("a", "b")) extract(col, into = "x", regex) -> separate_wider_regex(col, patterns = c(x = regex))
Superseded purrr Functions (purrr 1.0+)
map_dfr(x, f) -> map(x, f) |> list_rbind() map_dfc(x, f) -> map(x, f) |> list_cbind() map2_dfr(x, y, f) -> map2(x, y, f) |> list_rbind() pmap_dfr(list, f) -> pmap(list, f) |> list_rbind() imap_dfr(x, f) -> imap(x, f) |> list_rbind() # For side effects walk(x, write_file) # instead of for loops walk2(data, paths, write_csv) # multiple arguments