Awesome-Agent-Skills-for-Empirical-Research methods-communicator
Effective communication strategies for statistical methods
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
skills/26-Data-Wise-scholar/skills/writing/methods-communicator/skill.mdMethods Communicator
Translating complex statistical methodology for applied researchers, practitioners, and students
Use this skill when writing: package vignettes, tutorial materials, workshop content, applied journal articles, interpretation guides, FAQ documentation, or any communication targeting non-methodological audiences.
Audience Adaptation
Audience Profiles
| Audience | Statistical Background | Primary Needs | Communication Style |
|---|---|---|---|
| Methods Researchers | Advanced | Theory, proofs, efficiency | Technical, precise |
| Applied Statisticians | Intermediate-Advanced | Implementation, assumptions | Technical with examples |
| Quantitative Researchers | Intermediate | When to use, interpretation | Practical, guided |
| Graduate Students | Developing | Step-by-step, intuition | Pedagogical, scaffolded |
| Practitioners | Variable | Point-and-click, templates | Simplified, checklist-based |
Audience Detection Questions
- What statistical training has this person likely had?
- What is their primary goal (understanding vs. applying)?
- How much mathematical notation is appropriate?
- What prior knowledge can I assume?
- What examples would resonate with their field?
Plain Language Translations
Core Mediation Concepts
| Technical Term | Plain Language | Analogy |
|---|---|---|
| Natural Indirect Effect | How much of treatment's effect works through the mediator | "The portion of medicine that helps by reducing inflammation" |
| Natural Direct Effect | Treatment's effect through all other pathways | "All other ways the medicine helps beyond reducing inflammation" |
| Sequential Ignorability | No unmeasured confounding at each step | "Apples-to-apples comparison at each stage" |
| Positivity | All treatment combinations are possible | "Everyone had a real chance of getting either treatment" |
| Identification | Can estimate causal effect from data | "The data can answer our causal question" |
Statistical Concepts
| Technical | Applied Researcher Version |
|---|---|
| "The estimator is consistent" | "With more data, estimates get closer to the truth" |
| "Asymptotically normal" | "For large samples, you can use normal-theory confidence intervals" |
| "Efficiency bound" | "The best precision you can possibly achieve" |
| "Double robust" | "Correct if either model is right (doesn't need both)" |
| "Bootstrapped confidence interval" | "We resampled the data many times to estimate uncertainty" |
Effect Size Interpretation
## Template: Interpreting Indirect Effects **For a standardized indirect effect of 0.15:** "The treatment increases the outcome by 0.15 standard deviations through its effect on the mediator. In practical terms: for every 100 people treated, we would expect approximately [X] additional positive outcomes that can be attributed specifically to the pathway through the mediator. This effect size is considered [small/medium/large] by conventional standards in [field]."
Vignette Writing Framework
Structure Template
# Package Vignette: [Feature Name] ## Overview [1-2 sentence description of what this vignette covers] **You will learn:** - [Learning objective 1] - [Learning objective 2] - [Learning objective 3] **Prerequisites:** - [Required knowledge 1] - [Required package 2] ## Quick Start [Minimal working example - copy-pasteable code that runs immediately] ## Detailed Tutorial ### Step 1: [First Action] [Explanation of what we're doing and why] ```r # Annotated code result <- function_name( data = my_data, # Your dataset mediator = "M", # Name of mediator variable outcome = "Y" # Name of outcome variable )
What this does: [Plain language explanation]
Common issues:
- [Issue 1 and how to resolve]
- [Issue 2 and how to resolve]
Step 2: [Second Action]
[Continue pattern...]
Interpretation Guide
Understanding the Output
# Example output print(result)
Key values to look at:
| Output | What it means | What's "good" |
|---|---|---|
| The indirect effect | Depends on your context |
, | 95% confidence interval | Doesn't include 0 = significant |
| Probability under null | < 0.05 conventionally significant |
Real-World Interpretation
[Walk through interpretation in words someone would actually say]
Troubleshooting
Frequently Asked Questions
Q: Why is my confidence interval so wide? A: [Clear, actionable explanation]
Q: What if my mediator is binary? A: [Clear, actionable explanation]
Next Steps
- For more complex models, see
vignette("advanced-models") - For sensitivity analysis, see
vignette("sensitivity") - For theoretical background, see [paper citation]
References
--- ## Pedagogical Techniques ### The "Build-Up" Approach Start simple, add complexity gradually: ```markdown ## Understanding Mediation: A Graduated Approach ### Level 1: The Basic Idea (No Math) Think of a drug that treats depression. It might work in two ways: 1. **Directly** affecting brain chemistry → improved mood 2. **Indirectly** by improving sleep → which then improves mood Mediation analysis asks: "How much of the drug's benefit comes from each pathway?" ### Level 2: With Diagrams (Minimal Math)
Treatment (X) ──────→ Outcome (Y) │ ↑ └────→ Mediator (M) ─┘
- **Direct effect**: X → Y arrow - **Indirect effect**: X → M → Y pathway ### Level 3: With Simple Formulas Total Effect = Direct Effect + Indirect Effect - Direct: $c'$ (effect with M held constant) - Indirect: $a \times b$ (X→M effect × M→Y effect) ### Level 4: Full Formal Notation [For those who want the technical version]
The "Running Example" Technique
Use one consistent example throughout:
# Example dataset used throughout tutorials # Intervention study: Exercise program for depression # - treatment: exercise (1) vs. waitlist (0) # - mediator: self_efficacy (continuous, 1-10) # - outcome: depression_score (continuous, 0-63 BDI) # - covariates: age, gender, baseline_depression data("exercise_depression", package = "mediation") # We'll use this data for all examples in this vignette
Common Misconceptions Section
## Common Misconceptions ### Misconception 1: "If the indirect effect is significant, mediation is proven" **Why it's wrong:** Mediation analysis shows *statistical* association through the mediator path, not *proof* of causal mediation. **Better framing:** "Our data are consistent with a mediation process, assuming our causal assumptions hold." ### Misconception 2: "A non-significant indirect effect means no mediation" **Why it's wrong:** We may lack power to detect the effect, or the effect may be small but real. **Better framing:** "We did not find statistically significant evidence of mediation (indirect effect = X, 95% CI: [L, U])." ### Misconception 3: "The bootstrapped CI is always better" **Why it's wrong:** Bootstrap is better for *asymmetric* sampling distributions (like products). For normally-distributed effects, delta-method works fine. **When to use which:** [Decision guide]
Workshop Content Design
Workshop Module Template
# Module: [Topic Name] ## Duration: [X] minutes ### Learning Objectives By the end of this module, participants will be able to: 1. [Measurable objective 1] 2. [Measurable objective 2] ### Pre-Assessment (2 min) [Quick poll or question to gauge prior knowledge] ### Lecture Content (15 min) #### Slide 1: Motivating Question [Real-world question that motivates the topic] #### Slide 2-5: Core Concept [Building up the idea with visuals] #### Slide 6-7: Worked Example [Step-by-step with actual data] ### Hands-On Exercise (20 min) **Setup:** ```r # Load packages and data library(mediation) data("exercise_depression")
Task 1: [Specific task with expected output]
Task 2: [Build on Task 1]
Discussion: [Question to discuss with neighbor]
Common Pitfalls (5 min)
[Mistakes you see people make, and how to avoid them]
Wrap-Up (3 min)
- Key takeaways: [3 bullet points]
- For more practice: [Resources]
- Questions?
--- ## Applied Journal Translation ### Adapting Methods for Applied Journals | Methodological Paper | Applied Paper | |---------------------|---------------| | "We employ a semiparametric efficient estimator that achieves the efficiency bound under the nonparametric model" | "We used an efficient estimation approach that provides optimal precision" | | "Under the assumption of sequential ignorability (Assumptions 1-3)..." | "Assuming no unmeasured confounding at each step of the mediation process..." | | "The influence function takes the form..." | [Omit; put in supplement] | | "Monte Carlo simulations with 1000 replications" | "We verified performance through simulation studies (see Supplementary Materials)" | ### Applied Methods Section Template ```markdown ## Statistical Analysis ### Mediation Model We examined whether [mediator] explained the relationship between [treatment] and [outcome] using [method name] (Author, Year). This approach decomposes the total treatment effect into: - **Direct effect**: The portion of the effect that operates independently of [mediator] - **Indirect effect**: The portion operating through [mediator] ### Assumptions This analysis requires that: 1. [Plain language assumption 1] 2. [Plain language assumption 2] 3. [Plain language assumption 3] We assessed the sensitivity of our findings to potential violations using [sensitivity analysis approach]. ### Implementation Analyses were conducted in R (version X.X) using the [package] package (Author, Year). Confidence intervals were computed using [method] with [N] bootstrap resamples. Code for all analyses is available at [URL].
FAQ Templates
General FAQ Structure
## Frequently Asked Questions ### Getting Started **Q: What type of data do I need for mediation analysis?** A: You need: - A treatment/exposure variable (X) - A potential mediator variable (M) - An outcome variable (Y) - Ideally, covariates that might confound these relationships The mediator should be measured *after* the treatment but *before* (or contemporaneously with) the outcome. --- **Q: How large should my sample be?** A: For detecting medium-sized indirect effects (standardized ~ 0.26): - N ≈ 150-200 for good power - N ≈ 75 minimum for very large effects - N ≈ 500+ for small effects Use power analysis tools like `pwr.med` to determine your specific needs. --- ### Interpretation Questions **Q: My indirect effect is significant but my direct effect is not. What does this mean?** A: This pattern suggests "full mediation" - the treatment's effect appears to operate entirely through the mediator. However: 1. "Full" mediation is rare and often reflects low power for the direct effect 2. Focus on effect sizes, not just significance 3. Report both effects with confidence intervals --- **Q: Can the indirect effect be larger than the total effect?** A: Yes! This happens when direct and indirect effects have opposite signs. For example: - Direct effect: -0.20 (treatment directly *reduces* outcome) - Indirect effect: +0.35 (treatment increases mediator, which increases outcome) - Total effect: +0.15 This is called "inconsistent mediation" or "suppression." --- ### Troubleshooting **Q: I'm getting an error about convergence. What should I do?** A: Common solutions: 1. Check for missing data: `sum(is.na(your_data))` 2. Scale your variables: `scale(variable)` 3. Remove outliers or influential observations 4. Simplify your model (fewer covariates) 5. Increase bootstrap iterations If problems persist, check the package's GitHub issues.
Error Message Humanization
Improving Error Messages in R Packages
#' User-Friendly Error Messages #' #' @examples #' # Instead of: #' stop("non-conformable arguments") #' #' # Use: #' stop(paste0( #' "The mediator and outcome variables have different lengths.\n", #' " - mediator has ", length(mediator), " observations\n", #' " - outcome has ", length(outcome), " observations\n", #' "Check for missing data or subsetting issues." #' )) # Wrapper for common checks check_input <- function(data, treatment, mediator, outcome) { errors <- character() # Check variables exist if (!treatment %in% names(data)) { errors <- c(errors, sprintf( "Treatment variable '%s' not found in data.\nAvailable columns: %s", treatment, paste(names(data), collapse = ", ") )) } if (!mediator %in% names(data)) { errors <- c(errors, sprintf( "Mediator variable '%s' not found in data.\nAvailable columns: %s", mediator, paste(names(data), collapse = ", ") )) } # Check for missing data n_missing <- sum(is.na(data[[treatment]]) | is.na(data[[mediator]]) | is.na(data[[outcome]])) if (n_missing > 0) { errors <- c(errors, sprintf( "Found %d observations with missing data in key variables.\n", "Use `na.omit(data[c('%s', '%s', '%s')])` to remove, or consider multiple imputation.", n_missing, treatment, mediator, outcome )) } if (length(errors) > 0) { stop(paste(errors, collapse = "\n\n"), call. = FALSE) } }
Print Method Design
Creating Informative Print Methods
#' Print Method for Mediation Results #' #' Designed for applied researchers who need clear interpretation print.mediation_result <- function(x, ...) { cat("\n") cat("======================================\n") cat(" MEDIATION ANALYSIS RESULTS \n") cat("======================================\n\n") # Effect estimates cat("EFFECT DECOMPOSITION:\n") cat(sprintf(" Total Effect: %6.3f 95%% CI [%6.3f, %6.3f]\n", x$total, x$total_ci[1], x$total_ci[2])) cat(sprintf(" Direct Effect: %6.3f 95%% CI [%6.3f, %6.3f]\n", x$direct, x$direct_ci[1], x$direct_ci[2])) cat(sprintf(" Indirect Effect: %6.3f 95%% CI [%6.3f, %6.3f] %s\n", x$indirect, x$indirect_ci[1], x$indirect_ci[2], ifelse(x$indirect_ci[1] > 0 | x$indirect_ci[2] < 0, "*", ""))) cat("\n") # Proportion mediated if (x$total != 0) { prop_med <- x$indirect / x$total * 100 cat(sprintf(" Proportion Mediated: %.1f%%\n", prop_med)) } cat("\n") # Plain language interpretation cat("INTERPRETATION:\n") if (x$indirect_ci[1] > 0) { cat(sprintf(" There is evidence of positive mediation (p < .05).\n")) cat(sprintf(" The treatment increases the outcome by %.3f through\n", x$indirect)) cat(sprintf(" its effect on the mediator.\n")) } else if (x$indirect_ci[2] < 0) { cat(sprintf(" There is evidence of negative mediation (p < .05).\n")) } else { cat(sprintf(" The indirect effect is not statistically significant.\n")) cat(sprintf(" We cannot conclude that mediation is present.\n")) } cat("\n") # Caveats cat("IMPORTANT CAVEATS:\n") cat(" • Results assume no unmeasured confounding\n") cat(" • See sensitivity analysis with sensitivityAnalysis()\n") cat(" • Report effect sizes, not just p-values\n") cat("\n") invisible(x) }
Communication Checklist
Before Sharing with Applied Audience
- Removed or defined all jargon
- Provided concrete examples for abstract concepts
- Included worked example with real (or realistic) data
- Added interpretation template for output
- Listed common pitfalls and how to avoid them
- Tested code examples actually run
- Had someone from target audience review
Before Publishing Vignette
- Quick start section works in under 5 minutes
- All code chunks run without error
- Output is formatted readably
- Links to other vignettes for advanced topics
- References included for those wanting more depth
- Spell-checked and grammar-checked
References
Science Communication
- Katz, Y. (2013). Against storytelling of scientific results. Nature Methods
- Fischhoff, B. (2013). The sciences of science communication. PNAS
- Doumont, J. L. (2009). Trees, Maps, and Theorems
Statistical Communication
- Gelman, A., & Nolan, D. (2002). Teaching Statistics: A Bag of Tricks
- Wickham, H. (2010). A layered grammar of graphics. JCGS
- Wilke, C. O. (2019). Fundamentals of Data Visualization
R Package Documentation
- Wickham, H., & Bryan, J. (2023). R Packages (vignette chapter)
- rOpenSci Packages Guide: https://devguide.ropensci.org/
Version: 1.0.0 Created: 2025-12-08 Domain: Statistical communication for diverse audiences Target Outputs: Vignettes, tutorials, workshops, applied papers