Claude-skill-registry literature-gap-finder
Method×Setting matrices and systematic gap identification
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/literature-gap-finder" ~/.claude/skills/majiayu000-claude-skill-registry-literature-gap-finder && rm -rf "$T"
skills/data/literature-gap-finder/SKILL.mdLiterature Gap Finder
Systematic framework for identifying research opportunities in statistical methodology
Use this skill when: positioning research contributions, finding gaps in methodology literature, identifying unexplored combinations of methods and settings, building literature reviews, or deciding on research directions.
The Gap-Finding Framework
What Makes a Good Research Gap?
A publishable gap must be:
- Real - Not already addressed (check thoroughly!)
- Important - Solves a problem researchers face
- Tractable - Can be solved with available tools
- Novel - Provides new insight, not just combination
- Timely - Relevant to current research trends
Types of Gaps
| Gap Type | Description | Example |
|---|---|---|
| Method Gap | No method exists for setting | No mediation analysis for network data |
| Theory Gap | Method exists but lacks theory | Bootstrap for mediation lacks consistency proof |
| Efficiency Gap | Methods exist but are inefficient | Doubly robust mediation more efficient |
| Robustness Gap | Methods fail under violations | Mediation under measurement error |
| Computational Gap | Existing methods don't scale | Mediation with high-dimensional confounders |
| Extension Gap | Existing method needs generalization | Binary → continuous mediator |
Method-Setting Matrix
Systematic Gap Identification Framework
The method-setting matrix is the core tool for finding research gaps systematically:
# Build a method-setting matrix programmatically create_gap_matrix <- function() { methods <- c("Regression", "Weighting/IPW", "DR/AIPW", "TMLE", "ML-based") settings <- c("Binary treatment", "Continuous treatment", "Time-varying", "Clustered", "High-dimensional", "Measurement error", "Missing data", "Network") matrix_data <- expand.grid(method = methods, setting = settings) matrix_data$status <- "unknown" # To be filled: "developed", "partial", "gap" matrix_data$priority <- NA matrix_data$references <- "" matrix_data } # Visualize the gap matrix visualize_gaps <- function(gap_matrix) { library(ggplot2) ggplot(gap_matrix, aes(x = method, y = setting, fill = status)) + geom_tile(color = "white") + scale_fill_manual(values = c( "developed" = "#2ecc71", "partial" = "#f39c12", "gap" = "#e74c3c", "unknown" = "#95a5a6" )) + theme_minimal() + labs(title = "Method × Setting Gap Matrix", x = "Method", y = "Setting") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) }
Verification Process
Confirming a Gap is Real
Before claiming a gap, verify systematically:
| Step | Action | Tools |
|---|---|---|
| 1 | Search major databases | Google Scholar, Web of Science, Scopus |
| 2 | Search preprint servers | arXiv, bioRxiv, SSRN |
| 3 | Search R packages | CRAN, GitHub, R-universe |
| 4 | Check conference proceedings | ICML, NeurIPS, JSM, ENAR |
| 5 | Search dissertations | ProQuest, university repositories |
| 6 | Email domain experts | 2-3 experts for confirmation |
# Systematic verification checklist verify_gap <- function(topic, keywords) { checklist <- list( databases_searched = c("google_scholar", "web_of_science", "pubmed", "scopus"), search_terms = keywords, date_range = paste(Sys.Date() - 365*5, "to", Sys.Date()), results = list( papers_found = 0, closest_related = c(), why_not_the_same = "" ), expert_consultation = list( experts_contacted = c(), responses = c() ), verification_status = "pending" # pending, confirmed, rejected ) checklist } # Document the verification document_verification <- function(gap_description, search_log) { cat("## Gap Verification Report\n\n") cat("**Gap:**", gap_description, "\n\n") cat("**Search Date:**", as.character(Sys.Date()), "\n\n") cat("**Databases Searched:**\n") for (db in search_log$databases_searched) { cat("- ", db, "\n") } cat("\n**Search Terms:**", paste(search_log$search_terms, collapse = ", "), "\n") cat("\n**Conclusion:**", search_log$verification_status, "\n") }
Priority Ranking
Scoring Research Gaps
| Criterion | Weight | Score 1-5 |
|---|---|---|
| Impact (how many benefit?) | 0.25 | ___ |
| Novelty (how new?) | 0.20 | ___ |
| Tractability (can we solve it?) | 0.20 | ___ |
| Timeliness (is it hot now?) | 0.15 | ___ |
| Fit (matches our expertise?) | 0.10 | ___ |
| Publication potential | 0.10 | ___ |
Priority Score = Σ(weight × score)
# Priority scoring function score_research_gap <- function( impact, # 1-5: How many researchers would benefit novelty, # 1-5: How new/original is this tractability, # 1-5: How likely can we solve it timeliness, # 1-5: Is this currently hot fit, # 1-5: Matches our expertise publication # 1-5: Publication potential ) { weights <- c(0.25, 0.20, 0.20, 0.15, 0.10, 0.10) scores <- c(impact, novelty, tractability, timeliness, fit, publication) priority <- sum(weights * scores) list( priority_score = priority, interpretation = case_when( priority >= 4.0 ~ "High priority - pursue immediately", priority >= 3.0 ~ "Medium priority - develop further", priority >= 2.0 ~ "Low priority - back burner", TRUE ~ "Skip - not worth pursuing" ), breakdown = data.frame( criterion = c("Impact", "Novelty", "Tractability", "Timeliness", "Fit", "Publication"), weight = weights, score = scores, weighted = weights * scores ) ) } # Compare multiple gaps rank_gaps <- function(gaps_list) { scores <- sapply(gaps_list, function(g) g$priority_score) order(scores, decreasing = TRUE) }
Method × Setting Matrix
The Core Framework
Systematically map methods against settings to find gaps:
METHODS │ Regression │ Weighting │ DR/TMLE │ ML-based │ ──────────┼────────────┼───────────┼─────────┼──────────│ Binary A │ ✓ │ ✓ │ ✓ │ ✓ │ Continuous│ ✓ │ ? │ ✓ │ ? │ SETTINGS ├────────────┼───────────┼─────────┼──────────│ Time-vary │ ? │ ✓ │ ✓ │ ✗ │ Clustered │ ✓ │ ? │ ? │ ✗ │ High-dim │ ✗ │ ✗ │ ? │ ✓ │ ✓ = Well-developed ? = Partial/emerging ✗ = Gap
Building Your Matrix
Step 1: Identify Dimensions
For mediation analysis:
| Dimension | Variations |
|---|---|
| Treatment | Binary, continuous, multi-level, time-varying |
| Mediator | Single, multiple, high-dimensional, latent |
| Outcome | Continuous, binary, count, survival, longitudinal |
| Confounding | Measured, unmeasured, time-varying |
| Structure | Single mediator, parallel, sequential, moderated |
| Data | Cross-sectional, longitudinal, clustered, network |
| Assumptions | Standard, relaxed positivity, measurement error |
Step 2: List Methods
| Method Family | Specific Methods |
|---|---|
| Regression | Baron-Kenny, product of coefficients, difference |
| Weighting | IPW, MSM, sequential g-estimation |
| Doubly Robust | AIPW, TMLE, cross-fitted |
| Semiparametric | Influence function-based |
| Bayesian | MCMC, variational |
| Machine Learning | Causal forests, DML, neural |
| Bounds | Partial identification, sensitivity |
Step 3: Fill and Analyze
Mark each cell:
- ✓ (green): Well-established with theory + software
- ? (yellow): Emerging or partial coverage
- ✗ (red): Clear gap
Example: Sequential Mediation Matrix
│ Product │ Weighting │ DR │ Bounds │ ─────────────────────────┼─────────┼───────────┼────┼────────│ 2 mediators, linear │ ✓ │ ✓ │ ✓ │ ? │ 2 mediators, nonlinear │ ? │ ✓ │ ? │ ✗ │ 3+ mediators, linear │ ? │ ? │ ✗ │ ✗ │ 3+ mediators, nonlinear │ ✗ │ ? │ ✗ │ ✗ │ With measurement error │ ✗ │ ✗ │ ✗ │ ✗ │ With unmeasured conf. │ ✗ │ ✗ │ ✗ │ ? │
Gaps identified:
- DR methods for 3+ mediators
- Any method with measurement error
- Bounds approach underdeveloped
Assumption Relaxation Trees
The Framework
Map how assumptions have been relaxed over time:
Standard Mediation (Baron-Kenny 1986) │ ┌─────────────────┼─────────────────┐ ↓ ↓ ↓ No unmeasured Linearity No interaction confounding assumed assumed │ │ │ ↓ ↓ ↓ ┌───────┴───────┐ Nonparametric VanderWeele ↓ ↓ (Imai 2010) 4-way decomp Sensitivity Bounds │ (Imai 2010) (partial ID) ↓ │ │ Multiple mediators? ↓ ↓ Longitudinal? E-value Sharp bounds? Measurement error? (Ding 2016) │ │ │ ↓ ↓ ↓ [YOUR GAP?] [YOUR GAP?] [YOUR GAP?]
Building the Tree
Step 1: Identify Original Assumptions
For a classic method, list ALL assumptions:
- Explicit assumptions (stated in paper)
- Implicit assumptions (unstated but required)
- Computational assumptions (required for implementation)
Step 2: Trace Relaxation History
For each assumption, find papers that:
- Relax it partially
- Relax it completely
- Replace it with different assumption
- Show consequences of violation
Step 3: Find Unexplored Branches
Look for:
- Combinations of relaxations not yet explored
- Relaxations in one method not applied to another
- Partial relaxations that could be completed
Example: Positivity Assumption
Positivity: P(A=a|X) > ε > 0 for all a, x │ ┌───────────────┼───────────────┐ ↓ ↓ ↓ Near-violation Practical Structural positivity violations │ │ │ ↓ ↓ ↓ Trimming Overlap Extrapolation weights assessment methods │ │ │ ↓ ↓ ↓ Truncation? Diagnostics? Bounds under violations?
Citation Network Analysis
Forward and Backward Searching
Backward: From recent key paper, trace citations:
- What foundational papers are cited?
- What parallel developments exist?
- What's the intellectual lineage?
Forward: Using Google Scholar "Cited by":
- Who has built on this work?
- What extensions were made?
- What gaps remain unaddressed?
Key Paper Identification
For any topic, identify:
| Category | Description | How to Find |
|---|---|---|
| Foundational | Original method papers | Most-cited, oldest |
| Textbook | Comprehensive treatments | Citations across subfields |
| Recent reviews | State-of-the-art summaries | "Review" in title, last 5 years |
| Frontier | Latest developments | Top journals, last 2 years |
| Your competition | Groups working on same gap | Recent similar titles |
Building a Citation Map
1986: Baron & Kenny [foundations] │ ├──→ 1990s: SEM extensions │ ├──→ 2004: Robins & Greenland [causal foundations] │ │ │ ├──→ 2010: Imai et al. [sensitivity] │ │ │ ├──→ 2010: VanderWeele [4-way] │ │ │ │ │ └──→ 2015: Book [comprehensive] │ │ │ └──→ 2014: Tchetgen [semiparametric] │ └──→ 2020s: ML integration [frontier]
Gap Verification Checklist
Before claiming a gap, verify:
1. Literature Search
- Searched Google Scholar with multiple keyword combinations
- Searched arXiv stat.ME and stat.TH
- Searched JSTOR for older statistics journals
- Searched bioRxiv/medRxiv for preprints
- Checked reference lists of review papers
- Checked "cited by" for key papers
2. Terminology Check
- Same concept might have different names in different fields
- Checked econometrics terminology
- Checked biostatistics terminology
- Checked machine learning terminology
- Checked psychology/SEM terminology
3. Adjacent Literature
- Checked related but not identical settings
- Method might exist for similar problem
- Checked if general framework applies
4. Working Papers
- Checked key authors' websites
- Checked conference proceedings (JSM, ENAR)
- Asked collaborators/experts
5. Final Verification
- Gap is not addressed in supplementary materials
- Gap is not "obvious" extension reviewers will dismiss
- Gap is important enough to publish
Gap Characterization Template
When you identify a gap:
## Gap: [Brief Title] ### Setting [Precise description of the setting where the gap exists] ### Current State - **What exists**: [Methods that partially address this] - **What works**: [Aspects of the problem already solved] - **What fails**: [Where current methods break down] ### The Gap - **Precise statement**: [What is missing] - **Why it matters**: [Who needs this, for what applications] - **Why it's hard**: [Technical challenges] ### Evidence of Gap - [ ] Literature search documented - [ ] No existing solution found - [ ] Experts consulted (optional) ### Potential Approaches 1. [Approach 1]: [Brief description] - Pros: [Advantages] - Cons: [Challenges] 2. [Approach 2]: [Brief description] - Pros: [Advantages] - Cons: [Challenges] ### Related Work - [Paper 1]: [How it relates, why it doesn't solve gap] - [Paper 2]: [How it relates, why it doesn't solve gap] ### Contribution Positioning "While [existing work] addresses [related problem], no method currently handles [specific gap]. We propose [approach] which provides [properties]."
Common Gap Patterns in Mediation
Pattern 1: Data Structure Mismatch
Gap template: "[Method] assumes [simple structure], but in [application] data has [complex structure]"
Examples:
- Methods assume iid, but data is clustered
- Methods assume cross-sectional, but data is longitudinal
- Methods assume complete data, but missingness exists
Pattern 2: Assumption Violation
Gap template: "[Method] requires [assumption], which is violated when [situation]"
Examples:
- Unmeasured mediator-outcome confounding
- Measurement error in mediator
- Treatment-mediator interaction
Pattern 3: Estimand Ambiguity
Gap template: "When [complication], standard estimands [NDE/NIE] are not well-defined or interpretable"
Examples:
- Post-treatment confounding
- Time-varying treatments/mediators
- Multiple versions of treatment
Pattern 4: Efficiency vs Robustness
Gap template: "Efficient methods require [strong assumptions], while robust methods are inefficient"
Examples:
- Doubly robust methods for mediation
- Semiparametric efficiency in complex settings
- Adaptive methods
Pattern 5: Computational Barrier
Gap template: "Theoretically valid approach exists but [computational limitation]"
Examples:
- High-dimensional settings
- Continuous mediators requiring integration
- Bootstrap in complex models
Research Positioning Strategies
The Contribution Statement
Strong positioning formula:
"Although [Author Year] developed [method] for [setting], their approach [limitation]. In contrast, our method [advantage] while maintaining [property]. Specifically, we contribute: (1) [theoretical contribution], (2) [methodological contribution], (3) [practical contribution]."
Positioning Types
| Position | When to Use | Example Language |
|---|---|---|
| Extension | Build on existing | "We extend [method] to [new setting]" |
| Synthesis | Combine approaches | "We unify [method A] and [method B]" |
| Alternative | Different approach | "We propose an alternative that [advantage]" |
| Correction | Fix limitation | "We address the limitation of [method]" |
| Generalization | Broader framework | "We develop a general framework that includes [special cases]" |
Differentiation Matrix
| Dimension | Competitor 1 | Competitor 2 | Our Method |
|---|---|---|---|
| Setting | Binary A only | Any A | Any A |
| Theory | Consistency | + Normality | + Efficiency |
| Assumptions | Strong | Medium | Weaker |
| Computation | Fast | Slow | Medium |
| Software | R package | None | R + Python |
Integration with Other Skills
This skill works with:
- cross-disciplinary-ideation - Find solutions from other fields for identified gaps
- method-transfer-engine - Transfer methods to fill gaps
- identification-theory - Understand what assumptions are needed
- methods-paper-writer - Write up the gap and solution
Key References
On Finding Research Gaps
- Alvesson, M. & Sandberg, J. (2011). Generating research questions through problematization
- Sandberg, J. & Alvesson, M. (2011). Ways of constructing research questions
Mediation Reviews (for gap identification)
- VanderWeele, T.J. (2016). Mediation analysis: A practitioner's guide. Annual Review
- Nguyen, T.Q. et al. (2021). Clarifying causal mediation analysis. Psychological Methods
Causal Inference Reviews
- Hernán, M.A. (2018). The C-word: Scientific euphemisms do not improve causal inference
- Imbens, G.W. (2020). Potential outcome and directed acyclic graph approaches
Version: 1.0 Created: 2025-12-08 Domain: Research Strategy, Literature Review