Awesome-Agent-Skills-for-Empirical-Research method-transfer-engine
Six-phase protocol for adapting methods across research domains
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/26-Data-Wise-scholar/skills/research/method-transfer-engine" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-method-transfer-e && rm -rf "$T"
skills/26-Data-Wise-scholar/skills/research/method-transfer-engine/SKILL.mdMethod Transfer Engine
Rigorous framework for adapting statistical methods across domains and settings
Use this skill when: adapting a method from one field to another, extending a method to a new setting, formalizing an intuitive connection between methods, or verifying that a transferred method retains its properties.
The Transfer Framework
What is Method Transfer?
Taking a technique that works in Setting A and adapting it to work in Setting B, while:
- Preserving desirable theoretical properties
- Identifying what changes are needed
- Understanding what can and cannot transfer
Transfer Quality Spectrum
Direct Application → Minor Adaptation → Major Modification → Inspired-By │ │ │ │ Same theory Adjust for Rewrite theory New method, applies new setting for new setting similar spirit
Transfer Success Criteria
A successful transfer must:
- Solve the target problem - Method actually helps in new setting
- Preserve key properties - Consistency, efficiency, robustness transfer
- Have clear assumptions - Know what's required in new setting
- Be verifiable - Can prove/simulate that it works
- Add value - Better than existing approaches
The 6-Phase Protocol
This protocol provides a systematic approach to method transfer, covering all critical steps from source extraction through validation.
Source Extraction
Goal: Extract the core mathematical and algorithmic essence of the source method
# Template for source method extraction extract_source_method <- function(method_name, reference) { list( name = method_name, estimand = "formal expression of what is estimated", estimator = "formula for the estimator", assumptions = c("A1: condition", "A2: condition"), properties = c("consistency", "asymptotic normality"), algorithm = c("Step 1: ...", "Step 2: ..."), complexity = "O(n^2) or similar" ) } # Example: Extract Lasso from signal processing lasso_extraction <- list( name = "Lasso/Basis Pursuit", field = "Signal Processing / Compressed Sensing", estimand = "argmin ||y - Xb||_2^2 + lambda * ||b||_1", key_insight = "L1 penalty induces sparsity via soft thresholding", assumptions = c("RIP condition", "Incoherence"), properties = c("Sparse solution", "Variable selection consistency") )
Abstraction
Goal: Identify the abstract mathematical structure that enables the method
# Abstract structure identification identify_abstraction <- function(source_method) { list( mathematical_structure = "e.g., M-estimation, U-statistics, kernels", core_operation = "e.g., reweighting, regularization, projection", information_used = "e.g., first moments, covariance, distributional", key_invariance = "what property makes it work", generalization_path = "how to extend beyond original setting" ) } # Example: Abstraction of propensity score methods propensity_abstraction <- list( mathematical_structure = "Reweighting to balance distributions", core_operation = "Inverse probability weighting", invariance = "Balances covariate distribution across groups", generalization = "Any selection mechanism with known probabilities" )
Phase 1: Source Method Analysis
Goal: Deeply understand what you're transferring
## Source Method Profile ### Basic Information - Name: [Method name] - Source field: [Domain/area] - Key reference: [Citation] - What it does: [One sentence] ### Problem Solved - Input: [What data/information goes in] - Output: [What estimate/inference comes out] - Setting: [When it applies] ### Mathematical Structure - Estimand: [What it estimates, formally] - Estimator: [How it estimates, formula] - Loss/objective: [What it optimizes] ### Assumptions Required 1. [Assumption 1]: [Mathematical statement] - Why needed: [Role in proof/method] - When violated: [Failure mode] 2. [Assumption 2]: ... ### Theoretical Properties - Consistency: [When/how proved] - Rate: [Convergence rate] - Asymptotic distribution: [If known] - Efficiency: [Relative to what] - Robustness: [To what violations] ### Computational Aspects - Algorithm: [How implemented] - Complexity: [Time/space] - Software: [Available implementations]
Phase 2: Target Problem Analysis
Goal: Understand where you want to apply it
## Target Problem Profile ### Basic Information - Problem name: [Description] - Target field: [Domain/area] - Motivation: [Why solve this] ### Problem Structure - Data available: [What's observed] - Estimand: [What you want to estimate] - Challenges: [Why existing methods inadequate] ### Current Approaches - Method 1: [Name, limitations] - Method 2: [Name, limitations] - Gap: [What's missing] ### Constraints - Assumptions willing to make: [List] - Assumptions NOT willing to make: [List] - Computational constraints: [If any]
Target Mapping
Goal: Map source concepts to their target domain counterparts
# Target mapping framework create_target_mapping <- function(source, target) { mapping <- list( objects = data.frame( source = c("treatment", "outcome", "confounder"), target = c("mediator", "effect", "moderator"), relationship = c("direct", "indirect", "modifies") ), assumptions = data.frame( source_assumption = c("SUTVA", "Ignorability"), target_version = c("Consistency", "Sequential ignorability"), status = c("transfers", "needs modification") ) ) mapping } # Example: IV to Mendelian randomization mapping iv_to_mr <- list( price_instrument = "genetic_variant", demand = "biomarker_exposure", endogeneity = "unmeasured_confounding", exclusion = "pleiotropic_effects", key_difference = "biological vs economic mechanisms" )
Phase 3: Structure Mapping
Goal: Identify correspondences between source and target
## Structure Map ### Object Correspondence | Source | Target | Notes | |--------|--------|-------| | [Source object 1] | [Target object 1] | [How they relate] | | [Source object 2] | [Target object 2] | [How they relate] | | ... | ... | ... | ### Assumption Correspondence | Source Assumption | Target Version | Status | |-------------------|----------------|--------| | [Source A1] | [Target A1'] | ✓ Transfers / ✗ Fails / ? Modify | | [Source A2] | [Target A2'] | ... | | ... | ... | ... | ### What Transfers Directly - [Property 1]: Because [reason] - [Property 2]: Because [reason] ### What Needs Modification - [Element 1]: From [source version] to [target version] - Why: [Reason for change] - How: [Specific modification] ### What Doesn't Transfer - [Element 1]: Because [reason] - Impact: [What we lose] - Alternative: [How to address]
Gap Analysis
Goal: Identify what doesn't transfer and what modifications are needed
# Gap analysis framework analyze_transfer_gaps <- function(source, target, mapping) { gaps <- list( assumption_gaps = list( violated = c("iid assumption in clustered data"), modified = c("independence -> conditional independence"), new_required = c("mediator positivity") ), property_gaps = list( lost = c("efficiency under misspecification"), weakened = c("convergence rate n^{-1/2} -> n^{-1/4}"), preserved = c("consistency", "asymptotic normality") ), computational_gaps = list( new_challenges = c("non-convex optimization"), workarounds = c("ADMM algorithm", "approximate methods") ), bridging_strategies = c( "Add regularization for new setting", "Derive modified variance estimator", "Implement robustness check" ) ) gaps }
Phase 4: Adaptation Design
Goal: Design the transferred method
## Adapted Method Design ### Overview [One paragraph describing the adapted method] ### Formal Definition **Estimand**: $$\psi = [target estimand formula]$$ **Estimator**: $$\hat{\psi}_n = [adapted estimator formula]$$ **Algorithm**: 1. [Step 1] 2. [Step 2] 3. ... ### Modified Assumptions 1. [Assumption A1']: [New statement for target setting] - Analogous to: [Source assumption] - Modified because: [Reason] ### Expected Properties - Consistency: [Conjecture/claim] - Rate: [Expected] - Efficiency: [Expected] ### Key Differences from Source 1. [Difference 1]: [Explanation] 2. [Difference 2]: [Explanation]
Validation
Goal: Systematically verify the transferred method works correctly
# Comprehensive validation framework for method transfer validate_transfer <- function(adapted_method, n_sims = 1000) { results <- list() # 1. Bias check: Is estimator unbiased at truth? results$bias <- run_bias_simulation(adapted_method, n_sims) # 2. Coverage check: Do CIs achieve nominal coverage? results$coverage <- run_coverage_simulation(adapted_method, n_sims) # 3. Efficiency check: Compare to alternatives results$efficiency <- compare_to_alternatives(adapted_method) # 4. Robustness check: Behavior under violations results$robustness <- test_assumption_violations(adapted_method) # 5. Edge cases: Extreme scenarios results$edge_cases <- test_edge_cases(adapted_method) # Validation report list( passed = all(sapply(results, function(x) x$passed)), details = results, recommendations = generate_recommendations(results) ) } # Simulation template for validation run_transfer_validation <- function(n = 500, n_sims = 1000) { estimates <- replicate(n_sims, { # Generate data under true model data <- generate_dgp(n) # Apply transferred method est <- adapted_method(data) c(estimate = est$point, se = est$se) }) list( bias = mean(estimates["estimate", ]) - true_value, rmse = sqrt(mean((estimates["estimate", ] - true_value)^2)), coverage = mean(abs(estimates["estimate", ] - true_value) < 1.96 * estimates["se", ]) ) }
Phase 5: Verification
Goal: Prove/demonstrate the transfer works
## Verification Plan ### Theoretical Verification - [ ] Consistency proof - Approach: [Proof strategy] - Key lemma: [What needs to be shown] - [ ] Asymptotic normality - Approach: [Proof strategy] - Influence function: [If applicable] - [ ] Efficiency (if claiming) - Approach: [Efficiency bound derivation] ### Simulation Verification - [ ] Scenario 1: [Description] - DGP: [Data generating process] - Expected result: [What should happen] - [ ] Scenario 2: Comparison to oracle - Purpose: [Verify optimality] - [ ] Scenario 3: Stress test - Purpose: [Find failure modes] ### Empirical Verification - [ ] Benchmark dataset: [If available] - [ ] Real application: [Domain]
Phase 6: Documentation
Goal: Document for publication
## Transfer Documentation ### Contribution Statement "We adapt [source method] from [source field] to [target setting] by [key modification]. Our adapted method [key property]. Unlike [alternative], our approach [advantage]." ### Theoretical Contribution - New result 1: [Theorem statement] - New result 2: [If applicable] ### Methodological Contribution - Adaptation insight: [What's novel about the transfer] - Practical guidance: [When to use] ### What We Learned - About source method: [New understanding] - About target problem: [New understanding] - General principle: [Broader insight]
Common Transfer Patterns
Pattern 1: Estimator Family Transfer
Template: Estimator type from one setting to another
Example: IPW from survey sampling → causal inference
Source: Horvitz-Thompson estimator E[Y] ≈ Σᵢ Yᵢ/πᵢ where πᵢ = P(selected) Target: IPW for ATE E[Y(1)] ≈ Σᵢ Yᵢ·Aᵢ/e(Xᵢ) where e(x) = P(A=1|X=x) Mapping: - Selection indicator → Treatment indicator - Selection probability → Propensity score - Survey weights → Inverse propensity weights Key insight: Both correct for selection bias via reweighting
Pattern 2: Robustness Property Transfer
Template: Robustness technique from one method to another
Example: Double robustness from missing data → causal inference
Source: Augmented IPW for missing data DR = IPW + Imputation - (IPW × Imputation) Target: AIPW for causal effects Same structure but for counterfactual outcomes Mapping: - Missing indicator → Treatment indicator - Missingness model → Propensity model - Imputation model → Outcome model Key insight: Product-form bias enables robustness to one misspecification
Pattern 3: Asymptotic Result Transfer
Template: Asymptotic theory from simpler to complex setting
Example: Influence function theory → semiparametric mediation
Source: IF for smooth functional of CDF √n(T(Fₙ) - T(F)) → N(0, E[φ²]) Target: IF for mediation effect functional Requires: mediation-specific tangent space Mapping: - General functional → Mediation estimand - CDF → Joint distribution (Y,M,A,X) - Generic IF → Mediation-specific IF Key insight: EIF theory applies to any pathwise differentiable functional
Pattern 4: Identification Strategy Transfer
Template: Identification approach from one causal setting to another
Example: IV from economics → Mendelian randomization
Source: Instrumental variables for demand estimation Z → A → Y, Z ⫫ U Target: MR for causal effects of exposures Gene → Biomarker → Outcome Mapping: - Price instrument → Genetic variant - Demand → Exposure level - Endogeneity → Confounding Key insight: Exogenous variation strategy is general
Pattern 5: Computational Method Transfer
Template: Algorithm from optimization → statistical estimation
Example: SGD from ML → online causal estimation
Source: Stochastic gradient descent for ERM θₜ₊₁ = θₜ - ηₜ∇L(θₜ; Xₜ) Target: Online updating for streaming causal data Sequential estimation as data arrives Mapping: - Loss function → Estimating equation - Gradient → Score contribution - Learning rate → Weighting scheme Key insight: Streaming updates possible for M-estimators
Transfer Verification Checklist
Theoretical Checks
- Identification preserved: Estimand still identified under adapted assumptions
- Consistency maintained: Proof carries over or new proof provided
- Rate preserved: Convergence rate same or characterized
- Variance characterized: Influence function derived if applicable
- Efficiency understood: Know if/when efficient
Practical Checks
- Computable: Can actually implement the adapted method
- Stable: Numerical issues don't prevent use
- Scalable: Works at relevant data sizes
Simulation Checks
- Correct at truth: Estimator unbiased when DGP matches assumptions
- Proper coverage: CIs achieve nominal coverage
- Efficiency comparison: Compared to alternatives
- Robustness: Behavior under assumption violations
Documentation Checks
- Assumptions clear: All requirements stated
- Limitations stated: Known failure modes documented
- Guidance provided: When to use/not use
Common Transfer Pitfalls
Pitfall 1: Hidden Assumption Dependence
Problem: Source method relies on assumption not explicit in exposition
Example: Many ML methods implicitly assume iid data
- Transfer to clustered data fails silently
- Variance underestimated, inference invalid
Prevention:
- Read proofs, not just statements
- Check what each step requires
- Simulate under violations
Pitfall 2: Changed Meaning
Problem: Same symbol/concept means different things
Example: "Independence" in different fields
- Statistical independence: P(A,B) = P(A)P(B)
- Causal independence: No causal pathway
- Conditional independence: Given covariates
Prevention:
- Define all terms explicitly
- Verify mathematical equivalence
- Don't assume same word = same concept
Pitfall 3: Lost Efficiency
Problem: Method transfers but loses optimality properties
Example: MLE transferred to semiparametric setting
- Parametric MLE is efficient
- Plugging into semiparametric problem: no longer efficient
- Need to derive new efficient estimator
Prevention:
- Re-derive efficiency in target setting
- Don't assume optimality transfers
- Compare to efficiency bound
Pitfall 4: Computational Invalidity
Problem: Algorithm doesn't work in new setting
Example: Newton-Raphson for optimization
- Works when Hessian well-behaved
- In ill-conditioned problems: numerical disaster
Prevention:
- Test on representative problems
- Check condition numbers, stability
- Have fallback algorithms
Pitfall 5: False Generalization
Problem: Transfer works for one case, claimed general
Example: Method for binary → continuous
- Test case: continuous Y is approximately binary
- Claim: works for all continuous Y
- Reality: fails for skewed/heavy-tailed
Prevention:
- Test diverse scenarios
- Characterize where it works
- State limitations clearly
Transfer Feasibility Assessment
Quick Assessment Questions
| Question | If No | If Yes |
|---|---|---|
| Same mathematical structure? | Major adaptation needed | Direct transfer possible |
| All assumptions translatable? | Some properties lost | Full transfer possible |
| Same data requirements? | Additional modeling needed | Straightforward application |
| Existing theory applicable? | New proofs required | Theory transfers |
| Similar computational structure? | Algorithm redesign | Code adaptation |
Feasibility Score
For each dimension, score 1-5:
| Dimension | Score | Interpretation |
|---|---|---|
| Structural similarity | __ /5 | 5 = identical structure |
| Assumption compatibility | __ /5 | 5 = all assumptions transfer |
| Theoretical portability | __ /5 | 5 = proofs carry over |
| Computational similarity | __ /5 | 5 = same algorithm works |
| Value added | __ /5 | 5 = major improvement |
Total: __/25
- 20-25: Strong transfer candidate
- 15-19: Feasible with moderate effort
- 10-14: Significant adaptation required
- <10: May need different approach
Integration with Other Skills
This skill works with:
- cross-disciplinary-ideation - Find candidate methods to transfer
- literature-gap-finder - Identify where transfer would be valuable
- proof-architect - Verify transferred properties
- identification-theory - Ensure identification in target setting
- asymptotic-theory - Derive properties in target setting
- simulation-architect - Validate the transfer
Key References
On Method Transfer
- Box, G.E.P. (1976). Science and statistics (on borrowing strength)
- Breiman, L. (2001). Statistical modeling: The two cultures
Successful Transfer Examples
- Rosenbaum & Rubin (1983). Central role of propensity score [survey → causal]
- Tibshirani (1996). Regression shrinkage via lasso [signals → regression]
- Robins et al. (1994). Estimation of regression coefficients [missing → causal]
Transfer in Causal Inference
- Pearl, J. (2009). Causality [AI → statistics]
- Hernán & Robins (2020). Causal Inference: What If
Version: 1.0 Created: 2025-12-08 Domain: Method Development, Research Innovation