Awesome-Agent-Skills-for-Empirical-Research method-transfer-engine

Six-phase protocol for adapting methods across research domains

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/26-Data-Wise-scholar/skills/research/method-transfer-engine" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-method-transfer-e && rm -rf "$T"

manifest: skills/26-Data-Wise-scholar/skills/research/method-transfer-engine/SKILL.md

source content

Method Transfer Engine

Rigorous framework for adapting statistical methods across domains and settings

Use this skill when: adapting a method from one field to another, extending a method to a new setting, formalizing an intuitive connection between methods, or verifying that a transferred method retains its properties.

The Transfer Framework

What is Method Transfer?

Taking a technique that works in Setting A and adapting it to work in Setting B, while:

Preserving desirable theoretical properties
Identifying what changes are needed
Understanding what can and cannot transfer

Transfer Quality Spectrum

Direct Application → Minor Adaptation → Major Modification → Inspired-By
      │                    │                   │                  │
   Same theory         Adjust for          Rewrite theory      New method,
   applies            new setting          for new setting     similar spirit

Transfer Success Criteria

A successful transfer must:

Solve the target problem - Method actually helps in new setting
Preserve key properties - Consistency, efficiency, robustness transfer
Have clear assumptions - Know what's required in new setting
Be verifiable - Can prove/simulate that it works
Add value - Better than existing approaches

The 6-Phase Protocol

This protocol provides a systematic approach to method transfer, covering all critical steps from source extraction through validation.

Source Extraction

Goal: Extract the core mathematical and algorithmic essence of the source method

# Template for source method extraction
extract_source_method <- function(method_name, reference) {
  list(
    name = method_name,
    estimand = "formal expression of what is estimated",
    estimator = "formula for the estimator",
    assumptions = c("A1: condition", "A2: condition"),
    properties = c("consistency", "asymptotic normality"),
    algorithm = c("Step 1: ...", "Step 2: ..."),
    complexity = "O(n^2) or similar"
  )
}

# Example: Extract Lasso from signal processing
lasso_extraction <- list(
  name = "Lasso/Basis Pursuit",
  field = "Signal Processing / Compressed Sensing",
  estimand = "argmin ||y - Xb||_2^2 + lambda * ||b||_1",
  key_insight = "L1 penalty induces sparsity via soft thresholding",
  assumptions = c("RIP condition", "Incoherence"),
  properties = c("Sparse solution", "Variable selection consistency")
)

Abstraction

Goal: Identify the abstract mathematical structure that enables the method

# Abstract structure identification
identify_abstraction <- function(source_method) {
  list(
    mathematical_structure = "e.g., M-estimation, U-statistics, kernels",
    core_operation = "e.g., reweighting, regularization, projection",
    information_used = "e.g., first moments, covariance, distributional",
    key_invariance = "what property makes it work",
    generalization_path = "how to extend beyond original setting"
  )
}

# Example: Abstraction of propensity score methods
propensity_abstraction <- list(
  mathematical_structure = "Reweighting to balance distributions",
  core_operation = "Inverse probability weighting",
  invariance = "Balances covariate distribution across groups",
  generalization = "Any selection mechanism with known probabilities"
)

Phase 1: Source Method Analysis

Goal: Deeply understand what you're transferring

## Source Method Profile

### Basic Information
- Name: [Method name]
- Source field: [Domain/area]
- Key reference: [Citation]
- What it does: [One sentence]

### Problem Solved
- Input: [What data/information goes in]
- Output: [What estimate/inference comes out]
- Setting: [When it applies]

### Mathematical Structure
- Estimand: [What it estimates, formally]
- Estimator: [How it estimates, formula]
- Loss/objective: [What it optimizes]

### Assumptions Required
1. [Assumption 1]: [Mathematical statement]
   - Why needed: [Role in proof/method]
   - When violated: [Failure mode]

2. [Assumption 2]: ...

### Theoretical Properties
- Consistency: [When/how proved]
- Rate: [Convergence rate]
- Asymptotic distribution: [If known]
- Efficiency: [Relative to what]
- Robustness: [To what violations]

### Computational Aspects
- Algorithm: [How implemented]
- Complexity: [Time/space]
- Software: [Available implementations]

Phase 2: Target Problem Analysis

Goal: Understand where you want to apply it

## Target Problem Profile

### Basic Information
- Problem name: [Description]
- Target field: [Domain/area]
- Motivation: [Why solve this]

### Problem Structure
- Data available: [What's observed]
- Estimand: [What you want to estimate]
- Challenges: [Why existing methods inadequate]

### Current Approaches
- Method 1: [Name, limitations]
- Method 2: [Name, limitations]
- Gap: [What's missing]

### Constraints
- Assumptions willing to make: [List]
- Assumptions NOT willing to make: [List]
- Computational constraints: [If any]

Target Mapping

Goal: Map source concepts to their target domain counterparts

# Target mapping framework
create_target_mapping <- function(source, target) {
  mapping <- list(
    objects = data.frame(
      source = c("treatment", "outcome", "confounder"),
      target = c("mediator", "effect", "moderator"),
      relationship = c("direct", "indirect", "modifies")
    ),
    assumptions = data.frame(
      source_assumption = c("SUTVA", "Ignorability"),
      target_version = c("Consistency", "Sequential ignorability"),
      status = c("transfers", "needs modification")
    )
  )

  mapping
}

# Example: IV to Mendelian randomization mapping
iv_to_mr <- list(
  price_instrument = "genetic_variant",
  demand = "biomarker_exposure",
  endogeneity = "unmeasured_confounding",
  exclusion = "pleiotropic_effects",
  key_difference = "biological vs economic mechanisms"
)

Phase 3: Structure Mapping

Goal: Identify correspondences between source and target

## Structure Map

### Object Correspondence

| Source | Target | Notes |
|--------|--------|-------|
| [Source object 1] | [Target object 1] | [How they relate] |
| [Source object 2] | [Target object 2] | [How they relate] |
| ... | ... | ... |

### Assumption Correspondence

| Source Assumption | Target Version | Status |
|-------------------|----------------|--------|
| [Source A1] | [Target A1'] | ✓ Transfers / ✗ Fails / ? Modify |
| [Source A2] | [Target A2'] | ... |
| ... | ... | ... |

### What Transfers Directly
- [Property 1]: Because [reason]
- [Property 2]: Because [reason]

### What Needs Modification
- [Element 1]: From [source version] to [target version]
  - Why: [Reason for change]
  - How: [Specific modification]

### What Doesn't Transfer
- [Element 1]: Because [reason]
  - Impact: [What we lose]
  - Alternative: [How to address]

Gap Analysis

Goal: Identify what doesn't transfer and what modifications are needed

# Gap analysis framework
analyze_transfer_gaps <- function(source, target, mapping) {
  gaps <- list(
    assumption_gaps = list(
      violated = c("iid assumption in clustered data"),
      modified = c("independence -> conditional independence"),
      new_required = c("mediator positivity")
    ),

    property_gaps = list(
      lost = c("efficiency under misspecification"),
      weakened = c("convergence rate n^{-1/2} -> n^{-1/4}"),
      preserved = c("consistency", "asymptotic normality")
    ),

    computational_gaps = list(
      new_challenges = c("non-convex optimization"),
      workarounds = c("ADMM algorithm", "approximate methods")
    ),

    bridging_strategies = c(
      "Add regularization for new setting",
      "Derive modified variance estimator",
      "Implement robustness check"
    )
  )

  gaps
}

Phase 4: Adaptation Design

Goal: Design the transferred method

## Adapted Method Design

### Overview
[One paragraph describing the adapted method]

### Formal Definition

**Estimand**:
$$\psi = [target estimand formula]$$

**Estimator**:
$$\hat{\psi}_n = [adapted estimator formula]$$

**Algorithm**:
1. [Step 1]
2. [Step 2]
3. ...

### Modified Assumptions
1. [Assumption A1']: [New statement for target setting]
   - Analogous to: [Source assumption]
   - Modified because: [Reason]

### Expected Properties
- Consistency: [Conjecture/claim]
- Rate: [Expected]
- Efficiency: [Expected]

### Key Differences from Source
1. [Difference 1]: [Explanation]
2. [Difference 2]: [Explanation]

Validation

Goal: Systematically verify the transferred method works correctly

# Comprehensive validation framework for method transfer
validate_transfer <- function(adapted_method, n_sims = 1000) {
  results <- list()

  # 1. Bias check: Is estimator unbiased at truth?
  results$bias <- run_bias_simulation(adapted_method, n_sims)

  # 2. Coverage check: Do CIs achieve nominal coverage?
  results$coverage <- run_coverage_simulation(adapted_method, n_sims)

  # 3. Efficiency check: Compare to alternatives
  results$efficiency <- compare_to_alternatives(adapted_method)

  # 4. Robustness check: Behavior under violations
  results$robustness <- test_assumption_violations(adapted_method)

  # 5. Edge cases: Extreme scenarios
  results$edge_cases <- test_edge_cases(adapted_method)

  # Validation report
  list(
    passed = all(sapply(results, function(x) x$passed)),
    details = results,
    recommendations = generate_recommendations(results)
  )
}

# Simulation template for validation
run_transfer_validation <- function(n = 500, n_sims = 1000) {
  estimates <- replicate(n_sims, {
    # Generate data under true model
    data <- generate_dgp(n)

    # Apply transferred method
    est <- adapted_method(data)

    c(estimate = est$point, se = est$se)
  })

  list(
    bias = mean(estimates["estimate", ]) - true_value,
    rmse = sqrt(mean((estimates["estimate", ] - true_value)^2)),
    coverage = mean(abs(estimates["estimate", ] - true_value) <
                   1.96 * estimates["se", ])
  )
}

Phase 5: Verification

Goal: Prove/demonstrate the transfer works

## Verification Plan

### Theoretical Verification
- [ ] Consistency proof
  - Approach: [Proof strategy]
  - Key lemma: [What needs to be shown]

- [ ] Asymptotic normality
  - Approach: [Proof strategy]
  - Influence function: [If applicable]

- [ ] Efficiency (if claiming)
  - Approach: [Efficiency bound derivation]

### Simulation Verification
- [ ] Scenario 1: [Description]
  - DGP: [Data generating process]
  - Expected result: [What should happen]

- [ ] Scenario 2: Comparison to oracle
  - Purpose: [Verify optimality]

- [ ] Scenario 3: Stress test
  - Purpose: [Find failure modes]

### Empirical Verification
- [ ] Benchmark dataset: [If available]
- [ ] Real application: [Domain]

Phase 6: Documentation

Goal: Document for publication

## Transfer Documentation

### Contribution Statement
"We adapt [source method] from [source field] to [target setting] by
[key modification]. Our adapted method [key property]. Unlike [alternative],
our approach [advantage]."

### Theoretical Contribution
- New result 1: [Theorem statement]
- New result 2: [If applicable]

### Methodological Contribution
- Adaptation insight: [What's novel about the transfer]
- Practical guidance: [When to use]

### What We Learned
- About source method: [New understanding]
- About target problem: [New understanding]
- General principle: [Broader insight]

Common Transfer Patterns

Pattern 1: Estimator Family Transfer

Template: Estimator type from one setting to another

Example: IPW from survey sampling → causal inference

Source: Horvitz-Thompson estimator
        E[Y] ≈ Σᵢ Yᵢ/πᵢ where πᵢ = P(selected)

Target: IPW for ATE
        E[Y(1)] ≈ Σᵢ Yᵢ·Aᵢ/e(Xᵢ) where e(x) = P(A=1|X=x)

Mapping:
- Selection indicator → Treatment indicator
- Selection probability → Propensity score
- Survey weights → Inverse propensity weights

Key insight: Both correct for selection bias via reweighting

Pattern 2: Robustness Property Transfer

Template: Robustness technique from one method to another

Example: Double robustness from missing data → causal inference

Source: Augmented IPW for missing data
        DR = IPW + Imputation - (IPW × Imputation)

Target: AIPW for causal effects
        Same structure but for counterfactual outcomes

Mapping:
- Missing indicator → Treatment indicator
- Missingness model → Propensity model
- Imputation model → Outcome model

Key insight: Product-form bias enables robustness to one misspecification

Pattern 3: Asymptotic Result Transfer

Template: Asymptotic theory from simpler to complex setting

Example: Influence function theory → semiparametric mediation

Source: IF for smooth functional of CDF
        √n(T(Fₙ) - T(F)) → N(0, E[φ²])

Target: IF for mediation effect functional
        Requires: mediation-specific tangent space

Mapping:
- General functional → Mediation estimand
- CDF → Joint distribution (Y,M,A,X)
- Generic IF → Mediation-specific IF

Key insight: EIF theory applies to any pathwise differentiable functional

Pattern 4: Identification Strategy Transfer

Template: Identification approach from one causal setting to another

Example: IV from economics → Mendelian randomization

Source: Instrumental variables for demand estimation
        Z → A → Y, Z ⫫ U

Target: MR for causal effects of exposures
        Gene → Biomarker → Outcome

Mapping:
- Price instrument → Genetic variant
- Demand → Exposure level
- Endogeneity → Confounding

Key insight: Exogenous variation strategy is general

Pattern 5: Computational Method Transfer

Template: Algorithm from optimization → statistical estimation

Example: SGD from ML → online causal estimation

Source: Stochastic gradient descent for ERM
        θₜ₊₁ = θₜ - ηₜ∇L(θₜ; Xₜ)

Target: Online updating for streaming causal data
        Sequential estimation as data arrives

Mapping:
- Loss function → Estimating equation
- Gradient → Score contribution
- Learning rate → Weighting scheme

Key insight: Streaming updates possible for M-estimators

Transfer Verification Checklist

Theoretical Checks

Identification preserved: Estimand still identified under adapted assumptions
Consistency maintained: Proof carries over or new proof provided
Rate preserved: Convergence rate same or characterized
Variance characterized: Influence function derived if applicable
Efficiency understood: Know if/when efficient

Practical Checks

Computable: Can actually implement the adapted method
Stable: Numerical issues don't prevent use
Scalable: Works at relevant data sizes

Simulation Checks

Correct at truth: Estimator unbiased when DGP matches assumptions
Proper coverage: CIs achieve nominal coverage
Efficiency comparison: Compared to alternatives
Robustness: Behavior under assumption violations

Documentation Checks

Assumptions clear: All requirements stated
Limitations stated: Known failure modes documented
Guidance provided: When to use/not use

Common Transfer Pitfalls

Pitfall 1: Hidden Assumption Dependence

Problem: Source method relies on assumption not explicit in exposition

Example: Many ML methods implicitly assume iid data

Transfer to clustered data fails silently
Variance underestimated, inference invalid

Prevention:

Read proofs, not just statements
Check what each step requires
Simulate under violations

Pitfall 2: Changed Meaning

Problem: Same symbol/concept means different things

Example: "Independence" in different fields

Statistical independence: P(A,B) = P(A)P(B)
Causal independence: No causal pathway
Conditional independence: Given covariates

Prevention:

Define all terms explicitly
Verify mathematical equivalence
Don't assume same word = same concept

Pitfall 3: Lost Efficiency

Problem: Method transfers but loses optimality properties

Example: MLE transferred to semiparametric setting

Parametric MLE is efficient
Plugging into semiparametric problem: no longer efficient
Need to derive new efficient estimator

Prevention:

Re-derive efficiency in target setting
Don't assume optimality transfers
Compare to efficiency bound

Pitfall 4: Computational Invalidity

Problem: Algorithm doesn't work in new setting

Example: Newton-Raphson for optimization

Works when Hessian well-behaved
In ill-conditioned problems: numerical disaster

Prevention:

Test on representative problems
Check condition numbers, stability
Have fallback algorithms

Pitfall 5: False Generalization

Problem: Transfer works for one case, claimed general

Example: Method for binary → continuous

Test case: continuous Y is approximately binary
Claim: works for all continuous Y
Reality: fails for skewed/heavy-tailed

Prevention:

Test diverse scenarios
Characterize where it works
State limitations clearly

Transfer Feasibility Assessment

Quick Assessment Questions

Question	If No	If Yes
Same mathematical structure?	Major adaptation needed	Direct transfer possible
All assumptions translatable?	Some properties lost	Full transfer possible
Same data requirements?	Additional modeling needed	Straightforward application
Existing theory applicable?	New proofs required	Theory transfers
Similar computational structure?	Algorithm redesign	Code adaptation

Feasibility Score

For each dimension, score 1-5:

Dimension	Score	Interpretation
Structural similarity	__ /5	5 = identical structure
Assumption compatibility	__ /5	5 = all assumptions transfer
Theoretical portability	__ /5	5 = proofs carry over
Computational similarity	__ /5	5 = same algorithm works
Value added	__ /5	5 = major improvement

Total: __/25

20-25: Strong transfer candidate
15-19: Feasible with moderate effort
10-14: Significant adaptation required
<10: May need different approach

Integration with Other Skills

This skill works with:

cross-disciplinary-ideation - Find candidate methods to transfer
literature-gap-finder - Identify where transfer would be valuable
proof-architect - Verify transferred properties
identification-theory - Ensure identification in target setting
asymptotic-theory - Derive properties in target setting
simulation-architect - Validate the transfer

Key References

On Method Transfer

Box, G.E.P. (1976). Science and statistics (on borrowing strength)
Breiman, L. (2001). Statistical modeling: The two cultures

Successful Transfer Examples

Rosenbaum & Rubin (1983). Central role of propensity score [survey → causal]
Tibshirani (1996). Regression shrinkage via lasso [signals → regression]
Robins et al. (1994). Estimation of regression coefficients [missing → causal]

Transfer in Causal Inference

Pearl, J. (2009). Causality [AI → statistics]
Hernán & Robins (2020). Causal Inference: What If

Version: 1.0 Created: 2025-12-08 Domain: Method Development, Research Innovation