Claude-skill-registry clinical-research-pitfalls

Avoid common methodological mistakes in clinical research with MIMIC-IV and eICU databases. Covers immortal time bias, information leakage, selection bias, and other critical pitfalls.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/clinical-research-pitfalls" ~/.claude/skills/majiayu000-claude-skill-registry-clinical-research-pitfalls && rm -rf "$T"
manifest: skills/data/clinical-research-pitfalls/SKILL.md
source content

Clinical Research Pitfalls

This skill documents common methodological mistakes in ICU database research and how to avoid them. These errors can invalidate study conclusions.

When to Use This Skill

  • Designing research studies
  • Reviewing analysis plans
  • Debugging unexpected results
  • Peer review of methods

1. Immortal Time Bias

Definition

Time during which the outcome cannot occur, often because the exposure has not yet been assigned or identified.

Common Mistake

-- WRONG: Patients who "received Drug X during ICU stay"
-- Survival bias: Must survive long enough to receive the drug
SELECT stay_id
FROM mimiciv_derived.antibiotic
WHERE antibiotic LIKE '%vancomycin%';

Correct Approach

-- CORRECT: Define exposure at a fixed time point (e.g., first 24h)
SELECT DISTINCT stay_id
FROM mimiciv_derived.antibiotic ab
INNER JOIN mimiciv_icu.icustays ie ON ab.stay_id = ie.stay_id
WHERE ab.starttime <= DATETIME_ADD(ie.intime, INTERVAL 24 HOUR);

Key Principle

  • Define exposure status at a fixed time point (e.g., ICU admission, 24 hours, 48 hours)
  • Time zero should be the same for exposed and unexposed groups
  • Consider landmark analysis or time-varying covariates

2. Information Leakage (Future Data)

Definition

Using information that would not be available at the time of prediction/decision.

Common Mistake

-- WRONG: Using diagnosis codes for prediction at admission
-- ICD codes are assigned at discharge!
SELECT hadm_id, icd_code
FROM mimiciv_hosp.diagnoses_icd
WHERE icd_code LIKE 'I21%';  -- MI diagnosis

Correct Approach

-- CORRECT: Use chief complaint or admission diagnosis
-- Or clearly acknowledge this is retrospective phenotyping
SELECT hadm_id
FROM mimiciv_hosp.admissions
WHERE LOWER(admission_type) LIKE '%emergency%';

Common Sources of Leakage

  • Diagnosis codes: Assigned at discharge
  • Procedure codes: May be coded after completion
  • Length of stay: Only known at discharge
  • Discharge disposition: Future information
  • Labs ordered later: Not available at admission

3. Selection Bias

Definition

Systematic differences between study groups due to how subjects were selected.

Common Mistakes

Survivor Bias:

-- WRONG: Selecting patients who have 7-day labs
-- Excludes early deaths and early discharges
SELECT stay_id
FROM mimiciv_derived.chemistry
WHERE charttime >= DATETIME_ADD(
    (SELECT intime FROM mimiciv_icu.icustays WHERE stay_id = chemistry.stay_id),
    INTERVAL 7 DAY
);

Data Availability Bias:

-- WRONG: Patients with complete data
-- Complete cases may be systematically different
SELECT *
FROM mimiciv_derived.sofa
WHERE respiration_24hours IS NOT NULL
    AND coagulation_24hours IS NOT NULL
    AND liver_24hours IS NOT NULL
    AND cardiovascular_24hours IS NOT NULL
    AND cns_24hours IS NOT NULL
    AND renal_24hours IS NOT NULL;

Correct Approach

  • Report exclusions explicitly in CONSORT diagram
  • Analyze whether excluded patients differ
  • Consider imputation for missing data
  • Use intention-to-treat principles

4. Confounding by Indication

Definition

Treatment assignment is associated with prognosis, creating spurious treatment effects.

Example

Sicker patients receive more aggressive treatment, making treatment appear harmful:

-- WRONG: Comparing mortality by vasopressor use
-- Vasopressors given to sicker patients
SELECT
    CASE WHEN v.stay_id IS NOT NULL THEN 'Vasopressor' ELSE 'No Vasopressor' END AS treatment,
    AVG(a.hospital_expire_flag) AS mortality
FROM mimiciv_icu.icustays ie
LEFT JOIN mimiciv_derived.vasoactive_agent v ON ie.stay_id = v.stay_id
INNER JOIN mimiciv_hosp.admissions a ON ie.hadm_id = a.hadm_id
GROUP BY 1;
-- This will show higher mortality in vasopressor group (confounding!)

Correct Approaches

  • Propensity score matching/weighting
  • Instrumental variables
  • Regression discontinuity
  • Target trial emulation
  • Clearly state observational limitations

5. Multiple Comparisons

Definition

Testing many hypotheses increases false positive rate.

Common Mistake

  • Testing 20 lab values without adjustment
  • Subgroup analyses without pre-specification
  • Feature selection on full dataset

Correct Approach

  • Pre-specify primary outcome
  • Use Bonferroni or FDR correction
  • Hold out test set for final evaluation
  • Register analysis plan prospectively

6. Time-Related Errors

Aggregation Window Mismatch

-- WRONG: Mixing 24h and 48h windows
SELECT
    s.sofa_24hours,     -- 24-hour worst
    lab.creatinine_max  -- first_day_lab uses 24h
FROM mimiciv_derived.sofa s
INNER JOIN mimiciv_derived.first_day_lab lab
    ON s.stay_id = lab.stay_id
WHERE s.hr = 48;  -- SOFA at 48h, but lab is day 1!

Temporal Alignment

-- CORRECT: Align time windows
SELECT
    s.sofa_24hours,
    lab.creatinine_max
FROM mimiciv_derived.sofa s
INNER JOIN mimiciv_derived.first_day_lab lab
    ON s.stay_id = lab.stay_id
WHERE s.hr = 24;  -- Both at 24 hours

7. Handling Missing Data

Wrong Approaches

  • Complete case analysis (introduces bias)
  • Single imputation (underestimates variance)
  • Zero imputation for labs (not clinically meaningful)

Better Approaches

  • Multiple imputation
  • Maximum likelihood estimation
  • Sensitivity analyses
  • Pattern-mixture models
  • Report missingness rates

8. Outcome Definition

Ambiguous Mortality

-- Be specific about which mortality
SELECT
    hospital_expire_flag,  -- In-hospital only
    -- vs
    CASE WHEN dod IS NOT NULL
         AND dod <= DATETIME_ADD(dischtime, INTERVAL 30 DAY)
         THEN 1 ELSE 0 END AS mortality_30d
FROM mimiciv_hosp.admissions a
INNER JOIN mimiciv_hosp.patients p ON a.subject_id = p.subject_id;

Time Zero Definition

  • ICU admission? Hospital admission? First abnormal vital?
  • Be explicit and consistent

Checklist for Study Design

  • Time zero clearly defined
  • Exposure determined at fixed time point
  • No future information used as predictors
  • Selection criteria reported with flow diagram
  • Missing data handling specified
  • Confounders identified and addressed
  • Primary outcome pre-specified
  • Multiple comparison correction planned
  • Sensitivity analyses planned
  • External validation considered

References

  • Suissa S. "Immortal time bias in observational studies of drug effects." Pharmacoepidemiology and Drug Safety. 2007.
  • Hernán MA, Robins JM. "Causal Inference: What If." Chapman & Hall/CRC. 2020.
  • Johnson AEW et al. "Machine Learning and Decision Support in Critical Care." IEEE. 2016.