Awesome-Agent-Skills-for-Empirical-Research epidemiology-guide

Epidemiological study designs, measures of association, and public health ana...

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/domains/biomedical/epidemiology-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-epidemiology-guid && rm -rf "$T"
manifest: skills/43-wentorai-research-plugins/skills/domains/biomedical/epidemiology-guide/SKILL.md
source content

Epidemiology Guide

A skill for designing and analyzing epidemiological studies. Covers study design selection, measures of disease frequency and association, bias assessment, and public health data analysis methods.

Study Design Selection

Design Hierarchy

                    Evidence Strength
                         |
    Systematic Review / Meta-Analysis   (Highest)
                         |
         Randomized Controlled Trial
                         |
              Cohort Study (Prospective)
                         |
            Case-Control Study
                         |
         Cross-Sectional Study
                         |
         Case Report / Case Series       (Lowest)

When to Use Each Design

DesignResearch QuestionTimeCostBias Risk
RCTDoes intervention X prevent outcome Y?YearsVery highLowest
Prospective CohortDoes exposure X increase risk of Y?YearsHighModerate
Retrospective CohortHistorical exposure-outcome relationship?MonthsModerateModerate-High
Case-ControlWhat exposures are associated with rare disease?MonthsLowHigh
Cross-SectionalWhat is the prevalence of X?WeeksLowHigh
EcologicalDo population-level factors correlate with disease?WeeksVery lowVery high

Measures of Disease Frequency

import numpy as np

def compute_measures(cases: int, population: int,
                      person_time: float = None,
                      period_years: float = 1.0) -> dict:
    """
    Compute basic epidemiological measures.

    Args:
        cases: Number of new cases (for incidence) or existing cases (for prevalence)
        population: Population at risk
        person_time: Person-years of follow-up (for incidence rate)
        period_years: Time period in years (for cumulative incidence)
    """
    measures = {}

    # Point prevalence
    measures['prevalence'] = {
        'value': cases / population,
        'per_1000': (cases / population) * 1000,
        'formula': 'cases / population at a point in time'
    }

    # Cumulative incidence (risk)
    measures['cumulative_incidence'] = {
        'value': cases / population,
        'per_1000': (cases / population) * 1000,
        'period_years': period_years,
        'formula': 'new cases / population at risk during time period'
    }

    # Incidence rate (if person-time available)
    if person_time:
        measures['incidence_rate'] = {
            'value': cases / person_time,
            'per_1000_py': (cases / person_time) * 1000,
            'formula': 'new cases / person-time at risk'
        }

    return measures

Measures of Association

Risk Ratio, Odds Ratio, and Attributable Risk

def measures_of_association(a: int, b: int, c: int, d: int) -> dict:
    """
    Compute epidemiological measures of association from a 2x2 table.

                    Disease+    Disease-
    Exposed+          a           b        a+b
    Exposed-          c           d        c+d
                     a+c         b+d        N

    Args:
        a: Exposed with disease
        b: Exposed without disease
        c: Unexposed with disease
        d: Unexposed without disease
    """
    # Risk in exposed and unexposed
    risk_exposed = a / (a + b)
    risk_unexposed = c / (c + d)

    # Risk Ratio (Relative Risk)
    rr = risk_exposed / risk_unexposed
    ln_rr = np.log(rr)
    se_ln_rr = np.sqrt(1/a - 1/(a+b) + 1/c - 1/(c+d))
    rr_ci = (np.exp(ln_rr - 1.96*se_ln_rr), np.exp(ln_rr + 1.96*se_ln_rr))

    # Odds Ratio
    or_val = (a * d) / (b * c)
    ln_or = np.log(or_val)
    se_ln_or = np.sqrt(1/a + 1/b + 1/c + 1/d)
    or_ci = (np.exp(ln_or - 1.96*se_ln_or), np.exp(ln_or + 1.96*se_ln_or))

    # Attributable Risk (Risk Difference)
    ar = risk_exposed - risk_unexposed
    se_ar = np.sqrt(risk_exposed*(1-risk_exposed)/(a+b) +
                     risk_unexposed*(1-risk_unexposed)/(c+d))
    ar_ci = (ar - 1.96*se_ar, ar + 1.96*se_ar)

    # Attributable Fraction in Exposed
    af_exposed = (rr - 1) / rr

    # Population Attributable Fraction
    prevalence_exposure = (a + b) / (a + b + c + d)
    paf = prevalence_exposure * (rr - 1) / (prevalence_exposure * (rr - 1) + 1)

    return {
        'risk_ratio': {'value': round(rr, 3), 'ci_95': tuple(round(x, 3) for x in rr_ci)},
        'odds_ratio': {'value': round(or_val, 3), 'ci_95': tuple(round(x, 3) for x in or_ci)},
        'risk_difference': {'value': round(ar, 4), 'ci_95': tuple(round(x, 4) for x in ar_ci)},
        'attributable_fraction_exposed': round(af_exposed, 3),
        'population_attributable_fraction': round(paf, 3),
        'number_needed_to_harm': round(1/ar, 1) if ar > 0 else None
    }

# Example: smoking and lung cancer
result = measures_of_association(a=80, b=920, c=10, d=990)
print(f"RR: {result['risk_ratio']['value']} ({result['risk_ratio']['ci_95']})")
print(f"OR: {result['odds_ratio']['value']} ({result['odds_ratio']['ci_95']})")
print(f"PAF: {result['population_attributable_fraction']}")

Bias Assessment

Types of Bias and Mitigation

Bias TypeDescriptionMitigation Strategy
Selection biasNon-random sample selectionRandom sampling, matching
Information biasMeasurement error in exposure/outcomeValidated instruments, blinding
Recall biasDifferential recall by disease statusUse records, not self-report
ConfoundingThird variable affects both exposure and outcomeStratification, regression, matching
Lead-time biasEarlier detection misinterpreted as longer survivalUse mortality, not survival
Healthy worker effectWorkers are healthier than general populationUse employed comparison group

Confounding Assessment

def assess_confounding(crude_rr: float, adjusted_rr: float,
                        threshold: float = 0.10) -> dict:
    """
    Assess whether a variable is a confounder.
    """
    pct_change = abs(crude_rr - adjusted_rr) / crude_rr * 100

    return {
        'crude_RR': crude_rr,
        'adjusted_RR': adjusted_rr,
        'percent_change': round(pct_change, 1),
        'is_confounder': pct_change > threshold * 100,
        'interpretation': (
            f"{'Confounder detected' if pct_change > threshold * 100 else 'Not a confounder'}: "
            f"adjusting changed the RR by {pct_change:.1f}% "
            f"(threshold: {threshold*100:.0f}%)"
        )
    }

Survival Analysis

For time-to-event data, use Kaplan-Meier estimators for descriptive analysis, log-rank tests for group comparisons, and Cox proportional hazards regression for multivariable analysis. Always check the proportional hazards assumption using Schoenfeld residuals and report median survival times with 95% confidence intervals.

Reporting Standards

Follow STROBE (observational studies), CONSORT (trials), or RECORD (routinely collected data) reporting guidelines. Report all measures with 95% confidence intervals. Present both crude and adjusted estimates to show the impact of confounding adjustment.