Awesome-Agent-Skills-for-Empirical-Research pharmacovigilance-guide

Adverse drug event detection, safety signal mining, and drug monitoring

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/domains/pharma/pharmacovigilance-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-pharmacovigilance && rm -rf "$T"
manifest: skills/43-wentorai-research-plugins/skills/domains/pharma/pharmacovigilance-guide/SKILL.md
source content

Pharmacovigilance Guide

A skill for computational pharmacovigilance research, covering adverse drug event (ADE) databases, signal detection algorithms, disproportionality analysis, and safety surveillance methods used in post-market drug monitoring.

Adverse Event Data Sources

Key Databases

DatabaseOperatorCoverageAccess
FAERS (FDA Adverse Event Reporting System)FDAUS spontaneous reportsFree quarterly downloads
EudraVigilanceEMAEuropean reportsResearch access via application
VigiBaseWHO-UMCGlobal (150+ countries)Research license
VAERSCDC/FDAUS vaccine adverse eventsFree download
MAUDEFDAMedical device reportsFree download

Loading FAERS Data

import pandas as pd
import zipfile
import os

def load_faers_quarter(data_dir: str, year: int, quarter: int) -> dict:
    """
    Load FAERS quarterly data files into DataFrames.
    Downloads available from: fis.fda.gov/extensions/FPD-QDE-FAERS/FPD-QDE-FAERS.html
    Returns dict of DataFrames for each file type.
    """
    prefix = f"faers_ascii_{year}Q{quarter}"
    tables = {}

    file_map = {
        "DEMO": "demographics",    # Patient demographics
        "DRUG": "drugs",            # Drug information
        "REAC": "reactions",        # Adverse reactions (MedDRA terms)
        "OUTC": "outcomes",         # Patient outcomes
        "INDI": "indications",     # Drug indications
        "THER": "therapy",          # Therapy dates
        "RPSR": "report_sources",   # Report source
    }

    for suffix, name in file_map.items():
        filepath = os.path.join(data_dir, f"{suffix}{year}Q{quarter}.txt")
        if os.path.exists(filepath):
            tables[name] = pd.read_csv(
                filepath, sep="$", encoding="latin-1",
                low_memory=False, on_error="warn"
            )

    return tables

# Example: Load and inspect
faers = load_faers_quarter("./faers_data", 2024, 3)
print(f"Reports: {len(faers['demographics']):,}")
print(f"Drug-reaction pairs: {len(faers['reactions']):,}")

Signal Detection Methods

Disproportionality Analysis

Disproportionality measures compare the observed frequency of a drug-event pair against the expected frequency under independence:

import numpy as np
from scipy.stats import chi2

def compute_disproportionality(a: int, b: int, c: int, d: int) -> dict:
    """
    Compute disproportionality measures from a 2x2 contingency table:

              Event+    Event-
    Drug+       a          b
    Drug-       c          d

    a: reports with both the drug and the event
    b: reports with the drug but not the event
    c: reports with the event but not the drug
    d: reports with neither
    """
    n = a + b + c + d
    expected = (a + b) * (a + c) / n if n > 0 else 0

    # Reporting Odds Ratio (ROR)
    ror = (a * d) / (b * c) if b * c > 0 else float("inf")
    ln_ror = np.log(ror) if ror > 0 and ror != float("inf") else 0
    se_ln_ror = np.sqrt(1/a + 1/b + 1/c + 1/d) if min(a, b, c, d) > 0 else float("inf")
    ror_lower = np.exp(ln_ror - 1.96 * se_ln_ror)

    # Proportional Reporting Ratio (PRR)
    prr = (a / (a + b)) / (c / (c + d)) if (a + b) > 0 and (c + d) > 0 else 0

    # Information Component (IC, Bayesian shrinkage)
    ic = np.log2((a + 0.5) / (expected + 0.5)) if expected > 0 else 0

    # Chi-squared with Yates correction
    chi2_val = (n * (abs(a * d - b * c) - n / 2) ** 2) / (
        (a + b) * (c + d) * (a + c) * (b + d)
    ) if min(a + b, c + d, a + c, b + d) > 0 else 0

    return {
        "a": a, "b": b, "c": c, "d": d,
        "expected": round(expected, 2),
        "ROR": round(ror, 3),
        "ROR_lower_95": round(ror_lower, 3),
        "PRR": round(prr, 3),
        "IC": round(ic, 3),
        "chi2": round(chi2_val, 3),
        "signal": ror_lower > 1 and a >= 3 and chi2_val > 3.84,
    }

Multi-Item Gamma Poisson Shrinker (MGPS)

The MGPS method (used by FDA) applies empirical Bayesian shrinkage to stabilize estimates for rare events:

def empirical_bayes_geometric_mean(observed: np.ndarray,
                                     expected: np.ndarray) -> np.ndarray:
    """
    Simplified EBGM computation.
    Shrinks observed/expected ratios toward the overall mean,
    reducing false positives from small counts.
    """
    # Raw ratio
    rr = observed / np.maximum(expected, 0.01)

    # Empirical Bayes shrinkage (simplified two-component mixture)
    # Full implementation uses EM algorithm to fit mixture of gammas
    global_mean = np.mean(rr)
    shrinkage = expected / (expected + 1)  # more shrinkage for small expected
    ebgm = shrinkage * rr + (1 - shrinkage) * global_mean

    return ebgm

MedDRA Terminology

Medical Dictionary for Regulatory Activities

MedDRA provides the standardized terminology for adverse event coding:

Hierarchy (5 levels):
  System Organ Class (SOC)      -- e.g., "Cardiac disorders"
    High Level Group Term (HLGT) -- e.g., "Cardiac arrhythmias"
      High Level Term (HLT)      -- e.g., "Supraventricular tachyarrhythmias"
        Preferred Term (PT)      -- e.g., "Atrial fibrillation"
          Lowest Level Term (LLT) -- e.g., "Auricular fibrillation"

Standardized MedDRA Queries (SMQs)

Pre-defined search strategies for known safety topics:

  • Anaphylactic reaction (SMQ): Broad and narrow search terms
  • Drug-induced liver injury (SMQ): Hy's Law criteria
  • Torsade de pointes / QT prolongation (SMQ): Cardiac safety signals
  • Rhabdomyolysis (SMQ): Muscle-related adverse events

Temporal Pattern Analysis

Time-to-Onset Analysis

def time_to_onset_analysis(drug_start_dates: pd.Series,
                            event_dates: pd.Series) -> dict:
    """
    Analyze time-to-onset distribution for a drug-event pair.
    Useful for distinguishing causal signals from coincidental reports.
    """
    ttp = (event_dates - drug_start_dates).dt.days
    ttp = ttp[ttp >= 0]  # exclude negative (data quality issue)

    return {
        "n_reports": len(ttp),
        "median_days": ttp.median(),
        "mean_days": ttp.mean(),
        "q25_days": ttp.quantile(0.25),
        "q75_days": ttp.quantile(0.75),
        "within_30_days_pct": (ttp <= 30).mean() * 100,
        "within_90_days_pct": (ttp <= 90).mean() * 100,
    }

Causality Assessment

Standard frameworks for evaluating whether a drug caused an adverse event:

MethodTypeKey Criteria
WHO-UMCAlgorithmicTemporal, dechallenge, rechallenge, alternative causes
Naranjo ScoreScoring scale10 questions, score 0-13 (definite/probable/possible/doubtful)
Bradford HillPrinciplesStrength, consistency, specificity, temporality, biological gradient

Tools and Resources

  • openFDA API: Direct access to FAERS data via REST
  • OHDSI / OMOP CDM: Standardized observational health data for pharmacoepidemiology
  • PhViD (R package): Pharmacovigilance signal detection methods
  • EHRtemporalVariability: R package for temporal data quality in EHR
  • VigiRank: WHO-UMC signal prioritization algorithm
  • AEOLUS: Standardized and cleaned version of FAERS data