Medical-research-skills scikit-survival
A comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival; use it when you need to model censored time-to-event outcomes, fit Cox/RSF/GB models or Survival SVMs, evaluate with C-index/Brier score, or handle competing risks.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Data Analysis/scikit-survival" ~/.claude/skills/aipoch-medical-research-skills-scikit-survival && rm -rf "$T"
manifest:
scientific-skills/Data Analysis/scikit-survival/SKILL.mdsource content
When to Use
Use this skill when you need to:
- Model time-to-event outcomes with censoring (right/left/interval censored observations).
- Fit and interpret Cox Proportional Hazards models (including penalized Cox for high-dimensional data).
- Train non-linear survival models such as Random Survival Forests or Gradient Boosting survival models.
- Use Survival SVMs for margin-based survival prediction (linear or kernel).
- Evaluate survival predictions with censoring-aware metrics (Uno/Harrell C-index, time-dependent AUC, Brier/Integrated Brier Score) and/or perform competing risks analysis.
Key Features
- Survival target construction via
(arrays or DataFrame).sksurv.util.Surv - Model families
- Cox models:
,CoxPHSurvivalAnalysisCoxnetSurvivalAnalysis - Ensembles:
,RandomSurvivalForest
,GradientBoostingSurvivalAnalysisExtraSurvivalTrees - SVM-based:
,FastSurvivalSVMFastKernelSurvivalSVM
- Cox models:
- Non-parametric estimators: Kaplan–Meier and Nelson–Aalen.
- Competing risks: cumulative incidence estimation.
- scikit-learn compatibility: pipelines, cross-validation, and
with survival scorers.GridSearchCV - Evaluation utilities: IPCW-based metrics (e.g., Uno’s C-index) and calibration-aware scores (IBS).
Additional topic guides may exist under:
references/cox-models.mdreferences/ensemble-models.mdreferences/svm-models.mdreferences/data-handling.mdreferences/evaluation-metrics.mdreferences/competing-risks.md
Dependencies
(recommended:scikit-survival
)>=0.22
(recommended:scikit-learn
)>=1.2
(recommended:numpy
)>=1.23
(recommended:pandas
)>=1.5
Example Usage
A complete, runnable example using a scikit-survival built-in dataset, a scikit-learn pipeline, and Uno’s C-index (IPCW):
import numpy as np from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sksurv.datasets import load_breast_cancer from sksurv.linear_model import CoxPHSurvivalAnalysis from sksurv.metrics import concordance_index_ipcw, as_concordance_index_ipcw_scorer # 1) Load data (X: features, y: structured array with fields like ('event', 'time')) X, y = load_breast_cancer() # 2) Split (keep y_train for IPCW-based metrics) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # 3) Build a pipeline (scaling is important for many survival models) pipe = Pipeline([ ("scaler", StandardScaler()), ("model", CoxPHSurvivalAnalysis()), ]) # 4) Optional: hyperparameter tuning (CoxPH has few knobs; shown for workflow completeness) # If your version exposes regularization parameters, tune them here. param_grid = { # Example placeholder; remove if unsupported in your installed version: # "model__alpha": [0.0, 1e-4, 1e-3] } if param_grid: search = GridSearchCV( pipe, param_grid=param_grid, scoring=as_concordance_index_ipcw_scorer(), cv=5, n_jobs=-1, ) search.fit(X_train, y_train) best = search.best_estimator_ else: best = pipe.fit(X_train, y_train) # 5) Predict risk scores (higher typically means higher risk / shorter survival) risk_scores = best.predict(X_test) # 6) Evaluate with Uno's C-index (IPCW) c_uno = concordance_index_ipcw(y_train, y_test, risk_scores)[0] print(f"Uno's C-index (IPCW): {c_uno:.3f}")
Implementation Details
1) Survival Target Representation (Surv
)
Survscikit-survival expects outcomes as a structured array with at least:
- an event indicator (boolean)
- a time value (float/int)
Common construction patterns:
from sksurv.util import Surv y = Surv.from_arrays(event=event_array, time=time_array) # or y = Surv.from_dataframe("event", "time", df)
2) Model Selection Heuristics
- High-dimensional (p > n): prefer
(Elastic Net) for stability and feature selection.CoxnetSurvivalAnalysis - Interpretability required: prefer
(coefficients as log hazard ratios).CoxPHSurvivalAnalysis - Strong non-linearities / interactions: prefer
orRandomSurvivalForest
.GradientBoostingSurvivalAnalysis - Kernelized decision boundaries: consider
(ensure scaling).FastKernelSurvivalSVM
3) Preprocessing Requirements
- Scaling: strongly recommended for SVMs and often beneficial for penalized Cox models.
- Categoricals: encode (e.g., one-hot) before fitting most estimators.
- Data validation: ensure non-negative times; verify enough events relative to feature count.
4) Evaluation Under Censoring
- Harrell’s C-index (
): common, but can be less robust with heavy censoring.concordance_index_censored - Uno’s C-index (
): uses inverse probability of censoring weights and requiresconcordance_index_ipcw
to estimate censoring distribution.y_train
from sksurv.metrics import concordance_index_censored, concordance_index_ipcw c_harrell = concordance_index_censored(y_test["event"], y_test["time"], risk_scores)[0] c_uno = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
5) Time-dependent Metrics (AUC, Brier/IBS)
- Time-dependent AUC evaluates discrimination at specific time horizons.
- Brier score / Integrated Brier Score (IBS) evaluates calibration + discrimination over time and requires survival probabilities/functions.
from sksurv.metrics import cumulative_dynamic_auc times = np.array([365, 730, 1095]) # example horizons auc, mean_auc = cumulative_dynamic_auc(y_train, y_test, risk_scores, times)
6) Competing Risks (Cumulative Incidence)
Use competing risks methods when multiple mutually exclusive event types exist and one event prevents the others.
from sksurv.nonparametric import cumulative_incidence_competing_risks # y must encode event types appropriately for competing risks workflows time_points, cif1, cif2 = cumulative_incidence_competing_risks(y)