OpenClaw-Medical-Skills bio-research-tools-biomarker-signature-studio
<!--
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-research-tools-biomarker-signature-studio" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-research-tools-biomarker-signatu && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-research-tools-biomarker-signature-studio" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-research-tools-biomarker-signatu && rm -rf "$T"
skills/bio-research-tools-biomarker-signature-studio/SKILL.mdname: bio-research-tools-biomarker-signature-studio description: Multi-omic biomarker discovery studio that ingests expression + metadata, performs QC, multi-strategy feature selection, nested CV model training, survival analysis hooks, and SHAP-based interpretation. Use to design translational biomarker panels with documented evidence. tool_type: python primary_tool: scikit-learn depends_on:
- machine-learning/biomarker-discovery
- machine-learning/model-validation
- machine-learning/omics-classifiers
- differential-expression/de-results
- workflow-management/biomarker-pipeline measurable_outcome: Run biomarker_signature_studio.py end-to-end on provided data within 20 minutes and produce metrics + feature rankings JSON artifacts. allowed-tools:
- read_file
- run_shell_command
Biomarker Signature Studio
Design validated biomarker panels that are explainable, stable, and ready for translational follow-up. This skill stitches together the existing biomarker pipeline tooling, adds configurable feature-selection ensembles, a small survival-analysis hook, and artifact export so downstream lab teams can review QC outputs.
What This Skill Does
- QC + Harmonization: Align expression matrices (samples x features) with metadata, check label balance, and compute summary stats.
- Feature Selection Ensemble: Supports Boruta, elastic-net stability, mutual-information top-K, and mRMR with optional intersection voting.
- Model Factory: Trains multiple estimators (Logistic L1, RandomForest, XGBoost if present) under nested CV, picks champion by AUC.
- Explainability + Export: Produces SHAP tables/plots when packages are available, exports feature rankings and model weights.
- Survival Hook: If metadata contains
andtime_to_event
the skill computes concordance for selected features via Cox model.event
All logic lives in
scripts/biomarker_signature_studio.py.
Inputs
- Expression matrix (
): CSV/TSV genes x samples or samples x genes (auto-detected by metadata match).--expression - Metadata (
): Must contain--metadata
. Optional--label-column
(default--id-column
),sample_id
,time_to_event
.event - Optional gene list for filtering (
).--feature-list - Output directory (
), created if missing.--output-dir
Quick CLI Usage
python Skills/Research_Tools/Biomarker_Signature_Studio/scripts/biomarker_signature_studio.py \ --expression data/expression.csv \ --metadata data/metadata.csv \ --label-column phenotype \ --selectors boruta,lasso,mrmr \ --models rf,logit \ --output-dir outputs/biomarkers_run1
Key flags:
| Flag | Description |
|---|---|
| Comma list of selection strategies (, , , ). |
| Models to evaluate (, , ). |
| Target number of features for /. |
| Enable Cox evaluation when survival columns exist. |
| Reproducibility. |
| Outer CV folds (default 5). |
Workflow
- Load + align inputs, infer orientation, impute missing values.
- Standardize features (fit on train set only).
- Run requested selectors; create intersection + union candidate lists.
- For each selector output run nested CV training across requested models.
- Export champion metrics (
), feature table (metrics.json
), SHAP summary (selected_features.csv
when available), and survival stats (shap_summary.csv
).survival.json
QC Expectations
- Class count ratio ≤3:1; warnings logged otherwise.
- Selected features between 5 and 250 unless user overrides.
- Nested CV AUC ≥0.70 or flagged in report.
- SHAP overlap with selected features ≥60% (reported).
Related Assets
(scaffold for teams)examples/configs/biomarker_studio_template.yaml
(entry point)scripts/biomarker_signature_studio.py- Existing biomarker workflow skill for orchestrated runs.
Use this skill whenever you need a ready-to-review biomarker dossier (data QC, model metrics, explainability artifacts) before moving to validation cohorts or lab assays.
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->