OpenClaw-Medical-Skills bio-research-tools-biomarker-signature-studio

<!--

install

source · Clone the upstream repo

git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-research-tools-biomarker-signature-studio" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-research-tools-biomarker-signatu && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-research-tools-biomarker-signature-studio" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-research-tools-biomarker-signatu && rm -rf "$T"

manifest: skills/bio-research-tools-biomarker-signature-studio/SKILL.md

Biomarker Signature Studio

Design validated biomarker panels that are explainable, stable, and ready for translational follow-up. This skill stitches together the existing biomarker pipeline tooling, adds configurable feature-selection ensembles, a small survival-analysis hook, and artifact export so downstream lab teams can review QC outputs.

What This Skill Does

QC + Harmonization: Align expression matrices (samples x features) with metadata, check label balance, and compute summary stats.
Feature Selection Ensemble: Supports Boruta, elastic-net stability, mutual-information top-K, and mRMR with optional intersection voting.
Model Factory: Trains multiple estimators (Logistic L1, RandomForest, XGBoost if present) under nested CV, picks champion by AUC.
Explainability + Export: Produces SHAP tables/plots when packages are available, exports feature rankings and model weights.
Survival Hook: If metadata contains
```
time_to_event
```
and
```
event
```
the skill computes concordance for selected features via Cox model.

All logic lives in

scripts/biomarker_signature_studio.py

Inputs

Expression matrix (
```
--expression
```
): CSV/TSV genes x samples or samples x genes (auto-detected by metadata match).

Metadata (

--metadata

): Must contain

--label-column

. Optional

--id-column

(default

sample_id

time_to_event

event

Optional gene list for filtering (
```
--feature-list
```
).
Output directory (
```
--output-dir
```
), created if missing.

Quick CLI Usage

python Skills/Research_Tools/Biomarker_Signature_Studio/scripts/biomarker_signature_studio.py \
  --expression data/expression.csv \
  --metadata data/metadata.csv \
  --label-column phenotype \
  --selectors boruta,lasso,mrmr \
  --models rf,logit \
  --output-dir outputs/biomarkers_run1

Key flags:

Flag	Description
`--selectors`	Comma list of selection strategies ( `boruta` , `lasso` , `mrmr` , `mi_topk` ).
`--models`	Models to evaluate ( `logit` , `rf` , `xgb` ).
`--k-features`	Target number of features for `mrmr` / `mi_topk` .
`--survival`	Enable Cox evaluation when survival columns exist.
`--random-state`	Reproducibility.
`--nested-folds`	Outer CV folds (default 5).

Workflow

Load + align inputs, infer orientation, impute missing values.
Standardize features (fit on train set only).
Run requested selectors; create intersection + union candidate lists.
For each selector output run nested CV training across requested models.
Export champion metrics (
```
metrics.json
```
), feature table (
```
selected_features.csv
```
), SHAP summary (
```
shap_summary.csv
```
when available), and survival stats (
```
survival.json
```
).

QC Expectations

Class count ratio ≤3:1; warnings logged otherwise.
Selected features between 5 and 250 unless user overrides.
Nested CV AUC ≥0.70 or flagged in report.
SHAP overlap with selected features ≥60% (reported).

Related Assets

examples/configs/biomarker_studio_template.yaml

(scaffold for teams)

```
scripts/biomarker_signature_studio.py
```
(entry point)
Existing biomarker workflow skill for orchestrated runs.

Use this skill whenever you need a ready-to-review biomarker dossier (data QC, model metrics, explainability artifacts) before moving to validation cohorts or lab assays.