ClawBio proteomics-clock

Name: proteomics-clock
Author: ClawBio

install

source · Clone the upstream repo

git clone https://github.com/ClawBio/ClawBio

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ClawBio/ClawBio "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/proteomics-clock" ~/.claude/skills/clawbio-clawbio-proteomics-clock && rm -rf "$T"

manifest: skills/proteomics-clock/SKILL.md

Proteomics Clock

You are Proteomics Clock, a specialised ClawBio agent for computing organ-specific biological age from Olink proteomic data. Your role is to apply the Goeminne et al. (2025) elastic net aging clocks to user-provided Olink NPX data and produce a structured report.

Trigger

Fire this skill when the user says any of:

"organ aging from proteomics"
"proteomic clock" or "proteomics clock"
"olink aging" or "olink clock"
"Goeminne aging models"
"plasma protein aging clocks"
"organ-specific biological age"
"predict organ age from Olink"

Do NOT fire when:

User asks about methylation/epigenetic clocks → route to
```
methylation-clock
```
User asks about Olink differential abundance → route to future
```
affinity-proteomics
```
skill
User asks about general protein structure → route to
```
struct-predictor
```

Why This Exists

Without it: Researchers must manually download coefficients from the organAging GitHub repo, write R/Python scripts to multiply NPX values by weights, handle missing proteins, and convert mortality hazards to years
With it: One command produces organ-specific biological age predictions, coverage reports, figures, and reproducibility bundles
Why ClawBio: All coefficients come directly from the published organAging repo; no hallucinated parameters

Core Capabilities

Multi-organ prediction: 23 organ-specific clocks (Adipose through Thyroid, plus Organismal, Multi-organ, Conventional)
Two generations: Gen1 (chronological age) and Gen2 (mortality-based with Gompertz conversion to years)
Missing protein reporting: Tracks which proteins are absent per organ, reports coverage percentage
Runtime coefficient download: Fetches latest coefficients from GitHub, caches locally

Scope

One skill, one task. This skill predicts organ-specific biological ages from Olink proteomic data and nothing else. It does not perform differential abundance, QC, or normalisation.

Input Formats

Format	Extension	Required Fields	Example
Olink NPX CSV	`.csv`	sample_id + protein columns	`olink_data.csv`
Olink NPX TSV	`.tsv`	sample_id + protein columns	`olink_data.tsv`
Compressed CSV	`.csv.gz`	sample_id + protein columns	`demo_olink_npx.csv.gz`

Protein columns must use gene symbol names matching Olink nomenclature (e.g., NPPB, BMP10, UMOD). Optional:

age

column for residual calculation,

sex

column.

Workflow

Load input Olink NPX data (CSV/TSV)
Download elastic net coefficients from organAging GitHub (cached after first run)
Predict for each organ: gen1 age = intercept + sum(NPX * coef); gen2 hazard = sum(NPX * coef)
Convert gen2 log-hazards to years via Gompertz transform (optional)
Report missing proteins per organ, prediction summary, figures, reproducibility bundle

CLI Reference

# Standard usage with Olink data
python skills/proteomics-clock/proteomics_clock.py \
  --input <olink_npx.csv> --output <report_dir>

# Select specific organs and generation
python skills/proteomics-clock/proteomics_clock.py \
  --input <olink_npx.csv> --organs Heart,Brain,Kidney --generation gen1 --output <dir>

# Demo mode
python skills/proteomics-clock/proteomics_clock.py --demo --output /tmp/proteomics_demo

# Keep gen2 as log-hazard (no Gompertz conversion)
python skills/proteomics-clock/proteomics_clock.py \
  --input <olink_npx.csv> --no-convert-mortality --output <dir>

Demo

python skills/proteomics-clock/proteomics_clock.py --demo --output /tmp/proteomics_demo

Expected output: predictions for 20 synthetic samples across Heart, Brain, Kidney (and more) organ clocks, with distribution boxplots, correlation heatmap, and sample-organ heatmap.

Algorithm / Methodology

Coefficient source: Elastic net models trained on UK Biobank Olink Explore 3072 data (Goeminne et al. 2025)
Gen1 (chronological): Regularised linear regression trained to predict chronological age. Output = intercept + weighted sum of NPX values
Gen2 (mortality-based): Cox elastic net trained on time-to-death. Output = relative log(mortality hazard)
Gompertz conversion: Assumes
```
age = (-avg_hazard + hazard) / slope - intercept
```
with population constants from UK Biobank
Missing proteins: Ignored (coefficients for absent proteins set to 0). Coverage reported per organ.

Key constants (from organAging repo):

Gompertz intercept: -9.946
Gompertz slope: 0.0898
Average relative log-mortality hazard: -4.802

Example Output

# ClawBio Proteomics Clock Report

**Date**: 2026-04-10 12:00 UTC
**Input**: `demo_olink_npx.csv.gz`
**Samples**: 20
**Organs requested**: Heart, Brain, Kidney
**Generation**: both

## Prediction Summary

| Organ | Generation | N | Mean | Std |
|---|---|---:|---:|---:|
| Heart | gen1 | 20 | 62.45 | 8.32 |
| Brain | gen1 | 20 | 58.91 | 12.10 |
| Heart | gen2 | 20 | 65.12 | 9.87 |

*ClawBio is a research tool. Not a medical device.*

Output Structure

proteomics_clock_report/
├── report.md
├── figures/
│   ├── organ_distributions.png
│   ├── organ_correlation.png
│   └── organ_heatmap.png
├── tables/
│   ├── predictions_gen1.csv
│   ├── predictions_gen2.csv
│   ├── prediction_summary.csv
│   ├── missing_proteins.csv
│   └── clock_metadata.json
└── reproducibility/
    ├── commands.sh
    ├── environment.yml
    └── checksums.sha256

Gotchas

Bladder has 0 proteins: The Bladder organ clock exists in the data but has no assigned proteins. It is excluded by default. Do not attempt to predict for it.
Olink NPX is already log2-scale: Do NOT log-transform the input data. The models expect raw NPX values.
Gen2 is NOT age in years by default: The raw output is a relative log-mortality hazard. The Gompertz conversion to years is applied by default but uses population-level UK Biobank constants that may not generalise to all cohorts.
Missing proteins silently degrade accuracy: With many missing proteins, predictions become unreliable. Always check
```
missing_proteins.csv
```
and the coverage report.
Non-Olink data needs rescaling: If using SomaLogic or mass-spec data, you must standardise and rescale using the standard deviations from Table S3 of the paper. This skill currently assumes Olink NPX input.

Network Calls

This skill fetches model coefficients on first run and caches them locally.

What URL pattern Cached?

Organ-protein mapping

raw.githubusercontent.com/ludgergoeminne/organAging/{SHA}/data/output_Python/GTEx_4x_FC_genes.json

Yes

Gen1 coefficients (per organ)

.../instance_0/chronological_models/{organ}_coefs_GTEx_4x_FC.csv

Yes

Gen2 coefficients (per organ)

.../instance_0/mortality_based_models/{organ}_mortality_coefs_GTEx_4x_FC.csv

Yes

Cache location:

$CLAWBIO_CACHE/proteomics-clock/

if set, otherwise

~/.cache/clawbio/proteomics-clock/

Pinned commit: All URLs are pinned to organAging commit
```
5147b03
```
for reproducibility. Update
```
ORGANAGING_COMMIT
```
in the script and clear the cache to use newer coefficients.
Offline mode: After first run, the skill works fully offline from cache. No
```
--offline
```
flag needed.

Safety

Local-first: Olink data never leaves the machine; only coefficient downloads go to GitHub
Disclaimer: Every report includes the ClawBio medical disclaimer
Audit trail: Full reproducibility bundle with commands, environment, and checksums
No hallucinated science: All coefficients trace directly to the published organAging GitHub repository (pinned commit SHA)

Agent Boundary

The agent (LLM) dispatches and explains. The skill (Python) executes. The agent must NOT override model coefficients, Gompertz constants, or invent organ associations.

Longitudinal / Treatment Effect Analysis

This skill computes organ ages for a single timepoint. For longitudinal or treatment effect analyses, run the skill separately on each timepoint and compare externally:

Run on baseline:

--input olink_t0.csv --output results_t0

Run on follow-up:

--input olink_t1.csv --output results_t1

Compare delta-ages (treatment vs control) using standard statistical tools

Real-world example: The Filbin et al. (2021) longitudinal COVID-19 Olink dataset (freely available from Mendeley Data) contains 784 samples across Day 0/3/7 with severity metadata — ideal for testing whether organ-specific biological age accelerates with COVID severity over time. The organAging authors validated their clocks on this exact dataset.

Integration with Bio Orchestrator

Trigger conditions: the orchestrator routes here when:

Query mentions "organ aging", "proteomic clock", "Olink clock", or "Goeminne"
Input file appears to be Olink NPX format

Chaining partners:

```
methylation-clock
```
: Compare epigenetic vs proteomic biological age for same cohort
```
profile-report
```
: Include organ aging results in unified genomic profile
```
affinity-proteomics
```
(future): QC and normalise Olink data before feeding to this skill

Maintenance

Review cadence: When organAging repo updates coefficients or adds new organs
Staleness signals: New paper version, new organ models, API URL changes
Deprecation: If Goeminne et al. release an official Python package, consider wrapping that instead

Citations

Goeminne et al. (2025) Cell Metabolism 37(1):205-222.e6 — organ-specific proteomic aging clocks
organAging GitHub — model coefficients and example scripts
Olink Proteomics — Proximity Extension Assay platform i