LLMs-Universal-Life-Science-and-Clinical-Skills- proteomics-de
install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Proteomics/proteomics-de" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-proteomics-de && rm -rf "$T"
manifest:
Skills/Proteomics/proteomics-de/SKILL.mdsource content
⚖️ Differential Protein Abundance
Statistical testing for differentially abundant proteins between experimental conditions.
Core Capabilities
- MSstats: Feature-level mixed models with group comparison
- limma: Empirical Bayes moderated t-tests on protein-level data
- proDA: Probabilistic handling of missing values
- scipy/statsmodels (Python): T-test with FDR correction
- Visualization: Volcano plots, heatmaps
CLI Reference
python omicsclaw.py run differential-abundance --demo python omicsclaw.py run differential-abundance --input <protein_matrix.csv> --output <dir>
Algorithm / Methodology
MSstats Group Comparison (R)
library(MSstats) comparison_matrix <- matrix(c(1, -1, 0, 0, 1, 0, -1, 0, 0, 1, -1, 0), nrow = 3, byrow = TRUE) rownames(comparison_matrix) <- c('Treatment1-Control', 'Treatment2-Control', 'Treatment1-Treatment2') colnames(comparison_matrix) <- c('Control', 'Treatment1', 'Treatment2', 'Treatment3') results <- groupComparison(contrast.matrix = comparison_matrix, data = processed) sig_proteins <- results$ComparisonResult[results$ComparisonResult$adj.pvalue < 0.05 & abs(results$ComparisonResult$log2FC) > 1, ]
limma for Proteomics (R)
library(limma) design <- model.matrix(~ 0 + condition, data = sample_info) colnames(design) <- levels(sample_info$condition) fit <- lmFit(protein_matrix, design) contrast_matrix <- makeContrasts(Treatment - Control, levels = design) fit2 <- contrasts.fit(fit, contrast_matrix) fit2 <- eBayes(fit2) results <- topTable(fit2, number = Inf, adjust.method = 'BH') sig_results <- results[results$adj.P.Val < 0.05 & abs(results$logFC) > 1, ]
proDA (Missing Value-Aware)
library(QFeatures) library(proDA) fit <- proDA(protein_matrix, design = ~ condition, data = sample_info) results <- test_diff(fit, contrast = 'conditionTreatment') results$adj_pval <- p.adjust(results$pval, method = 'BH') sig_results <- results[results$adj_pval < 0.05 & abs(results$diff) > 1, ]
Python: scipy/statsmodels
import pandas as pd import numpy as np from scipy import stats from statsmodels.stats.multitest import multipletests def differential_test(intensities, group1_cols, group2_cols): results = [] for protein in intensities.index: g1 = intensities.loc[protein, group1_cols].dropna() g2 = intensities.loc[protein, group2_cols].dropna() if len(g1) >= 2 and len(g2) >= 2: stat, pval = stats.ttest_ind(g1, g2) log2fc = g2.mean() - g1.mean() results.append({'protein': protein, 'log2FC': log2fc, 'pvalue': pval}) df = pd.DataFrame(results) df['adj_pvalue'] = multipletests(df['pvalue'], method='fdr_bh')[1] return df sig = results[(results['adj_pvalue'] < 0.05) & (abs(results['log2FC']) > 1)]
Volcano Plot (R)
library(ggplot2) ggplot(results, aes(x = log2FC, y = -log10(adj.P.Val))) + geom_point(aes(color = significant), alpha = 0.6) + geom_hline(yintercept = -log10(0.05), linetype = 'dashed') + geom_vline(xintercept = c(-1, 1), linetype = 'dashed') + scale_color_manual(values = c('grey', 'red')) + theme_minimal()
Parameters
| Parameter | Default | Description |
|---|---|---|
| | limma, msstats, proda, ttest |
| | FDR threshold |
| | Log2FC threshold |
Why This Exists
- Without it: T-tests assume independence and fail on complex experimental designs or missing MS values
- With it: Validated mixed-models (MSstats) and empirical Bayes (Limma) correctly handle imputation and variances
- Why OmicsClaw: Standardizes notoriously strict R packages into a simple Python wrapper
Workflow
- Calculate: Prepare contrast matrix and condition vectors.
- Execute: Calculate log ratios and moderated t-statistics per protein.
- Assess: Perform Benjamini-Hochberg FDR correction.
- Generate: Output structured significant hits.
- Report: Synthesize Volcano plots and heatmaps.
Example Queries
- "Run MSstats differential abundance on this data"
- "Find differentially expressed proteins using limma"
Output Structure
output_directory/ ├── report.md ├── result.json ├── statistics.csv ├── figures/ │ └── volcano_plot.png ├── tables/ │ └── differential_results.csv └── reproducibility/ ├── commands.sh ├── environment.yml └── checksums.sha256
Safety
- Local-first: Strict offline processing without external upload.
- Disclaimer: Requires OmicsClaw reporting structures and disclaimers.
- Audit trail: Hyperparameters and operational flow states are logged fully.
Integration with Orchestrator
Trigger conditions:
- Automatically invoked dynamically based on tool metadata and user intent matching.
Chaining partners:
— Upstream normalized expressionquantification
— Downstream pathway enrichmentprot-enrichment
Version Compatibility
Reference examples tested with: limma 3.58+, MSstats 4.8+, scipy 1.12+, statsmodels 0.14+
Dependencies
Required: numpy, pandas, scipy, statsmodels Optional: MSstats (R), limma (R), proDA (R), QFeatures (R)
Citations
- MSstats — Choi et al., MCP 2014
- limma — Ritchie et al., Nucleic Acids Research 2015
- proDA — Ahlmann-Eltze & Anders, bioRxiv 2019
Related Skills
— Prepare normalized data for testingquantification
— Pathway enrichment of significant proteinsprot-enrichment
— PTM-level differential analysisptm