BioClaw proteomics

Mass spectrometry proteomics QC, quantification, comparative analysis, and export for DDA, DIA, and protein-level result tables.

install
source · Clone the upstream repo
git clone https://github.com/Runchuan-BU/BioClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Runchuan-BU/BioClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/container/skills/proteomics" ~/.claude/skills/runchuan-bu-bioclaw-proteomics && rm -rf "$T"
manifest: container/skills/proteomics/SKILL.md
source content

Proteomics

Version Compatibility

Reference examples assume:

  • pyopenms
    3.0+
  • pandas
    2.2+
  • numpy
    1.26+
  • seaborn
    0.13+

Overview

Use this skill when the user needs:

  • proteomics QC
  • protein table cleanup
  • replicate review
  • differential abundance analysis
  • publication-ready proteomics figures

When To Use This Skill

  • MaxQuant, FragPipe, DIA-NN, or similar outputs exist
  • the task is protein-level quantification or comparative proteomics
  • missingness, batch effects, and replicate quality need review before interpretation

Quick Route

  • DDA and DIA should not be treated identically
  • protein-level tables should remain distinct from peptide-level tables
  • QC comes before differential analysis

Progressive Disclosure

Expected Inputs

  • protein or peptide result table
  • sample metadata
  • assay context: DDA, DIA, PTM-enriched, or targeted

Expected Outputs

  • results/protein_abundance.tsv
  • qc/proteomics_qc_summary.tsv
  • figures/correlation_heatmap.pdf
  • figures/missingness.pdf
  • results/differential_proteins.tsv

Starter Pattern

import pandas as pd

protein_df = pd.read_csv("protein_groups.tsv", sep="\t")
sample_cols = [c for c in protein_df.columns if c.startswith("LFQ intensity")]
matrix = protein_df[sample_cols].replace(0, pd.NA)
qc = pd.DataFrame({
    "n_proteins": matrix.notna().sum(),
    "missing_pct": matrix.isna().mean() * 100,
})
qc.to_csv("qc/proteomics_qc_summary.tsv", sep="\t")

Workflow

1. Clarify assay and table level

  • DDA versus DIA
  • peptide versus protein table
  • PTM-enriched versus unenriched data

2. Run QC before comparisons

Inspect:

  • missingness
  • replicate correlation
  • batch effects
  • intensity distributions

3. Normalize and summarize consistently

Keep the normalization approach explicit and do not collapse peptides into proteins without documenting the rule.

4. Perform comparative analysis

Use replicate-aware differential abundance with clear filtering and missingness policy.

5. Export interpretable artifacts

Save both the cleaned abundance matrix and the differential results table.

Output Artifacts

results/
├── protein_abundance.tsv
└── differential_proteins.tsv
qc/
└── proteomics_qc_summary.tsv
figures/
├── correlation_heatmap.pdf
├── missingness.pdf
└── intensity_density.pdf

Quality Review

  • overall missingness
    > 30%
    should trigger caution
  • technical replicate correlation should usually be
    > 0.9
  • biological replicate correlation much below
    0.8
    deserves review
  • do not trust differential calls before batch structure and missingness are understood

Anti-Patterns

  • mixing peptide and protein tables in one downstream matrix
  • running differential abundance before QC
  • ignoring missingness patterns
  • hiding whether values are raw, normalized, or imputed

Related Skills

  • Metabolomics
  • Structural Biology

Optional Supplements

  • pyopenms