LLMs-Universal-Life-Science-and-Clinical-Skills- metabolomics-xcms-preprocessing
install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Metabolomics/metabolomics-xcms-preprocessing" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-metabolomics-xcms- && rm -rf "$T"
manifest:
Skills/Metabolomics/metabolomics-xcms-preprocessing/SKILL.mdsource content
🧪 XCMS Metabolomics Preprocessing
XCMS3 workflow for untargeted LC-MS/GC-MS metabolomics. Requires Bioconductor 3.18+ with xcms 4.0+ and MSnbase 2.28+.
Core Capabilities
- Peak detection: CentWave (centroided) or MatchedFilter (profile data)
- RT alignment: Obiwarp dynamic time warping for retention time correction
- Peak correspondence: Density-based grouping across samples
- Gap filling: Recover missing values by region integration
- CAMERA annotation: Isotope pattern and adduct group identification
CLI Reference
python omicsclaw.py run xcms-preprocess --demo python omicsclaw.py run xcms-preprocess --input <raw_data/> --output <dir>
Algorithm / Methodology
Load Raw Data
library(xcms) library(MSnbase) raw_files <- list.files('raw_data', pattern = '\\.(mzML|mzXML)$', full.names = TRUE) raw_data <- readMSData(raw_files, mode = 'onDisk')
Define Sample Groups
sample_info <- data.frame( sample_name = basename(raw_files), sample_group = c(rep('Control', 5), rep('Treatment', 5), rep('QC', 3)), injection_order = 1:length(raw_files) ) pData(raw_data) <- sample_info
Peak Detection (CentWave — Centroided Data)
cwp <- CentWaveParam( peakwidth = c(5, 30), # Peak width range in seconds ppm = 15, # m/z tolerance snthresh = 10, # Signal-to-noise threshold prefilter = c(3, 1000), # Min peaks and intensity mzdiff = 0.01, # Minimum m/z difference noise = 1000, # Noise level integrate = 1 # Integration method ) xdata <- findChromPeaks(raw_data, param = cwp) cat('Peaks found:', nrow(chromPeaks(xdata)), '\n')
Peak Detection (MatchedFilter — Profile Data)
mfp <- MatchedFilterParam( binSize = 0.1, fwhm = 30, snthresh = 10, step = 0.1, mzdiff = 0.8 ) xdata_profile <- findChromPeaks(raw_data, param = mfp)
Retention Time Alignment (Obiwarp)
obp <- ObiwarpParam( binSize = 0.5, response = 1, distFun = 'cor_opt', gapInit = 0.3, gapExtend = 2.4 ) xdata <- adjustRtime(xdata, param = obp) plotAdjustedRtime(xdata)
Peak Correspondence (Grouping)
pdp <- PeakDensityParam( sampleGroups = pData(xdata)$sample_group, bw = 5, # RT bandwidth minFraction = 0.5, # Min fraction of samples minSamples = 1, # Min samples per group binSize = 0.025 # m/z bin size ) xdata <- groupChromPeaks(xdata, param = pdp) cat('Features:', nrow(featureDefinitions(xdata)), '\n')
Gap Filling
fpp <- ChromPeakAreaParam() xdata <- fillChromPeaks(xdata, param = fpp)
Extract Feature Table
feature_values <- featureValues(xdata, method = 'maxint', value = 'into') feature_defs <- as.data.frame(featureDefinitions(xdata)) feature_defs$feature_id <- rownames(feature_defs) feature_table <- cbind(feature_defs[, c('feature_id', 'mzmed', 'rtmed')], feature_values) write.csv(feature_table, 'feature_table.csv', row.names = FALSE)
CAMERA Annotation (Isotopes/Adducts)
library(CAMERA) xsa <- xsAnnotate(as(xdata, 'xcmsSet')) xsa <- groupFWHM(xsa, perfwhm = 0.6) xsa <- findIsotopes(xsa, mzabs = 0.01, ppm = 10) xsa <- findAdducts(xsa, polarity = 'positive') camera_results <- getPeaklist(xsa)
Quality Control
# TIC for each sample tic <- chromatogram(raw_data, aggregationFun = 'sum') plot(tic) # Peak count per sample peak_counts <- table(chromPeaks(xdata)[, 'sample']) barplot(peak_counts, main = 'Peaks per sample') # PCA of features library(pcaMethods) log_values <- log2(feature_values + 1) log_values[is.na(log_values)] <- 0 pca <- pca(t(log_values), nPcs = 3, method = 'ppca') plotPcs(pca, col = as.factor(pData(xdata)$sample_group))
Export for MetaboAnalyst
export_data <- t(feature_values) colnames(export_data) <- paste0('M', round(feature_defs$mzmed, 4), 'T', round(feature_defs$rtmed, 1)) export_df <- data.frame(Sample = rownames(export_data), Group = pData(xdata)$sample_group, export_data) write.csv(export_df, 'metaboanalyst_input.csv', row.names = FALSE)
Parameters
| Parameter | Default | Description |
|---|---|---|
| | centwave or matchedfilter |
| | m/z tolerance (ppm) |
| | Peak width range (seconds) |
| | Signal-to-noise threshold |
| | RT alignment method |
Why This Exists
- Without it: Raw mzML/mzXML profiles are just unaligned 3D data clouds of m/z, intensity, and time
- With it: Sophisticated algorithms detect true chemical peaks, correct temporal drift (Obiwarp), and map features across runs
- Why OmicsClaw: Avoids verbose R scripts with a hyper-optimized parameter wrapper for XCMS3
Workflow
- Calculate: Fit moving mathematical wavelets to detect peaks (CentWave).
- Execute: Align retention times dynamically across all mass spec runs.
- Assess: Group congruent features and integrate signal caps for missing zones.
- Generate: Output structural correspondence matrices (feature tables).
- Report: Synthesize Total Ion Chromatogram (TIC) and aligned retention deviation plots.
Example Queries
- "Preprocess these mzML files using XCMS"
- "Detect peaks and align retention times"
Output Structure
output_directory/ ├── report.md ├── result.json ├── feature_table.csv ├── figures/ │ ├── tic_overlay.png │ └── retention_deviation.png ├── tables/ │ └── grouped_features.csv └── reproducibility/ ├── commands.sh ├── environment.yml └── checksums.sha256
Safety
- Local-first: Strict offline processing without external upload.
- Disclaimer: Requires OmicsClaw reporting structures and disclaimers.
- Audit trail: Hyperparameters and operational flow states are logged fully.
Integration with Orchestrator
Trigger conditions:
- Automatically invoked dynamically based on tool metadata and user intent matching.
Chaining partners:
— Downstream data scalingmet-normalize
— Downstream explicit matching to spectramet-annotate
Version Compatibility
Reference examples tested with: xcms 4.0+, MSnbase 2.28+
Dependencies
Required: xcms, MSnbase (R/Bioconductor) Optional: CAMERA, pcaMethods
Citations
- XCMS — Smith et al., Analytical Chemistry 2006
- CentWave — Tautenhahn et al., BMC Bioinformatics 2008
- CAMERA — Kuhl et al., Analytical Chemistry 2012
Related Skills
— Identify metabolitesmet-annotate
— Normalize feature tablemet-normalize
— Differential analysismet-diff