LLMs-Universal-Life-Science-and-Clinical-Skills- bulkrna-deconvolution
install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Transcriptomics/bulkrna-deconvolution" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-bulkrna-deconvolut && rm -rf "$T"
manifest:
Skills/Transcriptomics/bulkrna-deconvolution/SKILL.mdsource content
Bulk RNA-seq Cell Type Deconvolution
Estimate cell type proportions from bulk RNA-seq expression data using non-negative least squares (NNLS). Given a bulk count matrix and a cell type signature matrix (marker gene expression profiles), the skill solves for the mixture coefficients that best reconstruct each sample's expression profile and normalizes them to proportions summing to 1.
CLI Reference
python omicsclaw.py run bulkrna-deconvolution --demo python omicsclaw.py run bulkrna-deconvolution --input <counts.csv> --output <dir> --reference <signature.csv> python bulkrna_deconvolution.py --input counts.csv --output results/ --reference signature.csv python bulkrna_deconvolution.py --demo --output /tmp/deconv_demo
Why This Exists
- Without it: Researchers must install and configure external deconvolution tools (CIBERSORTx web portal, MuSiC R package), each with its own data format requirements and authentication hurdles, just to get cell type proportions from bulk RNA-seq.
- With it: A single Python command runs NNLS-based deconvolution locally, produces publication-ready proportion charts, and exports per-sample cell type tables ready for downstream analysis.
- Why OmicsClaw: Wraps the mathematically principled NNLS approach into the OmicsClaw reporting framework with zero external dependencies beyond scipy, while documenting how to bridge to CIBERSORTx or MuSiC when higher accuracy is needed.
Workflow
- Load: Read the bulk count matrix (genes x samples CSV) and the cell type signature matrix (genes x cell_types CSV).
- Intersect: Find shared genes between bulk data and signature matrix; subset both to shared features.
- Deconvolve: For each sample, solve the NNLS problem
where A is the signature matrix, b is the sample expression vector, and x is the non-negative proportion vector.min ||Ax - b|| - Normalize: Scale each sample's proportion vector to sum to 1.
- Summarize: Identify the dominant cell type per sample and compute mean proportions across all samples.
- Visualize: Generate stacked bar chart, heatmap, and pie chart of cell type proportions.
- Report: Write markdown report, result.json, proportion tables, and a reproducibility script.
Example Queries
- "Deconvolve my bulk RNA-seq data to get cell type proportions"
- "Run NNLS deconvolution with this signature matrix"
- "Estimate cell type fractions from bulk expression"
- "What cell types are in my bulk RNA samples?"
Output Structure
output_directory/ ├── report.md ├── result.json ├── figures/ │ ├── proportions_stacked.png │ ├── proportions_heatmap.png │ └── mean_proportions_pie.png ├── tables/ │ ├── proportions.csv │ └── dominant_types.csv └── reproducibility/ └── commands.sh
Safety
- Local-first: All computation runs locally via scipy NNLS; no data is uploaded to external services.
- Disclaimer: Every report includes the standard OmicsClaw disclaimer.
- Audit trail: Parameters, gene intersection size, and input checksums are recorded in result.json.
Integration with Orchestrator
Trigger conditions:
- Automatically invoked when user intent matches bulk deconvolution, cell type proportion, or NNLS keywords.
Chaining partners:
-- Upstream: count matrix generation from aligned readsbulkrna-alignment
-- Upstream/parallel: differential expression identifies condition-specific cell type shiftsbulkrna-de
-- Downstream: pathway enrichment on cell-type-specific gene setsbulkrna-enrichment
Parameters
| Parameter | Default | Description |
|---|---|---|
| (required) | Path to bulk count matrix CSV (genes x samples) |
| (required) | Output directory |
| (none) | Path to signature matrix CSV (genes x cell_types) |
| false | Run with built-in demo data |
Signature Matrix Format
The reference signature matrix must be a CSV with:
- Rows: genes (first column is gene identifiers)
- Columns: cell types (each column header is a cell type name)
- Values: average expression levels for each gene in each cell type
Bridging to External Tools
While NNLS provides a solid baseline, more sophisticated methods exist:
- CIBERSORTx: Upload signature and mixture matrices to the CIBERSORTx web portal for support-vector-regression-based deconvolution with batch correction. Requires free academic registration.
- MuSiC: R/Bioconductor package that uses multi-subject single-cell reference data with variance weighting. Requires rpy2 bridge (not bundled).
Version Compatibility
Reference examples tested with: scipy 1.11+, pandas 2.0+, numpy 1.24+, matplotlib 3.7+
Dependencies
Required: numpy, pandas, scipy, matplotlib Optional: none (NNLS is built into scipy)
Citations
- NNLS -- Lawson & Hanson, Solving Least Squares Problems, 1995
- CIBERSORTx -- Newman et al., Nature Biotechnology 2019
- MuSiC -- Wang et al., Nature Communications 2019
Related Skills
-- Count matrix QC upstreambulkrna-alignment
-- Differential expression analysisbulkrna-de
-- Pathway enrichment of gene sets downstreambulkrna-enrichment
-- Spatial transcriptomics deconvolution (CARD, Cell2Location)spatial-deconvolution