LLMs-Universal-Life-Science-and-Clinical-Skills- spatial-de
install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Transcriptomics/sc-de" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-spatial-de-4f912d && rm -rf "$T"
manifest:
Skills/Transcriptomics/sc-de/SKILL.mdsource content
🧬 Single-Cell Differential Expression
You are SC DE, the differential expression and marker gene discovery skill for single-cell data. Your role is to identify genes that define cell populations or respond to experimental conditions.
Why This Exists
- Without it: Users manually configure DE tests with inconsistent parameters, often overlooking fold-change artifacts in log-normalized single-cell data.
- With it: One command securely discovers robust markers per cluster or between any two biological groups using statistically sound methods.
- Why OmicsClaw: Standardized reporting includes Volcano plots, MA plots, and ranked marker tables out-of-the-box. Includes advanced pseudo-bulk testing for multi-sample designs.
Core Capabilities
- Cluster-vs-rest markers: Identify defining genes for every cell cluster (Wilcoxon, t-test, or MAST).
- Two-group pairwise comparison: Compute DEGs between two conditions (e.g., Control vs Treatment) within specific cell types.
- Pseudo-bulk DE: Aggregate counts across cells per sample/bio-rep for robust condition testing (DESeq2/edgeR integration).
- Comprehensive Visualisation: Dotplots, Volcano plots, rank-genes group plots, and heatmaps.
Input Formats
| Format | Extension | Required Fields | Example |
|---|---|---|---|
| AnnData (preprocessed) | | (log1p normalized), (raw counts needed for pseudo-bulk/MAST) | |
Workflow
- Validate: Verify if grouping variables exist and data matrices contain expected values.
- Execute: Run selected DE statistical test.
- Filter: Apply Log2FC and p-value Thresholds to identify significant up/down genes.
- Generate: Save updated h5ad, generate plots (Volcano, dotplot).
- Report: Write
with top hit summarization.report.md
CLI Reference
# Cluster vs Rest (FindAllMarkers) python skills/singlecell/de/sc_de.py \ --input <processed.h5ad> --groupby leiden --output <dir> # Pairwise comparison within a cell type python skills/singlecell/de/sc_de.py \ --input <annotated.h5ad> --output <dir> --groupby condition \ --group1 Treatment --group2 Control --subset-col cell_type --subset-val "CD8 T cells" # Demo mode python omicsclaw.py run sc-de --demo
Algorithm / Methodology
1. Cluster vs Rest (Scanpy - Python)
Goal: Identify marker genes for each cluster against all other cells using Wilcoxon Rank-Sum.
import scanpy as sc # Ensure data is log1p normalized adata = sc.read_h5ad('annotated.h5ad') # Compute marker genes sc.tl.rank_genes_groups(adata, groupby='cell_type', method='wilcoxon') # Visualise results sc.pl.rank_genes_groups(adata, n_genes=20, sharey=False) sc.pl.rank_genes_groups_dotplot(adata, n_genes=4)
2. Pairwise Condition Testing (Scanpy - Python)
Goal: Compare two conditions (Treatment vs Control) constrained to a specific cell subset.
import scanpy as sc adata = sc.read_h5ad('annotated.h5ad') # Filter to specific cell type adata_sub = adata[adata.obs['cell_type'] == 'CD8 T cells'].copy() # Run differential expression sc.tl.rank_genes_groups(adata_sub, groupby='condition', groups=['Treatment'], reference='Control', method='wilcoxon') # Extract result dataframe import pandas as pd result = sc.get.rank_genes_groups_df(adata_sub, group='Treatment') sig_degs = result[(result['pvals_adj'] < 0.05) & (result['logfoldchanges'].abs() > 0.5)]
3. Pseudo-bulk DE (Python/R integration)
Goal: Address biological replicates properly by aggregating counts at the sample level before DE testing.
import scanpy as sc import decoupler as dc adata = sc.read_h5ad('annotated.h5ad') # Pseudo-bulking by Sample, condition, and cell_type pdata = dc.get_pseudobulk( adata, sample_col='sample_id', groups_col='cell_type', mode='sum', min_cells=10, min_counts=1000 ) # Run DESeq2 or edgeR equivalent via PyDESeq2 or standard formulas # (Pseudo-code for edgeR/DESeq2 workflow follows from pdata)
Parameters
| Parameter | Default | Description |
|---|---|---|
| | Column in defining the groups |
| | Test: wilcoxon, t-test, or mast |
| none | Pairwise group A (e.g. Treatment) |
| none | Pairwise group B (reference, e.g. Control) |
| none | Optional obs column to subset cells before testing |
| none | Value in subset-col to keep |
Example Queries
- "Find marker genes for all clusters in this h5ad"
- "Compute differential expression between Tumor and Normal states for B cells"
- "Run a Wilcoxon test on the 'condition' column comparing DrugA to Vehicle"
Output Structure
output_dir/ ├── report.md ├── result.json ├── figures/ │ ├── marker_dotplot.png │ ├── volcano_plot.png │ └── rank_genes_groups.png ├── tables/ │ ├── markers_top.csv │ └── de_full_statistics.csv └── reproducibility/ ├── commands.sh ├── environment.yml └── checksums.sha256
Dependencies
Required: scanpy >= 1.9, pandas, matplotlib Optional: decoupler, pydeseq2, rpy2
Safety
- Local-first: No data upload.
- Disclaimer: Every DE report includes the OmicsClaw disclaimer regarding valid statistical assertions.
- Audit trail: Log the exact p-value and Log2FC formulas applied in the bundle.
Integration with Orchestrator
Trigger conditions:
- Expressions like "DE", "differential expression", "compare condition", "find markers".
Chaining partners:
: Requires cell types to exist before performing meaningful condition comparisons.sc-annotate
: Identify DE ligands or receptors.sc-communication
Citations
- Scanpy — Wolf et al., Genome Biology 2018
- Wilcoxon rank-sum
- MAST — Finak et al., Genome Biology 2015
- Decoupler (Pseudo-bulk) — Badia-i-Mompel et al., Bioinformatics 2022