LLMs-Universal-Life-Science-and-Clinical-Skills- claude
Single-Cell RNA-seq Quality Control
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/platform_legacy/output/claude" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-claude-485d3d && rm -rf "$T"
platform_legacy/output/claude/SKILL.mdSingle-Cell RNA-seq Quality Control
This skill performs comprehensive quality control on single-cell RNA-seq data using MAD-based filtering. It calculates standard QC metrics (total counts, genes detected, mitochondrial percentage), identifies outliers using Median Absolute Deviation, and generates publication-ready visualizations. Follows scverse consortium best practices.
Capabilities
calculate_qc_metrics
Calculate standard QC metrics for each cell
Inputs:
(AnnData) (required): Annotated data matrix with raw countsadata
(string) (optional): Prefix for mitochondrial genes [default: MT-]mito_prefix
Outputs:
(AnnData): AnnData with QC metrics in .obsadata
(DataFrame): Summary statistics of QC metricsqc_summary
filter_cells
Filter cells based on QC metrics using MAD-based thresholds
Inputs:
(AnnData) (required): AnnData with QC metrics calculatedadata
(integer) (required): Minimum genes per cell [default: 200]min_genes
(integer) (required): Maximum genes per cell (None for MAD-based) [default: None]max_genes
(float) (required): Maximum mitochondrial percentage [default: 20.0]max_mito_pct
(float) (required): Number of MADs for outlier detection [default: 5.0]n_mads
Outputs:
(AnnData): Filtered AnnData objectfiltered_adata
(dict): Report with cells removed per criterionfiltering_report
generate_qc_plots
Generate comprehensive QC visualization plots
Inputs:
(AnnData) (required): AnnData with QC metricsadata
(string) (required): Directory to save plots [default: ./qc_plots]output_dir
Outputs:
(list): Paths to generated plot filesplot_paths
System Prompt
You are an expert bioinformatician specializing in single-cell RNA sequencing analysis. You follow scverse best practices and understand the biology behind QC metrics. Key principles: 1. Always examine data distributions before setting hard thresholds 2. Use MAD-based filtering when possible for adaptive thresholds 3. Consider tissue/cell type when interpreting mitochondrial content 4. Document all filtering decisions for reproducibility 5. Generate visualizations to support QC decisions
Examples
Example 1
User: Run QC on my PBMC dataset at /data/pbmc_10k.h5ad
Assistant: I'll perform comprehensive QC on your PBMC dataset. Let me:
- First, calculate QC metrics (total counts, genes detected, mito %)
- Examine the distributions to set appropriate thresholds
- Apply MAD-based filtering (5 MADs from median)
- Generate visualization plots
[Executes QC pipeline...]
Results:
- Initial cells: 10,000
- Cells removed (low genes): 234
- Cells removed (high mito): 156
- Final cells: 9,610 (96.1% retained)
The filtering thresholds used:
- Min genes: 200
- Max genes: 5,847 (median + 5*MAD)
- Max mito %: 15.2% (median + 5*MAD)
Dependencies
Python Packages
scanpy>=1.9.0 anndata>=0.8.0 matplotlib>=3.5.0 seaborn>=0.12.0 pandas>=1.5.0 numpy>=1.23.0
Skill ID:
biomedical.genomics.single_cell_qc
Version: 1.0.0
Category: genomics/single_cell
License: MIT
Generated: 2025-12-28