git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/External_Collections/life-sciences_Claudeai-main/single-cell-rna-qc" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-single-cell-rna-qc-63dcd2 && rm -rf "$T"
Skills/External_Collections/life-sciences_Claudeai-main/single-cell-rna-qc/SKILL.mdname: single-cell-rna-qc description: Perform quality control on single-cell RNA-seq data (.h5ad, 10x .h5, or 10x directories) using scverse best practices, MAD-based filtering (log1p counts/genes, high-tail MT%), and generate filtered AnnData plus QC plots and summary JSON. Use when users request scRNA-seq QC, filtering low-quality cells, data quality assessment, or QC visualizations. measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Single-Cell RNA-seq Quality Control
Automated QC workflow for single-cell RNA-seq data following scverse best practices.
Approach 1: Complete QC Pipeline (Recommended)
Use the convenience script
scripts/qc_analysis.py for end-to-end QC:
python3 scripts/qc_analysis.py input.h5ad python3 scripts/qc_analysis.py raw_feature_bc_matrix.h5 python3 scripts/qc_analysis.py /path/to/10x_directory/
When to use this approach:
- Standard QC workflow with dataset-adaptive thresholds
- Batch processing or quick exploratory analysis
- Users want a reproducible, one-command pipeline
Key parameters:
,--mad-counts
,--mad-genes
- MAD thresholds--mad-mt
- Hard mitochondrial % cutoff--mt-threshold
- Gene filtering threshold--min-cells
,--mt-pattern
,--ribo-pattern
- Gene patterns--hb-pattern
- Disable log1p transform for MAD on counts/genes--no-log1p
Outputs (in
by default):<input_basename>_qc_results/
qc_metrics_before_filtering.pngqc_filtering_thresholds.pngqc_metrics_after_filtering.png<input_basename>_filtered.h5ad<input_basename>_with_qc.h5adqc_summary.json
Approach 2: Modular Building Blocks (Custom Workflows)
For custom analysis workflows, use functions from
scripts/qc_core.py and scripts/qc_plotting.py:
import anndata as ad from qc_core import calculate_qc_metrics, build_qc_masks, filter_cells adata = ad.read_h5ad('input.h5ad') calculate_qc_metrics(adata, inplace=True) masks = build_qc_masks( adata, mad_counts=5, mad_genes=5, mad_mt=3, mt_threshold=8, counts_transform='log1p', genes_transform='log1p' ) adata_filtered = filter_cells(adata, masks['pass_qc'])
When to use this approach:
- Non-standard filtering logic (subset-specific thresholds)
- Partial execution (metrics only, plots only)
- Integration with larger pipelines
Best Practices
- Use log1p for counts/genes - Stabilizes MAD thresholds for heavy-tailed distributions.
- Filter high MT% only - Low MT% is usually not problematic.
- Inspect plots - Validate that filtering aligns with biology and tissue context.
- Be permissive by default - Preserve rare cell populations; filter further later if needed.
Reference Materials
For deeper rationale, parameter guidance, and troubleshooting, see:
references/scverse_qc_guidelines.md
Next Steps After QC
- Ambient RNA correction (SoupX, CellBender)
- Doublet detection (scDblFinder, scrublet)
- Normalization and downstream analysis