LLMs-Universal-Life-Science-and-Clinical-Skills- single-cell-rna-qc

<!--

install

source · Clone the upstream repo

git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/External_Collections/life-sciences_Claudeai-main/single-cell-rna-qc" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-single-cell-rna-qc-63dcd2 && rm -rf "$T"

manifest: Skills/External_Collections/life-sciences_Claudeai-main/single-cell-rna-qc/SKILL.md

source content

name: single-cell-rna-qc description: Perform quality control on single-cell RNA-seq data (.h5ad, 10x .h5, or 10x directories) using scverse best practices, MAD-based filtering (log1p counts/genes, high-tail MT%), and generate filtered AnnData plus QC plots and summary JSON. Use when users request scRNA-seq QC, filtering low-quality cells, data quality assessment, or QC visualizations. measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

read_file
run_shell_command

Single-Cell RNA-seq Quality Control

Automated QC workflow for single-cell RNA-seq data following scverse best practices.

Approach 1: Complete QC Pipeline (Recommended)

Use the convenience script

scripts/qc_analysis.py

for end-to-end QC:

python3 scripts/qc_analysis.py input.h5ad
python3 scripts/qc_analysis.py raw_feature_bc_matrix.h5
python3 scripts/qc_analysis.py /path/to/10x_directory/

When to use this approach:

Standard QC workflow with dataset-adaptive thresholds
Batch processing or quick exploratory analysis
Users want a reproducible, one-command pipeline

Key parameters:

```
--mad-counts
```
,
```
--mad-genes
```
,
```
--mad-mt
```
- MAD thresholds
```
--mt-threshold
```
- Hard mitochondrial % cutoff
```
--min-cells
```
- Gene filtering threshold
```
--mt-pattern
```
,
```
--ribo-pattern
```
,
```
--hb-pattern
```
- Gene patterns
```
--no-log1p
```
- Disable log1p transform for MAD on counts/genes

Outputs (in

<input_basename>_qc_results/

by default):

```
qc_metrics_before_filtering.png
```
```
qc_filtering_thresholds.png
```
```
qc_metrics_after_filtering.png
```
```
<input_basename>_filtered.h5ad
```
```
<input_basename>_with_qc.h5ad
```
```
qc_summary.json
```

Approach 2: Modular Building Blocks (Custom Workflows)

For custom analysis workflows, use functions from

scripts/qc_core.py

and

scripts/qc_plotting.py

import anndata as ad
from qc_core import calculate_qc_metrics, build_qc_masks, filter_cells

adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)

masks = build_qc_masks(
    adata,
    mad_counts=5,
    mad_genes=5,
    mad_mt=3,
    mt_threshold=8,
    counts_transform='log1p',
    genes_transform='log1p'
)

adata_filtered = filter_cells(adata, masks['pass_qc'])

When to use this approach:

Non-standard filtering logic (subset-specific thresholds)
Partial execution (metrics only, plots only)
Integration with larger pipelines

Best Practices

Use log1p for counts/genes - Stabilizes MAD thresholds for heavy-tailed distributions.
Filter high MT% only - Low MT% is usually not problematic.
Inspect plots - Validate that filtering aligns with biology and tissue context.
Be permissive by default - Preserve rare cell populations; filter further later if needed.

Reference Materials

For deeper rationale, parameter guidance, and troubleshooting, see:

```
references/scverse_qc_guidelines.md
```

Next Steps After QC

Ambient RNA correction (SoupX, CellBender)
Doublet detection (scDblFinder, scrublet)
Normalization and downstream analysis