Skills single-cell-rnaseq-pipeline
Generate single-cell RNA-seq analysis code templates for Seurat and Scanpy,
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/aipoch-ai/single-cell-rnaseq-pipeline" ~/.claude/skills/clawdbot-skills-single-cell-rnaseq-pipeline && rm -rf "$T"
manifest:
skills/aipoch-ai/single-cell-rnaseq-pipeline/SKILL.mdsource content
Single-Cell RNA-seq Pipeline
Overview
Generate comprehensive single-cell RNA-seq analysis code templates for Seurat (R) and Scanpy (Python). This skill provides ready-to-use code frameworks for preprocessing, quality control, normalization, clustering, marker identification, visualization, and advanced analyses like batch correction and trajectory inference.
Technical Difficulty: High
When to Use
- Building scRNA-seq analysis pipelines from raw count matrices
- Need standardized QC and preprocessing workflows
- Performing batch correction across multiple samples/datasets
- Running dimensionality reduction and clustering
- Identifying cell type-specific marker genes
- Creating publication-ready visualizations (UMAP, violin plots, heatmaps)
- Conducting trajectory inference (pseudotime analysis)
- Comparing cell populations between conditions
Core Features
Seurat (R) Templates
- Data Loading: 10x Genomics, H5AD, Cell Ranger outputs
- QC Metrics: Mitochondrial content, gene counts, doublet detection
- Normalization: Log-normalization, SCTransform
- Integration: Harmony, RPCA, CCA for batch correction
- Clustering: Graph-based clustering with optimization
- Visualization: UMAP, t-SNE, feature plots, dot plots
- Marker Analysis: Wilcoxon tests, conserved markers
- Differential Expression: FindAllMarkers, FindConservedMarkers
- Cell Typing: Reference-based annotation with SingleR/Azimuth
Scanpy (Python) Templates
- Data Loading: AnnData, 10x, CSV, loom files
- QC Workflow: Comprehensive filtering and metrics
- Normalization: Log1p, scran, Combat batch correction
- Integration: scVI, Scanorama, BBKNN
- Clustering: Leiden/Louvain with resolution sweep
- Visualization: UMAP, PAGA, embeddings
- Marker Analysis: rank_genes_groups, filter markers
- Trajectory: PAGA, diffusion pseudotime (DPT)
- CellChat/CellPhoneDB: Cell-cell communication
Usage
Generate Seurat Template
python scripts/main.py --tool seurat --output seurat_analysis.R --species human
Generate Scanpy Template
python scripts/main.py --tool scanpy --output scanpy_analysis.py --species mouse
Generate Both Templates
python scripts/main.py --tool both --output scrna_pipeline --species human --batch-correction harmony --trajectory true
Command-Line Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| --tool | string | Yes | Analysis tool: , , or |
| --output | string | Yes | Output file or directory path |
| --species | string | No | Species: or (default: human) |
| --batch-correction | string | No | Method: , , , , |
| --trajectory | bool | No | Include trajectory analysis (default: false) |
| --cell-communication | bool | No | Include cell-cell communication (default: false) |
| --de-analysis | bool | No | Include differential expression (default: false) |
| --spatial | bool | No | Include spatial transcriptomics (default: false) |
Output Structure
output/ ├── seurat/ │ ├── 01_load_and_qc.R │ ├── 02_normalize_integrate.R │ ├── 03_cluster_annotate.R │ ├── 04_visualize.R │ └── 05_de_analysis.R (if --de-analysis) ├── scanpy/ │ ├── 01_load_qc.py │ ├── 02_normalize_integrate.py │ ├── 03_cluster_annotate.py │ ├── 04_visualize.py │ └── 05_trajectory.py (if --trajectory) └── README.md
Technical Details
Supported Input Formats
- 10x Genomics Cell Ranger outputs (barcodes.tsv, features.tsv, matrix.mtx)
- H5AD (AnnData h5 format)
- Seurat RDS objects
- CSV/TSV count matrices
- HDF5 files
QC Parameters (Default)
| Metric | Human | Mouse |
|---|---|---|
| min_genes | 200 | 200 |
| max_genes | 25000 | 25000 |
| min_cells | 3 | 3 |
| max_mt_percent | 20% | 20% |
| doublet_threshold | Auto | Auto |
Clustering Resolution Guidelines
- 0.4-0.6: Broad cell types
- 0.8-1.2: Subtypes
- 1.5-2.0: Fine populations
Batch Correction Recommendations
| Scenario | Seurat | Scanpy |
|---|---|---|
| Small batches (<5) | Harmony | Harmony |
| Large batches | RPCA | Scanorama |
| Complex variation | CCA | scVI |
Code Examples
Seurat Quick Start
# Load data seurat_obj <- CreateSeuratObject(counts = raw_data, project = "Sample") # QC seurat_obj[["percent.mt"]] <- PercentageFeatureSet(seurat_obj, pattern = "^MT-") seurat_obj <- subset(seurat_obj, subset = nFeature_RNA > 200 & percent.mt < 20) # Normalize seurat_obj <- NormalizeData(seurat_obj) seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000) # Scale and PCA seurat_obj <- ScaleData(seurat_obj) seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj)) # Cluster seurat_obj <- FindNeighbors(seurat_obj, dims = 1:30) seurat_obj <- FindClusters(seurat_obj, resolution = 1.0) seurat_obj <- RunUMAP(seurat_obj, dims = 1:30) # Visualize DimPlot(seurat_obj, reduction = "umap", label = TRUE) FeaturePlot(seurat_obj, features = c("CD3E", "CD14", "CD79A"))
Scanpy Quick Start
import scanpy as sc # Load data adata = sc.read_10x_mtx("filtered_gene_bc_matrices/") # QC sc.pp.filter_cells(adata, min_genes=200) sc.pp.filter_genes(adata, min_cells=3) adata.var['mt'] = adata.var_names.str.startswith('MT-') sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, inplace=True) adata = adata[adata.obs.pct_counts_mt < 20, :] # Normalize sc.pp.normalize_total(adata, target_sum=1e4) sc.pp.log1p(adata) sc.pp.highly_variable_genes(adata, n_top_genes=2000) # PCA and UMAP sc.pp.scale(adata) sc.tl.pca(adata, svd_solver='arpack') sc.pp.neighbors(adata, n_neighbors=15, n_pcs=30) sc.tl.umap(adata) sc.tl.leiden(adata, resolution=1.0) # Visualize sc.pl.umap(adata, color=['leiden', 'total_counts']) sc.pl.dotplot(adata, var_names=['CD3E', 'CD14', 'CD79A'], groupby='leiden')
References
- Complete Seurat analysis templatereferences/seurat_template.R
- Complete Scanpy analysis templatereferences/scanpy_template.py
- Batch correction comparisonreferences/batch_correction_guide.md
- Python dependenciesrequirements.txt
Dependencies
Seurat (R)
install.packages(c("Seurat", "SeuratObject", "tidyverse", "patchwork")) # Optional remotes::install_github("satijalab/seurat-wrappers") remotes::install_github("immunogenomics/harmony") BiocManager::install("SingleR")
Scanpy (Python)
pip install scanpy leidenalg scvi-tools cellchatpy
Testing
Run basic validation:
cd scripts python test_main.py
Error Handling
All errors return semantic messages:
{ "status": "error", "error": { "type": "invalid_parameter", "message": "Unsupported batch correction method: 'xyz'", "suggestion": "Use one of: harmony, rpca, cca, scanorama, scvi" } }
Safety & Compliance
- No external API calls
- All code templates are self-contained
- No hardcoded credentials or paths
- Templates use relative paths for data
- Default parameters are conservative for safety
Citation
If using generated templates in publications:
- Seurat: Satija Lab, Nature Biotechnology 2015
- Scanpy: Wolf et al., Genome Biology 2018
- scVI: Lopez et al., Nature Methods 2018
- Harmony: Korsunsky et al., Nature Methods 2019
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
Security Checklist
- No hardcoded credentials or API keys
- No unauthorized file system access (../)
- Output does not expose sensitive information
- Prompt injection protections in place
- Input file paths validated (no ../ traversal)
- Output directory restricted to workspace
- Script execution in sandboxed environment
- Error messages sanitized (no stack traces exposed)
- Dependencies audited
Prerequisites
# Python dependencies pip install -r requirements.txt
Evaluation Criteria
Success Metrics
- Successfully executes main functionality
- Output meets quality standards
- Handles edge cases gracefully
- Performance is acceptable
Test Cases
- Basic Functionality: Standard input → Expected output
- Edge Case: Invalid input → Graceful error handling
- Performance: Large dataset → Acceptable processing time
Lifecycle Status
- Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Performance optimization
- Additional feature support