Claude-skill-registry clustermarkers

Finds differentially expressed genes (markers) for clusters of T/B cells using Seurat's FindMarkers function. Performs statistical testing between clusters, identifies cluster-defining genes, and automatically runs pathway enrichment analysis (via Enrichr) on significant markers. Generates publication-ready visualizations including volcano plots, dot plots, heatmaps, and enrichment plots.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/clustermarkers" ~/.claude/skills/majiayu000-claude-skill-registry-clustermarkers && rm -rf "$T"

manifest: skills/data/clustermarkers/SKILL.md

ClusterMarkers Process Configuration

Purpose

When to Use

After SeuratClustering: Essential for cluster interpretation and annotation
Cluster annotation: Identify marker genes to assign biological meaning to clusters
Publication preparation: Generate marker tables, volcano plots, and enrichment figures
Cell type characterization: Understand functional differences between cell populations
Comparative analysis: Compare clusters to find unique gene expression signatures

Configuration Structure

Process Enablement

[ClusterMarkers]
cache = true  # Cache results for faster re-runs with different visualizations

Input Specification

[ClusterMarkers.in]
srtobj = ["SeuratClustering"]  # Seurat object with cluster assignments

Environment Variables

Core Parameters

[ClusterMarkers.envs]
# Number of cores for parallel computation
ncores = 1  # int; Parallelize Seurat procedures

# Subset cells before marker finding (R expression)
subset = "seurat_clusters %in% c('c1', 'c2', 'c3')"  # Optional

# Cache location for intermediate results
cache = "/tmp"  # Path; Set to false to disable caching

# Assay to use for marker finding
assay = "RNA"  # Default: uses active assay

# Error on no markers found
error = false  # bool; If true, fail if no markers found

Statistical Test Selection

[ClusterMarkers.envs]
# Statistical test for differential expression
test.use = "wilcox"  # Default

Available tests:

```
"wilcox"
```
: Wilcoxon rank sum test (default, fast)
```
"wilcox_limma"
```
: Limma implementation (Seurat v4 compatibility)
```
"MAST"
```
: GLM with cellular detection rate covariate (recommended)
```
"DESeq2"
```
: Negative binomial model (robust, requires counts)
```
"roc"
```
: ROC analysis (AUC-based classification)
```
"t"
```
: Student's t-test
```
"tobit"
```
: Tobit test for censored data
```
"bimod"
```
: Likelihood-ratio test for bimodal expression
```
"poisson"
```
: Poisson distribution (UMI datasets only)
```
"negbinom"
```
: Negative binomial (UMI datasets only)
```
"LR"
```
: Logistic regression (latent.vars supported)

Test selection guidelines:

Default:
```
"wilcox"
```
for speed and reliability
Publication-quality:
```
"MAST"
```
for single-cell-specific modeling
Bulk-like DE:
```
"DESeq2"
```
for rigorous statistical testing
UMI data:
```
"negbinom"
```
or
```
"poisson"
```
for count-based models
Classification:
```
"roc"
```
for AUC-based marker ranking

Threshold Parameters (Seurat FindMarkers)

[ClusterMarkers.envs]
# Minimum log2 fold change threshold
logfc.threshold = 0.25  # float; Default: 0.25

# Minimum percentage of cells expressing gene
min.pct = 0.1  # float; Range: 0.0-1.0

# Minimum difference in detection percentage
min.diff.pct = -Inf  # float; Default: no limit

# Only positive markers (higher in ident.1 group)
only.pos = false  # bool; Default: false (both directions)

# Maximum cells per identity (downsampling)
max.cells.per.ident = Inf  # int; No downsampling by default

# Minimum cells expressing gene (poisson/negbinom tests)
min.cells.feature = 3  # int

# Minimum cells per group
min.cells.group = 3  # int

Note: Use

to replace

in parameter names (e.g.,

logfc.threshold

, not

logfc.threshold

)

Significant Markers Filter (for Enrichment)

[ClusterMarkers.envs]
# Filter markers for enrichment analysis (R expression)
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"  # Default

# Variables available: p_val, avg_log2FC, pct.1, pct.2, p_val_adj
# Example: "p_val_adj < 0.05 & abs(avg_log2FC) > 1" (both directions)

Enrichment Analysis

[ClusterMarkers.envs]
# Databases for pathway enrichment
dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020"]  # Default

# Enrichment style
enrich_style = "enrichr"  # Options: "enrichr", "clusterprofiler", "clusterProfiler"

Available databases (enrichit):

```
"KEGG_2021_Human"
```
,
```
"KEGG"
```
: KEGG pathways
```
"MSigDB_Hallmark_2020"
```
,
```
"Hallmark"
```
: MSigDB Hallmark gene sets
```
"GO_Biological_Process_2025"
```
: Gene Ontology Biological Process
```
"GO_Cellular_Component_2025"
```
: Gene Ontology Cellular Component
```
"GO_Molecular_Function_2025"
```
: Gene Ontology Molecular Function
```
"Reactome_Pathways_2024"
```
,
```
"Reactome"
```
: Reactome pathways
```
"WikiPathways_2024_Human"
```
,
```
"WikiPathways"
```
: WikiPathways
```
"BioCarta_2016"
```
: BioCarta pathways

More databases: https://maayanlab.cloud/Enrichr/#libraries

Visualization Parameters

[ClusterMarkers.envs]
# Marker plots configuration
marker_plots_defaults = {order_by = "desc(avg_log2FC)"}

# All markers plots (across clusters)
allmarker_plots = {"Top 10 markers of all clusters": {plot_type = "heatmap"}}

# Enrichment plots (all clusters)
allenrich_plots = {}  # Empty by default

# Marker plots (per cluster)
marker_plots = {}  # Default: volcano plots and dot plots

# Enrichment plots (per cluster)
enrich_plots = {}  # Default: bar plot

# Overlap analysis (venn/upset)
overlaps = {}  # Empty by default

External References

Seurat FindMarkers

https://satijalab.org/seurat/reference/findmarkers

Core differential expression function
Statistical tests: wilcox, MAST, DESeq2, ROC, t-test, etc.
Threshold parameters control sensitivity and speed

Enrichr Databases

https://maayanlab.cloud/Enrichr/#libraries

Comprehensive gene set enrichment collection
KEGG, GO, Reactome, MSigDB, WikiPathways

biopipen MarkersFinder

https://pwwang.github.io/biopipen/api/biopipen.ns.scrna/#biopipen.ns.scrna.MarkersFinder

Parent process with extended functionality
Visualization: biopipen.utils::VizDEGs, scplotter::EnrichmentPlot

Configuration Examples

Minimal Configuration

[ClusterMarkers]
[ClusterMarkers.in]
srtobj = ["SeuratClustering"]

Result: Default wilcox test, standard thresholds, hallmark + KEGG enrichment

Standard Marker Finding (Wilcoxon)

[ClusterMarkers]
[ClusterMarkers.in]
srtobj = ["SeuratClustering"]

[ClusterMarkers.envs]
test.use = "wilcox"
logfc.threshold = 0.25
min.pct = 0.1
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"

Publication-Ready MAST Analysis

[ClusterMarkers]
[ClusterMarkers.in]
srtobj = ["SeuratClustering"]

[ClusterMarkers.envs]
test.use = "MAST"
logfc.threshold = 0.25
min.pct = 0.1
sigmarkers = "p_val_adj < 0.01 & abs(avg_log2FC) > 1"
ncores = 4

DESeq2 for Robust Analysis

[ClusterMarkers]
[ClusterMarkers.in]
srtobj = ["SeuratClustering"]

[ClusterMarkers.envs]
test.use = "DESeq2"
logfc.threshold = 0.5  # More stringent
min.pct = 0.15
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0.5"

Note: DESeq2 requires count data in the Seurat object

Stringent Thresholds for High-Confidence Markers

[ClusterMarkers.envs]
logfc.threshold = 0.58  # 1.5-fold change (2^0.58)
min.pct = 0.25  # Expressed in >25% cells
min.diff.pct = 0.1  # 10% difference in detection
only.pos = true  # Positive markers only
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1"

Subset Specific Clusters

[ClusterMarkers.envs]
# Only analyze clusters c1, c2, c3 to save computation
subset = "seurat_clusters %in% c('c1', 'c2', 'c3')"

Custom Enrichment Databases

[ClusterMarkers.envs]
# Use different pathway databases
dbs = ["Reactome_Pathways_2024", "GO_Biological_Process_2025"]
enrich_style = "clusterprofiler"

Positive Markers Only (Cluster-Specific)

[ClusterMarkers.envs]
only.pos = true
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"

Downsample Large Clusters

[ClusterMarkers.envs]
max.cells.per.ident = 5000  # Limit to 5000 cells per cluster
random.seed = 42  # Reproducible downsampling

Common Patterns

Pattern 1: Quick Wilcoxon Test (Default)

[ClusterMarkers]
[ClusterMarkers.in]
srtobj = ["SeuratClustering"]

Use case: Initial exploration, speed priority

Pattern 2: Publication-Quality MAST

[ClusterMarkers]
[ClusterMarkers.in]
srtobj = ["SeuratClustering"]

[ClusterMarkers.envs]
test.use = "MAST"
logfc.threshold = 0.25
min.pct = 0.1
ncores = 8

Use case: Single-cell publication, accounts for detection rate

Pattern 3: Both Positive and Negative Markers

[ClusterMarkers.envs]
only.pos = false
sigmarkers = "p_val_adj < 0.05 & abs(avg_log2FC) > 0.5"

Use case: Find genes upregulated and downregulated in each cluster

Pattern 4: Stringent Top Markers

[ClusterMarkers.envs]
logfc.threshold = 1.0  # 2-fold change
min.pct = 0.3
sigmarkers = "p_val_adj < 0.001 & avg_log2FC > 1"
only.pos = true

Use case: High-confidence cluster markers for annotation

Pattern 5: Custom Enrichment with Multiple DBs

[ClusterMarkers.envs]
dbs = [
  "KEGG_2021_Human",
  "MSigDB_Hallmark_2020",
  "GO_Biological_Process_2025",
  "Reactome_Pathways_2024"
]
enrich_style = "enrichr"

Pattern 6: ROC Analysis for Classification

[ClusterMarkers.envs]
test.use = "roc"
logfc.threshold = 0.1
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"

Use case: Find markers with highest AUC for classification

Dependencies

Upstream Processes

Required:
```
SeuratClustering
```
(provides cluster assignments)
Alternative:
```
SeuratSubClustering
```
(if sub-clustering analysis)
Context: Runs after
```
TOrBCellSelection
```
if T/B cell selection is enabled

Downstream Processes

CellTypeAnnotation: Uses markers for automated cell type assignment
SeuratMap2Ref: Reference-based annotation may use marker profiles
ScFGSEA: Gene set enrichment on identified markers
ModuleScoreCalculator: Score marker genes across cells

Validation Rules

Statistical Test Constraints

```
test.use
```
must be one of: wilcox, wilcox_limma, MAST, DESeq2, roc, t, tobit, bimod, poisson, negbinom, LR
DESeq2 requires count data (automatically uses counts slot)
MAST, poisson, negbinom support
```
latent.vars
```
for additional covariates

Threshold Validation

```
logfc.threshold
```
: ≥ 0 (typical range: 0.1-1.0)
```
min.pct
```
: 0.0-1.0 (typical: 0.1-0.3)
```
min.diff.pct
```
: ≥ -Inf (typical: 0.05-0.2)
```
min.cells.feature
```
: ≥ 1 (default: 3)
```
min.cells.group
```
: ≥ 1 (default: 3)

sigmarkers Expression

Must be valid R/dplyr expression
Available variables: p_val, avg_log2FC, pct.1, pct.2, p_val_adj
Use
```
&
```
for AND,
```
|
```
for OR,
```
!
```
for NOT

Database Constraints

```
dbs
```
must be valid enrichit database names or GMT file paths
Custom GMT files: use absolute paths or paths relative to config file

Troubleshooting

Issue: Too Many Markers Found

Symptoms: Thousands of markers, low statistical power

Solutions:

[ClusterMarkers.envs]
logfc.threshold = 0.5  # Increase fold change threshold
min.pct = 0.25  # Increase expression percentage
min.diff.pct = 0.15  # Increase detection difference
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1"  # Stricter filter

Issue: No Markers Found

Symptoms: Empty marker tables, no enrichment results

Solutions:

[ClusterMarkers.envs]
logfc.threshold = 0.1  # Lower threshold
min.pct = 0.05  # Lower expression requirement
min.diff.pct = -Inf  # Remove detection difference
sigmarkers = "p_val_adj < 0.1 & avg_log2FC > 0.1"  # Looser filter

Issue: Slow Performance

Symptoms: Marker finding takes hours

Solutions:

[ClusterMarkers.envs]
ncores = 8  # Use more cores
logfc.threshold = 0.5  # Higher threshold reduces genes tested
max.cells.per.ident = 5000  # Downsample large clusters

Issue: DESeq2 Fails with Integrated Data

Symptoms: DESeq2 error on integrated Seurat object

Cause: DESeq2 requires count data, integrated objects have empty counts slot

Solution:

# Use SCTransform counts instead of integrated data
[SeuratPreparing.envs]
method = "SCTransform"
integration_method = null  # Skip integration for DESeq2

[ClusterMarkers.envs]
test.use = "DESeq2"

Alternative: Use MAST or wilcox on integrated data

Issue: Enrichment Analysis Returns No Results

Symptoms: Empty enrichment tables/plots

Solutions:

[ClusterMarkers.envs]
# Check sigmarkers filter is too strict
sigmarkers = "p_val_adj < 0.1 & avg_log2FC > 0"

# Add more databases
dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020", "Reactome_Pathways_2024"]

Issue: NA p-values in Results

Symptoms: Some markers have NA p-values

Cause: Insufficient cells per group or low expression variance

Solutions:

[ClusterMarkers.envs]
min.cells.group = 10  # Increase minimum cells
min.cells.feature = 5  # Increase minimum expressing cells

Issue: Different Test Methods Return Similar Results

Symptoms: wilcox and MAST return nearly identical gene lists

Cause: Strong markers are robust across methods

Solution: Use ROC analysis for alternative ranking:

[ClusterMarkers.envs]
test.use = "roc"

Issue: Computationally Expensive Enrichment

Symptoms: Enrichment step takes very long

Solutions:

[ClusterMarkers.envs]
# Limit markers for enrichment
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1"

# Use fewer databases
dbs = ["MSigDB_Hallmark_2020"]

# Subset clusters for analysis
subset = "seurat_clusters %in% c('c1', 'c2')"

Best Practices

Start with default wilcox test for initial exploration
Use MAST for publications (single-cell-specific modeling)
Set appropriate thresholds: logfc.threshold = 0.25-0.5, min.pct = 0.1-0.2
Filter for enrichment: Use sigmarkers to limit to high-confidence markers
Customize enrichment databases: Choose databases relevant to your study
Use both.pos = false to see upregulated and downregulated genes
Parallelize with ncores for large datasets
Subset clusters when analyzing many clusters to save computation
Validate markers: Check expression patterns in visualization
Reproducibility: Set random.seed for downsampling

Related Processes

ClusterMarkersOfAllCells: Marker finding before T/B cell selection
MarkersFinder: Extended parent process with more flexibility
TopExpressingGenes: Top expressed genes per cluster (non-DE)
SeuratClustering: Required upstream process for cluster assignments
CellTypeAnnotation: Uses markers for automated annotation