Claude-skill-registry clustermarkersofallcells
Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/clustermarkersofallcells" ~/.claude/skills/majiayu000-claude-skill-registry-clustermarkersofallcells && rm -rf "$T"
skills/data/clustermarkersofallcells/SKILL.mdClusterMarkersOfAllCells Process Configuration
Purpose
Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.
When to Use
- After
: Runs on all cells before T/B selectionSeuratClusteringOfAllCells - Before
: Provides markers to identify which clusters are T/B cellsTOrBCellSelection - Broad cell type identification: Distinguish major immune cell types from mixed populations
- Mixed cell populations: When your data contains T, B, Myeloid, NK, and other cell types
- Initial cell typing: First-pass identification before detailed annotation
- Data quality check: Verify expected cell types are present in your data
Configuration Structure
Process Enablement
[ClusterMarkersOfAllCells] cache = true
Input Specification
[ClusterMarkersOfAllCells.in] srtobj = ["SeuratClusteringOfAllCells"] # Accepts output from SeuratClusteringOfAllCells process
Environment Variables
All parameters are inherited from
ClusterMarkers and MarkersFinder:
[ClusterMarkersOfAllCells.envs] # Parallel computing ncores = 1 # Grouping (uses seurat_clusters by default) group_by = null # null = use Seurat::Idents() (usually "seurat_clusters") # Statistical test parameters (passed to Seurat::FindMarkers()) test.use = "wilcox" # wilcox (Wilcoxon), bimod, roc, t, negbinom, poisson min.pct = 0.1 # Only test genes detected in >=10% of cells logfc.threshold = 0.25 # Minimum log2 fold change # Marker filtering sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0" # Filter for significant markers # Enrichment analysis dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020"] enrich_style = "enrichr" # enrichr or clusterprofiler # Error handling error = false # Don't error out if no markers found # Visualization marker_plots_defaults = {"order_by": "desc(avg_log2FC)"} allmarker_plots = {"Top 10 markers of all clusters": {"plot_type": "heatmap"}}
External References
Seurat FindMarkers Parameters
- Full reference: https://satijalab.org/seurat/reference/findmarkers
- Statistical tests:
parametertest.use
: Wilcoxon Rank Sum test (default, recommended)"wilcox"
: Receiver Operating Characteristic"roc"
: Student's t-test"t"
: Negative binomial (requires DESeq2)"negbinom"
: Poisson test"poisson"
- Common arguments (use
instead of-
in TOML):.
: Minimum detection percentage in either groupmin-pct
: Minimum log2 fold change thresholdlogfc-threshold
: Only return positive markersonly-pos
: Minimum difference in detection percentagemin-diff-pct
Enrichment Databases
- MSigDB: https://www.gsea-msigdb.org/gsea/msigdb/
- KEGG: https://www.genome.jp/kegg/
- Reactome: https://reactome.org/
- GO: http://geneontology.org/
Configuration Examples
Minimal Configuration
[SeuratClusteringOfAllCells] [ClusterMarkersOfAllCells] [ClusterMarkersOfAllCells.in] srtobj = ["SeuratClusteringOfAllCells"]
Standard Marker Finding
[SeuratClusteringOfAllCells] [ClusterMarkersOfAllCells] [ClusterMarkersOfAllCells.in] srtobj = ["SeuratClusteringOfAllCells"] [ClusterMarkersOfAllCells.envs] # Find markers for broad cell type identification dbs = ["MSigDB_Hallmark_2020", "KEGG_2021_Human"] sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0.25" # Generate key visualizations [ClusterMarkersOfAllCells.envs.marker_plots."Volcano Plot (log2FC)"] plot_type = "volcano_log2fc" [ClusterMarkersOfAllCells.envs.allmarker_plots."Top 10 markers of all clusters"] plot_type = "heatmap" [ClusterMarkersOfAllCells.envs.enrich_plots."Bar Plot"] plot_type = "bar" top_term = 10
Common Patterns
Pattern 1: Broad Cell Type Markers
[ClusterMarkersOfAllCells.envs] # Optimized for distinguishing T/B/Myeloid/NK cells min-pct = 0.1 # Require detection in >=10% of cells logfc-threshold = 0.25 # Minimum log2 fold change test.use = "wilcox" # Fast and robust sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0" # Visualize markers to identify cell types [ClusterMarkersOfAllCells.envs.allmarker_plots."Top 20 markers per cluster"] plot_type = "heatmap" # Check for expected markers in outputs # T cells: CD3D, CD3E, CD3G, CD4, CD8A # B cells: CD19, MS4A1 (CD20), CD79A, CD79B # Myeloid: CD14, LYZ, FCGR3A, CD68 # NK cells: NCAM1 (CD56), KLRD1 (CD94), NKG7
Pattern 2: Quick Wilcoxon for Large Datasets
[ClusterMarkersOfAllCells.envs] # Fast analysis for large datasets (>50k cells) ncores = 8 # Use multiple cores test.use = "wilcox" min-pct = 0.15 # More stringent to reduce noise logfc-threshold = 0.3 sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 0.5" # Skip enrichment to save time dbs = [] # Generate only essential plots [ClusterMarkersOfAllCells.envs.allmarker_plots."Top markers heatmap"] plot_type = "heatmap"
Pattern 3: Identify T/B Cell Clusters
[ClusterMarkersOfAllCells.envs] # Focus on finding T and B cell markers for selection sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 1" # Will help identify which clusters express: # T cell markers: CD3D, CD3E, CD3G # B cell markers: CD19, MS4A1, CD79A [ClusterMarkersOfAllCells.envs.allmarker_plots."All markers heatmap"] plot_type = "heatmap"
Difference from ClusterMarkers
| Aspect | ClusterMarkersOfAllCells | ClusterMarkers |
|---|---|---|
| Timing | BEFORE | AFTER |
| Data Scope | ALL cells (mixed population) | SELECTED T/B cells only |
| Purpose | Identify broad cell types | Fine-grained sub-clusters |
| Typical markers | CD3, CD19, CD14, NK markers | Activation, differentiation markers |
| Use case | "Which clusters are T/B/Myeloid?" | "What subtypes exist within T cells?" |
| Upstream | | (post-selection) |
| Downstream | | Cell type annotation, downstream analysis |
Key insight: Use
ClusterMarkersOfAllCells when you need to separate T/B cells from other cell types. Use ClusterMarkers when you want to analyze sub-clusters within already-purified T or B cell populations.
Dependencies
Upstream Processes
: Required - provides clustered object withSeuratClusteringOfAllCells
metadataseurat_clusters
: Indirect - provides normalized Seurat objectSeuratPreparing
orSampleInfo
: Entry point for dataLoadingRNAFromSeurat
Downstream Processes
: Primary consumer - uses marker results to select T/B cellsTOrBCellSelection
: Optional complementary analysisTopExpressingGenesOfAllCells
Validation Rules
Required Inputs
[ClusterMarkersOfAllCells.in] srtobj = ["SeuratClusteringOfAllCells"] # Must be specified
Process Enablement
- Process automatically enabled when
is in configSeuratClusteringOfAllCells - No need to explicitly set
if[ClusterMarkersOfAllCells]
is enabledSeuratClusteringOfAllCells
Parameter Constraints
: Must be one oftest.use
,"wilcox"
,"roc"
,"t"
,"negbinom""poisson"
: Should be between 0 and 1 (e.g., 0.1 = 10%)min-pct
: Numeric value (log2 scale)logfc-threshold
: Valid dplyr filter expressionsigmarkers
Common Errors
- Missing clustering: Ensure
runs firstSeuratClusteringOfAllCells - No markers found: Adjust
orsigmarkers
if too stringentlogfc-threshold - Memory issues: Reduce
or subset data with large datasetsncores
Troubleshooting
Issue: No significant markers found
Symptoms: Empty output directory or warning about no markers
Solutions:
[ClusterMarkersOfAllCells.envs] # Less stringent thresholds logfc-threshold = 0.1 # Lower fold change requirement min-pct = 0.05 # Lower detection percentage sigmarkers = "p_val_adj < 0.1" # More relaxed p-value # Or check data quality # - Are cells properly clustered? # - Is expression matrix normalized? # - Are there enough cells per cluster (>30 recommended)?
Issue: Too many markers (slow enrichment)
Symptoms: Process takes very long, memory issues
Solutions:
[ClusterMarkersOfAllCells.envs] # More stringent filtering logfc-threshold = 0.5 min-pct = 0.2 sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1" # Reduce enrichment databases dbs = ["MSigDB_Hallmark_2020"] # Or skip enrichment entirely dbs = []
Issue: Can't identify T/B cell clusters
Symptoms: Markers don't show clear T/B cell signatures
Solutions:
-
Check marker gene presence:
# Verify expected markers are in your data # Use SeuratClusterStats to visualize: [SeuratClusterStats.envs.features_defaults] features = ["CD3D", "CD3E", "CD19", "MS4A1", "CD14", "LYZ"] -
Adjust clustering parameters:
[SeuratClusteringOfAllCells.envs] res = 0.5 # Try different resolutions (0.2-1.5) -
Check data quality:
- Are genes properly normalized?
- Are there enough cells per cluster?
- Is species correct (human vs mouse gene symbols)?
Issue: Process not running
Symptoms: Process skipped in workflow
Solutions:
- Verify
is in configSeuratClusteringOfAllCells - Check dependencies are running correctly
- Ensure TCR data requires T/B selection (not all T cells already)
Typical Marker Genes for Identification
| Cell Type | Positive Markers | Negative Markers |
|---|---|---|
| T cells | CD3D, CD3E, CD3G, CD4, CD8A | CD19, MS4A1, CD14 |
| B cells | CD19, MS4A1 (CD20), CD79A, CD79B | CD3E, CD3D, CD14 |
| Monocytes | CD14, LYZ, FCGR3A, S100A8 | CD3E, CD19 |
| NK cells | NCAM1 (CD56), KLRD1 (CD94), NKG7 | CD3E, CD19, CD14 |
| Dendritic cells | FCER1A, CST3 | CD3E, CD19, CD14 |
| Megakaryocytes | PPBP, PF4 | CD3E, CD19, CD14 |
Use these marker lists to identify which clusters correspond to which cell types in your
allmarker_plots heatmaps.