Claude-skill-registry clustermarkersofallcells

Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/clustermarkersofallcells" ~/.claude/skills/majiayu000-claude-skill-registry-clustermarkersofallcells && rm -rf "$T"
manifest: skills/data/clustermarkersofallcells/SKILL.md
source content

ClusterMarkersOfAllCells Process Configuration

Purpose

Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.

When to Use

  • After
    SeuratClusteringOfAllCells
    : Runs on all cells before T/B selection
  • Before
    TOrBCellSelection
    : Provides markers to identify which clusters are T/B cells
  • Broad cell type identification: Distinguish major immune cell types from mixed populations
  • Mixed cell populations: When your data contains T, B, Myeloid, NK, and other cell types
  • Initial cell typing: First-pass identification before detailed annotation
  • Data quality check: Verify expected cell types are present in your data

Configuration Structure

Process Enablement

[ClusterMarkersOfAllCells]
cache = true

Input Specification

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
# Accepts output from SeuratClusteringOfAllCells process

Environment Variables

All parameters are inherited from

ClusterMarkers
and
MarkersFinder
:

[ClusterMarkersOfAllCells.envs]
# Parallel computing
ncores = 1

# Grouping (uses seurat_clusters by default)
group_by = null  # null = use Seurat::Idents() (usually "seurat_clusters")

# Statistical test parameters (passed to Seurat::FindMarkers())
test.use = "wilcox"           # wilcox (Wilcoxon), bimod, roc, t, negbinom, poisson
min.pct = 0.1                  # Only test genes detected in >=10% of cells
logfc.threshold = 0.25         # Minimum log2 fold change

# Marker filtering
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"  # Filter for significant markers

# Enrichment analysis
dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020"]
enrich_style = "enrichr"       # enrichr or clusterprofiler

# Error handling
error = false                  # Don't error out if no markers found

# Visualization
marker_plots_defaults = {"order_by": "desc(avg_log2FC)"}
allmarker_plots = {"Top 10 markers of all clusters": {"plot_type": "heatmap"}}

External References

Seurat FindMarkers Parameters

  • Full reference: https://satijalab.org/seurat/reference/findmarkers
  • Statistical tests:
    test.use
    parameter
    • "wilcox"
      : Wilcoxon Rank Sum test (default, recommended)
    • "roc"
      : Receiver Operating Characteristic
    • "t"
      : Student's t-test
    • "negbinom"
      : Negative binomial (requires DESeq2)
    • "poisson"
      : Poisson test
  • Common arguments (use
    -
    instead of
    .
    in TOML):
    • min-pct
      : Minimum detection percentage in either group
    • logfc-threshold
      : Minimum log2 fold change threshold
    • only-pos
      : Only return positive markers
    • min-diff-pct
      : Minimum difference in detection percentage

Enrichment Databases

Configuration Examples

Minimal Configuration

[SeuratClusteringOfAllCells]
[ClusterMarkersOfAllCells]

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]

Standard Marker Finding

[SeuratClusteringOfAllCells]
[ClusterMarkersOfAllCells]

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]

[ClusterMarkersOfAllCells.envs]
# Find markers for broad cell type identification
dbs = ["MSigDB_Hallmark_2020", "KEGG_2021_Human"]
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0.25"

# Generate key visualizations
[ClusterMarkersOfAllCells.envs.marker_plots."Volcano Plot (log2FC)"]
plot_type = "volcano_log2fc"

[ClusterMarkersOfAllCells.envs.allmarker_plots."Top 10 markers of all clusters"]
plot_type = "heatmap"

[ClusterMarkersOfAllCells.envs.enrich_plots."Bar Plot"]
plot_type = "bar"
top_term = 10

Common Patterns

Pattern 1: Broad Cell Type Markers

[ClusterMarkersOfAllCells.envs]
# Optimized for distinguishing T/B/Myeloid/NK cells
min-pct = 0.1              # Require detection in >=10% of cells
logfc-threshold = 0.25     # Minimum log2 fold change
test.use = "wilcox"        # Fast and robust
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"

# Visualize markers to identify cell types
[ClusterMarkersOfAllCells.envs.allmarker_plots."Top 20 markers per cluster"]
plot_type = "heatmap"

# Check for expected markers in outputs
# T cells: CD3D, CD3E, CD3G, CD4, CD8A
# B cells: CD19, MS4A1 (CD20), CD79A, CD79B
# Myeloid: CD14, LYZ, FCGR3A, CD68
# NK cells: NCAM1 (CD56), KLRD1 (CD94), NKG7

Pattern 2: Quick Wilcoxon for Large Datasets

[ClusterMarkersOfAllCells.envs]
# Fast analysis for large datasets (>50k cells)
ncores = 8                  # Use multiple cores
test.use = "wilcox"
min-pct = 0.15              # More stringent to reduce noise
logfc-threshold = 0.3
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 0.5"

# Skip enrichment to save time
dbs = []

# Generate only essential plots
[ClusterMarkersOfAllCells.envs.allmarker_plots."Top markers heatmap"]
plot_type = "heatmap"

Pattern 3: Identify T/B Cell Clusters

[ClusterMarkersOfAllCells.envs]
# Focus on finding T and B cell markers for selection
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 1"

# Will help identify which clusters express:
# T cell markers: CD3D, CD3E, CD3G
# B cell markers: CD19, MS4A1, CD79A

[ClusterMarkersOfAllCells.envs.allmarker_plots."All markers heatmap"]
plot_type = "heatmap"

Difference from ClusterMarkers

AspectClusterMarkersOfAllCellsClusterMarkers
TimingBEFORE
TOrBCellSelection
AFTER
TOrBCellSelection
Data ScopeALL cells (mixed population)SELECTED T/B cells only
PurposeIdentify broad cell typesFine-grained sub-clusters
Typical markersCD3, CD19, CD14, NK markersActivation, differentiation markers
Use case"Which clusters are T/B/Myeloid?""What subtypes exist within T cells?"
Upstream
SeuratClusteringOfAllCells
SeuratClustering
(post-selection)
Downstream
TOrBCellSelection
Cell type annotation, downstream analysis

Key insight: Use

ClusterMarkersOfAllCells
when you need to separate T/B cells from other cell types. Use
ClusterMarkers
when you want to analyze sub-clusters within already-purified T or B cell populations.

Dependencies

Upstream Processes

  • SeuratClusteringOfAllCells
    : Required - provides clustered object with
    seurat_clusters
    metadata
  • SeuratPreparing
    : Indirect - provides normalized Seurat object
  • SampleInfo
    or
    LoadingRNAFromSeurat
    : Entry point for data

Downstream Processes

  • TOrBCellSelection
    : Primary consumer - uses marker results to select T/B cells
  • TopExpressingGenesOfAllCells
    : Optional complementary analysis

Validation Rules

Required Inputs

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]  # Must be specified

Process Enablement

  • Process automatically enabled when
    SeuratClusteringOfAllCells
    is in config
  • No need to explicitly set
    [ClusterMarkersOfAllCells]
    if
    SeuratClusteringOfAllCells
    is enabled

Parameter Constraints

  • test.use
    : Must be one of
    "wilcox"
    ,
    "roc"
    ,
    "t"
    ,
    "negbinom"
    ,
    "poisson"
  • min-pct
    : Should be between 0 and 1 (e.g., 0.1 = 10%)
  • logfc-threshold
    : Numeric value (log2 scale)
  • sigmarkers
    : Valid dplyr filter expression

Common Errors

  • Missing clustering: Ensure
    SeuratClusteringOfAllCells
    runs first
  • No markers found: Adjust
    sigmarkers
    or
    logfc-threshold
    if too stringent
  • Memory issues: Reduce
    ncores
    or subset data with large datasets

Troubleshooting

Issue: No significant markers found

Symptoms: Empty output directory or warning about no markers

Solutions:

[ClusterMarkersOfAllCells.envs]
# Less stringent thresholds
logfc-threshold = 0.1           # Lower fold change requirement
min-pct = 0.05                 # Lower detection percentage
sigmarkers = "p_val_adj < 0.1"  # More relaxed p-value

# Or check data quality
# - Are cells properly clustered?
# - Is expression matrix normalized?
# - Are there enough cells per cluster (>30 recommended)?

Issue: Too many markers (slow enrichment)

Symptoms: Process takes very long, memory issues

Solutions:

[ClusterMarkersOfAllCells.envs]
# More stringent filtering
logfc-threshold = 0.5
min-pct = 0.2
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1"

# Reduce enrichment databases
dbs = ["MSigDB_Hallmark_2020"]

# Or skip enrichment entirely
dbs = []

Issue: Can't identify T/B cell clusters

Symptoms: Markers don't show clear T/B cell signatures

Solutions:

  1. Check marker gene presence:

    # Verify expected markers are in your data
    # Use SeuratClusterStats to visualize:
    [SeuratClusterStats.envs.features_defaults]
    features = ["CD3D", "CD3E", "CD19", "MS4A1", "CD14", "LYZ"]
    
  2. Adjust clustering parameters:

    [SeuratClusteringOfAllCells.envs]
    res = 0.5  # Try different resolutions (0.2-1.5)
    
  3. Check data quality:

    • Are genes properly normalized?
    • Are there enough cells per cluster?
    • Is species correct (human vs mouse gene symbols)?

Issue: Process not running

Symptoms: Process skipped in workflow

Solutions:

  • Verify
    SeuratClusteringOfAllCells
    is in config
  • Check dependencies are running correctly
  • Ensure TCR data requires T/B selection (not all T cells already)

Typical Marker Genes for Identification

Cell TypePositive MarkersNegative Markers
T cellsCD3D, CD3E, CD3G, CD4, CD8ACD19, MS4A1, CD14
B cellsCD19, MS4A1 (CD20), CD79A, CD79BCD3E, CD3D, CD14
MonocytesCD14, LYZ, FCGR3A, S100A8CD3E, CD19
NK cellsNCAM1 (CD56), KLRD1 (CD94), NKG7CD3E, CD19, CD14
Dendritic cellsFCER1A, CST3CD3E, CD19, CD14
MegakaryocytesPPBP, PF4CD3E, CD19, CD14

Use these marker lists to identify which clusters correspond to which cell types in your

allmarker_plots
heatmaps.