Claude-skill-registry clonalstats

Generate comprehensive clonality statistics and diversity visualizations for TCR/BCR repertoire analysis. Quantifies clonal expansion, measures diversity metrics (Shannon, Simpson, Gini), and creates publication-ready plots.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/clonalstats" ~/.claude/skills/majiayu000-claude-skill-registry-clonalstats && rm -rf "$T"
manifest: skills/data/clonalstats/SKILL.md
source content

ClonalStats Process Configuration

Purpose

Generate comprehensive clonality statistics and diversity visualizations for TCR/BCR repertoire analysis. Quantifies clonal expansion, measures diversity metrics (Shannon, Simpson, Gini), and creates publication-ready plots.

When to Use

  • To quantify clonal expansion patterns in TCR/BCR data
  • For diversity analysis comparing multiple samples or conditions
  • To identify hyperexpanded clones and their distribution
  • For rarefaction analysis to assess sampling depth
  • After
    ScRepCombiningExpression
    to analyze integrated TCR+RNA data

Configuration Structure

Process Enablement

[ClonalStats]
cache = true

Input Specification

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

Core Environment Variables

[ClonalStats.envs]
# Clone definition: "gene" (VDJC), "aa" (CDR3 amino acid), "nt" (CDR3 nucleotide)
clone_call = "aa"
# Chain analysis: "both", "TRA", "TRB", "TRG", "IGH", "IGL"
chain = "both"
# Data transformations (dplyr::mutate syntax)
mutaters = {}
# Data filtering (dplyr::filter syntax)
subset = null
# Output device parameters
devpars = {width = 800, height = 600, res = 100}
# Save code and data (large files - use with caution)
save_code = false
save_data = false

Case-Based Plot Generation

[ClonalStats.envs.cases."Case Name"]
viz_type = "volume"  # volume, abundance, length, residency, stat,
                    # composition, overlap, diversity, geneusage,
                    # positional, kmer, rarefaction

Diversity Metrics

MetricRangeInterpretationBest For
shannon0 - ∞Higher = more diversityGeneral comparison
inv.simpson1 - ∞Higher = more diversityCommon clones
gini.coeff0 - 10 = equality, 1 = inequalityClonality dominance
norm.entropy0 - 1Higher = more diversityEvenness-focused
chao1≥ richnessEstimates total richnessSmall samples
d50CountClones making up 50%Practical dominance

Interpretation:

  • High diversity = Many unique clones, even distribution (healthy repertoire)
  • Low diversity = Few dominant clones (antigen-specific response, infection, cancer)
  • Gini ≈ 1 = Very skewed, few clones dominate
  • Gini ≈ 0 = Even distribution

Visualization Types

viz_type options:

  • volume
    - Number of clones per sample/group
  • abundance
    - Clone abundance distribution (trend/histogram/density)
  • length
    - CDR3 sequence length distribution
  • residency
    - Clones present across groups (venn/upset)
  • stat
    - Expanded clone analysis (pies/sankey)
  • diversity
    - Diversity metrics (bar/box/violin)
  • geneusage
    - V/D/J gene usage frequency
  • rarefaction
    - Sampling depth assessment

Configuration Examples

Minimal Configuration

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

Standard Diversity Analysis

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"
plot_type = "box"
group_by = "Diagnosis"
comparisons = true

[ClonalStats.envs.cases."Gini Coeff"]
viz_type = "diversity"
method = "gini.coeff"
plot_type = "violin"
group_by = "Diagnosis"
add_box = true

Expanded Clone Analysis

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Expanded Clones"]
viz_type = "stat"
plot_type = "pies"
group_by = "Diagnosis"
subgroup_by = "seurat_clusters"
clones = {"Expanded (>2)" = "sel(Colitis > 2)"}

Rarefaction Analysis

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Rarefaction"]
viz_type = "rarefaction"
group_by = "Patient"
q = 1  # 0=richness, 1=shannon, 2=simpson
n_boots = 20

Complete Analysis Suite

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Volume"]
viz_type = "volume"

[ClonalStats.envs.cases."Abundance"]
viz_type = "abundance"
plot_type = "density"

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"

[ClonalStats.envs.cases."Rarefaction"]
viz_type = "rarefaction"

Common Patterns

Disease vs Healthy

[ClonalStats.envs.cases."Comparison"]
viz_type = "diversity"
method = "gini.coeff"
plot_type = "box"
group_by = "Condition"
comparisons = true

Time Course

[ClonalStats.envs.cases."Timepoint"]
viz_type = "volume"
x = "Timepoint"

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"
group_by = "Timepoint"

Treatment Response

[ClonalStats.envs.cases."Response"]
viz_type = "diversity"
method = "gini.coeff"
group_by = "Response"
plot_type = "box"
comparisons = true

Dependencies

  • Upstream:
    ScRepCombiningExpression
    (required)
  • Related:
    ScRepLoading
    ,
    CDR3Clustering
    ,
    TESSA
    (optional)

Validation Rules

  • Input must be valid scRepertoire object
  • For
    viz_type = "diversity"
    , method must be supported
  • For rarefaction,
    n_boots
    should be ≥ 10
  • Use
    sel()
    syntax in
    clones
    parameter for filtering

Troubleshooting

Sample column not found: Input must have

Sample
column or specify
x
parameter.

Strange diversity values: Small repertoire sizes cause bias. Use

plot_type = "box"
.

Rarefaction curves noisy: Increase

n_boots
(try 50-100).

Too many clones in stat plots: Use

subset
or stricter
clones
thresholds.

Plot generation slow: Use

clone_call = "gene"
for speed, apply
subset
.

Missing comparisons: Set

comparisons = true
to add significance tests.

Best Practices

  1. Start with default cases to see standard visualizations
  2. Use multiple diversity metrics: Shannon + Gini
  3. Check rarefaction curves to ensure sufficient sampling
  4. Document clone thresholds when defining expanded clones
  5. Use
    clone_call = "gene"
    for speed, "aa" for granularity
  6. Set
    save_data = true
    for debugging (watch disk space)
  7. Validate findings with complementary diversity indices
  8. Consider sample size: small samples underestimate richness