Claude-skill-registry cdr3aaphyschem

Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cdr3aaphyschem" ~/.claude/skills/majiayu000-claude-skill-registry-cdr3aaphyschem && rm -rf "$T"
manifest: skills/data/cdr3aaphyschem/SKILL.md
source content

CDR3AAPhyschem Process Configuration

Purpose

Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).

When to Use

  • To analyze CDR3 biochemical properties differences between cell groups (e.g., Treg vs Tconv)
  • For feature engineering in TCR machine learning models
  • To identify sequence features that distinguish cell subsets
  • After
    ScRepCombiningExpression
    (requires combined TCR + RNA data)
  • When investigating T cell fate determination (regulatory vs conventional T cells)

Configuration Structure

Process Enablement

[CDR3AAPhyschem]
cache = true

Input Specification

[CDR3AAPhyschem.in]
scrfile = ["ScRepCombiningExpression"]
  • scrfile
    : Output from
    ScRepCombiningExpression
    (RDS or qs/qs2 format)
  • Must contain both TRA and TRB chains
  • Generated by
    scRepertoire::combineExpression()

Environment Variables

[CDR3AAPhyschem.envs]
# Group comparison specification
group = "CellType"
comparison = {Treg = ["CD4 CTL", "CD4 Naive", "CD4 TCM", "CD4 TEM"], Tconv = "Tconv"}
target = "Treg"
each = "Sample"

# Chain selection
chain = "TRB"

Key Parameters:

  • group
    : Column name in metadata defining groups to compare (e.g.,
    CellType
    ,
    seurat_clusters
    )
  • comparison
    : Two-group specification for regression analysis
    • Format 1 (dict):
      Group1 = ["cell1", "cell2"], Group2 = "cell3"
    • Format 2 (list):
      ["Group1", "Group2"]
      (when groups exist in column)
  • target
    : Which group to label as 1 in regression (default: first group in
    comparison
    )
  • each
    : Column(s) to split data for separate analyses
    • Single column:
      "Sample"
    • Multiple columns:
      ["Sample", "Patient"]
    • Comma-separated:
      "Sample,Patient"
    • If not provided, all cells used together

Configuration Examples

Minimal Configuration

[CDR3AAPhyschem]
[CDR3AAPhyschem.in]
scrfile = ["ScRepCombiningExpression"]

Standard Treg vs Tconv Analysis

[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Define cell type groups for comparison
group = "CellType"
comparison = {Treg = ["Treg"], Tconv = ["Tconv"]}
target = "Treg"
chain = "TRB"

Multi-Sample Analysis

[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = ["Treg", "Tconv"]
target = "Treg"
# Run regression separately for each sample
each = "Sample"
chain = "TRB"

Custom Group Definition

[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "Cluster"
# Define clusters to compare
comparison = {
  HighQuality = ["c1", "c2", "c5"],
  LowQuality = ["c3", "c4"]
}
target = "HighQuality"
chain = "TRB"

Physicochemical Properties

Available Properties

The process calculates 8 key physicochemical properties from CDR3 amino acid sequences:

PropertyDescriptionBiological Significance
lengthTotal amino acid count in CDR3Influences binding loop size and flexibility
gravyGrand Average of Hydrophobicity (Kyte-Doolittle scale)Hydrophobic CDR3s associate with self-reactivity and Treg fate
bulkinessAverage bulkiness (Zimmerman scale)Measures steric bulk of amino acids
polarityAverage polarity (Grantham scale)Influences interactions with peptide-MHC
aliphaticNormalized aliphatic index (Ikai scale)Related to thermal stability
chargeNormalized net charge at physiological pHAffects electrostatic interactions
acidicAcidic side chain residue content (D, E proportion)Contributes to negative charge
aromaticAromatic side chain content (F, W, Y proportion)Important for π-π interactions

Property Calculation Methods

  • Default scales: Standard biophysical scales from peer-reviewed literature
  • GRAVY: Kyte & Doolittle (1982) hydropathy scale
  • Bulkiness: Zimmerman et al. (1968) bulkiness parameters
  • Polarity: Grantham (1974) amino acid difference index
  • Aliphatic index: Ikai (1980) thermodynamic stability scale
  • Charge: Normalized based on pKa values (EMBOSS database)
  • Acidic/Basic/Aromatic: Direct residue counting proportions

Regression Analysis

  • Performed for each physicochemical property independently
  • Compares properties across CDR3 length distributions
  • Binary classification: target group (1) vs non-target (0)
  • Output: Statistical significance of property differences

Common Patterns

Pattern 1: Treg vs Tconv (TRB Chain)

[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Literature-based: hydrophobic CDR3β promotes Treg fate
group = "CellType"
comparison = {Treg = ["Treg", "CD4+Treg"], Tconv = ["Tconv", "CD4+Tconv"]}
target = "Treg"
chain = "TRB"
each = ""  # Analyze all samples together

Pattern 2: Selected Properties Only

[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Focus on hydrophobicity (key Treg feature)
group = "CellType"
comparison = ["Treg", "Tconv"]
target = "Treg"
chain = "TRB"
# To analyze specific chains separately

Pattern 3: Multi-Chain Analysis

Run separate processes for different chains:

# TRB analysis
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
chain = "TRB"
group = "CellType"
comparison = ["Treg", "Tconv"]

# Note: Create separate config for TRA analysis if needed

Pattern 4: Multi-Group Comparisons

[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {
  Naive = ["CD4 Naive", "CD8 Naive"],
  Memory = ["CD4 TEM", "CD4 TCM", "CD8 TEM", "CD8 TCM"],
  Effector = ["CD4 CTL", "CD8 CTL"]
}
target = "Naive"
chain = "TRB"

Dependencies

  • Upstream:
    ScRepCombiningExpression
    (required)
  • Downstream: Feature analysis, ML model training, publication figures
  • Required data: Both TRA and TRB chains in combined object

Validation Rules

  • CDR3 sequence requirements: Must have valid amino acid sequences (no Ns)
  • Chain requirement: Data must contain specified chain (TRA or TRB)
  • Group specification: Groups must exist in metadata
  • Minimum cells: Sufficient cells per group for statistical regression
  • Length distribution: CDR3 length range must be adequate for regression

Troubleshooting

Issue: "Missing chain in data"

Cause: Specified chain (TRA/TRB) not found in combined object Solution:

# Change to available chain
[CDR3AAPhyschem.envs]
chain = "TRA"  # or "TRB"

Issue: "Group not found in metadata"

Cause:

group
column or
comparison
values don't exist Solution:

  1. Check available metadata columns in
    ScRepCombiningExpression
    output
  2. Verify group names match exactly (case-sensitive)
[CDR3AAPhyschem.envs]
group = "seurat_clusters"  # If CellType not available
comparison = ["0", "1"]  # Use cluster IDs

Issue: "Insufficient cells for regression"

Cause: Too few cells in one or more groups Solution:

  1. Use
    each
    to analyze samples separately if pooled analysis fails
  2. Combine similar cell types in
    comparison
[CDR3AAPhyschem.envs]
# Combine rare subtypes
comparison = {HighExpander = ["Treg", "Tconv"], LowExpander = ["Tfh"]}

Issue: "No significant property differences"

Cause: Groups may not differ in physicochemical properties Solution:

  1. Check if
    comparison
    groups are biologically distinct
  2. Consider different
    group
    column (e.g., gene expression clusters)
  3. Verify CDR3 sequences are high-quality

Scientific Context

Key Publications

  1. Stadinski et al. (2016): "Hydrophobic CDR3 residues promote development of self-reactive T cells" - Nature Immunology
  2. Lagattuta et al. (2022): "TCR sequence features influence T cell fate" - Nature Immunology
  3. Ostmeyer et al. (2019): "Biophysicochemical motifs distinguish TILs from healthy tissue" - Cancer Research

Interpretation Guidelines

  • High GRAVY: More hydrophobic CDR3 (associated with self-reactivity, Treg)
  • High charge: Electrostatic potential may affect binding affinity
  • High aromaticity: Increased π-π interactions, structural stability
  • Length distribution: Longer CDR3s may provide broader specificity

Feature Engineering Applications

Use properties as features for:

  • TCR specificity prediction models
  • T cell fate classification (Treg vs Tconv)
  • Antigen binding affinity estimation
  • Cross-reactivity assessment

Output Format

  • Directory:
    {{in.scrfile | stem}}.cdr3aaphyschem/
  • Files:
    • Regression plots per property (hydrophobicity, volume, pI)
    • Statistical tables comparing groups
    • CDR3 length distributions
    • Property correlation matrices
  • Visualizations:
    • Property vs length scatter plots
    • Group-wise property boxplots
    • Regression curves with confidence intervals

Advanced Usage

Custom Property Scales

If using non-default scales (requires modifying underlying R script):

# Note: Advanced usage - may require script modification
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Specify alternative hydrophobicity scale
hydro_scale = "Wimley"
pK_source = "Murray"

Length-Based Stratification

[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Analyze by CDR3 length bins
group = "CellType"
comparison = ["Treg", "Tconv"]
# Use metadata column with length information
each = "CDR3_Length_Bin"
chain = "TRB"

Publication-Ready Plots

[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {Treg = "Treg", Tconv = "Tconv"}
target = "Treg"
chain = "TRB"
# Publication parameters
plot_theme = "nature"
fig_dpi = 300
fig_format = "pdf"