Claude-skill-registry cdr3aaphyschem
Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cdr3aaphyschem" ~/.claude/skills/majiayu000-claude-skill-registry-cdr3aaphyschem && rm -rf "$T"
skills/data/cdr3aaphyschem/SKILL.mdCDR3AAPhyschem Process Configuration
Purpose
Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).
When to Use
- To analyze CDR3 biochemical properties differences between cell groups (e.g., Treg vs Tconv)
- For feature engineering in TCR machine learning models
- To identify sequence features that distinguish cell subsets
- After
(requires combined TCR + RNA data)ScRepCombiningExpression - When investigating T cell fate determination (regulatory vs conventional T cells)
Configuration Structure
Process Enablement
[CDR3AAPhyschem] cache = true
Input Specification
[CDR3AAPhyschem.in] scrfile = ["ScRepCombiningExpression"]
: Output fromscrfile
(RDS or qs/qs2 format)ScRepCombiningExpression- Must contain both TRA and TRB chains
- Generated by
scRepertoire::combineExpression()
Environment Variables
[CDR3AAPhyschem.envs] # Group comparison specification group = "CellType" comparison = {Treg = ["CD4 CTL", "CD4 Naive", "CD4 TCM", "CD4 TEM"], Tconv = "Tconv"} target = "Treg" each = "Sample" # Chain selection chain = "TRB"
Key Parameters:
: Column name in metadata defining groups to compare (e.g.,group
,CellType
)seurat_clusters
: Two-group specification for regression analysiscomparison- Format 1 (dict):
Group1 = ["cell1", "cell2"], Group2 = "cell3" - Format 2 (list):
(when groups exist in column)["Group1", "Group2"]
- Format 1 (dict):
: Which group to label as 1 in regression (default: first group intarget
)comparison
: Column(s) to split data for separate analyseseach- Single column:
"Sample" - Multiple columns:
["Sample", "Patient"] - Comma-separated:
"Sample,Patient" - If not provided, all cells used together
- Single column:
Configuration Examples
Minimal Configuration
[CDR3AAPhyschem] [CDR3AAPhyschem.in] scrfile = ["ScRepCombiningExpression"]
Standard Treg vs Tconv Analysis
[CDR3AAPhyschem] [CDR3AAPhyschem.envs] # Define cell type groups for comparison group = "CellType" comparison = {Treg = ["Treg"], Tconv = ["Tconv"]} target = "Treg" chain = "TRB"
Multi-Sample Analysis
[CDR3AAPhyschem] [CDR3AAPhyschem.envs] group = "CellType" comparison = ["Treg", "Tconv"] target = "Treg" # Run regression separately for each sample each = "Sample" chain = "TRB"
Custom Group Definition
[CDR3AAPhyschem] [CDR3AAPhyschem.envs] group = "Cluster" # Define clusters to compare comparison = { HighQuality = ["c1", "c2", "c5"], LowQuality = ["c3", "c4"] } target = "HighQuality" chain = "TRB"
Physicochemical Properties
Available Properties
The process calculates 8 key physicochemical properties from CDR3 amino acid sequences:
| Property | Description | Biological Significance |
|---|---|---|
| length | Total amino acid count in CDR3 | Influences binding loop size and flexibility |
| gravy | Grand Average of Hydrophobicity (Kyte-Doolittle scale) | Hydrophobic CDR3s associate with self-reactivity and Treg fate |
| bulkiness | Average bulkiness (Zimmerman scale) | Measures steric bulk of amino acids |
| polarity | Average polarity (Grantham scale) | Influences interactions with peptide-MHC |
| aliphatic | Normalized aliphatic index (Ikai scale) | Related to thermal stability |
| charge | Normalized net charge at physiological pH | Affects electrostatic interactions |
| acidic | Acidic side chain residue content (D, E proportion) | Contributes to negative charge |
| aromatic | Aromatic side chain content (F, W, Y proportion) | Important for π-π interactions |
Property Calculation Methods
- Default scales: Standard biophysical scales from peer-reviewed literature
- GRAVY: Kyte & Doolittle (1982) hydropathy scale
- Bulkiness: Zimmerman et al. (1968) bulkiness parameters
- Polarity: Grantham (1974) amino acid difference index
- Aliphatic index: Ikai (1980) thermodynamic stability scale
- Charge: Normalized based on pKa values (EMBOSS database)
- Acidic/Basic/Aromatic: Direct residue counting proportions
Regression Analysis
- Performed for each physicochemical property independently
- Compares properties across CDR3 length distributions
- Binary classification: target group (1) vs non-target (0)
- Output: Statistical significance of property differences
Common Patterns
Pattern 1: Treg vs Tconv (TRB Chain)
[CDR3AAPhyschem] [CDR3AAPhyschem.envs] # Literature-based: hydrophobic CDR3β promotes Treg fate group = "CellType" comparison = {Treg = ["Treg", "CD4+Treg"], Tconv = ["Tconv", "CD4+Tconv"]} target = "Treg" chain = "TRB" each = "" # Analyze all samples together
Pattern 2: Selected Properties Only
[CDR3AAPhyschem] [CDR3AAPhyschem.envs] # Focus on hydrophobicity (key Treg feature) group = "CellType" comparison = ["Treg", "Tconv"] target = "Treg" chain = "TRB" # To analyze specific chains separately
Pattern 3: Multi-Chain Analysis
Run separate processes for different chains:
# TRB analysis [CDR3AAPhyschem] [CDR3AAPhyschem.envs] chain = "TRB" group = "CellType" comparison = ["Treg", "Tconv"] # Note: Create separate config for TRA analysis if needed
Pattern 4: Multi-Group Comparisons
[CDR3AAPhyschem] [CDR3AAPhyschem.envs] group = "CellType" comparison = { Naive = ["CD4 Naive", "CD8 Naive"], Memory = ["CD4 TEM", "CD4 TCM", "CD8 TEM", "CD8 TCM"], Effector = ["CD4 CTL", "CD8 CTL"] } target = "Naive" chain = "TRB"
Dependencies
- Upstream:
(required)ScRepCombiningExpression - Downstream: Feature analysis, ML model training, publication figures
- Required data: Both TRA and TRB chains in combined object
Validation Rules
- CDR3 sequence requirements: Must have valid amino acid sequences (no Ns)
- Chain requirement: Data must contain specified chain (TRA or TRB)
- Group specification: Groups must exist in metadata
- Minimum cells: Sufficient cells per group for statistical regression
- Length distribution: CDR3 length range must be adequate for regression
Troubleshooting
Issue: "Missing chain in data"
Cause: Specified chain (TRA/TRB) not found in combined object Solution:
# Change to available chain [CDR3AAPhyschem.envs] chain = "TRA" # or "TRB"
Issue: "Group not found in metadata"
Cause:
group column or comparison values don't exist
Solution:
- Check available metadata columns in
outputScRepCombiningExpression - Verify group names match exactly (case-sensitive)
[CDR3AAPhyschem.envs] group = "seurat_clusters" # If CellType not available comparison = ["0", "1"] # Use cluster IDs
Issue: "Insufficient cells for regression"
Cause: Too few cells in one or more groups Solution:
- Use
to analyze samples separately if pooled analysis failseach - Combine similar cell types in
comparison
[CDR3AAPhyschem.envs] # Combine rare subtypes comparison = {HighExpander = ["Treg", "Tconv"], LowExpander = ["Tfh"]}
Issue: "No significant property differences"
Cause: Groups may not differ in physicochemical properties Solution:
- Check if
groups are biologically distinctcomparison - Consider different
column (e.g., gene expression clusters)group - Verify CDR3 sequences are high-quality
Scientific Context
Key Publications
- Stadinski et al. (2016): "Hydrophobic CDR3 residues promote development of self-reactive T cells" - Nature Immunology
- Lagattuta et al. (2022): "TCR sequence features influence T cell fate" - Nature Immunology
- Ostmeyer et al. (2019): "Biophysicochemical motifs distinguish TILs from healthy tissue" - Cancer Research
Interpretation Guidelines
- High GRAVY: More hydrophobic CDR3 (associated with self-reactivity, Treg)
- High charge: Electrostatic potential may affect binding affinity
- High aromaticity: Increased π-π interactions, structural stability
- Length distribution: Longer CDR3s may provide broader specificity
Feature Engineering Applications
Use properties as features for:
- TCR specificity prediction models
- T cell fate classification (Treg vs Tconv)
- Antigen binding affinity estimation
- Cross-reactivity assessment
Output Format
- Directory:
{{in.scrfile | stem}}.cdr3aaphyschem/ - Files:
- Regression plots per property (hydrophobicity, volume, pI)
- Statistical tables comparing groups
- CDR3 length distributions
- Property correlation matrices
- Visualizations:
- Property vs length scatter plots
- Group-wise property boxplots
- Regression curves with confidence intervals
Advanced Usage
Custom Property Scales
If using non-default scales (requires modifying underlying R script):
# Note: Advanced usage - may require script modification [CDR3AAPhyschem] [CDR3AAPhyschem.envs] # Specify alternative hydrophobicity scale hydro_scale = "Wimley" pK_source = "Murray"
Length-Based Stratification
[CDR3AAPhyschem] [CDR3AAPhyschem.envs] # Analyze by CDR3 length bins group = "CellType" comparison = ["Treg", "Tconv"] # Use metadata column with length information each = "CDR3_Length_Bin" chain = "TRB"
Publication-Ready Plots
[CDR3AAPhyschem] [CDR3AAPhyschem.envs] group = "CellType" comparison = {Treg = "Treg", Tconv = "Tconv"} target = "Treg" chain = "TRB" # Publication parameters plot_theme = "nature" fig_dpi = 300 fig_format = "pdf"