Awesome-omni-skill screpcombiningexpression
Combine scTCR/BCR repertoire data with scRNA-seq expression data using `scRepertoire::combineExpression()`. This process integrates immune receptor information (CDR3 sequences, V(D)J genes, clonotypes) into a Seurat object's metadata, enabling clonotype-aware gene expression analysis.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/backend/screpcombiningexpression" ~/.claude/skills/diegosouzapw-awesome-omni-skill-screpcombiningexpression && rm -rf "$T"
skills/backend/screpcombiningexpression/SKILL.mdScRepCombiningExpression Process Configuration
Purpose
Combine scTCR/BCR repertoire data with scRNA-seq expression data using
scRepertoire::combineExpression(). This process integrates immune receptor information (CDR3 sequences, V(D)J genes, clonotypes) into a Seurat object's metadata, enabling clonotype-aware gene expression analysis.
When to Use
- When analyzing paired scTCR-seq + scRNA-seq data or paired scBCR-seq + scRNA-seq data
- Required for all downstream TCR/BCR analyses (CDR3Clustering, TESSA, ClonalStats)
- After ScRepLoading and SeuratClustering (or TOrBCellSelection)
- When you need to visualize or analyze clonotype distribution across RNA clusters
Configuration Structure
Process Enablement
[ScRepCombiningExpression] cache = true
Input Specification
[ScRepCombiningExpression.in] # Required inputs screpfile = ["ScRepLoading"] # scRepertoire object from ScRepLoading srtobj = ["SeuratClustering"] # Seurat object with RNA data
Note: Barcodes in
srtobj must match barcodes in screpfile. Ensure both datasets were generated from the same cell calling run.
Environment Variables
[ScRepCombiningExpression.envs] # Clonotype definition cloneCall = "aa" # How to define a clonotype (default: aa) # Chain selection chain = "both" # Which TCR/BCR chains to use (default: both) # Frequency calculation group_by = "Sample" # Group for frequency/proportion calculation (default: Sample) proportion = true # Use proportion (true) or total frequency (false) (default: true) # Filtering options filterNA = false # Remove cells without TCR/BCR data (default: false) # Clone size bins cloneSize = {Rare = 0.0001, Small = 0.001, Medium = 0.01, Large = 0.1, Hyperexpanded = 1} # Label customization addLabel = false # Add label to frequency header (default: false)
Environment Variables Explained
cloneCall: Defines clonotype grouping strategy
: Group by V(D)JC gene usage (e.g., TRBV12-301, TRBJ2-701)gene
: Group by CDR3 nucleotide sequencent
: Group by CDR3 amino acid sequence (default)aa
: Group by V(D)JC gene + CDR3 nucleotide (most specific)strict- Custom: Use a custom column from the data
chain: Which receptor chains to include
: Include both chains (e.g., TRA + TRB for TCR, IGH + IGL/IGK for BCR)both
: T cell receptor alpha chain onlyTRA
: T cell receptor beta chain onlyTRB
: T cell receptor gamma chain onlyTRG
: T cell receptor delta chain onlyTRD
: B cell immunoglobulin heavy chain onlyIGH
: B cell immunoglobulin light chain (lambda/kappa) onlyIGL
group_by: Column for frequency calculation
: Calculate clonotype frequency per sample (default)"Sample"
: Calculate per RNA cluster"seurat_clusters"
orNULL
: Keep input format without grouping"none"- Custom: Use any metadata column name
proportion:
: Calculate as proportion (0-1 scale) within grouptrue
: Calculate as absolute frequency (counts)false
filterNA:
: Remove cells without V(D)J data from Seurat objecttrue
: Keep all cells, addfalse
for non-productive cellsVDJ_Presence = FALSE
cloneSize: Bins for categorizing clone sizes
- Values are thresholds for proportion or frequency
- Keys become new metadata column with categories
- If
, upper limit may auto-adjust based on dataproportion = false
addLabel: Useful when running multiple configurations with different
group_by or cloneCall settings.
Configuration Examples
Minimal Configuration
[ScRepCombiningExpression] [ScRepCombiningExpression.in] screpfile = ["ScRepLoading"] srtobj = ["SeuratClustering"]
Use when: Default CDR3aa clonotype definition is sufficient.
Standard TCR Integration
[ScRepCombiningExpression] [ScRepCombiningExpression.in] screpfile = ["ScRepLoading"] srtobj = ["SeuratClustering"] [ScRepCombiningExpression.envs] # Use CDR3 amino acid for clonotype definition cloneCall = "aa" # Include both TRA and TRB chains chain = "both" # Calculate frequency per sample group_by = "Sample" # Use proportional frequency proportion = true
Use when: Analyzing TCR data with standard parameters.
BCR Heavy+Light Chain Integration
[ScRepCombiningExpression] [ScRepCombiningExpression.in] screpfile = ["ScRepLoading"] srtobj = ["SeuratClustering"] [ScRepCombiningExpression.envs] # BCR data - analyze both heavy and light chains chain = "both" # Use V gene + CDR3aa for specific clonotype definition cloneCall = "aa" # Calculate frequency per sample group_by = "Sample"
Use when: Analyzing paired IGH + IGL/IGK BCR data.
Clonotype by RNA Cluster
[ScRepCombiningExpression] [ScRepCombiningExpression.in] screpfile = ["ScRepLoading"] srtobj = ["SeuratClustering"] [ScRepCombiningExpression.envs] # Calculate clonotype frequency within each RNA cluster group_by = "seurat_clusters" # Use absolute counts instead of proportions proportion = false
Use when: Need to track which RNA clusters contain expanded clones.
Remove Non-Productive Cells
[ScRepCombiningExpression] [ScRepCombiningExpression.in] screpfile = ["ScRepLoading"] srtobj = ["SeuratClustering"] [ScRepCombiningExpression.envs] # Remove cells without productive V(D)J rearrangements filterNA = true
Use when: Analysis should only include cells with receptor data.
Gene-Based Clonotype Definition
[ScRepCombiningExpression] [ScRepCombiningExpression.in] screpfile = ["ScRepLoading"] srtobj = ["SeuratClustering"] [ScRepCombiningExpression.envs] # Group by V(D)JC gene usage only cloneCall = "gene"
Use when: Interested in V gene bias rather than CDR3 specificity.
Common Patterns
Pattern 1: Simple TCR Addition (Default)
[ScRepCombiningExpression] [ScRepCombiningExpression.in] screpfile = ["ScRepLoading"] srtobj = ["SeuratClustering"]
Best for: Initial exploratory analysis with default CDR3aa clonotypes.
Pattern 2: TCR Beta Chain Only
[ScRepCombiningExpression.envs] # Analyze only TRB chain chain = "TRB"
Best for: TRB-focused analyses when TRA data is noisy or unavailable.
Pattern 3: BCR with Custom Clone Bins
[ScRepCombiningExpression.envs] chain = "both" cloneCall = "aa" # Custom bins for BCR clonal expansion cloneSize = {Single = 1, Small = 3, Medium = 10, Large = 50, Hyperexpanded = 500}
Best for: BCR analysis where expansion patterns differ from TCR.
Pattern 4: Frequency by Condition
[ScRepCombiningExpression.envs] # Calculate frequency per experimental condition group_by = "Diagnosis" # Use absolute counts proportion = false
Best for: Comparing clonal expansion across treatment groups.
Pattern 5: Strict Clonotype Definition
[ScRepCombiningExpression.envs] # Most specific: V(D)JC gene + CDR3 nucleotide cloneCall = "strict"
Best for: High-resolution clonotype analysis where gene + CDR3 matter.
Metadata Added to Seurat Object
After running
ScRepCombiningExpression, the Seurat object's metadata will include:
Core Columns (from scRepertoire)
: V(D)JC gene names (e.g., "TRBV12-301_TRBJ2-701" for paired chains)CTgene
: CDR3 nucleotide sequencesCTnt
: CDR3 amino acid sequences (separated byCTaa
for paired chains)_
: Count of cells in each clonotypeCTcount
: Frequency of clonotype within group (ifCTfrequency
set)group_by
: Proportion of clonotype within group (ifCTproportion
set)group_by
: Category fromcloneSize
bins (Rare, Small, Medium, Large, Hyperexpanded)cloneSize
Custom Column (immunopipe-specific)
: Boolean indicating if cell has TCR/BCR sequenceVDJ_Presence
: Productive V(D)J rearrangement detectedTRUE
: No productive V(D)J dataFALSE
Chain-Specific Columns (when applicable)
,TRA_1
: Individual CDR3aa sequences for TRA chainsTRA_2
,TRB_1
: Individual CDR3aa sequences for TRB chainsTRB_2
,IGH_1
: Individual CDR3aa sequences for IGH chainsIGH_2
,IGL_1
: Individual CDR3aa sequences for IGL/IGK chainsIGL_2
Dependencies
Upstream Processes
- ScRepLoading (required): Loads TCR/BCR data from raw files
- SeuratClustering or TOrBCellSelection (required): Provides RNA data with clustering
Downstream Processes
- CDR3Clustering: Clusters TCR/BCR clones by CDR3 similarity
- TESSA: TCR-specific analysis using clonotype information
- ClonalStats: Visualizes clonal statistics and diversity
- CDR3AAPhyschem: Analyzes CDR3 physicochemical properties
- SeuratClusterStats: Uses combined metadata for cluster statistics
Data Flow
ScRepLoading (TCR/BCR data) ↓ SeuratClustering (RNA clusters) ↓ ScRepCombiningExpression (integrate) ↓ CDR3Clustering / TESSA / ClonalStats
Validation Rules
Barcode Matching
- Critical: Cell barcodes in
must exactly match barcodes inscrepfilesrtobj - Common cause: Different cell calling algorithms or filtering thresholds
- Solution: Re-run cell calling with same parameters or manually filter to common barcodes
Clonotype Definition Constraints
requires both V(D)J genes AND CDR3 nt to matchcloneCall = "strict"
only uses gene usage (may over-group clones)cloneCall = "gene"
is recommended for most analyses (balance specificity and grouping)cloneCall = "aa"
Chain Selection Rules
requires paired data (TRA+TRB for TCR, IGH+IGL/IGK for BCR)chain = "both"- If single-chain data, specify which chain:
orchain = "TRB"chain = "IGH" - Unpaired chains will result in NA values in CTaa column
Troubleshooting
Issue: No metadata columns added
Possible causes:
- Barcode mismatch between RNA and TCR/BCR data
- All cells filtered out (low QC or missing V(D)J data)
- Empty TCR/BCR input file
Diagnosis:
# Check barcode overlap length(intersect(Cells(srtobj), rownames(screpfile)))
Solution: Ensure both datasets come from same cell calling run, or manually filter to common barcodes.
Issue: Many cells with VDJ_Presence = FALSE
VDJ_Presence = FALSEPossible causes:
- Low sequencing depth for V(D)J library
- Stringent V(D)J calling filters
- Non-T/B cells in dataset (e.g., myeloid cells)
Solution:
- Check V(D)J library quality metrics
- Verify cell type composition
- Consider
to remove non-productive cellsfilterNA = true
Issue: Empty CTaa
column
CTaaPossible causes:
- Wrong chain selected (e.g.,
for alpha-only T cells)chain = "TRB" - Paired chain data but single-chain selection
- Non-productive rearrangements filtered out
Solution:
- Use
if paired data availablechain = "both" - Verify which chains are present in raw contig file
- Check
output for chain distributionScRepLoading
Issue: Clonotype frequency doesn't sum to 1
Possible causes:
includes cells without V(D)J datagroup_by
(using counts, not proportions)proportion = false- Multiple samples with different cell counts
Solution:
- Check if
should be enabledfilterNA = true - Verify
column is appropriategroup_by - Use
for normalized frequenciesproportion = true
Issue: All clones in "Rare" category
Possible causes:
thresholds too high for datacloneSize- Low clonal expansion in dataset
with absolute countsproportion = false
Solution:
- Adjust
bins to match data distributioncloneSize - Check if dataset truly has expanded clones
- Use
for relative clone sizesproportion = true
Best Practices
Data Preparation
- Run same cell calling for RNA and V(D)J libraries
- Apply consistent QC filters before integration
- Verify sample matching between RNA and TCR/BCR metadata
Parameter Selection
- Use
for most analyses (default)cloneCall = "aa" - Use
for paired receptor datachain = "both" - Set
for frequency calculations (default)group_by = "Sample" - Keep
to distinguish productive vs non-productive cellsfilterNA = false
Validation
- Check barcode overlap before running:
sum(rownames(srtobj) %in% rownames(screpfile)) - Verify VDJ_Presence distribution:
table(srtobj$VDJ_Presence) - Inspect clone size distribution:
table(srtobj$cloneSize) - Check clonotype counts:
length(unique(srtobj$CTaa))
External References
scRepertoire Documentation
- combineExpression: https://www.borch.dev/uploads/screpertoire/reference/combineexpression
- combineTCR/combineBCR: https://www.borch.dev/uploads/screpertoire/articles/combining_contigs
- Working with Single-Cell Objects: https://www.borch.dev/uploads/screpertoire/articles/attaching_sc
Clonotype Definition
- CDR3aa: Most common - uses amino acid sequence of CDR3 region
- CDR3nt: More specific - uses nucleotide sequence (higher resolution)
- V(D)JC gene: Broader - groups by gene usage only
- Strict: Most specific - requires both gene usage and CDR3nt match
Chain Pairing
- TCR: Typically TRA (alpha) + TRB (beta) paired
- BCR: IGH (heavy) + IGL/IGK (light) paired
- Gamma-delta T cells: TRG + TRD chains