Claude-skill-registry-data metabolicpathwayheterogeneity
Analyzes metabolic pathway heterogeneity within cell populations by calculating normalized enrichment scores (NES) for each pathway across different groups. Quantifies metabolic diversity and identifies pathways with variable activity patterns. Uses principal component analysis and GSEA to assess pathway heterogeneity, revealing subpopulation-specific metabolic states and transitions.
git clone https://github.com/majiayu000/claude-skill-registry-data
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/metabolicpathwayheterogeneity" ~/.claude/skills/majiayu000-claude-skill-registry-data-metabolicpathwayheterogeneity && rm -rf "$T"
data/metabolicpathwayheterogeneity/SKILL.mdMetabolicPathwayHeterogeneity Process Configuration
Purpose
Analyzes metabolic pathway heterogeneity within cell populations by calculating normalized enrichment scores (NES) for each pathway across different groups. Quantifies metabolic diversity and identifies pathways with variable activity patterns. Uses principal component analysis and GSEA to assess pathway heterogeneity, revealing subpopulation-specific metabolic states and transitions.
When to Use
- Assess metabolic diversity: When you need to quantify metabolic heterogeneity within clusters or conditions
- Identify variable pathways: To find which metabolic pathways show high vs low heterogeneity across groups
- Compare metabolic variability: To compare heterogeneity between treatments, timepoints, or cell types
- After pathway activity analysis: Complements MetabolicPathwayActivity by adding heterogeneity dimension
- Final metabolic analysis step: Typically the last process in the ScrnaMetabolicLandscape workflow
- Subpopulation discovery: When metabolic variability suggests hidden substructure
Configuration Structure
Process Enablement
MetabolicPathwayHeterogeneity is part of the ScrnaMetabolicLandscape group. Enable it by enabling the group:
[ScrnaMetabolicLandscape] cache = true
Input Specification
MetabolicPathwayHeterogeneity receives input automatically from MetabolicInput (or MetabolicExprImputation if imputation enabled):
[ScrnaMetabolicLandscape.in] srtobj = ["SeuratClustering"] # Input from upstream clustering process
Environment Variables
All configuration is done at the ScrnaMetabolicLandscape group level:
[ScrnaMetabolicLandscape.envs] # Core configuration (inherited by all metabolic processes) gmtfile = "KEGG_2021_Human" # Metabolic pathways database group_by = "seurat_clusters" # Column to group cells (e.g., "cluster") subset_by = "treatment" # Optional: Subset by metadata column ncores = 1 # Number of cores for parallelization
MetabolicPathwayHeterogeneity-Specific Configuration
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] # Principal component selection for heterogeneity analysis select_pcs = 0.8 # Fraction or number of PCs to use (0-1 = fraction, >1 = number) # Pathway significance filtering pathway_pval_cutoff = 0.01 # P-value cutoff to select enriched pathways # Parallelization ncores = 1 # Cores for parallel processing (inherited from group if not set) # fgsea parameters [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args] scoreType = "std" # Options: "std", "pos", "neg" nproc = 1 # fgsea internal parallelization minSize = 15 # Minimum pathway size maxSize = 500 # Maximum pathway size # Plots configuration [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots] "Pathway Heterogeneity" = { plot_type = "dot", # Options: dot, heatmap devpars = { res = 100 } # Plot resolution } # Multiple analysis cases (advanced) [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.cases] "Treatment" = { subset_by = "treatment", group_by = "seurat_clusters", select_pcs = 0.8, pathway_pval_cutoff = 0.01 }
Heterogeneity Analysis Method
Algorithm Overview
- PCA on pathway scores: Principal component analysis on pathway activity scores per group
- PC selection: Select top PCs explaining variance (controlled by
)select_pcs - GSEA on PCs: Run fgsea for each PC to identify pathways correlating with variance
- NES calculation: Normalized Enrichment Score quantifies pathway-PC association
- Heterogeneity metric: Pathways with high |NES| show high heterogeneity across groups
PC Selection (select_pcs
)
select_pcs| Value | Interpretation | Use Case |
|---|---|---|
| 0.8 (default) | Use PCs explaining 80% variance | Balanced approach |
| 0.5 | Use PCs explaining 50% variance | Focus on major variation sources |
| 0.95 | Use PCs explaining 95% variance | Comprehensive analysis (slower) |
| 5 (integer) | Use exactly 5 PCs | Manual control |
| 10 | Use exactly 10 PCs | Fine-grained heterogeneity |
Recommendation: Start with
0.8 (default), increase to 0.9-0.95 if you suspect hidden heterogeneity.
Pathway P-value Cutoff
pathway_pval_cutoff = 0.01 # Only pathways with p < 0.01 are analyzed
- Purpose: Filter to significantly enriched pathways before heterogeneity analysis
- Lower values (0.001): Strict, only highly significant pathways
- Higher values (0.05): Permissive, include more pathways
- Default (0.01): Good balance for most analyses
FGSEA Score Type
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args] scoreType = "std" # Options: "std", "pos", "neg"
| Score Type | Description | Use Case |
|---|---|---|
| std | Standard GSEA (both directions) | Default; detects heterogeneity in both high/low activity |
| pos | Only positive enrichment | Focus on pathways with increased activity variation |
| neg | Only negative enrichment | Focus on pathways with decreased activity variation |
GMT File Sources
The
gmtfile parameter accepts:
- Built-in databases:
,"KEGG_2021_Human"
,"Reactome_Pathways_2024"
,"BioCarta_2016""MSigDB_Hallmark_2020" - Custom files: Local paths or URLs to GMT format files
- See
for detailed database options/skills/processes/metabolicinput.md
Configuration Examples
Minimal Configuration (Default Settings)
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.in] srtobj = ["SeuratClustering"] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" group_by = "seurat_clusters"
Comprehensive Heterogeneity Analysis
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" group_by = "seurat_clusters" ncores = 8 [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 0.95 # Capture 95% of variance pathway_pval_cutoff = 0.01 # Significant pathways only ncores = 8 [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args] scoreType = "std" nproc = 8 minSize = 10 maxSize = 500
Treatment Comparison Heterogeneity
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "Reactome_Pathways_2024" group_by = "seurat_clusters" subset_by = "treatment" # Compare heterogeneity between treatments [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 0.8 pathway_pval_cutoff = 0.05 # More permissive for exploratory analysis [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots] "Treatment Heterogeneity" = { plot_type = "dot", devpars = { width = 1200, height = 800, res = 150 } }
High-Resolution Publication Plots
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" group_by = "seurat_clusters" [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 10 # Use exactly 10 PCs pathway_pval_cutoff = 0.01 [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots] "Pathway Heterogeneity" = { plot_type = "dot", devpars = { width = 1600, height = 1200, res = 300 } }
Multiple Analysis Cases
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" ncores = 8 # Case 1: Overall cluster heterogeneity [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.cases.Clusters] group_by = "seurat_clusters" select_pcs = 0.8 pathway_pval_cutoff = 0.01 plots = { "Cluster Heterogeneity" = { plot_type = "dot", devpars = { res = 150 } } } # Case 2: Treatment-specific heterogeneity [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.cases.Treatment] subset_by = "treatment" group_by = "seurat_clusters" select_pcs = 0.9 pathway_pval_cutoff = 0.01 plots = { "Treatment Heterogeneity" = { plot_type = "dot", devpars = { res = 150 } } }
Conservative Analysis (Major Variance Only)
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" group_by = "seurat_clusters" [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 0.5 # Only major variance (50%) pathway_pval_cutoff = 0.001 # Very strict significance [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args] scoreType = "std" minSize = 20 # Larger pathways only maxSize = 300
Common Patterns
Pattern 1: Standard Heterogeneity Analysis
Assess metabolic variability across clusters:
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" group_by = "seurat_clusters" [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 0.8 pathway_pval_cutoff = 0.01
Pattern 2: Treatment Response Variability
Compare heterogeneity between responders and non-responders:
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "Reactome_Pathways_2024" subset_by = "response" # "responder" vs "nonresponder" group_by = "seurat_clusters" [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 0.85 pathway_pval_cutoff = 0.01
Pattern 3: Timepoint Progression
Analyze heterogeneity changes over time:
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" subset_by = "timepoint" # e.g., "day0", "day7", "day14" group_by = "seurat_clusters" [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 0.9 # Comprehensive to detect gradual changes pathway_pval_cutoff = 0.01
Pattern 4: Energy Metabolism Focus
Analyze heterogeneity in glycolysis and OXPHOS:
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "/data/pathways/energy_metabolism.gmt" # Custom GMT group_by = "seurat_clusters" [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 0.8 pathway_pval_cutoff = 0.05 # More permissive for focused analysis [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args] minSize = 5 # Allow smaller custom pathways
Pattern 5: High-Throughput Parallel Execution
Large dataset with many groups:
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" group_by = "seurat_clusters" ncores = 16 [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 0.8 ncores = 16 [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args] nproc = 1 # Parallelize at group level, not within fgsea
Dependencies
Upstream Processes
- Required:
(part of ScrnaMetabolicLandscape group)MetabolicInput - Optional:
(if imputation enabled withMetabolicExprImputation
)noimpute = false - Root:
→ requiresCombinedInput
or similar clustering processSeuratClustering
Downstream Processes
- Parallel: Runs alongside
andMetabolicPathwayActivity
(same group)MetabolicFeatures - Typically final: Usually the last metabolic analysis step
- Optional: Can feed into visualization or reporting processes
Data Requirements
- Seurat object with normalized expression data
- Metadata column specified in
(e.g., cluster assignments)group_by - Optional metadata column in
for subset analysissubset_by - GMT file with metabolic pathway gene sets matching Seurat object gene names
- Multiple groups required (>2) for meaningful heterogeneity analysis
Output Format
Output Files
MetabolicPathwayHeterogeneity generates the following outputs in the
outdir directory (default: {{in.sobjfile | stem}}.pathwayhetero):
- Heterogeneity scores: TSV files with NES per pathway and PC
- Columns: pathway, PC1_NES, PC1_pval, PC2_NES, PC2_pval, ...
- Dot plots: Visualization of pathway heterogeneity across groups and subsets
- PCA results: PC variance explained, loadings
- Pathway rankings: Pathways ranked by heterogeneity magnitude
Result Interpretation
- High |NES|: Pathway shows high heterogeneity (variable activity across groups)
- Low |NES|: Pathway shows low heterogeneity (uniform activity across groups)
- Positive NES: Pathway correlated with PC (increases along PC axis)
- Negative NES: Pathway anti-correlated with PC (decreases along PC axis)
- P-value: Statistical significance of pathway-PC association
Biological Interpretation
- High heterogeneity pathways: Indicate metabolic subpopulations or transitions
- Low heterogeneity pathways: Indicate core/constitutive metabolism
- Group-specific patterns: Suggest differential metabolic regulation
- PC loadings: Reveal metabolic axes of variation
Validation Rules
Input Validation
must be a valid enrichit database name OR accessible GMT filegmtfile- Gene names in GMT file must match Seurat object (case-sensitive)
column must exist in Seurat object metadatagroup_by- If
specified, column must exist and NA values will be removedsubset_by
Parameter Validation
: If 0 < value <= 1, interpreted as fraction; if > 1, interpreted as numberselect_pcs
: Must be between 0 and 1 (typically 0.001-0.1)pathway_pval_cutoff
: Must be positive integerncores- At least 2 groups required in
for heterogeneity analysisgroup_by
FGSEA Validation
must be one of: "std", "pos", "neg"scoreType
<minSizemaxSize
>= 1minSize- At least one pathway must meet size and significance criteria
Troubleshooting
Issue: All heterogeneity scores are zero or very low
Cause: Groups have similar metabolic profiles or insufficient variance Solution:
# Increase PC capture [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 0.95 # Capture more variance # Relax pathway significance pathway_pval_cutoff = 0.05 # Check if groups are truly different # May need to refine group_by or subset_by columns
Issue: Process too slow
Cause: High
select_pcs or insufficient parallelization
Solution:
# Reduce PCs [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 0.7 # Use fewer PCs # Increase parallelization ncores = 8 [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args] nproc = 8
Issue: No significant pathways after filtering
Cause:
pathway_pval_cutoff too strict or weak biological signal
Solution:
# Relax cutoff [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] pathway_pval_cutoff = 0.05 # Increase from 0.01 # Or reduce pathway size filters [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args] minSize = 10 maxSize = 800
Issue: Memory errors during PCA
Cause: Too many pathways or groups Solution:
# Reduce number of PCs [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] select_pcs = 5 # Use fixed small number # Or filter pathways more aggressively pathway_pval_cutoff = 0.001 # Reduce parallelization ncores = 2
Issue: Results don't make biological sense
Cause: Inappropriate PC selection or wrong grouping variable Solution:
# Try different PC thresholds select_pcs = 0.8 # Default # vs select_pcs = 10 # Fixed number # Verify grouping variable is biologically meaningful group_by = "refined_clusters" # More biologically relevant than raw seurat_clusters # Check if subset_by is creating meaningful comparisons subset_by = "cell_type" # More specific than "treatment"
Issue: Dot plot unreadable (too many pathways)
Cause: Too many significant pathways Solution:
# Stricter filtering [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs] pathway_pval_cutoff = 0.001 # Very strict # Or use custom GMT with fewer pathways [ScrnaMetabolicLandscape.envs] gmtfile = "/data/pathways/core_metabolism.gmt" # Increase plot size [ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots] "Pathway Heterogeneity" = { plot_type = "dot", devpars = { width = 2000, height = 1500, res = 150 } }
Issue: Gene name mismatch errors
Cause: GMT file gene names don't match Seurat object Solution:
- Check gene format: Human (UPPERCASE), Mouse (TitleCase)
- Verify GMT file format:
name\tdescription\tgene1\tgene2\tgene3 - Ensure gene IDs match (e.g., both ENSEMBL or both symbols)
Issue: Insufficient groups for heterogeneity analysis
Cause: Only 1-2 groups in
group_by column
Solution:
# Use different grouping variable with more groups [ScrnaMetabolicLandscape.envs] group_by = "seurat_clusters" # Should have >2 clusters # Or add subset_by to create comparisons subset_by = "treatment" # Creates multiple group sets
Issue: PCA explains very little variance
Cause: Pathways have little variation across groups Solution:
- Verify upstream pathway activity analysis completed successfully
- Check if groups are truly metabolically distinct
- Try different pathway database (e.g., KEGG → Reactome)
- Consider different
variablegroup_by
External References
Original Paper
Metabolic landscape methodology:
- Xiao, Z. et al. (2019). Metabolic landscape of the tumor microenvironment at single cell resolution. Nature Communications, 10, 3763. https://www.nature.com/articles/s41467-019-11738-0
Analytical Methods
PCA (Principal Component Analysis):
- Standard dimensionality reduction technique for identifying axes of variation
- https://en.wikipedia.org/wiki/Principal_component_analysis
GSEA (Gene Set Enrichment Analysis):
- Subramanian, A. et al. (2005). Gene set enrichment analysis. PNAS, 102(43), 15545-15550. https://www.pnas.org/doi/10.1073/pnas.0506580102
fgsea (Fast GSEA):
- Korotkevich, G. et al. (2021). Fast gene set enrichment analysis. bioRxiv. https://www.biorxiv.org/content/10.1101/060012v3
Tool Documentation
- fgsea package: https://rdrr.io/bioc/fgsea/man/fgsea.html
- biopipen VizGSEA: https://pwwang.github.io/biopipen.utils.R/reference/VizGSEA.html
- biopipen metabolic pipeline: https://pwwang.github.io/biopipen/pipelines/scrna_metabolic/
GMT Databases
- MSigDB: http://www.gsea-msigdb.org/gsea/msigdb/
- KEGG: https://www.genome.jp/kegg/pathway.html
- Reactome: https://reactome.org/
- enrichit database list: See
/skills/processes/metabolicinput.md
Related Skills
- ScrnaMetabolicLandscape:
- Full metabolic analysis group/skills/processes/scrnametaboliclandscape.md - MetabolicInput:
- Input preparation and GMT databases/skills/processes/metabolicinput.md - MetabolicPathwayActivity:
- Pathway activity scoring (AUCell-based)/skills/processes/metabolicpathwayactivity.md - MetabolicFeatures:
- Pathway enrichment analysis (FGSEA-based)/skills/processes/metabolicfeatures.md
Decision Tree for Heterogeneity Analysis
Start: MetabolicPathwayHeterogeneity │ ├─ Do groups show obvious metabolic differences? │ ├─ YES → Use default settings (select_pcs = 0.8) │ └─ NO → Increase PC capture (select_pcs = 0.95) │ ├─ How many groups in analysis? │ ├─ 2-5 groups → Use select_pcs = 0.8 (major variance) │ ├─ 6-10 groups → Use select_pcs = 0.85 (balanced) │ └─ >10 groups → Use select_pcs = 0.9 (comprehensive) │ ├─ Exploratory vs confirmatory analysis? │ ├─ Exploratory → pathway_pval_cutoff = 0.05 (permissive) │ └─ Confirmatory → pathway_pval_cutoff = 0.01 (default) │ └─ Computational resources available? ├─ Limited → select_pcs = 0.7, ncores = 2 ├─ Moderate → select_pcs = 0.8, ncores = 4 └─ Abundant → select_pcs = 0.95, ncores = 16
Interpretation Guide
High Heterogeneity Pathways (|NES| > 2)
- Biological significance: Indicates metabolic subpopulations or transitional states
- Follow-up: Investigate which groups drive heterogeneity (check PC loadings)
- Examples: Glycolysis (Warburg effect heterogeneity), fatty acid oxidation (metabolic flexibility)
Low Heterogeneity Pathways (|NES| < 1)
- Biological significance: Core/constitutive metabolism shared across groups
- Follow-up: May serve as normalizing factors or housekeeping metabolism
- Examples: Basic nucleotide metabolism, core TCA cycle
PC Interpretation
- PC1: Usually captures major metabolic axis (e.g., proliferative vs quiescent)
- PC2-3: Secondary axes (e.g., treatment response, differentiation state)
- Higher PCs: Fine-grained variation or technical noise
Comparing Cases
When using multiple cases (e.g., treatment vs control):
- Similar heterogeneity: Pathway variability intrinsic to cell populations
- Differential heterogeneity: Treatment-induced metabolic plasticity or restriction