Claude-skill-registry-data metabolicpathwayheterogeneity

Analyzes metabolic pathway heterogeneity within cell populations by calculating normalized enrichment scores (NES) for each pathway across different groups. Quantifies metabolic diversity and identifies pathways with variable activity patterns. Uses principal component analysis and GSEA to assess pathway heterogeneity, revealing subpopulation-specific metabolic states and transitions.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/metabolicpathwayheterogeneity" ~/.claude/skills/majiayu000-claude-skill-registry-data-metabolicpathwayheterogeneity && rm -rf "$T"
manifest: data/metabolicpathwayheterogeneity/SKILL.md
source content

MetabolicPathwayHeterogeneity Process Configuration

Purpose

Analyzes metabolic pathway heterogeneity within cell populations by calculating normalized enrichment scores (NES) for each pathway across different groups. Quantifies metabolic diversity and identifies pathways with variable activity patterns. Uses principal component analysis and GSEA to assess pathway heterogeneity, revealing subpopulation-specific metabolic states and transitions.

When to Use

  • Assess metabolic diversity: When you need to quantify metabolic heterogeneity within clusters or conditions
  • Identify variable pathways: To find which metabolic pathways show high vs low heterogeneity across groups
  • Compare metabolic variability: To compare heterogeneity between treatments, timepoints, or cell types
  • After pathway activity analysis: Complements MetabolicPathwayActivity by adding heterogeneity dimension
  • Final metabolic analysis step: Typically the last process in the ScrnaMetabolicLandscape workflow
  • Subpopulation discovery: When metabolic variability suggests hidden substructure

Configuration Structure

Process Enablement

MetabolicPathwayHeterogeneity is part of the ScrnaMetabolicLandscape group. Enable it by enabling the group:

[ScrnaMetabolicLandscape]
cache = true

Input Specification

MetabolicPathwayHeterogeneity receives input automatically from MetabolicInput (or MetabolicExprImputation if imputation enabled):

[ScrnaMetabolicLandscape.in]
srtobj = ["SeuratClustering"]  # Input from upstream clustering process

Environment Variables

All configuration is done at the ScrnaMetabolicLandscape group level:

[ScrnaMetabolicLandscape.envs]
# Core configuration (inherited by all metabolic processes)
gmtfile = "KEGG_2021_Human"           # Metabolic pathways database
group_by = "seurat_clusters"          # Column to group cells (e.g., "cluster")
subset_by = "treatment"               # Optional: Subset by metadata column
ncores = 1                            # Number of cores for parallelization

MetabolicPathwayHeterogeneity-Specific Configuration

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
# Principal component selection for heterogeneity analysis
select_pcs = 0.8                      # Fraction or number of PCs to use (0-1 = fraction, >1 = number)

# Pathway significance filtering
pathway_pval_cutoff = 0.01            # P-value cutoff to select enriched pathways

# Parallelization
ncores = 1                            # Cores for parallel processing (inherited from group if not set)

# fgsea parameters
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
scoreType = "std"                     # Options: "std", "pos", "neg"
nproc = 1                             # fgsea internal parallelization
minSize = 15                          # Minimum pathway size
maxSize = 500                         # Maximum pathway size

# Plots configuration
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots]
"Pathway Heterogeneity" = {
    plot_type = "dot",                 # Options: dot, heatmap
    devpars = { res = 100 }            # Plot resolution
}

# Multiple analysis cases (advanced)
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.cases]
"Treatment" = {
    subset_by = "treatment",
    group_by = "seurat_clusters",
    select_pcs = 0.8,
    pathway_pval_cutoff = 0.01
}

Heterogeneity Analysis Method

Algorithm Overview

  1. PCA on pathway scores: Principal component analysis on pathway activity scores per group
  2. PC selection: Select top PCs explaining variance (controlled by
    select_pcs
    )
  3. GSEA on PCs: Run fgsea for each PC to identify pathways correlating with variance
  4. NES calculation: Normalized Enrichment Score quantifies pathway-PC association
  5. Heterogeneity metric: Pathways with high |NES| show high heterogeneity across groups

PC Selection (
select_pcs
)

ValueInterpretationUse Case
0.8 (default)Use PCs explaining 80% varianceBalanced approach
0.5Use PCs explaining 50% varianceFocus on major variation sources
0.95Use PCs explaining 95% varianceComprehensive analysis (slower)
5 (integer)Use exactly 5 PCsManual control
10Use exactly 10 PCsFine-grained heterogeneity

Recommendation: Start with

0.8
(default), increase to
0.9-0.95
if you suspect hidden heterogeneity.

Pathway P-value Cutoff

pathway_pval_cutoff = 0.01  # Only pathways with p < 0.01 are analyzed
  • Purpose: Filter to significantly enriched pathways before heterogeneity analysis
  • Lower values (0.001): Strict, only highly significant pathways
  • Higher values (0.05): Permissive, include more pathways
  • Default (0.01): Good balance for most analyses

FGSEA Score Type

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
scoreType = "std"  # Options: "std", "pos", "neg"
Score TypeDescriptionUse Case
stdStandard GSEA (both directions)Default; detects heterogeneity in both high/low activity
posOnly positive enrichmentFocus on pathways with increased activity variation
negOnly negative enrichmentFocus on pathways with decreased activity variation

GMT File Sources

The

gmtfile
parameter accepts:

  • Built-in databases:
    "KEGG_2021_Human"
    ,
    "Reactome_Pathways_2024"
    ,
    "BioCarta_2016"
    ,
    "MSigDB_Hallmark_2020"
  • Custom files: Local paths or URLs to GMT format files
  • See
    /skills/processes/metabolicinput.md
    for detailed database options

Configuration Examples

Minimal Configuration (Default Settings)

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.in]
srtobj = ["SeuratClustering"]

[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"

Comprehensive Heterogeneity Analysis

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"
ncores = 8

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.95                      # Capture 95% of variance
pathway_pval_cutoff = 0.01             # Significant pathways only
ncores = 8

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
scoreType = "std"
nproc = 8
minSize = 10
maxSize = 500

Treatment Comparison Heterogeneity

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "Reactome_Pathways_2024"
group_by = "seurat_clusters"
subset_by = "treatment"                # Compare heterogeneity between treatments

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.8
pathway_pval_cutoff = 0.05             # More permissive for exploratory analysis

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots]
"Treatment Heterogeneity" = {
    plot_type = "dot",
    devpars = { width = 1200, height = 800, res = 150 }
}

High-Resolution Publication Plots

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 10                        # Use exactly 10 PCs
pathway_pval_cutoff = 0.01

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots]
"Pathway Heterogeneity" = {
    plot_type = "dot",
    devpars = { width = 1600, height = 1200, res = 300 }
}

Multiple Analysis Cases

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
ncores = 8

# Case 1: Overall cluster heterogeneity
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.cases.Clusters]
group_by = "seurat_clusters"
select_pcs = 0.8
pathway_pval_cutoff = 0.01
plots = {
    "Cluster Heterogeneity" = { plot_type = "dot", devpars = { res = 150 } }
}

# Case 2: Treatment-specific heterogeneity
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.cases.Treatment]
subset_by = "treatment"
group_by = "seurat_clusters"
select_pcs = 0.9
pathway_pval_cutoff = 0.01
plots = {
    "Treatment Heterogeneity" = { plot_type = "dot", devpars = { res = 150 } }
}

Conservative Analysis (Major Variance Only)

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.5                       # Only major variance (50%)
pathway_pval_cutoff = 0.001            # Very strict significance

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
scoreType = "std"
minSize = 20                           # Larger pathways only
maxSize = 300

Common Patterns

Pattern 1: Standard Heterogeneity Analysis

Assess metabolic variability across clusters:

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.8
pathway_pval_cutoff = 0.01

Pattern 2: Treatment Response Variability

Compare heterogeneity between responders and non-responders:

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "Reactome_Pathways_2024"
subset_by = "response"                 # "responder" vs "nonresponder"
group_by = "seurat_clusters"

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.85
pathway_pval_cutoff = 0.01

Pattern 3: Timepoint Progression

Analyze heterogeneity changes over time:

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
subset_by = "timepoint"                # e.g., "day0", "day7", "day14"
group_by = "seurat_clusters"

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.9                       # Comprehensive to detect gradual changes
pathway_pval_cutoff = 0.01

Pattern 4: Energy Metabolism Focus

Analyze heterogeneity in glycolysis and OXPHOS:

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "/data/pathways/energy_metabolism.gmt"  # Custom GMT
group_by = "seurat_clusters"

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.8
pathway_pval_cutoff = 0.05             # More permissive for focused analysis

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
minSize = 5                            # Allow smaller custom pathways

Pattern 5: High-Throughput Parallel Execution

Large dataset with many groups:

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"
ncores = 16

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.8
ncores = 16

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
nproc = 1  # Parallelize at group level, not within fgsea

Dependencies

Upstream Processes

  • Required:
    MetabolicInput
    (part of ScrnaMetabolicLandscape group)
  • Optional:
    MetabolicExprImputation
    (if imputation enabled with
    noimpute = false
    )
  • Root:
    CombinedInput
    → requires
    SeuratClustering
    or similar clustering process

Downstream Processes

  • Parallel: Runs alongside
    MetabolicPathwayActivity
    and
    MetabolicFeatures
    (same group)
  • Typically final: Usually the last metabolic analysis step
  • Optional: Can feed into visualization or reporting processes

Data Requirements

  • Seurat object with normalized expression data
  • Metadata column specified in
    group_by
    (e.g., cluster assignments)
  • Optional metadata column in
    subset_by
    for subset analysis
  • GMT file with metabolic pathway gene sets matching Seurat object gene names
  • Multiple groups required (>2) for meaningful heterogeneity analysis

Output Format

Output Files

MetabolicPathwayHeterogeneity generates the following outputs in the

outdir
directory (default:
{{in.sobjfile | stem}}.pathwayhetero
):

  • Heterogeneity scores: TSV files with NES per pathway and PC
    • Columns: pathway, PC1_NES, PC1_pval, PC2_NES, PC2_pval, ...
  • Dot plots: Visualization of pathway heterogeneity across groups and subsets
  • PCA results: PC variance explained, loadings
  • Pathway rankings: Pathways ranked by heterogeneity magnitude

Result Interpretation

  • High |NES|: Pathway shows high heterogeneity (variable activity across groups)
  • Low |NES|: Pathway shows low heterogeneity (uniform activity across groups)
  • Positive NES: Pathway correlated with PC (increases along PC axis)
  • Negative NES: Pathway anti-correlated with PC (decreases along PC axis)
  • P-value: Statistical significance of pathway-PC association

Biological Interpretation

  • High heterogeneity pathways: Indicate metabolic subpopulations or transitions
  • Low heterogeneity pathways: Indicate core/constitutive metabolism
  • Group-specific patterns: Suggest differential metabolic regulation
  • PC loadings: Reveal metabolic axes of variation

Validation Rules

Input Validation

  • gmtfile
    must be a valid enrichit database name OR accessible GMT file
  • Gene names in GMT file must match Seurat object (case-sensitive)
  • group_by
    column must exist in Seurat object metadata
  • If
    subset_by
    specified, column must exist and NA values will be removed

Parameter Validation

  • select_pcs
    : If 0 < value <= 1, interpreted as fraction; if > 1, interpreted as number
  • pathway_pval_cutoff
    : Must be between 0 and 1 (typically 0.001-0.1)
  • ncores
    : Must be positive integer
  • At least 2 groups required in
    group_by
    for heterogeneity analysis

FGSEA Validation

  • scoreType
    must be one of: "std", "pos", "neg"
  • minSize
    <
    maxSize
  • minSize
    >= 1
  • At least one pathway must meet size and significance criteria

Troubleshooting

Issue: All heterogeneity scores are zero or very low

Cause: Groups have similar metabolic profiles or insufficient variance Solution:

# Increase PC capture
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.95  # Capture more variance

# Relax pathway significance
pathway_pval_cutoff = 0.05

# Check if groups are truly different
# May need to refine group_by or subset_by columns

Issue: Process too slow

Cause: High

select_pcs
or insufficient parallelization Solution:

# Reduce PCs
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.7  # Use fewer PCs

# Increase parallelization
ncores = 8

[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
nproc = 8

Issue: No significant pathways after filtering

Cause:

pathway_pval_cutoff
too strict or weak biological signal Solution:

# Relax cutoff
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
pathway_pval_cutoff = 0.05  # Increase from 0.01

# Or reduce pathway size filters
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
minSize = 10
maxSize = 800

Issue: Memory errors during PCA

Cause: Too many pathways or groups Solution:

# Reduce number of PCs
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 5  # Use fixed small number

# Or filter pathways more aggressively
pathway_pval_cutoff = 0.001

# Reduce parallelization
ncores = 2

Issue: Results don't make biological sense

Cause: Inappropriate PC selection or wrong grouping variable Solution:

# Try different PC thresholds
select_pcs = 0.8  # Default
# vs
select_pcs = 10    # Fixed number

# Verify grouping variable is biologically meaningful
group_by = "refined_clusters"  # More biologically relevant than raw seurat_clusters

# Check if subset_by is creating meaningful comparisons
subset_by = "cell_type"  # More specific than "treatment"

Issue: Dot plot unreadable (too many pathways)

Cause: Too many significant pathways Solution:

# Stricter filtering
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
pathway_pval_cutoff = 0.001  # Very strict

# Or use custom GMT with fewer pathways
[ScrnaMetabolicLandscape.envs]
gmtfile = "/data/pathways/core_metabolism.gmt"

# Increase plot size
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots]
"Pathway Heterogeneity" = {
    plot_type = "dot",
    devpars = { width = 2000, height = 1500, res = 150 }
}

Issue: Gene name mismatch errors

Cause: GMT file gene names don't match Seurat object Solution:

  • Check gene format: Human (UPPERCASE), Mouse (TitleCase)
  • Verify GMT file format:
    name\tdescription\tgene1\tgene2\tgene3
  • Ensure gene IDs match (e.g., both ENSEMBL or both symbols)

Issue: Insufficient groups for heterogeneity analysis

Cause: Only 1-2 groups in

group_by
column Solution:

# Use different grouping variable with more groups
[ScrnaMetabolicLandscape.envs]
group_by = "seurat_clusters"  # Should have >2 clusters

# Or add subset_by to create comparisons
subset_by = "treatment"  # Creates multiple group sets

Issue: PCA explains very little variance

Cause: Pathways have little variation across groups Solution:

  • Verify upstream pathway activity analysis completed successfully
  • Check if groups are truly metabolically distinct
  • Try different pathway database (e.g., KEGG → Reactome)
  • Consider different
    group_by
    variable

External References

Original Paper

Metabolic landscape methodology:

Analytical Methods

PCA (Principal Component Analysis):

GSEA (Gene Set Enrichment Analysis):

fgsea (Fast GSEA):

Tool Documentation

GMT Databases

Related Skills

  • ScrnaMetabolicLandscape:
    /skills/processes/scrnametaboliclandscape.md
    - Full metabolic analysis group
  • MetabolicInput:
    /skills/processes/metabolicinput.md
    - Input preparation and GMT databases
  • MetabolicPathwayActivity:
    /skills/processes/metabolicpathwayactivity.md
    - Pathway activity scoring (AUCell-based)
  • MetabolicFeatures:
    /skills/processes/metabolicfeatures.md
    - Pathway enrichment analysis (FGSEA-based)

Decision Tree for Heterogeneity Analysis

Start: MetabolicPathwayHeterogeneity
│
├─ Do groups show obvious metabolic differences?
│  ├─ YES → Use default settings (select_pcs = 0.8)
│  └─ NO → Increase PC capture (select_pcs = 0.95)
│
├─ How many groups in analysis?
│  ├─ 2-5 groups → Use select_pcs = 0.8 (major variance)
│  ├─ 6-10 groups → Use select_pcs = 0.85 (balanced)
│  └─ >10 groups → Use select_pcs = 0.9 (comprehensive)
│
├─ Exploratory vs confirmatory analysis?
│  ├─ Exploratory → pathway_pval_cutoff = 0.05 (permissive)
│  └─ Confirmatory → pathway_pval_cutoff = 0.01 (default)
│
└─ Computational resources available?
   ├─ Limited → select_pcs = 0.7, ncores = 2
   ├─ Moderate → select_pcs = 0.8, ncores = 4
   └─ Abundant → select_pcs = 0.95, ncores = 16

Interpretation Guide

High Heterogeneity Pathways (|NES| > 2)

  • Biological significance: Indicates metabolic subpopulations or transitional states
  • Follow-up: Investigate which groups drive heterogeneity (check PC loadings)
  • Examples: Glycolysis (Warburg effect heterogeneity), fatty acid oxidation (metabolic flexibility)

Low Heterogeneity Pathways (|NES| < 1)

  • Biological significance: Core/constitutive metabolism shared across groups
  • Follow-up: May serve as normalizing factors or housekeeping metabolism
  • Examples: Basic nucleotide metabolism, core TCA cycle

PC Interpretation

  • PC1: Usually captures major metabolic axis (e.g., proliferative vs quiescent)
  • PC2-3: Secondary axes (e.g., treatment response, differentiation state)
  • Higher PCs: Fine-grained variation or technical noise

Comparing Cases

When using multiple cases (e.g., treatment vs control):

  • Similar heterogeneity: Pathway variability intrinsic to cell populations
  • Differential heterogeneity: Treatment-induced metabolic plasticity or restriction