Claude-skill-registry-data metabolicfeatures
Performs enrichment analysis (GSEA-based) for metabolic pathways across different cell groups to identify significantly enriched pathways. Uses fast gene set enrichment analysis (fgsea package) to rank pathways by their association with specific clusters, conditions, or cell states. Generates summary plots and enrichment visualizations for biological interpretation.
git clone https://github.com/majiayu000/claude-skill-registry-data
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/metabolicfeatures" ~/.claude/skills/majiayu000-claude-skill-registry-data-metabolicfeatures && rm -rf "$T"
data/metabolicfeatures/SKILL.mdMetabolicFeatures Process Configuration
Purpose
Performs enrichment analysis (GSEA-based) for metabolic pathways across different cell groups to identify significantly enriched pathways. Uses fast gene set enrichment analysis (fgsea package) to rank pathways by their association with specific clusters, conditions, or cell states. Generates summary plots and enrichment visualizations for biological interpretation.
When to Use
- Identify differentially active pathways: When you need to find which metabolic pathways are enriched in specific cell groups
- Compare pathway enrichment: To identify metabolic differences between clusters, treatments, or conditions
- After pathway activity scoring: Complements MetabolicPathwayActivity by providing statistical enrichment (p-values, FDR)
- Part of ScrnaMetabolicLandscape: Runs in parallel with MetabolicPathwayActivity and MetabolicPathwayHeterogeneity
- GSEA-based analysis: When you want enrichment scores based on ranked gene lists (signal-to-noise, t-test, fold change)
Configuration Structure
Process Enablement
MetabolicFeatures is part of the ScrnaMetabolicLandscape group. Enable it by enabling the group:
[ScrnaMetabolicLandscape] cache = true
Input Specification
MetabolicFeatures receives input automatically from MetabolicInput (or MetabolicExprImputation if imputation enabled):
[ScrnaMetabolicLandscape.in] srtobj = ["SeuratClustering"] # Input from upstream clustering process
Environment Variables
All configuration is done at the ScrnaMetabolicLandscape group level:
[ScrnaMetabolicLandscape.envs] # Core configuration (inherited by all metabolic processes) gmtfile = "KEGG_2021_Human" # Metabolic pathways database group_by = "seurat_clusters" # Column to group cells (e.g., "cluster") subset_by = "treatment" # Optional: Subset by metadata column ncores = 1 # Number of cores for parallelization
MetabolicFeatures-Specific Configuration
[ScrnaMetabolicLandscape.MetabolicFeatures.envs] # Gene ranking method for GSEA prerank_method = "signal_to_noise" # Options: signal_to_noise, abs_signal_to_noise, t_test, ratio_of_classes, diff_of_classes, log2_ratio_of_classes ncores = 1 # Cores for parallel fgsea execution # Comparison groups (optional - defaults to all vs all) comparisons = [] # e.g., ["1", "2"] or ["1:2", "1:3"] # fgsea parameters [ScrnaMetabolicLandscape.MetabolicFeatures.envs.fgsea_args] minSize = 15 # Minimum pathway size maxSize = 500 # Maximum pathway size nproc = 1 # fgsea internal parallelization # Plots configuration [ScrnaMetabolicLandscape.MetabolicFeatures.envs.plots] "Summary Plot" = { plot_type = "summary", # Options: summary, gsea, dot top_term = 10, # Number of top pathways to show devpars = { res = 100 } # Plot resolution } "Enrichment Plots" = { plot_type = "gsea", # GSEA enrichment plot top_term = 10, devpars = { res = 100 } } # Multiple analysis cases (advanced) [ScrnaMetabolicLandscape.MetabolicFeatures.envs.cases] "Treatment" = { subset_by = "treatment", group_by = "seurat_clusters", prerank_method = "signal_to_noise", comparisons = [] }
Gene Ranking Methods (prerank_method)
Available Methods
| Method | Code | Description | Use Case |
|---|---|---|---|
| Signal to Noise | or | (mean1 - mean2) / (sd1 + sd2) | Default; balanced approach |
| Absolute S2N | or | abs(signal_to_noise) | Magnitude-focused ranking |
| T-test | | (mean1 - mean2) / SE | Statistical significance focus |
| Ratio of Classes | | mean1 / mean2 | Fold change (natural scale) |
| Diff of Classes | | mean1 - mean2 | Absolute difference |
| Log2 Ratio | | log2(mean1 / mean2) | Fold change (log scale, recommended for log-normalized data) |
Method Selection Guide
Signal to Noise (Default): Best for most analyses
- Accounts for both mean difference and variance
- Robust to outliers
- Recommended starting point
T-test: When statistical significance is priority
- Incorporates sample size
- Good for unbalanced groups
Log2 Ratio: For log-normalized data (Seurat default)
- Recommended for interpreting fold changes
- Standard in RNA-seq analysis
Ratio/Diff of Classes: For natural scale data
- Use if data is NOT log-normalized
- Direct fold change interpretation
FGSEA Parameters
Core Parameters
[ScrnaMetabolicLandscape.MetabolicFeatures.envs.fgsea_args] minSize = 15 # Minimum genes in pathway (filter small pathways) maxSize = 500 # Maximum genes in pathway (filter very broad pathways) nproc = 1 # Internal fgsea parallelization (set to ncores for speedup) eps = 1e-50 # Epsilon for p-value calculation (lower = more precise)
FGSEA Algorithm
MetabolicFeatures uses the fgsea R package for fast GSEA:
- Kolmogorov-Smirnov-like test: Tests if pathway genes are enriched at top/bottom of ranked list
- Permutation-based: Generates null distribution by permuting gene labels
- Normalized Enrichment Score (NES): Accounts for pathway size differences
- FDR correction: Multiple testing correction for significance
Reference
fgsea documentation: https://rdrr.io/bioc/fgsea/man/fgsea.html
Comparison Groups
Automatic Comparisons (Default)
# Empty comparisons = all groups vs all groups [ScrnaMetabolicLandscape.MetabolicFeatures.envs] comparisons = [] # Each group compared to all other groups
If
group_by = "seurat_clusters" has clusters 1, 2, 3:
- Cluster 1 vs (2+3)
- Cluster 2 vs (1+3)
- Cluster 3 vs (1+2)
Specific Groups
# Only analyze specific groups comparisons = ["1", "2"] # Only clusters 1 and 2 vs rest
Results:
- Cluster 1 vs (2+3+...)
- Cluster 2 vs (1+3+...)
Pairwise Comparisons
# Explicit pairwise comparisons comparisons = ["1:2", "1:3", "2:3"]
Results:
- Cluster 1 vs Cluster 2
- Cluster 1 vs Cluster 3
- Cluster 2 vs Cluster 3
GMT File Sources
The
gmtfile parameter accepts:
- Built-in databases:
,"KEGG_2021_Human"
,"Reactome_Pathways_2024"
,"BioCarta_2016""MSigDB_Hallmark_2020" - Custom files: Local paths or URLs to GMT format files
- See
for detailed database options/skills/processes/metabolicinput.md
Configuration Examples
Minimal Configuration (Default Settings)
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.in] srtobj = ["SeuratClustering"] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" group_by = "seurat_clusters"
Custom GSEA Parameters
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" group_by = "seurat_clusters" [ScrnaMetabolicLandscape.MetabolicFeatures.envs] prerank_method = "log2_ratio_of_classes" # Fold change for log-normalized data ncores = 4 [ScrnaMetabolicLandscape.MetabolicFeatures.envs.fgsea_args] minSize = 10 # Allow smaller pathways maxSize = 300 # Restrict to core pathways nproc = 4 # Parallel fgsea
Pairwise Treatment Comparison
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "Reactome_Pathways_2024" group_by = "treatment" ncores = 8 [ScrnaMetabolicLandscape.MetabolicFeatures.envs] prerank_method = "t_test" comparisons = ["control:treated", "control:resistant"] [ScrnaMetabolicLandscape.MetabolicFeatures.envs.fgsea_args] minSize = 15 maxSize = 500 nproc = 8
High-Resolution Publication Plots
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" group_by = "seurat_clusters" [ScrnaMetabolicLandscape.MetabolicFeatures.envs] prerank_method = "signal_to_noise" [ScrnaMetabolicLandscape.MetabolicFeatures.envs.plots] "Top Enriched Pathways" = { plot_type = "summary", top_term = 20, devpars = { width = 1600, height = 1200, res = 300 } } "GSEA Enrichment Curves" = { plot_type = "gsea", top_term = 10, devpars = { width = 1400, height = 1000, res = 300 } }
Multiple Analysis Cases
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" ncores = 8 # Case 1: Cluster-based enrichment [ScrnaMetabolicLandscape.MetabolicFeatures.envs.cases.Clusters] group_by = "seurat_clusters" prerank_method = "signal_to_noise" comparisons = [] plots = { "Cluster Enrichment" = { plot_type = "summary", top_term = 15, devpars = { res = 150 } } } # Case 2: Treatment response [ScrnaMetabolicLandscape.MetabolicFeatures.envs.cases.Treatment] subset_by = "response" group_by = "treatment" prerank_method = "log2_ratio_of_classes" comparisons = ["responder:nonresponder"] plots = { "Response Enrichment" = { plot_type = "gsea", top_term = 10, devpars = { res = 150 } } }
Common Patterns
Pattern 1: Standard Pathway Enrichment
Identify enriched pathways per cluster:
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" group_by = "seurat_clusters" [ScrnaMetabolicLandscape.MetabolicFeatures.envs] prerank_method = "signal_to_noise"
Pattern 2: Treatment vs Control
Compare metabolic enrichment between conditions:
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "Reactome_Pathways_2024" group_by = "treatment" [ScrnaMetabolicLandscape.MetabolicFeatures.envs] prerank_method = "log2_ratio_of_classes" comparisons = ["control:treated"]
Pattern 3: Specific Pathway Focus
Analyze only glycolysis and OXPHOS:
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "/data/pathways/glycolysis_oxphos.gmt" group_by = "seurat_clusters" [ScrnaMetabolicLandscape.MetabolicFeatures.envs] prerank_method = "signal_to_noise" [ScrnaMetabolicLandscape.MetabolicFeatures.envs.fgsea_args] minSize = 5 # Allow smaller custom pathways
Pattern 4: Subset-Specific Analysis
Compare enrichment within T cell subsets only:
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" subset_by = "celltype" # Metadata column to subset by group_by = "activation_state" [ScrnaMetabolicLandscape.MetabolicFeatures.envs] prerank_method = "t_test" # Will analyze only cells where celltype == "T cell"
Pattern 5: High-Throughput Parallel Execution
Large dataset with many comparisons:
[ScrnaMetabolicLandscape] [ScrnaMetabolicLandscape.envs] gmtfile = "KEGG_2021_Human" group_by = "seurat_clusters" ncores = 16 [ScrnaMetabolicLandscape.MetabolicFeatures.envs] prerank_method = "signal_to_noise" ncores = 16 [ScrnaMetabolicLandscape.MetabolicFeatures.envs.fgsea_args] nproc = 1 # Parallelize at comparison level, not within fgsea
Dependencies
Upstream Processes
- Required:
(part of ScrnaMetabolicLandscape group)MetabolicInput - Optional:
(if imputation enabled withMetabolicExprImputation
)noimpute = false - Root:
→ requiresCombinedInput
or similar clustering processSeuratClustering
Downstream Processes
- Parallel: Runs alongside
andMetabolicPathwayActivity
(same group)MetabolicPathwayHeterogeneity - Optional: Can feed into visualization or reporting processes
Data Requirements
- Seurat object with normalized expression data
- Metadata column specified in
(e.g., cluster assignments)group_by - Optional metadata column in
for subset analysissubset_by - GMT file with metabolic pathway gene sets matching Seurat object gene names
Output Format
Output Files
MetabolicFeatures generates the following outputs in the
outdir directory (default: {{in.sobjfile | stem}}.pathwayfeatures):
- GSEA results tables: TSV files with pathway enrichment statistics (NES, p-value, FDR)
- Columns: pathway, NES, pval, padj, ES, leading_edge, size
- Summary plots: Bar/dot plots showing top enriched pathways per comparison
- GSEA enrichment plots: Classic GSEA running enrichment score plots for top pathways
- Dot plots: Multi-comparison overviews (case/subset level)
Result Interpretation
- NES (Normalized Enrichment Score): Direction and magnitude of enrichment
- Positive NES: Pathway enriched in group 1 (upregulated)
- Negative NES: Pathway enriched in group 2 (downregulated)
- Magnitude: Strength of enrichment (|NES| > 1.5 typically significant)
- P-value: Statistical significance (raw)
- FDR (padj): Multiple testing corrected p-value (use this for significance)
- Leading edge: Core genes driving enrichment
Validation Rules
Input Validation
must be a valid enrichit database name OR accessible GMT filegmtfile- Gene names in GMT file must match Seurat object (case-sensitive)
column must exist in Seurat object metadatagroup_by- If
specified, column must exist and NA values will be removedsubset_by
Parameter Validation
must be one of: signal_to_noise, s2n, abs_signal_to_noise, abs_s2n, t_test, ratio_of_classes, diff_of_classes, log2_ratio_of_classesprerank_method
must be positive integerncores
must be valid group names fromcomparisons
columngroup_by
FGSEA Validation
<minSizemaxSize
>= 1minSize- At least one pathway must meet size criteria after filtering
Troubleshooting
Issue: No significant pathways found
Cause: Weak biological signal or stringent filtering Solution:
# Relax pathway size filters [ScrnaMetabolicLandscape.MetabolicFeatures.envs.fgsea_args] minSize = 5 maxSize = 1000 # Try different ranking method [ScrnaMetabolicLandscape.MetabolicFeatures.envs] prerank_method = "log2_ratio_of_classes"
Issue: Process too slow
Cause: Many comparisons or insufficient parallelization Solution:
# Increase parallelization [ScrnaMetabolicLandscape.envs] ncores = 8 [ScrnaMetabolicLandscape.MetabolicFeatures.envs] ncores = 8 [ScrnaMetabolicLandscape.MetabolicFeatures.envs.fgsea_args] nproc = 8 # Parallelize fgsea internally # Or reduce comparison scope comparisons = ["1:2", "1:3"] # Only specific comparisons
Issue: Gene name mismatch errors
Cause: GMT file gene names don't match Seurat object Solution:
- Check gene format: Human (UPPERCASE), Mouse (TitleCase)
- Verify GMT file format:
name\tdescription\tgene1\tgene2\tgene3 - Ensure gene IDs match (e.g., both ENSEMBL or both symbols)
Issue: Empty or NA results for some comparisons
Cause: Insufficient cells in comparison groups Solution:
# Check group sizes first # Ensure each group has >30 cells for reliable statistics # Or remove small groups [ScrnaMetabolicLandscape.MetabolicFeatures.envs] comparisons = ["1", "2"] # Exclude small cluster 3
Issue: Plots are unreadable (too many pathways)
Cause:
top_term too high or too many significant pathways
Solution:
# Reduce number of pathways shown [ScrnaMetabolicLandscape.MetabolicFeatures.envs.plots] "Summary Plot" = { plot_type = "summary", top_term = 5, # Show only top 5 pathways devpars = { width = 1200, height = 600, res = 150 } }
Issue: Enrichment scores don't match expectations
Cause: Wrong ranking method for data type Solution:
# For log-normalized Seurat data (default), use: [ScrnaMetabolicLandscape.MetabolicFeatures.envs] prerank_method = "log2_ratio_of_classes" # For raw/natural scale data, use: prerank_method = "ratio_of_classes"
Issue: Memory errors during fgsea
Cause: Too many pathways or parallel processes Solution:
# Reduce parallelization [ScrnaMetabolicLandscape.MetabolicFeatures.envs] ncores = 2 [ScrnaMetabolicLandscape.MetabolicFeatures.envs.fgsea_args] nproc = 1 # Or filter pathways more aggressively minSize = 20 maxSize = 300
External References
Original Papers
fgsea algorithm:
- Korotkevich, G. et al. (2021). Fast gene set enrichment analysis. bioRxiv. https://www.biorxiv.org/content/10.1101/060012v3
GSEA methodology:
- Subramanian, A. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS, 102(43), 15545-15550. https://www.pnas.org/doi/10.1073/pnas.0506580102
Tool Documentation
- fgsea package: https://rdrr.io/bioc/fgsea/man/fgsea.html
- biopipen VizGSEA: https://pwwang.github.io/biopipen.utils.R/reference/VizGSEA.html
- biopipen metabolic pipeline: https://pwwang.github.io/biopipen/pipelines/scrna_metabolic/
GMT Databases
- MSigDB: http://www.gsea-msigdb.org/gsea/msigdb/
- KEGG: https://www.genome.jp/kegg/pathway.html
- Reactome: https://reactome.org/
- enrichit database list: See
/skills/processes/metabolicinput.md
Related Skills
- ScrnaMetabolicLandscape:
- Full metabolic analysis group/skills/processes/scrnametaboliclandscape.md - MetabolicInput:
- Input preparation and GMT databases/skills/processes/metabolicinput.md - MetabolicPathwayActivity:
- Pathway activity scoring (AUCell-based)/skills/processes/metabolicpathwayactivity.md - MetabolicPathwayHeterogeneity:
- Heterogeneity analysis/skills/processes/metabolicpathwayheterogeneity.md