Claude-skill-registry-data metabolicinput

Pass-through process that prepares Seurat object for metabolic landscape analysis. Routes the processed Seurat object to downstream metabolic analysis processes (MetabolicExprImputation, MetabolicPathwayActivity, MetabolicFeatures, MetabolicPathwayHeterogeneity). **Note**: This process requires no direct configuration.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry-data

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/metabolicinput" ~/.claude/skills/majiayu000-claude-skill-registry-data-metabolicinput && rm -rf "$T"

manifest: data/metabolicinput/SKILL.md

source content

MetabolicInput Process Configuration

Purpose

Note: This process requires no direct configuration. All metabolic analysis parameters are configured at the ScrnaMetabolicLandscape group level.

When to Use

First step in modular metabolic analysis workflow
When you want to perform metabolic pathway analysis on single-cell RNA-seq data
Alternative to ScrnaMetabolicLandscape (same group, modular approach)
After clustering is complete (SeuratClustering or related processes)
When investigating metabolic heterogeneity across cell types or conditions

Configuration Structure

Process Enablement

[ScrnaMetabolicLandscape]
# This enables the entire metabolic analysis group
# MetabolicInput is automatically included as part of this group

[ScrnaMetabolicLandscape.envs]
# Configure metabolic analysis parameters here

Input Specification

MetabolicInput automatically receives input from upstream processes:

Requires: Seurat object from CombinedInput (includes RNA + optional VDJ data)
Typically follows:
```
SeuratClustering
```
,
```
TESSA
```
, or other clustering/annotation processes

Environment Variables (Group Level)

All metabolic analysis configuration is done at the ScrnaMetabolicLandscape group level:

[ScrnaMetabolicLandscape.envs]
# Metabolic pathway database file
gmtfile = "KEGG_2021_Human"

# Skip imputation (if data already complete)
noimpute = false

# Number of cores for parallelization
ncores = 4

# Optional: Subset data by metadata column
# subset_by = "Response"  # Remove NA values in this column

# Optional: Group data by metadata column
# group_by = "cluster"

# Optional: Add metadata columns for grouping/subsetting
# mutaters = {timepoint = "if_else(treatment == 'control', 'pre', 'post')"}

Metabolic Pathway Databases

Available Databases (via enrichit)

The

gmtfile

parameter accepts either:

Built-in database names (auto-downloaded):
- ```
"KEGG_2021_Human"
```
  - KEGG pathways (human, default)
- ```
"KEGG"
```
  - KEGG pathways (latest)
- ```
"Reactome_Pathways_2024"
```
  - Reactome pathways
- ```
"Reactome"
```
  - Reactome pathways (latest)
- ```
"BioCarta_2016"
```
  - BioCarta pathways
- ```
"MSigDB_Hallmark_2020"
```
  - MSigDB Hallmark gene sets
- See full list: https://pwwang.github.io/enrichit/reference/FetchGMT.html
Custom GMT files (local paths or URLs):
- Local file:
```
/path/to/custom.gmt
```
- URL:
```
https://example.com/pathways.gmt
```

Database Descriptions

KEGG: Kyoto Encyclopedia of Genes and Genomes - manually curated metabolic pathways. Comprehensive coverage of metabolism, including carbohydrate, energy, lipid, nucleotide, amino acid, xenobiotics, and other pathways. Species-specific versions available.
Reactome: Curated pathway database covering cellular processes, signal transduction, metabolic pathways, and more. More comprehensive than KEGG for signaling and regulatory pathways. Good for human/mouse.
BioCarta: Curated pathways focusing on cell signaling, metabolic, and disease pathways. Older database but still useful for classic pathways.
Custom GMT: Your own gene sets in GMT format (Gene Set Enrichment Format). Format:
```
name\tdescription\tgene1,gene2,gene3
```
(tab-separated).

Species-Specific Considerations

Human data: Use
```
"KEGG_2021_Human"
```
,
```
"Reactome_Pathways_2024"
```
, or species-specific GMT files
Mouse data: Use KEGG with mouse gene IDs or download mouse-specific GMT from MSigDB
Other species: Provide custom GMT file with appropriate gene identifiers matching your Seurat object
Gene name matching: Ensure gene names in Seurat object match GMT file (case-sensitive, human: UPPERCASE, mouse: TitleCase)

Configuration Examples

Minimal Configuration (Default KEGG)

[ScrnaMetabolicLandscape]

KEGG Human Pathways (Explicit)

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
ncores = 4
noimpute = false

Reactome Pathways

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "Reactome_Pathways_2024"
ncores = 8

Custom Metabolic Pathway GMT File

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "/data/pathways/custom_metabolism.gmt"
ncores = 4

Subset Analysis by Response Group

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
subset_by = "Response"  # Analyze responders vs non-responders
group_by = "cluster"
ncores = 4

Multiple Pathway Databases (Via Cases)

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
ncores = 4

# Analyze with KEGG
[ScrnaMetabolicLandscape.envs.cases.KEGG]
gmtfile = "KEGG_2021_Human"
group_by = "cluster"

# Analyze with Reactome
[ScrnaMetabolicLandscape.envs.cases.Reactome]
gmtfile = "Reactome_Pathways_2024"
group_by = "cluster"

Adding Custom Metadata for Grouping

[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
ncores = 4
# Create timepoint column based on treatment
mutaters = {timepoint = "if_else(treatment == 'control', 'pre', 'post')"}
subset_by = "timepoint"
group_by = "cluster"

Common Patterns

Pattern 1: Standard Metabolic Analysis

# Basic setup with KEGG pathways
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
ncores = 4

Pattern 2: Skip Imputation (Clean Data)

# If data is already complete, skip imputation step
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
noimpute = true
ncores = 4

Pattern 3: Disease vs Control Comparison

# Compare metabolic pathways between conditions
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
subset_by = "diagnosis"  # e.g., "disease", "control"
group_by = "cluster"
ncores = 4

Pattern 4: Time Series Analysis

# Analyze metabolic changes across timepoints
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "Reactome_Pathways_2024"
subset_by = "timepoint"  # e.g., "day0", "day7", "day14"
group_by = "cluster"
ncores = 8

Pattern 5: Species-Specific Analysis

# Non-human data with custom pathways
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "/data/pathways/mouse_metabolism.gmt"
ncores = 4

Dependencies

Upstream Processes

Required: Seurat object from
```
CombinedInput
```
- CombinedInput can be:
```
ScRepCombiningExpression
```
  (RNA + VDJ) or
```
RNAInput
```
  (RNA only)
- RNAInput typically:
```
SeuratClustering
```
  ,
```
SeuratMap2Ref
```
  ,
```
CellTypeAnnotation
```
  , or
```
TESSA
```
Preceding: Clustering must be complete before metabolic analysis

Downstream Processes (In ScrnaMetabolicLandscape Group)

MetabolicExprImputation (optional): Impute missing expression values (ALRA, scImpute, or MAGIC)
MetabolicPathwayActivity: Calculate pathway activity scores per group
MetabolicFeatures: Enrichment analysis of metabolic pathways per group
MetabolicPathwayHeterogeneity: Calculate metabolic heterogeneity across groups

Validation Rules

Database Validation

```
gmtfile
```
must be a valid enrichit database name OR accessible GMT file path/URL
For custom GMT files:
- File must exist (absolute path or relative to config file)
- Format must be GMT:
```
name\tdescription\tgene1,gene2,gene3
```
- Gene identifiers must match Seurat object (case-sensitive)

Species Validation

Gene names in Seurat object must match GMT file:
- Human: UPPERCASE (e.g.,
```
CD3D
```
  ,
```
IFNG
```
  )
- Mouse: TitleCase (e.g.,
```
Cd3d
```
  ,
```
Ifng
```
  )
- Verify with:
```
sobj@assays$RNA@features
```
  (Seurat R command)

Metadata Validation

If
```
subset_by
```
specified: column must exist in Seurat object metadata
If
```
group_by
```
specified: column must exist in Seurat object metadata
NA values in
```
subset_by
```
column are automatically removed

Troubleshooting

Common Pathway Loading Issues

Issue: "GMT file not found"

Cause: Invalid path to custom GMT file Solution:

# Use absolute path
gmtfile = "/full/path/to/pathways.gmt"

# Or path relative to config file location
gmtfile = "./data/pathways.gmt"

Issue: "Gene names not found in Seurat object"

Cause: Gene identifier mismatch between GMT and Seurat object Solution:

Check gene format in Seurat:
```
sobj@assays$RNA@features[1:10,]
```
Ensure case matches: Human (UPPERCASE) vs Mouse (TitleCase)
Consider using gene symbol conversion tools if needed

Issue: "Empty pathway results"

Cause: Too few genes matching between pathways and data Solution:

Verify species compatibility (human GMT with mouse data won't work)
Try different database: Switch from KEGG to Reactome or vice versa
Use custom GMT with species-specific pathways

Issue: "No enriched pathways found"

Cause: Statistical thresholds too strict or no biological differences Solution:

Relax p-value cutoff in downstream processes (e.g.,
```
pathway_pval_cutoff
```
)
Check grouping: Ensure groups have distinct biological differences
Use more comprehensive database (Reactome often has more pathways than KEGG)

Performance Issues

Issue: Metabolic analysis too slow

Cause: Insufficient cores for parallelization Solution:

# Increase cores for metabolic analysis
[ScrnaMetabolicLandscape.envs]
ncores = 8  # Increase based on available CPU

Issue: Memory errors during imputation

Cause: Large dataset with imputation enabled Solution:

# Skip imputation if data is complete
[ScrnaMetabolicLandscape.envs]
noimpute = true

Integration Issues

Issue: Process not running

Cause: ScrnaMetabolicLandscape not enabled in config Solution:

# Ensure the group is enabled
[ScrnaMetabolicLandscape]

Issue: Wrong input data

Cause: Clustering not complete or incorrect upstream process Solution:

Ensure
```
SeuratClustering
```
or similar process runs before metabolic analysis
Check that Seurat object has cluster assignments:
```
sobj@meta.data$seurat_clusters
```
Verify no missing values in metadata columns used for grouping

Reference

Original Paper: Xiao, Z. et al. "Metabolic landscape of the tumor microenvironment at single cell resolution." Nature Communications 10, 1-12 (2019)
Pipeline: https://github.com/LocasaleLab/Single-Cell-Metabolic-Landscape
KEGG: https://www.genome.jp/kegg/pathway.html
Reactome: https://reactome.org/
enrichit Databases: https://pwwang.github.io/enrichit/reference/FetchGMT.html
GMT Format: http://www.broadinstitute.org/gsea/msigdb/file_formats.jsp