BioSkills bio-workflows-expression-to-pathways

Workflow from differential expression results to functional enrichment analysis. Covers GO, KEGG, Reactome enrichment with clusterProfiler and visualization. Use when taking DE results to pathway enrichment.

install
source · Clone the upstream repo
git clone https://github.com/GPTomics/bioSkills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/workflows/expression-to-pathways" ~/.claude/skills/gptomics-bioskills-bio-workflows-expression-to-pathways && rm -rf "$T"
manifest: workflows/expression-to-pathways/SKILL.md
source content

Version Compatibility

Reference examples tested with: DESeq2 1.42+, R stats (base), ReactomePA 1.46+, clusterProfiler 4.10+, ggplot2 3.5+

Before using code patterns, verify installed versions match. If versions differ:

  • R:
    packageVersion('<pkg>')
    then
    ?function_name
    to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Expression to Pathways Workflow

"Find enriched pathways from my differential expression results" → Orchestrate GO enrichment (clusterProfiler), GSEA, KEGG/Reactome pathway mapping, and enrichment visualization from DE gene lists or ranked gene lists.

Convert differential expression results into biological insights through functional enrichment analysis.

Method Selection

ScenarioMethodWhy
Have DE results with Wald stat / t-stat for all genesGSEA (Step 4)Uses full ranking; no arbitrary cutoff; ~35% higher F1 than ORA
Clear gene list from non-DE source (co-expression, GWAS)ORA (Steps 1-3)No ranking available
RNA-seq with known gene length biasGOseq (goseq package)Standard ORA ignores length bias
Bacterial / prokaryotic dataKEGG with locus tagsNo org.*.eg.db; use keyType='kegg'
Multiple conditions to comparecompareCluster or mitchNever compare p-values across separate enrichments

When in doubt, run both ORA and GSEA and compare. Concordant results are more trustworthy.

Workflow Overview

DE Results (gene list or ranked list)
    |
    v
[1. Gene ID Conversion] --> Convert to Entrez/Ensembl
    |
    v
[2. Over-representation Analysis]
    |
    +---> GO Enrichment (BP, MF, CC)
    |
    +---> KEGG Pathways
    |
    +---> Reactome Pathways
    |
    v
[3. GSEA (ranked genes)]
    |
    v
[4. Visualization] -----> Dot plots, networks, bar plots
    |
    v
Functional annotations and pathway insights

Input Preparation

From DESeq2 Results

library(DESeq2)
library(clusterProfiler)
library(org.Hs.eg.db)

# Load DE results
res <- read.csv('deseq2_results.csv', row.names = 1)

# Significant genes for ORA
sig_genes <- rownames(subset(res, padj < 0.05 & abs(log2FoldChange) > 1))

# Background = all tested genes (NOT the full genome)
# Pre-filtering and independent filtering reduce the tested set; use only genes that were tested
background_genes <- rownames(res[!is.na(res$pvalue), ])

# Ranked list for GSEA — prefer Wald statistic (combines magnitude + precision)
# Alternatives: shrunken LFC, or sign(logFC) * -log10(PValue) for edgeR
ranked_genes <- res$stat
names(ranked_genes) <- rownames(res)
ranked_genes <- sort(ranked_genes[!is.na(ranked_genes)], decreasing = TRUE)

Gene ID Conversion

# Convert gene symbols to Entrez IDs
sig_entrez <- bitr(sig_genes, fromType = 'SYMBOL', toType = 'ENTREZID',
                   OrgDb = org.Hs.eg.db)

# For ranked list
ranked_entrez <- bitr(names(ranked_genes), fromType = 'SYMBOL', toType = 'ENTREZID',
                      OrgDb = org.Hs.eg.db)
ranked_list <- ranked_genes[ranked_entrez$SYMBOL]
names(ranked_list) <- ranked_entrez$ENTREZID

Step 1: GO Over-representation Analysis

# Convert background genes too
bg_entrez <- bitr(background_genes, fromType = 'SYMBOL', toType = 'ENTREZID', OrgDb = org.Hs.eg.db)

# Biological Process — always specify universe (background)
go_bp <- enrichGO(gene = sig_entrez$ENTREZID,
                  universe = bg_entrez$ENTREZID,
                  OrgDb = org.Hs.eg.db,
                  ont = 'BP',
                  pAdjustMethod = 'BH',
                  pvalueCutoff = 0.05,
                  qvalueCutoff = 0.1,
                  readable = TRUE)

# Molecular Function
go_mf <- enrichGO(gene = sig_entrez$ENTREZID,
                  universe = bg_entrez$ENTREZID,
                  OrgDb = org.Hs.eg.db,
                  ont = 'MF',
                  pAdjustMethod = 'BH',
                  pvalueCutoff = 0.05,
                  readable = TRUE)

# Cellular Component
go_cc <- enrichGO(gene = sig_entrez$ENTREZID,
                  universe = bg_entrez$ENTREZID,
                  OrgDb = org.Hs.eg.db,
                  ont = 'CC',
                  pAdjustMethod = 'BH',
                  pvalueCutoff = 0.05,
                  readable = TRUE)

# Simplify redundant terms
go_bp_simple <- simplify(go_bp, cutoff = 0.7, by = 'p.adjust')

Step 2: KEGG Pathway Enrichment

kegg <- enrichKEGG(gene = sig_entrez$ENTREZID,
                   organism = 'hsa',
                   pvalueCutoff = 0.05,
                   qvalueCutoff = 0.1)

# Convert KEGG IDs to readable names
kegg <- setReadable(kegg, OrgDb = org.Hs.eg.db, keyType = 'ENTREZID')

Step 3: Reactome Pathway Enrichment

library(ReactomePA)

reactome <- enrichPathway(gene = sig_entrez$ENTREZID,
                          organism = 'human',
                          pvalueCutoff = 0.05,
                          readable = TRUE)

Step 4: Gene Set Enrichment Analysis (GSEA)

# GO GSEA
gsea_go <- gseGO(geneList = ranked_list,
                 OrgDb = org.Hs.eg.db,
                 ont = 'BP',
                 minGSSize = 10,
                 maxGSSize = 500,
                 pvalueCutoff = 0.05,
                 verbose = FALSE)

# KEGG GSEA
gsea_kegg <- gseKEGG(geneList = ranked_list,
                     organism = 'hsa',
                     minGSSize = 10,
                     maxGSSize = 500,
                     pvalueCutoff = 0.05,
                     verbose = FALSE)

Step 5: Visualization

library(enrichplot)
library(ggplot2)

# Dot plot
dotplot(go_bp_simple, showCategory = 20) +
    ggtitle('GO Biological Process Enrichment')
ggsave('go_bp_dotplot.pdf', width = 10, height = 8)

# Bar plot
barplot(kegg, showCategory = 15) +
    ggtitle('KEGG Pathway Enrichment')
ggsave('kegg_barplot.pdf', width = 9, height = 6)

# Enrichment map (network of related terms)
go_bp_simple <- pairwise_termsim(go_bp_simple)
emapplot(go_bp_simple, showCategory = 30) +
    ggtitle('GO Term Similarity Network')
ggsave('go_network.pdf', width = 10, height = 10)

# Concept network (gene-term connections)
cnetplot(go_bp, showCategory = 5, categorySize = 'pvalue') +
    ggtitle('Gene-Concept Network')
ggsave('cnet_plot.pdf', width = 12, height = 10)

# GSEA plot for specific pathway
gseaplot2(gsea_kegg, geneSetID = 1:3, pvalue_table = TRUE)
ggsave('gsea_plot.pdf', width = 10, height = 8)

# Ridge plot for GSEA
ridgeplot(gsea_go, showCategory = 15)
ggsave('gsea_ridge.pdf', width = 8, height = 10)

Step 6: Export Results

# Export enrichment results
write.csv(as.data.frame(go_bp), 'go_bp_enrichment.csv', row.names = FALSE)
write.csv(as.data.frame(kegg), 'kegg_enrichment.csv', row.names = FALSE)
write.csv(as.data.frame(reactome), 'reactome_enrichment.csv', row.names = FALSE)
write.csv(as.data.frame(gsea_go), 'gsea_go_results.csv', row.names = FALSE)

# Combine key results
combined <- rbind(
    data.frame(Database = 'GO_BP', as.data.frame(go_bp_simple)[1:10,]),
    data.frame(Database = 'KEGG', as.data.frame(kegg)[1:10,]),
    data.frame(Database = 'Reactome', as.data.frame(reactome)[1:10,])
)
write.csv(combined, 'top_enriched_pathways.csv', row.names = FALSE)

Parameter Recommendations

AnalysisParameterValue
enrichGOpvalueCutoff0.05
enrichGOqvalueCutoff0.1
simplifycutoff0.7
gseGOminGSSize10
gseGOmaxGSSize500
GSEAperm1000 (default)

Troubleshooting

IssueLikely CauseSolution
No enriched termsToo few genes, wrong IDsCheck gene IDs, relax thresholds
All terms significantToo many genesBe more stringent with DE cutoffs
Gene ID conversion failsWrong organism, formatCheck OrgDb package, gene format
GSEA no resultsPoor ranking, small gene setsCheck ranked list, adjust minGSSize

Complete Workflow Script

library(clusterProfiler)
library(org.Hs.eg.db)
library(ReactomePA)
library(enrichplot)
library(ggplot2)

# Configuration
de_file <- 'deseq2_results.csv'
output_dir <- 'pathway_analysis'
dir.create(output_dir, showWarnings = FALSE)

# Load and prepare data
res <- read.csv(de_file, row.names = 1)
sig_genes <- rownames(subset(res, padj < 0.05 & abs(log2FoldChange) > 1))
cat('Significant genes:', length(sig_genes), '\n')

# Convert IDs
sig_entrez <- bitr(sig_genes, fromType = 'SYMBOL', toType = 'ENTREZID', OrgDb = org.Hs.eg.db)
cat('Converted to Entrez:', nrow(sig_entrez), '\n')

# Ranked list for GSEA (Wald statistic preferred over LFC)
ranked <- res$stat
names(ranked) <- rownames(res)
ranked <- sort(ranked[!is.na(ranked)], decreasing = TRUE)
ranked_entrez <- bitr(names(ranked), fromType = 'SYMBOL', toType = 'ENTREZID', OrgDb = org.Hs.eg.db)
ranked_list <- ranked[ranked_entrez$SYMBOL]
names(ranked_list) <- ranked_entrez$ENTREZID

# Background genes (all tested, not full genome)
bg_entrez <- bitr(rownames(res[!is.na(res$pvalue), ]), fromType = 'SYMBOL', toType = 'ENTREZID', OrgDb = org.Hs.eg.db)

# GO enrichment with background
go_bp <- enrichGO(sig_entrez$ENTREZID, universe = bg_entrez$ENTREZID, OrgDb = org.Hs.eg.db, ont = 'BP', readable = TRUE)
go_bp_simple <- simplify(go_bp, cutoff = 0.7)

# KEGG
kegg <- enrichKEGG(sig_entrez$ENTREZID, organism = 'hsa')
kegg <- setReadable(kegg, OrgDb = org.Hs.eg.db, keyType = 'ENTREZID')

# Reactome
reactome <- enrichPathway(sig_entrez$ENTREZID, organism = 'human', readable = TRUE)

# GSEA
gsea_go <- gseGO(ranked_list, OrgDb = org.Hs.eg.db, ont = 'BP', verbose = FALSE)

# Plots
pdf(file.path(output_dir, 'enrichment_plots.pdf'), width = 10, height = 8)
print(dotplot(go_bp_simple, showCategory = 20) + ggtitle('GO Biological Process'))
print(barplot(kegg, showCategory = 15) + ggtitle('KEGG Pathways'))
if (nrow(as.data.frame(reactome)) > 0) {
    print(dotplot(reactome, showCategory = 15) + ggtitle('Reactome Pathways'))
}
dev.off()

# Export
write.csv(as.data.frame(go_bp_simple), file.path(output_dir, 'go_bp.csv'), row.names = FALSE)
write.csv(as.data.frame(kegg), file.path(output_dir, 'kegg.csv'), row.names = FALSE)
write.csv(as.data.frame(reactome), file.path(output_dir, 'reactome.csv'), row.names = FALSE)

cat('\nResults saved to:', output_dir, '\n')
cat('GO BP terms:', nrow(as.data.frame(go_bp_simple)), '\n')
cat('KEGG pathways:', nrow(as.data.frame(kegg)), '\n')
cat('Reactome pathways:', nrow(as.data.frame(reactome)), '\n')

Prokaryotic Organisms

For bacteria/archaea, standard org.db annotation packages are unavailable. Use KEGG directly with strain-specific organism codes:

# Find organism code
search_kegg_organism('Pseudomonas aeruginosa', by = 'scientific_name')

# KEGG ORA with bacterial organism code (e.g., 'pae' for P. aeruginosa PAO1)
kegg_bac <- enrichKEGG(gene = sig_gene_ids, organism = 'pae', keyType = 'kegg',
                       pvalueCutoff = 0.05)

# For organisms without KEGG annotation, use KEGG Orthology
# Map genes to KO IDs via eggNOG-mapper or KOALA, then:
kegg_ko <- enrichKEGG(gene = ko_ids, organism = 'ko', keyType = 'kegg')

GO enrichment for prokaryotes: use

enricher()
with custom GO-to-gene mapping from eggNOG-mapper or InterProScan output, rather than org.db packages.

Multi-Condition Enrichment Comparison

When comparing enrichment across conditions (e.g., treatment A vs B vs C):

# compareCluster: run ORA across multiple gene lists
gene_clusters <- list(
    ConditionA = sig_genes_A,
    ConditionB = sig_genes_B,
    ConditionC = sig_genes_C
)
cc <- compareCluster(gene_clusters, fun = 'enrichKEGG', organism = 'hsa')
dotplot(cc, showCategory = 10) + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Do not compare raw -log10(p-values) across conditions — they scale with sample size. Compare NES (normalized enrichment scores) for GSEA, or use compareCluster for ORA.

Related Skills

  • pathway-analysis/go-enrichment - GO enrichment details
  • pathway-analysis/kegg-pathways - KEGG analysis
  • pathway-analysis/reactome-pathways - Reactome analysis
  • pathway-analysis/gsea - GSEA methods
  • pathway-analysis/enrichment-visualization - Visualization options
  • differential-expression/de-results - Preparing gene lists for enrichment