OpenClaw-Medical-Skills tooluniverse-multiomic-disease-characterization
Comprehensive multi-omics disease characterization integrating genomics, transcriptomics, proteomics, pathway, and therapeutic layers for systems-level understanding. Produces a detailed multi-omics report with quantitative confidence scoring (0-100), cross-layer gene concordance analysis, biomarker candidates, therapeutic opportunities, and mechanistic hypotheses. Uses 80+ ToolUniverse tools across 8 analysis layers. Use when users ask about disease mechanisms, multi-omics analysis, systems biology of disease, biomarker discovery, or therapeutic target identification from a disease perspective.
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tooluniverse-multiomic-disease-characterization" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-multiomic-disease-chara && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/tooluniverse-multiomic-disease-characterization" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-multiomic-disease-chara && rm -rf "$T"
skills/tooluniverse-multiomic-disease-characterization/SKILL.mdMulti-Omics Disease Characterization Pipeline
Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, then populate progressively
- Disease disambiguation FIRST - Resolve all identifiers before omics analysis
- Layer-by-layer analysis - Systematically cover all omics layers
- Cross-layer integration - Identify genes/targets appearing in multiple layers
- Evidence grading - Grade all evidence as T1 (human/clinical) to T4 (computational)
- Tissue context - Emphasize disease-relevant tissues/organs
- Quantitative scoring - Multi-Omics Confidence Score (0-100)
- Druggable focus - Prioritize targets with therapeutic potential
- Biomarker identification - Highlight diagnostic/prognostic markers
- Mechanistic synthesis - Generate testable hypotheses
- Source references - Every statement must cite tool/database
- Completeness checklist - Mandatory section showing analysis coverage
- English-first queries - Always use English terms in tool calls. Respond in user's language
When to Use This Skill
Apply when users:
- Ask about disease mechanisms across omics layers
- Need multi-omics characterization of a disease
- Want to understand disease at the systems biology level
- Ask "What pathways/genes/proteins are involved in [disease]?"
- Need biomarker discovery for a disease
- Want to identify druggable targets from disease profiling
- Ask for integrated genomics + transcriptomics + proteomics analysis
- Need cross-layer concordance analysis
- Ask about disease network biology / hub genes
NOT for (use other skills instead):
- Single gene/target validation -> Use
tooluniverse-drug-target-validation - Drug safety profiling -> Use
tooluniverse-adverse-event-detection - General disease overview -> Use
tooluniverse-disease-research - Variant interpretation -> Use
tooluniverse-variant-interpretation - GWAS-specific analysis -> Use
skillstooluniverse-gwas-* - Pathway-only analysis -> Use
tooluniverse-systems-biology
Input Parameters
| Parameter | Required | Description | Example |
|---|---|---|---|
| disease | Yes | Disease name, OMIM ID, EFO ID, or MONDO ID | , |
| tissue | No | Tissue/organ of interest | , , |
| focus_layers | No | Specific omics layers to emphasize | , , |
Multi-Omics Confidence Score (0-100)
Score Components
Data Availability (0-40 points):
- Genomics data available (GWAS or rare variants): 10 points
- Transcriptomics data available (DEGs or expression): 10 points
- Protein data available (PPI or expression): 5 points
- Pathway data available (enriched pathways): 10 points
- Clinical/drug data available (approved drugs or trials): 5 points
Evidence Concordance (0-40 points):
- Multi-layer genes (appear in 3+ layers): up to 20 points (2 per gene, max 10 genes)
- Consistent direction (genetics + expression concordant): 10 points
- Pathway-gene concordance (genes found in enriched pathways): 10 points
Evidence Quality (0-20 points):
- Strong genetic evidence (GWAS p < 5e-8): 10 points
- Clinical validation (approved drugs): 10 points
Score Interpretation
| Score | Tier | Interpretation |
|---|---|---|
| 80-100 | Excellent | Comprehensive multi-omics coverage, high confidence, strong cross-layer concordance |
| 60-79 | Good | Good coverage across most layers, some gaps |
| 40-59 | Moderate | Moderate coverage, limited cross-layer integration |
| 0-39 | Limited | Limited data, single-layer analysis dominates |
Evidence Grading System
| Tier | Symbol | Criteria | Examples |
|---|---|---|---|
| T1 | [T1] | Direct human evidence, clinical proof | FDA-approved drug, GWAS hit (p<5e-8), clinical trial result |
| T2 | [T2] | Experimental evidence | Differential expression (validated), functional screen, mouse KO |
| T3 | [T3] | Computational/database evidence | PPI network, pathway mapping, expression correlation |
| T4 | [T4] | Annotation/prediction only | GO annotation, text-mined association, predicted interaction |
Report Template
Create this file structure at the start:
{disease_name}_multiomic_report.md
# Multi-Omics Disease Characterization: {Disease Name} **Report Generated**: {date} **Disease Identifiers**: (to be filled) **Multi-Omics Confidence Score**: (to be calculated) --- ## Executive Summary (2-3 sentence disease mechanism synthesis - fill after all layers complete) --- ## 1. Disease Definition & Context ### Disease Identifiers | System | ID | Source | |--------|-----|--------| ### Description ### Synonyms ### Disease Hierarchy (parents/children) ### Affected Tissues/Organs ### Therapeutic Areas **Sources**: (tools used) --- ## 2. Genomics Layer ### 2.1 GWAS Associations | SNP | P-value | Effect | Gene | Study | Source | |-----|---------|--------|------|-------|--------| ### 2.2 GWAS Studies Summary | Study ID | Trait | Sample Size | Year | Source | |----------|-------|-------------|------|--------| ### 2.3 Associated Genes (Genetic Evidence) | Gene | Ensembl ID | Association Score | Evidence Type | Source | |------|------------|-------------------|---------------|--------| ### 2.4 Rare Variants (ClinVar) | Variant | Gene | Clinical Significance | Source | |---------|------|-----------------------|--------| ### Genomics Layer Summary - Total GWAS hits: - Top genes by genetic evidence: - Genetic architecture: **Sources**: (tools used) --- ## 3. Transcriptomics Layer ### 3.1 Differential Expression Studies | Experiment | Condition | Up-regulated | Down-regulated | Source | |------------|-----------|--------------|----------------|--------| ### 3.2 Expression Atlas Disease Evidence | Gene | Score | Source | |------|-------|--------| ### 3.3 Tissue Expression Patterns (GTEx/HPA) | Gene | Tissue | Expression Level | Source | |------|--------|-----------------|--------| ### 3.4 Biomarker Candidates (Expression-Based) | Gene | Tissue Specificity | Fold Change | Evidence | Source | |------|-------------------|-------------|----------|--------| ### Transcriptomics Layer Summary - Differential expression datasets: - Top DEGs: - Tissue-specific patterns: **Sources**: (tools used) --- ## 4. Proteomics & Interaction Layer ### 4.1 Protein-Protein Interactions (STRING) | Protein A | Protein B | Score | Source | |-----------|-----------|-------|--------| ### 4.2 Hub Genes (Network Centrality) | Gene | Degree | Betweenness | Role | Source | |------|--------|-------------|------|--------| ### 4.3 Protein Complexes (IntAct) | Complex | Members | Function | Source | |---------|---------|----------|--------| ### 4.4 Tissue-Specific PPI Network | Gene | Interaction Score | Tissue | Source | |------|-------------------|--------|--------| ### Proteomics Layer Summary - Total PPIs: - Hub genes: - Network modules: **Sources**: (tools used) --- ## 5. Pathway & Network Layer ### 5.1 Enriched Pathways (Enrichr/Reactome) | Pathway | Database | P-value | Genes | Source | |---------|----------|---------|-------|--------| ### 5.2 Reactome Pathway Details | Pathway ID | Name | Genes Involved | Source | |------------|------|----------------|--------| ### 5.3 KEGG Pathways | Pathway ID | Name | Description | Source | |------------|------|-------------|--------| ### 5.4 WikiPathways | Pathway ID | Name | Organism | Source | |------------|------|----------|--------| ### Pathway Layer Summary - Top enriched pathways: - Key pathway nodes: - Cross-pathway connections: **Sources**: (tools used) --- ## 6. Gene Ontology & Functional Annotation ### 6.1 Biological Processes | GO Term | Name | P-value | Genes | Source | |---------|------|---------|-------|--------| ### 6.2 Molecular Functions | GO Term | Name | P-value | Genes | Source | |---------|------|---------|-------|--------| ### 6.3 Cellular Components | GO Term | Name | P-value | Genes | Source | |---------|------|---------|-------|--------| **Sources**: (tools used) --- ## 7. Therapeutic Landscape ### 7.1 Approved Drugs | Drug | ChEMBL ID | Mechanism | Target | Phase | Source | |------|-----------|-----------|--------|-------|--------| ### 7.2 Druggable Targets | Gene | Tractability | Modality | Clinical Precedent | Source | |------|-------------|----------|-------------------|--------| ### 7.3 Drug Repurposing Candidates | Drug | Original Indication | Mechanism | Target | Source | |------|---------------------|-----------|--------|--------| ### 7.4 Clinical Trials | NCT ID | Title | Phase | Status | Intervention | Source | |--------|-------|-------|--------|--------------|--------| ### Therapeutic Summary - Approved drugs: - Clinical pipeline: - Novel targets: **Sources**: (tools used) --- ## 8. Multi-Omics Integration ### 8.1 Cross-Layer Gene Concordance | Gene | Genomics | Transcriptomics | Proteomics | Pathways | Layers | Evidence Tier | |------|----------|-----------------|------------|----------|--------|---------------| ### 8.2 Multi-Omics Hub Genes (Top 20) | Rank | Gene | Layers Found | Key Evidence | Druggable | Source | |------|------|-------------|--------------|-----------|--------| ### 8.3 Biomarker Candidates | Biomarker | Type | Evidence Layers | Confidence | Source | |-----------|------|-----------------|------------|--------| ### 8.4 Mechanistic Hypotheses 1. (Hypothesis with supporting evidence from multiple layers) 2. ... ### 8.5 Systems-Level Insights - Key disrupted processes: - Critical pathway nodes: - Therapeutic intervention points: - Testable hypotheses: --- ## Multi-Omics Confidence Score | Component | Points | Max | Details | |-----------|--------|-----|---------| | Genomics data | | 10 | | | Transcriptomics data | | 10 | | | Protein data | | 5 | | | Pathway data | | 10 | | | Clinical data | | 5 | | | Multi-layer genes | | 20 | | | Direction concordance | | 10 | | | Pathway-gene concordance | | 10 | | | Genetic evidence quality | | 10 | | | Clinical validation | | 10 | | | **TOTAL** | | **100** | | **Score**: XX/100 - [Tier] --- ## Data Availability Checklist | Omics Layer | Data Available | Tools Used | Findings | |-------------|---------------|------------|----------| | Genomics (GWAS) | Yes/No | | | | Genomics (Rare Variants) | Yes/No | | | | Transcriptomics (DEGs) | Yes/No | | | | Transcriptomics (Expression) | Yes/No | | | | Proteomics (PPI) | Yes/No | | | | Proteomics (Expression) | Yes/No | | | | Pathways (Enrichment) | Yes/No | | | | Pathways (KEGG/Reactome) | Yes/No | | | | Gene Ontology | Yes/No | | | | Drugs/Therapeutics | Yes/No | | | | Clinical Trials | Yes/No | | | | Literature | Yes/No | | | --- ## Completeness Checklist - [ ] Disease disambiguation complete (IDs resolved) - [ ] Genomics layer analyzed (GWAS + variants) - [ ] Transcriptomics layer analyzed (DEGs + expression) - [ ] Proteomics layer analyzed (PPI + interactions) - [ ] Pathway layer analyzed (enrichment + mapping) - [ ] Gene Ontology analyzed (BP + MF + CC) - [ ] Therapeutic landscape analyzed (drugs + targets + trials) - [ ] Cross-layer integration complete (concordance analysis) - [ ] Multi-Omics Confidence Score calculated - [ ] Biomarker candidates identified - [ ] Hub genes identified - [ ] Mechanistic hypotheses generated - [ ] Executive summary written - [ ] All sections have source citations --- ## References ### Data Sources Used | # | Tool | Parameters | Section | Items Retrieved | |---|------|------------|---------|-----------------| ### Database Versions - OpenTargets: (current) - GWAS Catalog: (current) - STRING: (current) - Reactome: (current)
Phase 0: Disease Disambiguation (ALWAYS FIRST)
Objective: Resolve disease to standard identifiers for all downstream queries.
Tools Used
OpenTargets_get_disease_id_description_by_name (primary):
- Input:
(string) - Disease namediseaseName - Output:
{data: {search: {hits: [{id, name, description}]}}} - Use: Get MONDO/EFO IDs and description
- CRITICAL: Disease IDs from OpenTargets use underscore format (e.g.,
), NOT colon formatMONDO_0004975
OSL_get_efo_id_by_disease_name (secondary):
- Input:
(string) - Disease namedisease - Output:
{efo_id, name} - Use: Get EFO/MONDO ID
OpenTargets_get_disease_description_by_efoId:
- Input:
(string) - Disease ID (e.g.,efoId
)MONDO_0004975 - Output:
{data: {disease: {id, name, description, dbXRefs}}} - Use: Get full description, cross-references (OMIM, UMLS, DOID, etc.)
OpenTargets_get_disease_synonyms_by_efoId:
- Input:
(string)efoId - Output:
{data: {disease: {id, name, synonyms: [{relation, terms}]}}}
OpenTargets_get_disease_therapeutic_areas_by_efoId:
- Input:
(string)efoId - Output:
{data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}
OpenTargets_get_disease_ancestors_parents_by_efoId:
- Input:
(string)efoId - Output:
{data: {disease: {id, name, ancestors: [{id, name}]}}}
OpenTargets_get_disease_descendants_children_by_efoId:
- Input:
(string)efoId - Output:
{data: {disease: {id, name, descendants: [{id, name}]}}}
OpenTargets_map_any_disease_id_to_all_other_ids:
- Input:
(string) - Any known disease ID (e.g.,inputId
,OMIM:104300
)UMLS:C0002395 - Output:
{data: {disease: {id, name, dbXRefs: [str], ...}}} - Use: Cross-map between OMIM, UMLS, ICD10, DOID, etc.
Workflow
- Search by disease name to get primary ID (OpenTargets)
- Get full description and cross-references
- Get synonyms for search term expansion
- Get therapeutic areas for context
- Get disease hierarchy (parents/children)
- If user provided OMIM/other ID, map to MONDO/EFO first
Collision-Aware Search
When disease name returns multiple hits:
- Check if user's input matches any hit exactly
- If ambiguous, present top 3-5 options and ask user to select
- Always prefer the most specific disease (not parent categories)
- For cancer, prefer the specific tumor type over generic "cancer"
Key Disease IDs to Track
After disambiguation, store these for all downstream queries:
- Primary ID for OpenTargets queries (e.g.,efo_id
)MONDO_0004975
- Canonical name (e.g.,disease_name
)Alzheimer disease
- For literature search expansionsynonyms
- For contexttherapeutic_areas
- Cross-references (OMIM, UMLS, DOID, etc.)dbXRefs
Phase 1: Genomics Layer
Objective: Identify genetic variants, GWAS associations, and genetically implicated genes.
Tools Used
OpenTargets_get_associated_targets_by_disease_efoId (primary):
- Input:
(string) - Disease EFO/MONDO IDefoId - Output:
{data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}} - Use: Get ALL disease-associated genes ranked by overall evidence score
- NOTE: Returns top 25 by default. For comprehensive analysis, note the total
count
OpenTargets_get_evidence_by_datasource:
- Input:
(string),efoId
(string), optionalensemblId
(array),datasourceIds
(int, default 50)size - Output:
{data: {disease: {evidences: {count, rows: [{...evidence details}]}}}} - Use: Get specific evidence types. Key datasourceIds for genomics:
- GWAS/genetics['ot_genetics_portal']
- Rare variants['gene2phenotype', 'genomics_england', 'orphanet']
- ClinVar variants['eva']
gwas_search_associations (GWAS Catalog):
- Input:
(string),disease_trait
(int, default 20)size - Output:
{data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}} - Use: Get genome-wide significant associations
- NOTE: Use disease name (e.g., "Alzheimer"), not ID. Returns paginated results
gwas_get_studies_for_trait:
- Input:
(string),disease_trait
(int)size - Output:
{data: [...studies], metadata: {pagination}} - NOTE: May return empty if trait name does not match exactly. Try synonyms
gwas_get_variants_for_trait:
- Input:
(string),disease_trait
(int)size - Output:
{data: [...variants], metadata: {pagination}}
GWAS_search_associations_by_gene:
- Input:
(string)gene_name - Output: Associations for a specific gene
OpenTargets_search_gwas_studies_by_disease:
- Input:
(array of strings),diseaseIds
(bool, default true),enableIndirect
(int, default 10)size - Output:
{data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}} - Use: Get GWAS studies from OpenTargets genetics portal
clinvar_search_variants:
- Input:
(string) orcondition
(string), optionalgene
(int)max_results - Output: List of ClinVar variants with clinical significance
- Use: Rare variant / monogenic disease evidence
Workflow
- Get associated genes from OpenTargets (overall scores)
- For top 10-15 genes, get genetic evidence specifically via
OpenTargets_get_evidence_by_datasource - Search GWAS Catalog for associations
- Search OpenTargets GWAS studies
- Search ClinVar for rare variants
- For top GWAS genes, check
GWAS_search_associations_by_gene
Gene Tracking
Maintain a dictionary of genes found in genomics layer:
genomics_genes = { 'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'}, 'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'}, # ... }
Phase 2: Transcriptomics Layer
Objective: Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers.
Tools Used
ExpressionAtlas_search_differential:
- Input: optional
(string),gene
(string),condition
(string, default 'homo sapiens')species - Output: Differential expression studies and results
- Use: Find studies where genes are differentially expressed in disease
ExpressionAtlas_search_experiments:
- Input: optional
(string),gene
(string),condition
(string)species - Output: Expression experiments relevant to condition
- Use: Find all Expression Atlas experiments for the disease
expression_atlas_disease_target_score:
- Input:
(string),efoId
(int, required)pageSize - Output: Genes scored by expression evidence for the disease
- Use: Get expression-based disease-gene association scores
europepmc_disease_target_score:
- Input:
(string),efoId
(int, required)pageSize - Output: Genes scored by literature evidence for the disease
- Use: Complement expression evidence with literature-mined associations
HPA_get_rna_expression_by_source (Human Protein Atlas):
- Input:
(string),gene_name
(string: 'tissue', 'blood', 'brain'),source_type
(string: e.g., 'brain', 'liver')source_name - Output:
{status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}} - NOTE: ALL 3 params required.
options: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell'source_type
HPA_get_rna_expression_in_specific_tissues:
- Input:
(string),gene_name
(array of strings)tissues - Output: Expression across specified tissues
HPA_get_cancer_prognostics_by_gene:
- Input:
(string)gene_name - Output: Cancer prognostic data (if cancer context)
HPA_get_subcellular_location:
- Input:
(string)gene_name - Output: Subcellular localization data
HPA_search_genes_by_query:
- Input:
(string)query - Output: Matching genes in HPA
Workflow
- Search Expression Atlas for differential expression studies
- Get expression-based disease scores
- Get literature-based disease scores (EuropePMC)
- For top 10-15 genes from genomics layer, check tissue expression via HPA
- Check disease-relevant tissue expression patterns
- For cancer: check prognostic biomarkers
Gene Tracking
Add transcriptomics genes to tracking:
transcriptomics_genes = { 'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'}, # ... }
Phase 3: Proteomics & Interaction Layer
Objective: Map protein-protein interactions, identify hub genes, and characterize interaction networks.
Tools Used
STRING_get_interaction_partners (primary PPI):
- Input:
(array of strings - gene names work),protein_ids
(int, default 9606),species
(float, default 0.4),confidence_score
(int, default 20)limit - Output:
{status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]} - Use: Get interaction partners for disease genes
- NOTE:
is an array, NOT string. Gene symbols likeprotein_ids
work['APOE']
STRING_get_network:
- Input:
(array),protein_ids
(int),species
(float)confidence_score - Output: Network of interactions between input proteins
- Use: Build disease-specific PPI network
STRING_functional_enrichment:
- Input:
(array),protein_ids
(int)species - Output: Functional enrichment results (GO, KEGG, etc.)
- Use: Functional characterization of disease gene set
STRING_ppi_enrichment:
- Input:
(array),protein_ids
(int)species - Output: Statistical test for PPI enrichment (more interactions than expected)
- Use: Test if disease genes form a connected module
intact_get_interactions:
- Input:
(string - UniProt ID or gene name)identifier - Output: Molecular interaction data from IntAct
intact_search_interactions:
- Input:
(string),query
(int, default 0),first
(int, default 25)max - Output: Search results for interactions
HPA_get_protein_interactions_by_gene:
- Input:
(string)gene_name - Output:
{gene, interactions, interactor_count, interactors: [...]}
humanbase_ppi_analysis:
- Input:
(array),gene_list
(string),tissue
(int),max_node
(string),interaction
(bool)string_mode - Output: Tissue-specific PPI network
- NOTE: ALL params required.
options: 'coexpression', 'interaction', 'coexpression_and_interaction'.interaction
: true/falsestring_mode
Workflow
- Take top 15-20 genes from genomics + transcriptomics layers
- Query STRING for interaction partners of each gene
- Build composite PPI network using STRING_get_network
- Test PPI enrichment (are genes more connected than random?)
- Get functional enrichment from STRING
- For disease-relevant tissue, get tissue-specific network (HumanBase)
- Identify hub genes (highest degree centrality)
- Check IntAct for experimentally validated interactions
Hub Gene Analysis
Calculate network centrality metrics:
- Degree: Number of interaction partners
- Betweenness: Number of shortest paths through node
- Hub score: Genes with degree > mean + 1 SD are hubs
Phase 4: Pathway & Network Layer
Objective: Identify enriched biological pathways and cross-pathway connections.
Tools Used
enrichr_gene_enrichment_analysis (primary enrichment):
- Input:
(array of gene symbols, min 2),gene_list
(array of library names)libs - Output:
{status: 'success', data: '{...JSON string with enrichment results...}'} - Key libraries:
,['KEGG_2021_Human']
,['Reactome_2022']
,['WikiPathway_2023_Human']
,['GO_Biological_Process_2023']
,['GO_Molecular_Function_2023']['GO_Cellular_Component_2023'] - NOTE:
field is a JSON string, needs parsing. Containsdata
and per-library resultsconnected_paths - NOTE:
is REQUIRED as arraylibs
ReactomeAnalysis_pathway_enrichment:
- Input:
(string - space-separated gene list), optionalidentifiers
(int, default 20),page_size
(bool),include_disease
(bool)projection - Output:
{data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}} - Use: Reactome-specific pathway enrichment with statistical testing
Reactome_map_uniprot_to_pathways:
- Input:
(string - UniProt accession)id - Output: List of Reactome pathways containing this protein
- Use: Map individual proteins to pathways
Reactome_get_pathway:
- Input:
(string - Reactome stable ID, e.g., 'R-HSA-73817')stId - Output: Pathway details
Reactome_get_pathway_reactions:
- Input:
(string)stId - Output: Reactions within pathway
kegg_search_pathway:
- Input:
(string)keyword - Output: Array of KEGG pathway matches
kegg_get_pathway_info:
- Input:
(string, e.g., 'hsa04930')pathway_id - Output: Detailed pathway information
WikiPathways_search:
- Input:
(string), optionalquery
(string, e.g., 'Homo sapiens')organism - Output: Matching community-curated pathways
Workflow
- Collect all genes from genomics + transcriptomics layers (top 20-30)
- Run Enrichr enrichment for KEGG, Reactome, WikiPathways
- Run ReactomeAnalysis for more detailed Reactome enrichment with p-values
- Search KEGG for disease-specific pathways
- Search WikiPathways for disease pathways
- For top Reactome pathways, get detailed reactions
- Identify cross-pathway connections (genes in multiple pathways)
Phase 5: Gene Ontology & Functional Annotation
Objective: Characterize biological processes, molecular functions, and cellular components.
Tools Used
enrichr_gene_enrichment_analysis (GO enrichment):
- Use with
for BPlibs=['GO_Biological_Process_2023'] - Use with
for MFlibs=['GO_Molecular_Function_2023'] - Use with
for CClibs=['GO_Cellular_Component_2023']
GO_get_annotations_for_gene:
- Input:
(string - gene symbol or UniProt ID)gene_id - Output: List of GO annotations with terms, aspects, evidence codes
GO_search_terms:
- Input:
(string)query - Output: Matching GO terms
QuickGO_annotations_by_gene:
- Input:
(string - UniProt accession, e.g., 'UniProtKB:P02649'), optionalgene_product_id
(string: 'biological_process', 'molecular_function', 'cellular_component'),aspect
(int: 9606),taxon_id
(int: 25)limit - Output: GO annotations with evidence codes
OpenTargets_get_target_gene_ontology_by_ensemblID:
- Input:
(string)ensemblId - Output: GO terms associated with target
Workflow
- Run Enrichr GO enrichment for all 3 aspects using combined gene list
- For top 5 genes, get detailed GO annotations from QuickGO
- For top genes, get OpenTargets GO terms
- Summarize key biological processes, molecular functions, cellular components
Phase 6: Therapeutic Landscape
Objective: Map approved drugs, druggable targets, repurposing opportunities, and clinical trials.
Tools Used
OpenTargets_get_associated_drugs_by_disease_efoId (primary):
- Input:
(string),efoId
(int, REQUIRED - use 100)size - Output:
{data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}} - Use: All drugs associated with disease (approved + investigational)
OpenTargets_get_target_tractability_by_ensemblID:
- Input:
(string)ensemblId - Output: Tractability assessment (small molecule, antibody, PROTAC, etc.)
OpenTargets_get_associated_drugs_by_target_ensemblID:
- Input:
(string),ensemblId
(int, REQUIRED)size - Output: Drugs targeting this gene/protein
search_clinical_trials:
- Input:
(string, REQUIRED), optionalquery_term
(string),condition
(string),intervention
(int, default 10)pageSize - Output: Clinical trial results
- NOTE:
is REQUIRED even ifquery_term
is providedcondition
OpenTargets_get_drug_mechanisms_of_action_by_chemblId:
- Input:
(string)chemblId - Output: Mechanism of action details
Workflow
- Get all drugs for disease from OpenTargets
- For top disease-associated genes, check tractability
- For top genes with no approved drugs, identify repurposing candidates
- Search clinical trials for disease
- For top approved drugs, get mechanism of action
Drug Tracking
drug_targets = { 'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3}, 'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4}, # ... }
Phase 7: Multi-Omics Integration
Objective: Integrate findings across all layers to identify cross-layer genes, calculate concordance, and generate mechanistic hypotheses.
Cross-Layer Gene Concordance Analysis
This is the core integrative step. For each gene found in the analysis:
-
Count layers: In how many omics layers does this gene appear?
- Genomics (GWAS, rare variants, genetic association)
- Transcriptomics (DEGs, expression score)
- Proteomics (PPI hub, protein expression)
- Pathways (enriched pathway member)
- Therapeutics (drug target)
-
Score genes: Genes appearing in 3+ layers are "multi-omics hub genes"
-
Direction concordance: Do genetics and expression agree?
- Risk allele + upregulated = concordant gain-of-function
- Risk allele + downregulated = concordant loss-of-function
- Discordant = needs investigation
Biomarker Identification
For each multi-omics hub gene, assess biomarker potential:
- Diagnostic: Gene expression distinguishes disease vs healthy
- Prognostic: Expression/variant predicts outcome (cancer prognostics from HPA)
- Predictive: Variant/expression predicts treatment response (pharmacogenomics)
- Evidence level: Number of supporting omics layers
Mechanistic Hypothesis Generation
From the integrated data:
- Identify the most supported biological processes (GO + pathways)
- Map causal chain: genetic variant -> gene expression -> protein function -> pathway disruption -> disease
- Identify intervention points (druggable nodes in the causal chain)
- Generate testable hypotheses
Confidence Score Calculation
Calculate the Multi-Omics Confidence Score (0-100) based on:
- Data availability across layers
- Cross-layer concordance
- Evidence quality
- Clinical validation
Phase 8: Report Finalization
Executive Summary
Write a 2-3 sentence synthesis covering:
- Disease mechanism in systems terms
- Key genes/pathways identified
- Therapeutic opportunities
Final Report Quality Checklist
Before presenting to user, verify:
- All 8 sections have content (or marked as "No data available")
- Every data point has a source citation
- Executive summary reflects key findings
- Multi-Omics Confidence Score calculated
- Top 20 genes ranked by multi-omics evidence
- Top 10 enriched pathways listed
- Biomarker candidates identified
- Cross-layer concordance table complete
- Therapeutic opportunities summarized
- Mechanistic hypotheses generated
- Data Availability Checklist complete
- Completeness Checklist complete
- References section lists all tools used
Tool Parameter Quick Reference
| Tool | Key Parameters | Notes |
|---|---|---|
| | Primary disambiguation |
| | Secondary disambiguation |
| | Returns top 25 genes |
| , , , | Per-gene evidence |
| , | GWAS studies |
| , | GWAS Catalog |
| or , | Rare variants |
| , | DEGs |
| , (REQUIRED) | Expression scores |
| , (REQUIRED) | Literature scores |
| , , (ALL REQUIRED) | Tissue expression |
| , (9606), | PPI partners |
| , | PPI network |
| , | Functional enrichment |
| , | Network significance |
| , | Experimental PPIs |
| , , , , (ALL REQ) | Tissue PPI |
| , (BOTH REQUIRED) | Pathway/GO enrichment |
| (space-sep string) | Reactome enrichment |
| (UniProt accession) | Protein-pathway mapping |
| | KEGG pathway search |
| , | WikiPathways search |
| | GO annotations |
| (e.g., 'UniProtKB:P02649') | Detailed GO |
| , (REQUIRED) | Disease drugs |
| | Druggability |
| (REQUIRED), , | Clinical trials |
| , | Literature |
| , ('homo_sapiens' REQUIRED) | Gene lookup |
| , , , | Gene info |
| , , (ALL REQUIRED) | Similar diseases |
Response Format Notes (Verified)
OpenTargets Associated Targets
{ "data": { "disease": { "id": "MONDO_0004975", "name": "Alzheimer disease", "associatedTargets": { "count": 2456, "rows": [ { "target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"}, "score": 0.87 } ] } } } }
GWAS Catalog Associations
{ "data": [ { "association_id": 216440893, "p_value": 2e-09, "or_per_copy_num": 0.94, "or_value": "0.94", "efo_traits": [{"..."}], "risk_frequency": "NR" } ], "metadata": {"pagination": {"totalElements": 1061816}} }
STRING Interactions
{ "status": "success", "data": [ { "stringId_A": "9606.ENSP00000252486", "stringId_B": "9606.ENSP00000466775", "preferredName_A": "APOE", "preferredName_B": "APOC2", "score": 0.999 } ] }
Reactome Enrichment
{ "data": { "token": "...", "pathways_found": 154, "pathways": [ { "pathway_id": "R-HSA-1251985", "name": "Nuclear signaling by ERBB4", "species": "Homo sapiens", "is_disease": false, "is_lowest_level": true, "entities_found": 3, "entities_total": 47, "entities_ratio": 0.00291, "p_value": 4.0e-06, "fdr": 0.00068, "reactions_found": 3, "reactions_total": 34 } ] } }
HPA RNA Expression
{ "status": "success", "data": { "gene_name": "APOE", "source_type": "tissue", "source_name": "brain", "expression_value": "2714.9", "expression_level": "very high", "expression_unit": "nTPM" } }
Enrichr Results
{ "status": "success", "data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}" }
NOTE: The
data field is a JSON string that needs parsing.
Common Use Patterns
1. Comprehensive Disease Profiling
User: "Characterize Alzheimer's disease across omics layers" -> Run all 8 phases -> Produce full multi-omics report
2. Therapeutic Target Discovery
User: "What are druggable targets for rheumatoid arthritis?" -> Emphasize Phase 1 (genomics), Phase 6 (therapeutics), Phase 7 (integration) -> Focus on tractability and clinical precedent
3. Biomarker Identification
User: "Find diagnostic biomarkers for pancreatic cancer" -> Emphasize Phase 2 (transcriptomics), Phase 3 (proteomics), Phase 7 (biomarkers) -> Focus on tissue-specific expression and diagnostic potential
4. Mechanism Elucidation
User: "What pathways are dysregulated in Crohn's disease?" -> Emphasize Phase 4 (pathways), Phase 5 (GO), Phase 7 (mechanistic hypotheses) -> Focus on pathway enrichment and cross-pathway connections
5. Drug Repurposing
User: "What existing drugs could be repurposed for ALS?" -> Emphasize Phase 1 (genetics), Phase 6 (therapeutic landscape), Phase 7 (repurposing) -> Focus on drugs targeting disease-associated genes
6. Systems Biology
User: "What are the hub genes and key pathways in type 2 diabetes?" -> Emphasize Phase 3 (PPI network), Phase 4 (pathways), Phase 7 (network analysis) -> Focus on hub genes and network modules
Edge Case Handling
Rare Diseases (limited data)
- Genomics layer may dominate (single gene)
- Limited GWAS data (monogenic)
- Focus on ClinVar variants, pathway consequences
- Confidence score will be lower (less cross-layer data)
Common Diseases (overwhelming data)
- Thousands of GWAS associations
- Prioritize by effect size and significance
- Focus on top 20-30 genes for downstream analysis
- Use strict significance thresholds (p < 5e-8)
Cancer
- Include somatic mutations (if CIViC/cBioPortal available)
- Check cancer prognostics via HPA
- Include tumor-specific expression patterns
- Clinical trial landscape may be extensive
Monogenic Diseases
- Single gene dominates
- ClinVar/OMIM evidence is primary
- Pathway analysis reveals downstream effects
- Therapeutic landscape may be limited (gene therapy, enzyme replacement)
Polygenic Diseases
- Many weak genetic signals
- GWAS provides the gene list
- Pathway enrichment reveals convergent biology
- Network analysis identifies hub genes
Tissue Ambiguity
- Diseases affecting multiple tissues
- Query HPA for all relevant tissues
- Compare tissue-specific expression patterns
- Use tissue context from disease ontology
Fallback Strategies
If disease name not found
- Try synonyms
- Try broader disease category
- Try OMIM/UMLS ID mapping
- Report disambiguation failure and ask user
If no GWAS data
- Check ClinVar for rare variants
- Use OpenTargets genetic evidence
- Note in report as "Limited genetic data"
- Adjust confidence score accordingly
If no expression data
- Try different disease name/synonym
- Check HPA for individual gene expression
- Use OpenTargets expression evidence
- Note as "Limited transcriptomics data"
If no pathway enrichment
- Reduce gene list stringency
- Try different pathway databases
- Map individual genes to pathways via Reactome
- Note as "No significant pathway enrichment"
If no drugs found
- Check if disease is rare/orphan
- Look for drugs targeting individual genes
- Check clinical trials for investigational therapies
- Note as "No approved drugs - novel therapeutic opportunity"