OpenClaw-Medical-Skills tooluniverse-target-research
Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tooluniverse-target-research" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-target-research && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/tooluniverse-target-research" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-target-research && rm -rf "$T"
skills/tooluniverse-target-research/SKILL.mdComprehensive Target Intelligence Gatherer
Gather complete target intelligence by exploring 9 parallel research paths. Supports targets identified by gene symbol, UniProt accession, Ensembl ID, or gene name.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, then populate progressively
- Tool parameter verification - Verify params via
before calling unfamiliar toolsget_tool_info - Evidence grading - Grade all claims by evidence strength (T1-T4)
- Citation requirements - Every fact must have inline source attribution
- Mandatory completeness - All sections must exist with data minimums or explicit "No data" notes
- Disambiguation first - Resolve all identifiers before research
- Negative results documented - "No drugs found" is data; empty sections are failures
- Collision-aware literature search - Detect and filter naming collisions
- English-first queries - Always use English terms in tool calls, even if the user writes in another language. Translate gene names, disease names, and search terms to English. Only try original-language terms as a fallback if English returns no results. Respond in the user's language
Phase 0: Tool Parameter Verification (CRITICAL)
BEFORE calling ANY tool for the first time, verify its parameters:
# Always check tool params to prevent silent failures tool_info = tu.tools.get_tool_info(tool_name="Reactome_map_uniprot_to_pathways") # Reveals: takes `id` not `uniprot_id`
Known Parameter Corrections (Updated)
| Tool | WRONG Parameter | CORRECT Parameter |
|---|---|---|
| | |
| | |
| only | + |
| | (camelCase) |
GTEx Versioned ID Fallback (CRITICAL)
GTEx often requires versioned Ensembl IDs. If
ENSG00000123456 returns empty:
# Step 1: Get gene info with version gene_info = tu.tools.ensembl_lookup_gene(id=ensembl_id, species="human") version = gene_info.get('version', 1) # Step 2: Try versioned ID versioned_id = f"{ensembl_id}.{version}" # e.g., "ENSG00000123456.12" result = tu.tools.GTEx_get_median_gene_expression( gencode_id=versioned_id, operation="median" )
When to Use This Skill
Apply when users:
- Ask about a drug target, protein, or gene
- Need target validation or assessment
- Request druggability analysis
- Want comprehensive target profiling
- Ask "what do we know about [target]?"
- Need target-disease associations
- Request safety profile for a target
Critical Workflow Requirements
1. Report-First Approach (MANDATORY)
DO NOT show the search process or tool outputs to the user. Instead:
-
Create the report file FIRST - Before any data collection:
- File name:
[TARGET]_target_report.md - Initialize with all 14 section headers
- Add placeholder:
in each section[Researching...]
- File name:
-
Progressively update the report - As you gather data:
- Update each section immediately after retrieving data
- Replace
with actual content[Researching...] - Include "No data returned" when tools return empty results
-
Methodology in appendix only - If user requests methodology details, create separate
[TARGET]_methods_appendix.md
2. Evidence Grading System (MANDATORY)
CRITICAL: Grade every claim by evidence strength.
Evidence Tiers
| Tier | Symbol | Criteria | Examples |
|---|---|---|---|
| T1 | ★★★ | Direct mechanistic evidence, human genetic proof | CRISPR KO, patient mutations, crystal structure with mechanism |
| T2 | ★★☆ | Functional studies, model organism validation | siRNA phenotype, mouse KO, biochemical assay |
| T3 | ★☆☆ | Association, screen hits, computational | GWAS hit, DepMap essentiality, expression correlation |
| T4 | ☆☆☆ | Mention, review, text-mined, predicted | Review article, database annotation, computational prediction |
Required Evidence Grading Locations
Evidence grades MUST appear in:
- Executive Summary - Key disease claims graded
- Section 8.2 Disease Associations - Every disease link graded with source type
- Section 11 Literature - Key papers table with evidence tier
- Section 13 Recommendations - Scorecard items reference evidence quality
Per-Section Evidence Summary
--- **Evidence Quality for this Section**: Strong - Mechanistic (T1): 12 papers - Functional (T2): 8 papers - Association (T3): 15 papers - Mention (T4): 23 papers **Data Gaps**: No CRISPR data; mouse KO phenotypes limited ---
3. Citation Requirements (MANDATORY)
Every piece of information MUST include its source:
EGFR mutations cause lung adenocarcinoma [★★★: PMID:15118125, activating mutations in patients]. *Source: ClinVar, CIViC*
Core Strategy: 9 Research Paths
Execute 9 research paths (Path 0 is always first):
Target Query (e.g., "EGFR" or "P00533") │ ├─ IDENTIFIER RESOLUTION (always first) │ └─ Check if GPCR → GPCRdb_get_protein │ ├─ PATH 0: Open Targets Foundation (ALWAYS FIRST - fills gaps in all other paths) │ ├─ PATH 1: Core Identity (names, IDs, sequence, organism) │ └─ InterProScan_scan_sequence for novel domain prediction (NEW) ├─ PATH 2: Structure & Domains (3D structure, domains, binding sites) │ └─ If GPCR: GPCRdb_get_structures (active/inactive states) ├─ PATH 3: Function & Pathways (GO terms, pathways, biological role) ├─ PATH 4: Protein Interactions (PPI network, complexes) ├─ PATH 5: Expression Profile (tissue expression, single-cell) ├─ PATH 6: Variants & Disease (mutations, clinical significance) │ └─ DisGeNET_search_gene for curated gene-disease associations ├─ PATH 7: Drug Interactions (known drugs, druggability, safety) │ ├─ Pharos_get_target for TDL classification (Tclin/Tchem/Tbio/Tdark) │ ├─ BindingDB_get_ligands_by_uniprot for known ligands (NEW) │ ├─ PubChem_search_assays_by_target_gene for HTS data (NEW) │ ├─ If GPCR: GPCRdb_get_ligands (curated agonists/antagonists) │ └─ DepMap_get_gene_dependencies for target essentiality └─ PATH 8: Literature & Research (publications, trends)
Identifier Resolution (Phase 1)
CRITICAL: Resolve ALL identifiers before any research path.
def resolve_target_ids(tu, query): """ Resolve target query to ALL needed identifiers. Returns dict with: query, uniprot, ensembl, ensembl_version, symbol, entrez, chembl_target, hgnc """ ids = { 'query': query, 'uniprot': None, 'ensembl': None, 'ensembl_versioned': None, # For GTEx 'symbol': None, 'entrez': None, 'chembl_target': None, 'hgnc': None, 'full_name': None, 'synonyms': [] } # [Resolution logic based on input type] # ... (see current implementation) # CRITICAL: Get versioned Ensembl ID for GTEx if ids['ensembl']: gene_info = tu.tools.ensembl_lookup_gene(id=ids['ensembl'], species="human") if gene_info and gene_info.get('version'): ids['ensembl_versioned'] = f"{ids['ensembl']}.{gene_info['version']}" # Also get synonyms for literature collision detection ids['full_name'] = gene_info.get('description', '').split(' [')[0] # Get UniProt alternative names for synonyms if ids['uniprot']: alt_names = tu.tools.UniProt_get_alternative_names_by_accession(accession=ids['uniprot']) if alt_names: ids['synonyms'].extend(alt_names) return ids
GPCR Target Detection (NEW)
~35% of approved drugs target GPCRs. After identifier resolution, check if target is a GPCR:
def check_gpcr_target(tu, ids): """ Check if target is a GPCR and retrieve specialized data. Call after identifier resolution. """ symbol = ids.get('symbol', '') # Build GPCRdb entry name entry_name = f"{symbol.lower()}_human" gpcr_info = tu.tools.GPCRdb_get_protein( operation="get_protein", protein=entry_name ) if gpcr_info.get('status') == 'success': # Target is a GPCR - get specialized data # Get structures with receptor state structures = tu.tools.GPCRdb_get_structures( operation="get_structures", protein=entry_name ) # Get known ligands (critical for binder projects) ligands = tu.tools.GPCRdb_get_ligands( operation="get_ligands", protein=entry_name ) # Get mutation data mutations = tu.tools.GPCRdb_get_mutations( operation="get_mutations", protein=entry_name ) return { 'is_gpcr': True, 'gpcr_family': gpcr_info['data'].get('family'), 'gpcr_class': gpcr_info['data'].get('receptor_class'), 'structures': structures.get('data', {}).get('structures', []), 'ligands': ligands.get('data', {}).get('ligands', []), 'mutations': mutations.get('data', {}).get('mutations', []), 'ballesteros_numbering': True # GPCRdb provides this } return {'is_gpcr': False}
GPCRdb Report Section (add to Section 2 for GPCR targets):
### 2.x GPCR-Specific Data (GPCRdb) **Receptor Class**: Class A (Rhodopsin-like) **GPCR Family**: Adrenoceptors **Structures by State**: | PDB ID | State | Resolution | Ligand | Year | |--------|-------|------------|--------|------| | 3SN6 | Active | 3.2Å | Agonist (BI-167107) | 2011 | | 2RH1 | Inactive | 2.4Å | Antagonist (carazolol) | 2007 | **Known Ligands**: 45 agonists, 32 antagonists, 8 allosteric modulators **Key Binding Site Residues** (Ballesteros-Weinstein): 3.32, 5.42, 6.48, 7.39
Collision Detection for Literature Search
Before literature search, detect naming collisions:
def detect_collisions(tu, symbol, full_name): """ Detect if gene symbol has naming collisions in literature. Returns negative filter terms if collisions found. """ # Search by symbol in title results = tu.tools.PubMed_search_articles( query=f'"{symbol}"[Title]', limit=20 ) # Check if >20% are off-topic off_topic_terms = [] for paper in results.get('articles', []): title = paper.get('title', '').lower() # Check if title mentions biology/protein/gene context bio_terms = ['protein', 'gene', 'cell', 'expression', 'mutation', 'kinase', 'receptor'] if not any(term in title for term in bio_terms): # Extract potential collision terms # e.g., "JAK" might collide with "Just Another Kinase" jokes # e.g., "WDR7" might collide with other WDR family members in certain contexts pass # Build negative filter collision_filter = "" if off_topic_terms: collision_filter = " NOT " + " NOT ".join(off_topic_terms) return collision_filter
PATH 0: Open Targets Foundation (ALWAYS FIRST)
Objective: Populate baseline data for Sections 5, 8, 9, 10, 11 before specialized queries.
CRITICAL: Open Targets provides the most comprehensive aggregated data. Query ALL these endpoints:
| Endpoint | Section | Data Type |
|---|---|---|
| 8 | Diseases/phenotypes |
| 9 | Druggability assessment |
| 10 | Safety liabilities |
| 6 | PPI network |
| 5 | GO annotations |
| 11 | Literature |
| 8/10 | Mouse KO phenotypes |
| 9 | Chemical probes |
| 9 | Known drugs |
Path 0 Implementation
def path_0_open_targets(tu, ids): """ Open Targets foundation data - fills gaps for sections 5, 6, 8, 9, 10, 11. ALWAYS run this first. """ ensembl_id = ids['ensembl'] if not ensembl_id: return {'status': 'skipped', 'reason': 'No Ensembl ID'} results = {} # 1. Diseases & Phenotypes (Section 8) diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensemblId( ensemblId=ensembl_id ) results['diseases'] = diseases if diseases else {'note': 'No disease associations returned'} # 2. Tractability (Section 9) tractability = tu.tools.OpenTargets_get_target_tractability_by_ensemblId( ensemblId=ensembl_id ) results['tractability'] = tractability if tractability else {'note': 'No tractability data returned'} # 3. Safety Profile (Section 10) safety = tu.tools.OpenTargets_get_target_safety_profile_by_ensemblId( ensemblId=ensembl_id ) results['safety'] = safety if safety else {'note': 'No safety liabilities identified'} # 4. Interactions (Section 6) interactions = tu.tools.OpenTargets_get_target_interactions_by_ensemblId( ensemblId=ensembl_id ) results['interactions'] = interactions if interactions else {'note': 'No interactions returned'} # 5. GO Annotations (Section 5) go_terms = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblId( ensemblId=ensembl_id ) results['go_terms'] = go_terms if go_terms else {'note': 'No GO annotations returned'} # 6. Publications (Section 11) publications = tu.tools.OpenTargets_get_publications_by_target_ensemblId( ensemblId=ensembl_id ) results['publications'] = publications if publications else {'note': 'No publications returned'} # 7. Mouse Models (Section 8/10) mouse_models = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblId( ensemblId=ensembl_id ) results['mouse_models'] = mouse_models if mouse_models else {'note': 'No mouse model data returned'} # 8. Chemical Probes (Section 9) probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblId( ensemblId=ensembl_id ) results['chemical_probes'] = probes if probes else {'note': 'No chemical probes available'} # 9. Associated Drugs (Section 9) drugs = tu.tools.OpenTargets_get_associated_drugs_by_target_ensemblId( ensemblId=ensembl_id ) results['drugs'] = drugs if drugs else {'note': 'No approved/trial drugs found'} return results
Negative Results Are Data
CRITICAL: Always document when a query returns empty:
### 9.3 Chemical Probes **Status**: No validated chemical probes available for this target. *Source: OpenTargets_get_chemical_probes_by_target_ensemblId returned empty* **Implication**: Tool compound development would be needed for chemical biology studies.
PATH 2: Structure & Domains (Enhanced)
Objective: Robust structure coverage using 3-step chain.
3-Step Structure Search Chain
Do NOT rely solely on PDB text search. Use this chain:
def path_structure_robust(tu, ids): """ Robust structure search using 3-step chain. """ structures = {'pdb': [], 'alphafold': None, 'domains': [], 'method_notes': []} # STEP 1: UniProt PDB Cross-References (most reliable) if ids['uniprot']: entry = tu.tools.UniProt_get_entry_by_accession(accession=ids['uniprot']) pdb_xrefs = [x for x in entry.get('uniProtKBCrossReferences', []) if x.get('database') == 'PDB'] for xref in pdb_xrefs: pdb_id = xref.get('id') # Get details for each PDB pdb_info = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id) if pdb_info: structures['pdb'].append(pdb_info) structures['method_notes'].append(f"Step 1: {len(pdb_xrefs)} PDB cross-refs from UniProt") # STEP 2: Sequence-based PDB Search (catches missing annotations) if ids['uniprot'] and len(structures['pdb']) < 5: sequence = tu.tools.UniProt_get_sequence_by_accession(accession=ids['uniprot']) if sequence and len(sequence) < 1000: # Reasonable length for search similar = tu.tools.PDB_search_similar_structures( sequence=sequence[:500], # Use first 500 AA if long identity_cutoff=0.7 ) if similar: for hit in similar[:10]: # Top 10 similar if hit['pdb_id'] not in [s.get('pdb_id') for s in structures['pdb']]: structures['pdb'].append(hit) structures['method_notes'].append(f"Step 2: Sequence search (identity ≥70%)") # STEP 3: Domain-based Search (for multi-domain proteins) if ids['uniprot']: domains = tu.tools.InterPro_get_protein_domains(uniprot_accession=ids['uniprot']) structures['domains'] = domains if domains else [] # For large proteins with domains, search by domain sequence windows if len(structures['pdb']) < 3 and domains: for domain in domains[:3]: # Top 3 domains domain_name = domain.get('name', '') # Could search PDB by domain name domain_hits = tu.tools.PDB_search_by_keyword(query=domain_name, limit=5) if domain_hits: structures['method_notes'].append(f"Step 3: Domain '{domain_name}' search") # AlphaFold (always check) alphafold = tu.tools.alphafold_get_prediction(uniprot_accession=ids['uniprot']) structures['alphafold'] = alphafold if alphafold else {'note': 'No AlphaFold prediction'} # IMPORTANT: Document limitations if not structures['pdb']: structures['limitation'] = "No direct PDB hit does NOT mean no structure exists. Check: (1) structures under different UniProt entries, (2) homolog structures, (3) domain-only structures." return structures
Structure Section Output Format
### 4.1 Experimental Structures (PDB) **Total PDB Entries**: 23 structures *(Source: UniProt cross-references)* **Search Method**: 3-step chain (UniProt xrefs → sequence search → domain search) | PDB ID | Resolution | Method | Ligand | Coverage | Year | |--------|------------|--------|--------|----------|------| | 1M17 | 2.6Å | X-ray | Erlotinib | 672-998 | 2002 | | 3POZ | 2.8Å | X-ray | Gefitinib | 696-1022 | 2010 | **Note**: "No direct PDB hit" ≠ "no structure exists". Check homologs and domain structures.
PATH 5: Expression Profile (Enhanced)
GTEx with Versioned ID Fallback
def path_expression(tu, ids): """ Expression data with GTEx versioned ID fallback. """ results = {'gtex': None, 'hpa': None, 'failed_tools': []} # GTEx with fallback ensembl_id = ids['ensembl'] versioned_id = ids.get('ensembl_versioned') # Try unversioned first gtex_result = tu.tools.GTEx_get_median_gene_expression( gencode_id=ensembl_id, operation="median" ) # Fallback to versioned if empty if not gtex_result or gtex_result.get('data') == []: if versioned_id: gtex_result = tu.tools.GTEx_get_median_gene_expression( gencode_id=versioned_id, operation="median" ) if gtex_result and gtex_result.get('data'): results['gtex'] = gtex_result results['gtex_note'] = f"Used versioned ID: {versioned_id}" if not results.get('gtex'): results['failed_tools'].append({ 'tool': 'GTEx_get_median_gene_expression', 'tried': [ensembl_id, versioned_id], 'fallback': 'See HPA data below' }) else: results['gtex'] = gtex_result # HPA (always query as backup) hpa_result = tu.tools.HPA_get_rna_expression_by_source(ensembl_id=ensembl_id) results['hpa'] = hpa_result if hpa_result else {'note': 'No HPA RNA data'} return results
Human Protein Atlas - Extended Expression (NEW)
HPA provides comprehensive protein expression data including tissue-level, cell-level, and cell line expression.
def get_hpa_comprehensive_expression(tu, gene_symbol): """ Get comprehensive expression data from Human Protein Atlas. Provides: - Tissue expression (protein and RNA) - Subcellular localization - Cell line expression comparison - Tissue specificity """ # 1. Search for gene to get IDs gene_info = tu.tools.HPA_search_genes_by_query(search_query=gene_symbol) if not gene_info: return {'error': f'Gene {gene_symbol} not found in HPA'} # 2. Get tissue expression with specificity tissue_search = tu.tools.HPA_generic_search( search_query=gene_symbol, columns="g,gs,rnat,rnatsm,scml,scal", # Gene, synonyms, tissue specificity, subcellular format="json" ) # 3. Compare expression in cancer cell lines vs normal tissue cell_lines = ['a549', 'mcf7', 'hela', 'hepg2', 'pc3'] cell_line_expression = {} for cell_line in cell_lines: try: expr = tu.tools.HPA_get_comparative_expression_by_gene_and_cellline( gene_name=gene_symbol, cell_line=cell_line ) cell_line_expression[cell_line] = expr except: continue return { 'gene_info': gene_info, 'tissue_data': tissue_search, 'cell_line_expression': cell_line_expression, 'source': 'Human Protein Atlas' }
HPA Expression Output for Report:
### Tissue Expression Profile (Human Protein Atlas) | Tissue | Protein Level | RNA nTPM | Specificity | |--------|---------------|----------|-------------| | Brain | High | 45.2 | Enriched | | Liver | Medium | 23.1 | Enhanced | | Kidney | Low | 8.4 | Not detected | **Subcellular Localization**: Cytoplasm, Plasma membrane ### Cancer Cell Line Expression | Cell Line | Cancer Type | Expression | vs Normal | |-----------|-------------|------------|-----------| | A549 | Lung | High | Elevated | | MCF7 | Breast | Medium | Similar | | HeLa | Cervical | High | Elevated | *Source: Human Protein Atlas via `HPA_search_genes_by_query`, `HPA_get_comparative_expression_by_gene_and_cellline`*
Why HPA for Target Research:
- Drug target validation - Confirm expression in target tissue
- Safety assessment - Expression in essential organs
- Biomarker potential - Tissue-specific expression
- Cell line selection - Choose appropriate models
PATH 6: Variants & Disease (Enhanced)
6.1 ClinVar SNV vs CNV Separation
### 8.3 Clinical Variants (ClinVar) #### Single Nucleotide Variants (SNVs) | Variant | Clinical Significance | Condition | Review Status | PMID | |---------|----------------------|-----------|---------------|------| | p.L858R | Pathogenic | Lung cancer | 4 stars | 15118125 | | p.T790M | Pathogenic | Drug resistance | 4 stars | 15737014 | **Total Pathogenic SNVs**: 47 #### Copy Number Variants (CNVs) - Reported Separately | Type | Region | Clinical Significance | Frequency | |------|--------|----------------------|-----------| | Amplification | 7p11.2 | Pathogenic | Common in cancer | *Note: CNV data separated as it represents different mutation mechanism*
6.2 DisGeNET Integration (NEW)
DisGeNET provides curated gene-disease associations with evidence scores. Requires:
DISGENET_API_KEY
def get_disgenet_associations(tu, ids): """ Get gene-disease associations from DisGeNET. Complements Open Targets with curated association scores. """ symbol = ids.get('symbol') if not symbol: return {'status': 'skipped', 'reason': 'No gene symbol'} # Get all disease associations for gene gda = tu.tools.DisGeNET_search_gene( operation="search_gene", gene=symbol, limit=50 ) if gda.get('status') != 'success': return {'status': 'error', 'message': 'DisGeNET query failed'} associations = gda.get('data', {}).get('associations', []) # Categorize by evidence strength strong = [] # score >= 0.7 moderate = [] # score 0.4-0.7 weak = [] # score < 0.4 for assoc in associations: score = assoc.get('score', 0) disease_name = assoc.get('disease_name', '') umls_cui = assoc.get('disease_id', '') entry = { 'disease': disease_name, 'umls_cui': umls_cui, 'score': score, 'evidence_index': assoc.get('ei'), 'dsi': assoc.get('dsi'), # Disease Specificity Index 'dpi': assoc.get('dpi') # Disease Pleiotropy Index } if score >= 0.7: strong.append(entry) elif score >= 0.4: moderate.append(entry) else: weak.append(entry) return { 'total_associations': len(associations), 'strong_associations': strong, 'moderate_associations': moderate, 'weak_associations': weak[:10], # Limit weak 'disease_pleiotropy': len(associations) # How many diseases linked }
DisGeNET Report Section (add to Section 8 - Disease Associations):
### 8.x DisGeNET Gene-Disease Associations (NEW) **Total Diseases Associated**: 47 **Disease Pleiotropy Index**: High (gene linked to many disease types) #### Strong Associations (Score ≥0.7) | Disease | UMLS CUI | Score | Evidence Index | |---------|----------|-------|----------------| | Non-small cell lung cancer | C0007131 | 0.85 | 0.92 | | Glioblastoma | C0017636 | 0.78 | 0.88 | #### Moderate Associations (Score 0.4-0.7) | Disease | UMLS CUI | Score | DSI | |---------|----------|-------|-----| | Breast cancer | C0006142 | 0.62 | 0.45 | *Note: DisGeNET score integrates curated databases, GWAS, animal models, and literature*
Evidence Tier Assignment:
- DisGeNET Score ≥0.7 → Consider T2 evidence (multiple validated sources)
- DisGeNET Score 0.4-0.7 → Consider T3 evidence
- DisGeNET Score <0.4 → T4 evidence only
PATH 7: Druggability & Target Validation (ENHANCED)
7.1 Pharos/TCRD - Target Development Level (NEW)
NIH's Illuminating the Druggable Genome (IDG) portal provides TDL classification for all human proteins:
def get_pharos_target_info(tu, ids): """ Get Pharos/TCRD target development level and druggability. TDL Classification: - Tclin: Approved drug targets - Tchem: Targets with small molecule activities (IC50 < 30nM) - Tbio: Targets with biological annotations - Tdark: Understudied proteins """ gene_symbol = ids.get('symbol') uniprot = ids.get('uniprot') # Try by gene symbol first if gene_symbol: result = tu.tools.Pharos_get_target( gene=gene_symbol ) elif uniprot: result = tu.tools.Pharos_get_target( uniprot=uniprot ) else: return {'status': 'error', 'message': 'Need gene symbol or UniProt'} if result.get('status') == 'success' and result.get('data'): target = result['data'] return { 'name': target.get('name'), 'symbol': target.get('sym'), 'tdl': target.get('tdl'), # Tclin/Tchem/Tbio/Tdark 'family': target.get('fam'), # Kinase, GPCR, etc. 'novelty': target.get('novelty'), 'description': target.get('description'), 'publications': target.get('publicationCount'), 'interpretation': interpret_tdl(target.get('tdl')) } return None def interpret_tdl(tdl): """Interpret Target Development Level for druggability.""" interpretations = { 'Tclin': 'Approved drug target - highest confidence for druggability', 'Tchem': 'Small molecule active - good chemical tractability', 'Tbio': 'Biologically characterized - may require novel modalities', 'Tdark': 'Understudied - limited data, high novelty potential' } return interpretations.get(tdl, 'Unknown') def search_disease_targets(tu, disease_name): """Find targets associated with a disease via Pharos.""" result = tu.tools.Pharos_get_disease_targets( disease=disease_name, top=50 ) if result.get('status') == 'success': targets = result['data'].get('targets', []) # Group by TDL for prioritization by_tdl = {'Tclin': [], 'Tchem': [], 'Tbio': [], 'Tdark': []} for t in targets: tdl = t.get('tdl', 'Unknown') if tdl in by_tdl: by_tdl[tdl].append(t) return by_tdl return None
Pharos Report Section (add to Section 9 - Druggability):
### 9.x Pharos/TCRD Target Classification (NEW) **Target Development Level**: Tchem **Protein Family**: Kinase **Novelty Score**: 0.35 (moderately studied) **Publication Count**: 12,456 **TDL Interpretation**: Target has validated small molecule activities with IC50 < 30nM. Good chemical starting points exist. **Disease Targets Analysis** (for disease-centric queries): | TDL | Count | Examples | |-----|-------|----------| | Tclin | 12 | EGFR, ALK, RET | | Tchem | 45 | KRAS, SHP2, CDK4 | | Tbio | 78 | Novel kinases | | Tdark | 23 | Understudied | *Source: Pharos/TCRD via `Pharos_get_target`*
7.2 DepMap - Target Essentiality Validation (NEW)
CRISPR knockout data from cancer cell lines to validate target essentiality:
def assess_target_essentiality(tu, ids): """ Is this target essential for cancer cell survival? Negative effect scores = gene is essential (cells die upon KO) """ gene_symbol = ids.get('symbol') if not gene_symbol: return {'status': 'error', 'message': 'Need gene symbol'} deps = tu.tools.DepMap_get_gene_dependencies( gene_symbol=gene_symbol ) if deps.get('status') == 'success': return { 'gene': gene_symbol, 'data': deps.get('data', {}), 'interpretation': 'Negative scores indicate gene is essential for cell survival', 'note': 'Score < -0.5 is strongly essential, < -1.0 is extremely essential' } return None def get_cancer_type_essentiality(tu, gene_symbol, cancer_type): """Check if gene is essential in specific cancer type.""" # Get cell lines for cancer type cell_lines = tu.tools.DepMap_get_cell_lines( cancer_type=cancer_type, page_size=20 ) return { 'gene': gene_symbol, 'cancer_type': cancer_type, 'cell_lines': cell_lines.get('data', {}).get('cell_lines', []), 'note': 'Query individual cell lines for dependency scores via DepMap portal' }
DepMap Report Section (add to Section 9 - Druggability):
### 9.x Target Essentiality (DepMap) (NEW) **Gene Essentiality Assessment**: | Context | Effect Score | Interpretation | |---------|--------------|----------------| | Pan-cancer | -0.42 | Moderately essential | | Lung cancer | -0.78 | Strongly essential | | Breast cancer | -0.21 | Weakly essential | **Selectivity**: Differential essentiality suggests cancer-type selective target **Cell Lines Tested**: 1,054 cancer cell lines from DepMap *Interpretation*: Score < -0.5 indicates strong dependency. This target is more essential in lung cancer than other cancer types - suggesting lung-selective targeting may be feasible. *Source: DepMap via `DepMap_get_gene_dependencies`*
7.3 InterProScan - Novel Domain Prediction (NEW)
For uncharacterized proteins, run InterProScan to predict domains and function:
def predict_protein_domains(tu, sequence, title="Query protein"): """ Run InterProScan for de novo domain prediction. Use when: - Protein has no InterPro annotations - Novel/uncharacterized protein - Custom sequence analysis """ result = tu.tools.InterProScan_scan_sequence( sequence=sequence, title=title, go_terms=True, pathways=True ) if result.get('status') == 'success': data = result.get('data', {}) # Job may still be running if data.get('job_status') == 'RUNNING': return { 'job_id': data.get('job_id'), 'status': 'running', 'note': 'Use InterProScan_get_job_results to retrieve when ready' } # Parse completed results return { 'domains': data.get('domains', []), 'domain_count': data.get('domain_count', 0), 'go_annotations': data.get('go_annotations', []), 'pathways': data.get('pathways', []), 'sequence_length': data.get('sequence_length') } return None def check_interproscan_job(tu, job_id): """Check status and get results for InterProScan job.""" status = tu.tools.InterProScan_get_job_status(job_id=job_id) if status.get('data', {}).get('is_finished'): results = tu.tools.InterProScan_get_job_results(job_id=job_id) return results.get('data', {}) return status.get('data', {})
When to use InterProScan:
- Novel/uncharacterized proteins (Tdark in Pharos)
- Custom sequences (e.g., protein variants)
- Proteins with outdated/sparse InterPro annotations
- Validating domain predictions
InterProScan Report Section (for novel proteins):
### Domain Prediction (InterProScan) (NEW) *Used for uncharacterized protein analysis* **Predicted Domains**: | Domain | Database | Start-End | E-value | InterPro Entry | |--------|----------|-----------|---------|----------------| | Protein kinase domain | Pfam | 45-305 | 1.2e-89 | IPR000719 | | SH2 domain | SMART | 320-410 | 3.4e-45 | IPR000980 | **Predicted GO Terms**: - GO:0004672 protein kinase activity - GO:0005524 ATP binding **Predicted Pathways**: - Reactome: Signal Transduction *Source: InterProScan via `InterProScan_scan_sequence`*
7.4 BindingDB - Known Ligands & Binding Data (NEW)
BindingDB provides experimental binding affinity data (Ki, IC50, Kd) for target-ligand pairs:
def get_bindingdb_ligands(tu, uniprot_id, affinity_cutoff=10000): """ Get ligands with measured binding affinities from BindingDB. Critical for: - Identifying chemical starting points - Understanding existing chemical matter - Assessing tractability with small molecules Args: uniprot_id: UniProt accession (e.g., P00533 for EGFR) affinity_cutoff: Maximum affinity in nM (lower = more potent) """ # Get ligands by UniProt result = tu.tools.BindingDB_get_ligands_by_uniprot( uniprot=uniprot_id, affinity_cutoff=affinity_cutoff ) if result: ligands = [] for entry in result: ligands.append({ 'smiles': entry.get('smile'), 'affinity_type': entry.get('affinity_type'), # Ki, IC50, Kd 'affinity_nM': entry.get('affinity'), 'monomer_id': entry.get('monomerid'), 'pmid': entry.get('pmid') }) # Sort by affinity (most potent first) ligands.sort(key=lambda x: float(x['affinity_nM']) if x['affinity_nM'] else float('inf')) return { 'total_ligands': len(ligands), 'ligands': ligands[:20], # Top 20 most potent 'best_affinity': ligands[0]['affinity_nM'] if ligands else None } return {'total_ligands': 0, 'ligands': [], 'note': 'No ligands found in BindingDB'} def get_ligands_by_structure(tu, pdb_id, affinity_cutoff=10000): """Get ligands for a protein by PDB structure ID.""" result = tu.tools.BindingDB_get_ligands_by_pdb( pdb_ids=pdb_id, affinity_cutoff=affinity_cutoff, sequence_identity=100 ) return result def find_compound_targets(tu, smiles, similarity_cutoff=0.85): """Find other targets for a compound (polypharmacology).""" result = tu.tools.BindingDB_get_targets_by_compound( smiles=smiles, similarity_cutoff=similarity_cutoff ) return result
BindingDB Report Section (add to Section 9 - Druggability):
### Known Ligands (BindingDB) (NEW) **Total Ligands with Binding Data**: 156 **Best Reported Affinity**: 0.3 nM (Ki) #### Most Potent Ligands | SMILES | Affinity Type | Value (nM) | Source PMID | |--------|---------------|------------|-------------| | CC(=O)Nc1ccc(cc1)c2... | Ki | 0.3 | 15737014 | | CN(C)C/C=C/C(=O)Nc1... | IC50 | 0.8 | 15896103 | | COc1cc2ncnc(Nc3ccc... | Kd | 2.1 | 16460808 | **Chemical Tractability Assessment**: - ✅ **Tchem-level target**: Multiple ligands with <30nM affinity - ✅ **Diverse chemotypes**: Multiple scaffolds identified - ✅ **Published literature**: Ligands have PMID references *Source: BindingDB via `BindingDB_get_ligands_by_uniprot`*
Affinity Interpretation for Druggability:
| Affinity Range | Interpretation | Drug Development Potential |
|---|---|---|
| <1 nM | Ultra-potent | Clinical compound likely |
| 1-10 nM | Highly potent | Drug-like |
| 10-100 nM | Potent | Good starting point |
| 100-1000 nM | Moderate | Needs optimization |
| >1000 nM | Weak | Early hit only |
7.5 PubChem BioAssay - Screening Data (NEW)
PubChem BioAssay provides HTS screening data and dose-response curves:
def get_pubchem_assays_for_target(tu, gene_symbol): """ Get bioassays targeting a gene from PubChem. Provides: - HTS screening results - Dose-response data (IC50/EC50) - Active compound counts """ # Search assays by target gene assays = tu.tools.PubChem_search_assays_by_target_gene( gene_symbol=gene_symbol ) assay_info = [] if assays.get('data', {}).get('aids'): for aid in assays['data']['aids'][:10]: # Top 10 assays # Get assay details summary = tu.tools.PubChem_get_assay_summary(aid=aid) targets = tu.tools.PubChem_get_assay_targets(aid=aid) assay_info.append({ 'aid': aid, 'summary': summary.get('data', {}), 'targets': targets.get('data', {}) }) return { 'total_assays': len(assays.get('data', {}).get('aids', [])), 'assay_details': assay_info } def get_active_compounds_from_assay(tu, aid): """Get active compounds from a specific bioassay.""" actives = tu.tools.PubChem_get_assay_active_compounds(aid=aid) return { 'aid': aid, 'active_cids': actives.get('data', {}).get('cids', []), 'count': len(actives.get('data', {}).get('cids', [])) }
PubChem BioAssay Report Section:
### PubChem BioAssay Data (NEW) **Assays Targeting This Gene**: 45 | AID | Assay Type | Active Compounds | Target Info | |-----|------------|------------------|-------------| | 1053104 | Dose-response | 12 | EGFR kinase | | 504526 | HTS | 234 | EGFR binding | | 651564 | Confirmatory | 8 | EGFR cellular | **Total Active Compounds Across Assays**: ~500 *Source: PubChem via `PubChem_search_assays_by_target_gene`, `PubChem_get_assay_active_compounds`*
PATH 8: Literature & Research (Collision-Aware)
Collision-Aware Query Strategy
def path_literature_collision_aware(tu, ids): """ Literature search with collision detection and filtering. """ symbol = ids['symbol'] full_name = ids.get('full_name', '') uniprot = ids['uniprot'] synonyms = ids.get('synonyms', []) # Step 1: Detect collisions collision_filter = detect_collisions(tu, symbol, full_name) # Step 2: Build high-precision seed queries seed_queries = [ f'"{symbol}"[Title] AND (protein OR gene OR expression)', # Symbol in title f'"{full_name}"[Title]' if full_name else None, # Full name in title f'"UniProt:{uniprot}"' if uniprot else None, # UniProt accession ] seed_queries = [q for q in seed_queries if q] # Add key synonyms for syn in synonyms[:3]: seed_queries.append(f'"{syn}"[Title]') # Step 3: Execute seed queries and collect PMIDs seed_pmids = set() for query in seed_queries: if collision_filter: query = f"({query}){collision_filter}" results = tu.tools.PubMed_search_articles(query=query, limit=30) for article in results.get('articles', []): seed_pmids.add(article.get('pmid')) # Step 4: Expand via citation network (for sparse targets) if len(seed_pmids) < 30: expanded_pmids = set() for pmid in list(seed_pmids)[:10]: # Top 10 seeds # Get related articles related = tu.tools.PubMed_get_related(pmid=pmid, limit=20) for r in related.get('articles', []): expanded_pmids.add(r.get('pmid')) # Get citing articles citing = tu.tools.EuropePMC_get_citations(pmid=pmid, limit=20) for c in citing.get('citations', []): expanded_pmids.add(c.get('pmid')) seed_pmids.update(expanded_pmids) # Step 5: Classify papers by evidence tier papers_by_tier = {'T1': [], 'T2': [], 'T3': [], 'T4': []} # ... classification logic based on title/abstract keywords return { 'total_papers': len(seed_pmids), 'collision_filter_applied': collision_filter if collision_filter else 'None needed', 'seed_queries': seed_queries, 'papers_by_tier': papers_by_tier }
Retry Logic & Fallback Chains
Retry Policy
For each critical tool, implement retry with exponential backoff:
def call_with_retry(tu, tool_name, params, max_retries=3): """ Call tool with retry logic. """ for attempt in range(max_retries): try: result = getattr(tu.tools, tool_name)(**params) if result and not result.get('error'): return result except Exception as e: if attempt < max_retries - 1: time.sleep(2 ** attempt) # Exponential backoff else: return {'error': str(e), 'tool': tool_name, 'attempts': max_retries} return None
Fallback Chains (CRITICAL)
| Primary Tool | Fallback 1 | Fallback 2 | Failure Action |
|---|---|---|---|
| | | Note in report |
| | | Note in report |
| | | Note in report |
| | Note as unavailable | Document in report |
| | - | Note in report |
| | | Note in report |
Failure Surfacing Rule
NEVER silently skip failed tools. Always document:
### 7.1 Tissue Expression **GTEx Data**: Unavailable (API timeout after 3 attempts) **Fallback Data (HPA)**: | Tissue | Expression Level | Specificity | |--------|-----------------|-------------| | Liver | High | Enhanced | | Kidney | Medium | - | *Note: For complete GTEx data, query directly at gtexportal.org*
Per-Section Data Minimums & Completeness Audit
Minimum Data Requirements (Enforced)
| Section | Minimum Data | If Not Met |
|---|---|---|
| 6. PPIs | ≥20 interactors | Document which tools failed + why |
| 7. Expression | Top 10 tissues with TPM + HPA RNA summary | Note "limited data" with specific gaps |
| 8. Disease | Top 10 OT diseases + gnomAD constraints + ClinVar summary | Separate SNV/CNV; note if constraint unavailable |
| 9. Druggability | OT tractability + probes + drugs + DGIdb + GtoPdb fallback | "No drugs/probes" is valid data |
| 11. Literature | Total count + 5-year trend + 3-5 key papers with evidence tiers | Note if sparse (<50 papers) |
Post-Run Completeness Audit
Before finalizing the report, run this checklist:
## Completeness Audit (REQUIRED) ### Data Minimums Check - [ ] PPIs: ≥20 interactors OR explanation why fewer - [ ] Expression: Top 10 tissues with values OR explicit "unavailable" - [ ] Diseases: Top 10 associations with scores OR "no associations" - [ ] Constraints: All 4 scores (pLI, LOEUF, missense Z, pRec) OR "unavailable" - [ ] Druggability: All modalities assessed; probes + drugs listed OR "none" ### Negative Results Documented - [ ] Empty tool results noted explicitly (not left blank) - [ ] Failed tools with fallbacks documented - [ ] "No data" sections have implications noted ### Evidence Quality - [ ] T1-T4 grades in Executive Summary disease claims - [ ] T1-T4 grades in Disease Associations table - [ ] Key papers table has evidence tiers - [ ] Per-section evidence summaries included ### Source Attribution - [ ] Every data point has source tool/database cited - [ ] Section-end source summaries present
Data Gap Table (Required if minimums not met)
## 15. Data Gaps & Limitations | Section | Expected Data | Actual | Reason | Alternative Source | |---------|---------------|--------|--------|-------------------| | 6. PPIs | ≥20 interactors | 8 | Novel target, limited studies | Literature review needed | | 7. Expression | GTEx TPM | None | Versioned ID not recognized | See HPA data | | 9. Probes | Chemical probes | None | No validated probes exist | Consider tool compound dev | **Recommendations for Data Gaps**: 1. For PPIs: Query BioGRID with broader parameters; check yeast-2-hybrid studies 2. For Expression: Query GEO directly for tissue-specific datasets
Report Template (Initial File)
File:
[TARGET]_target_report.md
# Target Intelligence Report: [TARGET NAME] **Generated**: [Date] | **Query**: [Original query] | **Status**: In Progress --- ## 1. Executive Summary [Researching...] <!-- REQUIRED: 2-3 sentences, disease claims must have T1-T4 grades --> ## 2. Target Identifiers [Researching...] <!-- REQUIRED: UniProt, Ensembl (versioned), Entrez, ChEMBL, HGNC, Symbol --> ## 3. Basic Information ### 3.1 Protein Description [Researching...] ### 3.2 Protein Function [Researching...] ### 3.3 Subcellular Localization [Researching...] ## 4. Structural Biology ### 4.1 Experimental Structures (PDB) [Researching...] <!-- METHOD: 3-step chain (UniProt xrefs → sequence search → domain search) --> ### 4.2 AlphaFold Prediction [Researching...] ### 4.3 Domain Architecture [Researching...] ### 4.4 Key Structural Features [Researching...] ## 5. Function & Pathways ### 5.1 Gene Ontology Annotations [Researching...] <!-- REQUIRED: Evidence codes mapped to T1-T4 --> ### 5.2 Pathway Involvement [Researching...] ## 6. Protein-Protein Interactions [Researching...] <!-- MINIMUM: ≥20 interactors OR explanation --> ## 7. Expression Profile ### 7.1 Tissue Expression (GTEx/HPA) [Researching...] <!-- NOTE: Use versioned Ensembl ID for GTEx if needed --> ### 7.2 Tissue Specificity [Researching...] <!-- MINIMUM: Top 10 tissues with TPM values --> ## 8. Genetic Variation & Disease ### 8.1 Constraint Scores [Researching...] <!-- REQUIRED: pLI, LOEUF, missense Z, pRec with interpretations --> ### 8.2 Disease Associations [Researching...] <!-- REQUIRED: Top 10 with OT scores; T1-T4 evidence grades --> ### 8.3 Clinical Variants (ClinVar) [Researching...] <!-- REQUIRED: Separate SNV and CNV tables --> ### 8.4 Mouse Model Phenotypes [Researching...] ## 9. Druggability & Pharmacology ### 9.1 Tractability Assessment [Researching...] <!-- REQUIRED: All modalities (SM, Ab, PROTAC, other) --> ### 9.2 Known Drugs [Researching...] ### 9.3 Chemical Probes [Researching...] <!-- NOTE: "No probes" is valid data - document explicitly --> ### 9.4 Clinical Pipeline [Researching...] ### 9.5 ChEMBL Bioactivity [Researching...] ## 10. Safety Profile ### 10.1 Safety Liabilities [Researching...] ### 10.2 Expression-Based Toxicity Risk [Researching...] ### 10.3 Mouse KO Phenotypes [Researching...] ## 11. Literature & Research Landscape ### 11.1 Publication Metrics [Researching...] <!-- REQUIRED: Total, 5y, 1y, drug-related, clinical --> ### 11.2 Research Trend [Researching...] ### 11.3 Key Publications [Researching...] <!-- REQUIRED: Table with PMID, title, year, evidence tier --> ### 11.4 Evidence Summary by Theme [Researching...] <!-- REQUIRED: T1-T4 breakdown per research theme --> ## 12. Competitive Landscape [Researching...] ## 13. Summary & Recommendations ### 13.1 Target Validation Scorecard [Researching...] <!-- REQUIRED: 6 criteria, 1-5 scores, evidence quality noted --> ### 13.2 Strengths [Researching...] ### 13.3 Challenges & Risks [Researching...] ### 13.4 Recommendations [Researching...] <!-- REQUIRED: ≥3 prioritized (HIGH/MEDIUM/LOW) --> ## 14. Data Sources & Methodology [Will be populated as research progresses...] ## 15. Data Gaps & Limitations [To be populated post-audit...]
Quick Reference: Tool Parameters
| Tool | Parameter | Notes |
|---|---|---|
| | NOT |
| | NOT |
| , | Try versioned ID if empty |
| | camelCase, not |
| , | List format for IDs |
| | UniProt accession |
When NOT to Use This Skill
- Simple protein lookup → Use
directlyUniProt_get_entry_by_accession - Drug information only → Use drug-focused tools
- Disease-centric query → Use disease-intelligence-gatherer skill
- Sequence retrieval → Use sequence-retrieval skill
- Structure download → Use protein-structure-retrieval skill
Use this skill for comprehensive, multi-angle target analysis with guaranteed data completeness.