Claude-scientific-skills database-lookup

Search 78 public scientific, biomedical, materials science, and economic databases via REST APIs. Covers physics/astronomy (NASA, NIST, SDSS, SIMBAD), earth/environment (USGS, NOAA, EPA), chemistry/drugs (PubChem, ChEMBL, DrugBank, FDA, KEGG, ZINC, BindingDB), materials (Materials Project, COD), biology/genomics (Reactome, UniProt, STRING, Ensembl, NCBI Gene, GEO, GTEx, PDB, AlphaFold, InterPro, BioGRID, Gene Ontology, dbSNP, gnomAD, ENCODE, Human Protein Atlas, Human Cell Atlas), disease/clinical (COSMIC, Open Targets, ClinicalTrials.gov, OMIM, ClinVar, GDC/TCGA, cBioPortal, DisGeNET, GWAS Catalog), regulatory (FDA, USPTO, SEC EDGAR), economics/finance (FRED, World Bank, US Treasury), demographics (US Census, Eurostat, WHO). Use when looking up compounds, genes, proteins, pathways, variants, clinical trials, patents, economic indicators, or any public database API query.

install
source · Clone the upstream repo
git clone https://github.com/K-Dense-AI/scientific-agent-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/K-Dense-AI/scientific-agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/database-lookup" ~/.claude/skills/k-dense-ai-claude-scientific-skills-database-lookup && rm -rf "$T"
manifest: scientific-skills/database-lookup/SKILL.md
source content

Database Lookup

You have access to 78 public databases through their REST APIs. Your job is to figure out which database(s) are relevant to the user's question, query them, and return the raw JSON results along with which databases you used.

Core Workflow

  1. Understand the query — What is the user looking for? A compound? A gene? A pathway? A patent? Expression data? An economic indicator? This determines which database(s) to hit.

  2. Select database(s) — Use the database selection guide below. When in doubt, search multiple databases — it's better to cast a wide net than to miss relevant data.

  3. Read the reference file — Each database has a reference file in

    references/
    with endpoint details, query formats, and example calls. Read the relevant file(s) before making API calls.

  4. Make the API call(s) — See the Making API Calls section below for which HTTP fetch tool to use on your platform.

  5. Return results — Always return:

    • The raw JSON response from each database
    • A list of databases queried with the specific endpoints used
    • If a query returned no results, say so explicitly rather than omitting it

Database Selection Guide

Match the user's intent to the right database(s). Many queries benefit from hitting multiple databases.

Physics & Astronomy

User is asking about...Primary database(s)Also consider
Near-Earth objects, asteroidsNASA (NeoWs)
Mars rover imagesNASA (Mars Rover Photos)
Exoplanets, orbital parametersNASA Exoplanet Archive
Astronomical objects by name/coordinatesSIMBADSDSS
Galaxy/star spectra, photometrySDSSSIMBAD
Physical constantsNIST
Atomic spectra, spectral linesNIST (ASD)

Earth & Environmental Sciences

User is asking about...Primary database(s)Also consider
Earthquakes, seismic eventsUSGS Earthquakes
Water data, streamflow, groundwaterUSGS Water Services
Weather (current, forecast, historical)OpenWeatherMapNOAA
Climate data, historical weather stationsNOAA (CDO)
Air quality, toxic releasesEPA (Envirofacts)

Chemistry & Drugs

User is asking about...Primary database(s)Also consider
Chemical compounds, moleculesPubChemChEMBL
Molecular properties (weight, formula, SMILES)PubChem
Drug synonyms, CAS numbersPubChem (synonyms)DrugBank
Bioactivity data, IC50, binding assaysChEMBLBindingDB, PubChem
Drug binding affinities (Ki, IC50, Kd)ChEMBL, BindingDBPubChem
Drug-target interactionsChEMBL, DrugBankBindingDB, Open Targets
Ligands for a protein target (by UniProt)BindingDBChEMBL
Target identification from compound structureBindingDB (SMILES similarity)ChEMBL
Drug labels, adverse events, recallsFDA (OpenFDA)DailyMed
Drug labels (structured product labels)DailyMedFDA (OpenFDA)
Drug pharmacology, indicationsDrugBankFDA
Chemical cross-referencingPubChem (xrefs)ChEMBL
Commercially available compounds for screeningZINCPubChem
Similarity/substructure search (purchasable)ZINCPubChem, ChEMBL
Drug-like compound libraries, building blocksZINC
FDA-approved drug structuresZINC (fda subset)PubChem, FDA
Compound purchasability, vendor catalogsZINC

Materials Science & Crystallography

User is asking about...Primary database(s)Also consider
Materials by formula or elementsMaterials ProjectCOD
Band gap, electronic structureMaterials Project
Crystal structures, CIF filesCODMaterials Project
Elastic/mechanical propertiesMaterials Project
Formation energy, thermodynamicsMaterials Project
Cell parameters, space groupsCODMaterials Project

Biology & Genomics

User is asking about...Primary database(s)Also consider
Biological pathwaysReactome, KEGG
What pathways a gene/protein is inReactome (mapping), KEGG
Enzyme kinetics, catalytic activityBRENDAKEGG
Metabolomics studies, metabolite profilesMetabolomics WorkbenchPubChem
m/z or exact mass lookupMetabolomics Workbench (moverz/exactmass)PubChem
Protein sequence, function, annotationUniProtEnsembl
Protein-protein interactionsSTRINGBioGRID
Gene information, genomic locationNCBI GeneEnsembl
Genome sequences, variants, transcriptsEnsemblNCBI Gene
Gene expression datasetsGEO (NCBI E-utilities)
Gene expression across tissuesGTExHuman Protein Atlas
Gene expression signatures (CMap/L1000)LINCS L1000GEO
Gene set enrichment vs GEORummaGEOGEO
Protein sequences (NCBI)NCBI ProteinUniProt
Taxonomic classificationNCBI Taxonomy
SNP/variant data (dbSNP)dbSNPClinVar, gnomAD
Population variant frequenciesgnomADdbSNP
Sequencing run metadataSRAENA, GEO
Nucleotide sequences (European archive)ENASRA, NCBI Gene
Genome assemblies, raw reads (European)ENASRA, Ensembl
Cross-references from sequence accessionsENA (xref)NCBI Gene, UniProt
Genome annotations, tracksUCSC Genome BrowserEnsembl
3D protein structures (experimental)PDB (RCSB)EMDB
3D protein structures (predicted)AlphaFold DBPDB
EM maps, cryo-EM structuresEMDBPDB
Protein families, domainsInterProUniProt
Chemical entities (biological)ChEBIPubChem
Protein/genetic interactionsBioGRIDSTRING
Gene function annotations (GO terms)QuickGOGene Ontology
Regulatory elements, ChIP-seq, ATAC-seqENCODE
TF binding profiles/motifsJASPARENCODE
Protein expression across tissuesHuman Protein AtlasUniProt
Single-cell atlas projectsHuman Cell Atlas
Proteomics datasetsPRIDE
Mouse gene dataMouseMineNCBI Gene
Plasmid repositoryAddgene

Organism/species matters. Most biology databases cover multiple organisms. If the user's query is about a specific organism, pass it explicitly — don't assume human. Common patterns: Ensembl uses

{species}
in the URL path (e.g.
homo_sapiens
), STRING/BioGRID/QuickGO use NCBI taxon IDs (
species=9606
for human,
10090
for mouse), UniProt uses
organism_id:9606
in search queries, KEGG uses organism codes (
hsa
,
mmu
). GTEx and Human Protein Atlas are human-only. Check the reference file for each database's specific parameter.

Disease & Clinical

User is asking about...Primary database(s)Also consider
Somatic mutations in cancerCOSMICOpen Targets, cBioPortal
Cancer genomics (TCGA)GDC (TCGA)COSMIC, cBioPortal
Cancer study mutations, CNA, expressioncBioPortalGDC (TCGA), COSMIC
Tumor clinical data (survival, staging)cBioPortalGDC (TCGA)
Drug-target-disease associationsOpen TargetsChEMBL
Gene-disease associationsDisGeNETOpen Targets, Monarch
Mendelian disease-gene relationshipsOMIMNCBI Gene
Variant clinical significanceClinVar (NCBI)OMIM
GWAS SNP-trait associationsGWAS Catalog
Disease-phenotype-gene linksMonarch InitiativeHPO
Phenotype ontology, HPO termsHPOMonarch
Pharmacogenomics, drug-gene interactionsClinPGx (PharmGKB)DrugBank
Clinical trials for a drug/diseaseClinicalTrials.govFDA
Disease-related expression dataGEOOpen Targets

Patents & Regulatory

User is asking about...Primary database(s)Also consider
Patents by keyword or technologyUSPTO (PatentsView)
Patents by inventor or assigneeUSPTO (PatentsView)
Patent prosecution statusUSPTO (PEDS)
Trademark lookupUSPTO (TSDR)
SEC company filings, 10-K, 10-QSEC EDGAR

Economics & Finance

User is asking about...Primary database(s)Also consider
US economic time series (GDP, CPI, rates)FREDBEA
Employment, wages, labor statisticsBLSFRED
GDP, national accountsBEAFRED, World Bank
International development indicatorsWorld BankFRED
Interest rates, money supplyFederal ReserveFRED
Euro exchange rates, ECB monetary statsECB
US debt, yield curves, fiscal dataUS TreasuryFRED
Stock prices, forex, cryptoAlpha Vantage
Statistical data across many topicsData Commons

Social Sciences & Demographics

User is asking about...Primary database(s)Also consider
US population, housing, income dataUS CensusData Commons
EU statistics (economy, trade, health)EurostatWorld Bank
Global health indicators (mortality, disease)WHO GHOWorld Bank

Cross-domain queries

User is asking about...Primary database(s)Also consider
Everything about a compoundPubChem + ChEMBL + DrugBankBindingDB, ZINC, Reactome, FDA
Everything about a geneNCBI Gene + UniProt + EnsemblReactome, STRING, COSMIC, cBioPortal, ENA
Everything about a variantdbSNP + ClinVar + gnomADGWAS Catalog, COSMIC, cBioPortal
Drug target pathwaysChEMBL + ReactomeOpen Targets, GEO
Prior art for a chemical inventionUSPTO + PubChemChEMBL
Everything about a materialMaterials Project + COD
US economic overviewFRED + BLS + BEAFederal Reserve

When the user's query spans multiple domains (e.g. "what do we know about aspirin" or "find everything about BRCA1"), query all relevant databases in parallel.

Common Identifier Formats

Different databases use different identifier systems. If a query fails, the identifier format may be wrong. Here's a quick reference:

IdentifierFormatExampleUsed by
UniProt accession
P#####
or
Q#####
P04637
(TP53)
UniProt, STRING, AlphaFold, Reactome mapping
Ensembl gene ID
ENSG###########
ENSG00000141510
Ensembl, Open Targets, GTEx
NCBI Gene IDInteger
7157
(TP53)
NCBI Gene, GEO, DisGeNET, HPO
HGNC ID
HGNC:#####
HGNC:11998
Monarch
PubChem CIDInteger
2244
(aspirin)
PubChem
ZINC ID
ZINC
+ 15 digits
ZINC000000000053
(aspirin)
ZINC
ENA Project
PRJEB
+ digits
PRJEB40665
ENA
ENA Run
ERR
+ digits
ERR1234567
ENA
ENA Experiment
ERX
+ digits
ERX1234567
ENA
ENA Sample
ERS
+ digits
ERS1234567
ENA
ChEMBL ID
CHEMBL####
CHEMBL25
(aspirin)
ChEMBL
Reactome stable ID
R-HSA-######
R-HSA-109581
Reactome
HP term
HP:#######
HP:0001250
(seizure)
HPO (URL-encode colon as %3A)
MONDO disease
MONDO:#######
MONDO:0007947
Monarch
GO term
GO:#######
GO:0008150
QuickGO, Gene Ontology
dbSNP rsID
rs########
rs334
dbSNP, GWAS Catalog, gnomAD
GENCODE ID
ENSG###.##
(versioned)
ENSG00000139618.17
GTEx (requires version suffix)

Identifier Resolution

When a database doesn't recognize an identifier, convert it using these workflows:

Genes: Symbol (e.g. "TP53") → look up in NCBI Gene (esearch by symbol) → get NCBI Gene ID → convert to Ensembl ID via Ensembl

/xrefs/symbol/homo_sapiens/{symbol}
, or to UniProt accession via UniProt search (
gene_exact:{symbol} AND organism_id:9606
).

Compounds: Name → PubChem

/compound/name/{name}/cids/JSON
→ get CID → convert to ChEMBL ID via UniChem or ChEMBL molecule search. If name lookup fails, try SMILES, InChIKey, or CAS number.

Variants: rsID (e.g. "rs334") works directly in dbSNP, ClinVar, GWAS Catalog, gnomAD. For genomic coordinates, use Ensembl VEP to get consequence annotations and linked rsIDs.

Diseases: Name → Open Targets or Monarch search → get EFO or MONDO ID → use in downstream queries.

POST-Only APIs

These databases require HTTP POST and will not work with WebFetch (GET-only). Use

curl
via your platform's shell tool instead:

DatabaseWhy POST neededExample
Open TargetsGraphQL endpoint
curl -X POST -H "Content-Type: application/json" -d '{"query":"..."}' https://api.platform.opentargets.org/api/v4/graphql
gnomADGraphQL endpoint
curl -X POST -H "Content-Type: application/json" -d '{"query":"..."}' https://gnomad.broadinstitute.org/api
RummaGEOPOST-only enrichment
curl -X POST -H "Content-Type: application/json" -d '{"genes":["..."]}' https://rummageo.com/api/enrich
GDC/TCGAComplex filter queries
curl -X POST -H "Content-Type: application/json" -d '{"filters":...}' https://api.gdc.cancer.gov/ssms
SEC EDGARRequires User-Agent header
curl -H "User-Agent: YourApp you@email.com" https://efts.sec.gov/LATEST/search-index?q=...

API Keys and Access Restrictions

Some databases require API keys or have access restrictions. When an API key is needed:

  1. Check the current environment first — the key may already be exported as a shell environment variable (e.g.
    $FRED_API_KEY
    ). Read it directly from the environment.
  2. Fall back to
    .env
    — if the variable isn't in the environment, check the
    .env
    file in the current working directory.
  3. If neither has it — proceed without the key (most APIs still work at lower rate limits) and tell the user which key is missing and how to get one.

Databases requiring API keys (free registration)

DatabaseEnv VariableRegistration URL
FRED
FRED_API_KEY
https://fred.stlouisfed.org/docs/api/api_key.html
BEA
BEA_API_KEY
https://apps.bea.gov/API/signup/
BLS
BLS_API_KEY
https://data.bls.gov/registrationEngine/
NCBI (GEO, Gene)
NCBI_API_KEY
https://www.ncbi.nlm.nih.gov/account/settings/
OpenFDA
OPENFDA_API_KEY
https://open.fda.gov/apis/authentication/
USPTO (PatentsView)
PATENTSVIEW_API_KEY
https://patentsview.org/apis/keyrequest
Data Commons
DATACOMMONS_API_KEY
Google Cloud Console
Materials Project
MP_API_KEY
https://materialsproject.org (free account)
NASA
NASA_API_KEY
https://api.nasa.gov (free, DEMO_KEY available)
NOAA (CDO)
NOAA_API_KEY
https://www.ncdc.noaa.gov/cdo-web/token
OpenWeatherMap
OPENWEATHERMAP_API_KEY
https://openweathermap.org/appid
OMIM
OMIM_API_KEY
https://omim.org/api (free academic)
BioGRID
BIOGRID_API_KEY
https://webservice.thebiogrid.org (free)
Alpha Vantage
ALPHAVANTAGE_API_KEY
https://www.alphavantage.co/support/#api-key
US Census
CENSUS_API_KEY
https://api.census.gov/data/key_signup.html
DisGeNET
DISGENET_API_KEY
https://www.disgenet.org (free academic)
Addgene
ADDGENE_API_KEY
https://www.addgene.org (free account)
LINCS L1000 (CLUE)
CLUE_API_KEY
https://clue.io (free academic)

These are all free to obtain. The APIs work without keys but have lower rate limits. Always try with a key first — if the env variable isn't set, proceed without the key and note in your response that rate limits may be lower.

Databases with paid or restricted access

DatabaseRestrictionFree alternative
DrugBankPaid API license requiredUse ChEMBL + PubChem + OpenFDA instead
COSMICFree academic registration required (JWT auth)Use Open Targets for cancer mutation data
BRENDAFree registration required (SOAP, not REST)Use KEGG for enzyme/pathway data

When a database requires paid access or registration the user hasn't set up:

  1. Fall back to a free alternative that can answer the same question
  2. Tell the user which database you couldn't access, why, and what you used instead
  3. If the user specifically requests a restricted database, explain the access requirements so they can set it up

Loading API keys

Step 1 — Check the current environment. The key may already be exported as a shell variable. For example, in Claude Code you can check with Bash:

echo $FRED_API_KEY
. If the variable is set and non-empty, use it.

Step 2 — Check

.env
file. If the environment variable isn't set, read
.env
from the current working directory. Format:

FRED_API_KEY=your_key_here
BEA_API_KEY=your_key_here

Step 3 — Proceed without. If neither source has the key, proceed without it (most APIs still work at lower rate limits) and mention this to the user.

Making API Calls

Use your environment's HTTP fetch tool to call REST endpoints. The tool name varies by platform:

PlatformHTTP Fetch ToolFallback
Claude Code
WebFetch
curl
via Bash
Gemini CLI
web_fetch
curl
via shell
Windsurf
read_url_content
curl
via terminal
CursorNo dedicated fetch tool
curl
via
run_terminal_cmd
Codex CLINo dedicated fetch tool
curl
via
shell
ClineNo dedicated fetch tool
curl
via
execute_command

If you don't recognize your platform or the fetch tool fails, fall back to

curl
via whatever shell/terminal tool is available. Example:

curl -s -H "Accept: application/json" "https://api.example.com/endpoint"

Request guidelines

  • Set
    Accept: application/json
    header where supported
  • URL-encode special characters in query parameters — SMILES strings (
    /
    ,
    #
    ,
    =
    ,
    @
    ), compound names with parentheses, and ontology terms with colons (
    HP:0001250
    HP%3A0001250
    ) are common sources of failures. With
    curl
    , use
    --data-urlencode
    for safety.
  • Parallel OK: When querying different databases (e.g., PubChem + ChEMBL + Reactome), run them in parallel — most APIs have generous rate limits.
  • Serialize requests to rate-limited APIs: NCBI APIs (Gene, GEO, Protein, Taxonomy, dbSNP, SRA) at 3 req/sec without key, 10 with key. Also watch: Ensembl (15 req/sec), BLS v1 (25 req/day without key), SEC EDGAR (10 req/sec), NOAA (5 req/sec with token).
  • If you get a rate-limit error (HTTP 429 or 503), wait briefly and retry once

Error recovery

If an API returns an error or empty results:

  1. Check the identifier format — use the Common Identifier Formats table above. A gene symbol may need to be converted to NCBI Gene ID or Ensembl ID first.
  2. Try alternative identifiers — if a compound name fails in PubChem, try SMILES, InChIKey, or CID. If a gene symbol fails, try the NCBI Gene ID.
  3. Try a different database — if one database is down or returns nothing, check the "Also consider" column in the selection guide for alternatives.
  4. Report the failure — tell the user which database failed, the error, and what you tried instead.

Pagination

Many APIs return paginated results — if you only read the first page, you may miss data. Common patterns:

  • Offset/Limit:
    offset=0&limit=100
    → increment offset by limit for the next page (ChEMBL, FRED, NOAA, USGS, NCBI E-utilities, ENA, GDC, FDA)
  • Cursor-based: Response includes a
    nextPageToken
    or
    cursor
    value — pass it in the next request (ClinicalTrials.gov, UniProt)
  • Page number:
    page=1&per_page=50
    → increment page (World Bank, cBioPortal, ZINC)

Check the reference file for each database's specific pagination parameters. If a response includes

total
,
totalCount
, or
next
and the number of returned results is less than the total, there are more pages.

For targeted lookups (single gene, single compound), the first page is usually sufficient. Paginate when the user needs comprehensive results (e.g., "all clinical trials for X" or "all known variants in gene Y").

Output Format

Structure your response like this:

## Databases Queried
- **PubChem** — /compound/name/aspirin/property/...
- **Reactome** — /search/query?query=aspirin

## Results

### PubChem
[raw JSON response]

### Reactome
[raw JSON response]

If results are very large, present the most relevant portion and note that additional data is available. But default to showing the full raw JSON — the user asked for it.

Adding New Databases

This skill is designed to grow. Each database is a self-contained reference file in

references/
. To add a new database:

  1. Create
    references/<database-name>.md
    following the same format as existing files
  2. Add an entry to the database selection guide above
  3. The reference file should include: base URL, key endpoints, query parameter formats, example calls, rate limits, and response structure

Available Databases

Read the relevant reference file before making any API call.

Physics & Astronomy

DatabaseReference FileWhat it covers
NASA
references/nasa.md
NEO asteroids, Mars rover, APOD
NASA Exoplanet Archive
references/nasa-exoplanet-archive.md
Exoplanets, orbital parameters
NIST
references/nist.md
Physical constants, atomic spectra
SDSS
references/sdss.md
Galaxy/star spectra, photometry
SIMBAD
references/simbad.md
Astronomical object catalog

Earth & Environmental Sciences

DatabaseReference FileWhat it covers
USGS
references/usgs.md
Earthquakes, water data
NOAA
references/noaa.md
Climate, weather station data
EPA
references/epa.md
Air quality, toxic releases
OpenWeatherMap
references/openweathermap.md
Weather current/forecast

Chemistry & Drugs

DatabaseReference FileWhat it covers
PubChem
references/pubchem.md
Compounds, properties, synonyms
ChEMBL
references/chembl.md
Bioactivity, drug discovery
DrugBank
references/drugbank.md
Drug data, interactions (paid)
FDA (OpenFDA)
references/fda.md
Drug labels, adverse events, recalls
DailyMed
references/dailymed.md
Drug labels (NIH/NLM)
KEGG
references/kegg.md
Pathways, genes, compounds
ChEBI
references/chebi.md
Chemical entities of biological interest
ZINC
references/zinc.md
Commercially available compounds, virtual screening
BindingDB
references/bindingdb.md
Experimentally measured binding affinities

Materials Science

DatabaseReference FileWhat it covers
Materials Project
references/materials-project.md
Band gaps, elastic properties, crystal structures
COD
references/cod.md
Crystal structures, CIF files

Biology & Genomics

DatabaseReference FileWhat it covers
Reactome
references/reactome.md
Biological pathways, reactions
BRENDA
references/brenda.md
Enzyme kinetics, catalysis (SOAP)
UniProt
references/uniprot.md
Protein sequences, function
STRING
references/string.md
Protein-protein interactions
Ensembl
references/ensembl.md
Genomes, variants, sequences
NCBI Gene
references/ncbi-gene.md
Gene information, links
NCBI Protein
references/ncbi-protein.md
Protein sequences, records
NCBI Taxonomy
references/ncbi-taxonomy.md
Taxonomic classification
GEO (NCBI)
references/geo.md
Gene expression datasets
GTEx
references/gtex.md
Gene expression across tissues
PDB
references/pdb.md
Protein 3D structures
AlphaFold DB
references/alphafold.md
Predicted protein structures
EMDB
references/emdb.md
Electron microscopy maps
InterPro
references/interpro.md
Protein families, domains
BioGRID
references/biogrid.md
Protein/genetic interactions
Gene Ontology
references/gene-ontology.md
GO terms, gene annotations
QuickGO
references/quickgo.md
GO annotations (EBI, recommended)
dbSNP
references/dbsnp.md
SNP/variant data
SRA
references/sra.md
Sequencing run metadata
gnomAD
references/gnomad.md
Population variant frequencies (POST)
UCSC Genome Browser
references/ucsc-genome.md
Genome annotations, tracks
ENCODE
references/encode.md
DNA elements, ChIP-seq, ATAC-seq
JASPAR
references/jaspar.md
TF binding profiles/motifs
Human Protein Atlas
references/human-protein-atlas.md
Protein expression across tissues
Human Cell Atlas
references/hca.md
Single-cell atlas data
LINCS L1000
references/lincs-l1000.md
Gene expression signatures (CMap)
RummaGEO
references/rummageo.md
GEO gene set enrichment (POST)
PRIDE
references/pride.md
Proteomics data repository
Metabolomics Workbench
references/metabolomics-workbench.md
Metabolomics studies, metabolites
MouseMine
references/mousemine.md
Mouse genome informatics
ENA
references/ena.md
Nucleotide sequences, reads, assemblies, taxonomy (EMBL-EBI)
Addgene
references/addgene.md
Plasmid repository

Disease & Clinical

DatabaseReference FileWhat it covers
Open Targets
references/opentargets.md
Target-disease associations (POST)
COSMIC
references/cosmic.md
Somatic mutations in cancer
ClinPGx (PharmGKB)
references/clinpgx.md
Pharmacogenomics
ClinicalTrials.gov
references/clinicaltrials.md
Clinical trial registry
OMIM
references/omim.md
Mendelian disease-gene data
ClinVar
references/clinvar.md
Variant clinical significance
GDC (TCGA)
references/tcga-gdc.md
Cancer genomics, mutations (POST)
cBioPortal
references/cbioportal.md
Cancer study mutations, CNA, expression, clinical data
DisGeNET
references/disgenet.md
Gene-disease associations
GWAS Catalog
references/gwas-catalog.md
GWAS SNP-trait associations
Monarch Initiative
references/monarch.md
Disease-phenotype-gene links
HPO
references/hpo.md
Human Phenotype Ontology

Patents & Regulatory

DatabaseReference FileWhat it covers
USPTO
references/uspto.md
Patents, trademarks
SEC EDGAR
references/sec-edgar.md
Company filings (needs User-Agent header)

Economics & Finance

DatabaseReference FileWhat it covers
FRED
references/fred.md
US economic time series
Federal Reserve
references/federal-reserve.md
Monetary/financial data
BEA
references/bea.md
GDP, national accounts
BLS
references/bls.md
Employment, wages, CPI
World Bank
references/worldbank.md
Development indicators
ECB
references/ecb.md
Euro exchange rates, monetary stats
US Treasury
references/treasury.md
Debt, yield curves, fiscal data
Alpha Vantage
references/alphavantage.md
Stocks, forex, crypto
Data Commons
references/datacommons.md
Statistical knowledge graph

Social Sciences & Demographics

DatabaseReference FileWhat it covers
US Census
references/census.md
Population, housing, economic surveys
Eurostat
references/eurostat.md
EU statistics
WHO GHO
references/who.md
Global health indicators