BioClaw query-geo
Query NCBI GEO for gene expression datasets. Use when user asks about RNA-seq datasets, microarray data, expression data, GEO accessions, or finding public datasets. Triggers on "geo", "gene expression omnibus", "expression dataset", "RNA-seq dataset", "microarray dataset", "GSE", "GDS".
install
source · Clone the upstream repo
git clone https://github.com/Runchuan-BU/BioClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Runchuan-BU/BioClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/container/skills/query-geo" ~/.claude/skills/runchuan-bu-bioclaw-query-geo && rm -rf "$T"
manifest:
container/skills/query-geo/SKILL.mdsource content
NCBI GEO Database Query
Query Gene Expression Omnibus for public expression datasets.
When to Use
- User wants to find RNA-seq or microarray datasets
- User asks about gene expression studies for a disease/tissue
- User provides a GEO accession (GSE/GDS) to look up
- User wants to download expression data
How to Execute
from Bio import Entrez import json Entrez.email = "bioclaw@example.com" # 1. Search GEO datasets def search_geo(query, max_results=10, db="gds"): handle = Entrez.esearch(db=db, term=query, retmax=max_results, sort="relevance") record = Entrez.read(handle) handle.close() return record # 2. Get dataset summaries def geo_summary(id_list, db="gds"): ids = ",".join(str(i) for i in id_list) handle = Entrez.esummary(db=db, id=ids, retmode="json") result = json.loads(handle.read()) handle.close() return result # 3. Search for Series (GSE) def search_gse(keyword, organism="Homo sapiens", max_results=10): query = f'"{keyword}" AND "{organism}"[Organism] AND gse[ETYP]' return search_geo(query, max_results) # Example: Find breast cancer RNA-seq datasets search = search_gse("breast cancer RNA-seq", max_results=5) print(f"Found {search['Count']} datasets") if search['IdList']: summaries = geo_summary(search['IdList']) for uid in search['IdList']: info = summaries['result'].get(str(uid), {}) title = info.get('title', 'N/A') gse = info.get('accession', 'N/A') gpl = info.get('gpl', 'N/A') n_samples = info.get('n_samples', 'N/A') summary = info.get('summary', 'N/A')[:200] print(f"\n{gse}: {title}") print(f" Platform: {gpl}, Samples: {n_samples}") print(f" Summary: {summary}...") print(f" URL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc={gse}")
Search Syntax
- By keyword:
"CRISPR" AND gse[ETYP] - By organism:
"Homo sapiens"[Organism] - By platform:
"Illumina"[Platform] - By date:
"2024/01:2026/12"[PDAT] - Combine:
"breast cancer" AND "RNA-seq" AND "Homo sapiens"[Organism] AND gse[ETYP]
Follow-up Suggestions
- "Want me to download the expression matrix for this dataset?"
- "Should I do differential expression analysis?"
- "Want me to check what genes are differentially expressed?"