install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-geo-data" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-geo-data && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-geo-data" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-geo-data && rm -rf "$T"
manifest:
skills/bio-geo-data/SKILL.mdsafety · automated scan (medium risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
- pip install
- downloads files (wget)
- references API keys
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-geo-data description: Query NCBI Gene Expression Omnibus (GEO) for expression datasets using Biopython Bio.Entrez. Use when finding microarray/RNA-seq datasets, downloading expression data, or linking GEO series to SRA runs. tool_type: python primary_tool: Bio.Entrez measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
GEO Data
Query and access Gene Expression Omnibus datasets using Biopython's Entrez module.
Required Setup
from Bio import Entrez Entrez.email = 'your.email@example.com' # Required by NCBI Entrez.api_key = 'your_api_key' # Optional
GEO Database Types
| Database | db value | Description |
|---|---|---|
| GEO DataSets | | Curated datasets (GDS*) |
| GEO Profiles | | Individual gene profiles |
GEO Record Types:
| Prefix | Type | Description |
|---|---|---|
| GSE | Series | Complete study/experiment |
| GSM | Sample | Individual sample |
| GPL | Platform | Array/sequencing platform |
| GDS | DataSet | Curated, normalized dataset |
Searching GEO
Search GEO DataSets (GDS)
from Bio import Entrez Entrez.email = 'your.email@example.com' # Search curated datasets handle = Entrez.esearch(db='gds', term='breast cancer AND Homo sapiens[orgn]', retmax=10) record = Entrez.read(handle) handle.close() print(f"Found {record['Count']} datasets") print(f"IDs: {record['IdList']}")
Search GEO Series (GSE)
# Search GEO Series via gds database # Use entry_type filter handle = Entrez.esearch(db='gds', term='RNA-seq[title] AND human[orgn] AND gse[entry_type]', retmax=10) record = Entrez.read(handle) handle.close()
Common Search Fields
| Field | Description | Example |
|---|---|---|
| Organism | |
| Dataset title | |
| Description text | |
| Platform GPL | |
| Record type | , |
| Study type | |
| PubMed ID | |
| Publication date | |
GDS Types
# Expression profiling by array term = 'expression profiling by array[gdstype] AND cancer' # RNA-seq expression term = 'expression profiling by high throughput sequencing[gdstype]' # ChIP-seq term = 'genome binding/occupancy profiling[gdstype]'
Fetching GEO Information
Get GEO DataSet Summary
# Fetch summary for GDS records handle = Entrez.esummary(db='gds', id='200024320') record = Entrez.read(handle) handle.close() summary = record[0] print(f"Accession: {summary['Accession']}") print(f"Title: {summary['title']}") print(f"Summary: {summary['summary'][:200]}...") print(f"Organism: {summary['taxon']}") print(f"Platform: {summary['GPL']}") print(f"Samples: {summary['n_samples']}")
Summary Fields
summary['Accession'] # GSE/GDS accession summary['title'] # Dataset title summary['summary'] # Description summary['taxon'] # Organism summary['GPL'] # Platform ID summary['n_samples'] # Number of samples summary['FTPLink'] # FTP download link summary['PubMedIds'] # Associated publications summary['gdsType'] # Dataset type summary['ptechType'] # Platform technology
Code Patterns
Search and List GEO Series
from Bio import Entrez Entrez.email = 'your.email@example.com' def search_geo(term, entry_type='gse', max_results=20): full_term = f'{term} AND {entry_type}[entry_type]' handle = Entrez.esearch(db='gds', term=full_term, retmax=max_results) search = Entrez.read(handle) handle.close() if not search['IdList']: return [] handle = Entrez.esummary(db='gds', id=','.join(search['IdList'])) summaries = Entrez.read(handle) handle.close() results = [] for s in summaries: results.append({ 'accession': s['Accession'], 'title': s['title'], 'organism': s['taxon'], 'samples': s['n_samples'], 'platform': s['GPL'] }) return results datasets = search_geo('breast cancer RNA-seq AND human[orgn]') for ds in datasets: print(f"{ds['accession']}: {ds['title'][:60]}... ({ds['samples']} samples)")
Find RNA-Seq Datasets
def find_rnaseq_datasets(organism, keywords, max_results=20): term = f'{keywords} AND {organism}[orgn] AND expression profiling by high throughput sequencing[gdstype] AND gse[entry_type]' handle = Entrez.esearch(db='gds', term=term, retmax=max_results) search = Entrez.read(handle) handle.close() if not search['IdList']: return [] handle = Entrez.esummary(db='gds', id=','.join(search['IdList'])) summaries = Entrez.read(handle) handle.close() return summaries datasets = find_rnaseq_datasets('Homo sapiens', 'COVID-19') for ds in datasets: print(f"{ds['Accession']}: {ds['n_samples']} samples - {ds['title'][:50]}...")
Get GSE Download Link
def get_geo_ftp(gse_accession): '''Get FTP download link for a GSE''' handle = Entrez.esearch(db='gds', term=f'{gse_accession}[accn]') search = Entrez.read(handle) handle.close() if not search['IdList']: return None handle = Entrez.esummary(db='gds', id=search['IdList'][0]) summary = Entrez.read(handle)[0] handle.close() return summary.get('FTPLink') ftp_link = get_geo_ftp('GSE123456') print(f"Download from: {ftp_link}")
Link GEO to SRA
Many GEO RNA-seq datasets have associated SRA data.
def geo_to_sra(gse_accession): '''Find SRA runs associated with a GEO series''' # Search GEO handle = Entrez.esearch(db='gds', term=f'{gse_accession}[accn]') search = Entrez.read(handle) handle.close() if not search['IdList']: return [] # Link to SRA handle = Entrez.elink(dbfrom='gds', db='sra', id=search['IdList'][0]) links = Entrez.read(handle) handle.close() if not links[0]['LinkSetDb']: return [] sra_ids = [link['Id'] for link in links[0]['LinkSetDb'][0]['Link']] # Get SRA accessions handle = Entrez.esummary(db='sra', id=','.join(sra_ids[:50])) summaries = Entrez.read(handle) handle.close() runs = [] for s in summaries: expxml = s.get('ExpXml', '') if 'SRR' in str(expxml) or 'SRX' in str(expxml): runs.append(s) return runs sra_data = geo_to_sra('GSE123456') print(f"Found {len(sra_data)} SRA records")
Search by PubMed ID
def geo_from_pubmed(pmid): '''Find GEO datasets associated with a publication''' handle = Entrez.elink(dbfrom='pubmed', db='gds', id=pmid) links = Entrez.read(handle) handle.close() if not links[0]['LinkSetDb']: return [] gds_ids = [link['Id'] for link in links[0]['LinkSetDb'][0]['Link']] handle = Entrez.esummary(db='gds', id=','.join(gds_ids)) summaries = Entrez.read(handle) handle.close() return summaries datasets = geo_from_pubmed('35412348') for ds in datasets: print(f"{ds['Accession']}: {ds['title']}")
Download GEO Data (GEOparse)
For full data parsing, use the GEOparse library:
# pip install GEOparse import GEOparse # Download and parse GSE gse = GEOparse.get_GEO('GSE123456') # Access metadata print(f"Title: {gse.metadata['title'][0]}") print(f"Samples: {len(gse.gsms)}") # Get sample metadata for gsm_name, gsm in gse.gsms.items(): print(f"{gsm_name}: {gsm.metadata['title'][0]}") # Get expression table if gse.gpls: gpl_name = list(gse.gpls.keys())[0] expression_table = gse.pivot_samples('VALUE')
Download Options
Direct FTP Download
# Download entire GSE wget -r -np -nd ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE123nnn/GSE123456/ # Download specific file types wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE123nnn/GSE123456/suppl/*counts*.txt.gz
Series Matrix Files
import gzip import urllib.request def download_series_matrix(gse): '''Download series matrix file''' gse_prefix = gse[:len(gse)-3] + 'nnn' url = f'https://ftp.ncbi.nlm.nih.gov/geo/series/{gse_prefix}/{gse}/matrix/{gse}_series_matrix.txt.gz' filename = f'{gse}_series_matrix.txt.gz' urllib.request.urlretrieve(url, filename) return filename
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Empty results | Wrong entry_type | Add or |
| No FTPLink | Superseries or no data | Check if series has supplementary files |
| No SRA link | Microarray data | SRA only for sequencing data |
Decision Tree
Need GEO expression data? ├── Looking for curated datasets? │ └── Search gds with [entry_type]=gds ├── Looking for any experiment? │ └── Search gds with [entry_type]=gse ├── Want RNA-seq specifically? │ └── Add 'expression profiling by high throughput sequencing[gdstype]' ├── Have a publication? │ └── Link pubmed -> gds ├── Need raw sequencing data? │ └── Link gds -> sra, then use sra-data skill ├── Need processed expression matrix? │ └── Download series matrix or use GEOparse └── Need full metadata? └── Use GEOparse library
Related Skills
- entrez-search - General database searching
- entrez-link - Link GEO to SRA and other databases
- sra-data - Download raw sequencing data from linked SRA
- batch-downloads - Download multiple GEO records