install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-blast-searches" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-blast-searches && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-blast-searches" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-blast-searches && rm -rf "$T"
manifest:
skills/bio-blast-searches/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-blast-searches description: Run remote BLAST searches against NCBI databases using Biopython Bio.Blast. Use when identifying unknown sequences, finding homologs, or searching for sequence similarity against NCBI's nr/nt databases. tool_type: python primary_tool: Bio.Blast.NCBIWWW measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
BLAST Searches
Run BLAST searches against NCBI databases using Biopython's Bio.Blast module.
Required Import
from Bio.Blast import NCBIWWW, NCBIXML from Bio import SeqIO
BLAST Programs
| Program | Query | Database | Use Case |
|---|---|---|---|
| Nucleotide | Nucleotide | DNA/RNA sequence similarity |
| Protein | Protein | Protein sequence similarity |
| Nucleotide | Protein | Find protein hits for DNA query |
| Protein | Nucleotide | Find DNA encoding protein-like |
| Nucleotide | Nucleotide | Translated vs translated |
Core Function
NCBIWWW.qblast()
Submit a BLAST query to NCBI servers.
from Bio.Blast import NCBIWWW # Simple BLASTN search result_handle = NCBIWWW.qblast('blastn', 'nt', sequence)
Key Parameters:
| Parameter | Description | Example |
|---|---|---|
| BLAST program | , |
| Target database | , , |
| Query sequence | String or SeqRecord |
| Limit by Entrez query | |
| Max hits to return | |
| E-value threshold | |
| Word size | for blastn |
| Gap penalties | (open, extend) |
| Output format | (default), |
Common Databases
Nucleotide:
| Database | Description |
|---|---|
| All GenBank + EMBL + DDBJ |
| RefSeq RNA sequences |
| RefSeq genomic sequences |
Protein:
| Database | Description |
|---|---|
| Non-redundant protein |
| RefSeq proteins |
| SwissProt (curated) |
| Protein structures |
Parsing Results
NCBIXML Parser
from Bio.Blast import NCBIWWW, NCBIXML # Run BLAST result_handle = NCBIWWW.qblast('blastn', 'nt', sequence) # Parse XML results blast_record = NCBIXML.read(result_handle) result_handle.close() # Iterate hits for alignment in blast_record.alignments: print(f"Hit: {alignment.title}") for hsp in alignment.hsps: print(f" E-value: {hsp.expect}") print(f" Score: {hsp.score}") print(f" Identity: {hsp.identities}/{hsp.align_length}")
Alignment/HSP Attributes
# Alignment (hit) attributes alignment.title # Hit description alignment.accession # Accession number alignment.length # Subject sequence length alignment.hsps # List of HSPs # HSP (High-scoring Segment Pair) attributes hsp.score # Raw score hsp.bits # Bit score hsp.expect # E-value hsp.identities # Number of identical positions hsp.positives # Number of positive-scoring positions hsp.gaps # Number of gaps hsp.align_length # Alignment length hsp.query # Aligned query sequence hsp.match # Match line (| for identity) hsp.sbjct # Aligned subject sequence hsp.query_start # Query start position hsp.query_end # Query end position hsp.sbjct_start # Subject start position hsp.sbjct_end # Subject end position hsp.strand # Strand (blastn) hsp.frame # Reading frame (blastx/tblastn)
Code Patterns
Basic BLASTN
from Bio.Blast import NCBIWWW, NCBIXML sequence = '''ATGAAAGCAATTTTCGTACTGAAAGGTTGGTGGCGCACTTCCTGA''' print("Running BLASTN (this may take a minute)...") result_handle = NCBIWWW.qblast('blastn', 'nt', sequence) blast_record = NCBIXML.read(result_handle) result_handle.close() print(f"\nFound {len(blast_record.alignments)} hits") for alignment in blast_record.alignments[:5]: hsp = alignment.hsps[0] print(f"\n{alignment.title[:70]}...") print(f" E-value: {hsp.expect:.2e}") print(f" Identity: {hsp.identities}/{hsp.align_length} ({100*hsp.identities/hsp.align_length:.1f}%)")
BLASTP with Organism Filter
from Bio.Blast import NCBIWWW, NCBIXML protein_seq = '''MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH''' result_handle = NCBIWWW.qblast( 'blastp', 'nr', protein_seq, entrez_query='Mammalia[organism]', hitlist_size=20, expect=0.001 ) blast_record = NCBIXML.read(result_handle) result_handle.close() for alignment in blast_record.alignments[:10]: hsp = alignment.hsps[0] print(f"{alignment.accession}: E={hsp.expect:.2e} - {alignment.title[:50]}...")
BLAST from FASTA File
from Bio import SeqIO from Bio.Blast import NCBIWWW, NCBIXML record = SeqIO.read('query.fasta', 'fasta') result_handle = NCBIWWW.qblast('blastn', 'nt', record.seq) blast_record = NCBIXML.read(result_handle) result_handle.close() for alignment in blast_record.alignments[:5]: print(f"{alignment.accession}: {alignment.title[:60]}...")
Save Results to File
from Bio.Blast import NCBIWWW result_handle = NCBIWWW.qblast('blastn', 'nt', sequence) # Save XML for later parsing with open('blast_results.xml', 'w') as out: out.write(result_handle.read()) result_handle.close() # Parse saved file from Bio.Blast import NCBIXML with open('blast_results.xml') as f: blast_record = NCBIXML.read(f)
Extract Top Hits
def get_top_hits(sequence, program='blastn', database='nt', num_hits=10, evalue=0.01): result_handle = NCBIWWW.qblast( program, database, sequence, hitlist_size=num_hits, expect=evalue ) blast_record = NCBIXML.read(result_handle) result_handle.close() hits = [] for alignment in blast_record.alignments: hsp = alignment.hsps[0] hits.append({ 'accession': alignment.accession, 'title': alignment.title, 'evalue': hsp.expect, 'identity': hsp.identities / hsp.align_length, 'coverage': hsp.align_length / blast_record.query_length }) return hits hits = get_top_hits(my_sequence, num_hits=5) for hit in hits: print(f"{hit['accession']}: {hit['identity']:.1%} identity, E={hit['evalue']:.2e}")
BLASTX (DNA to Protein)
from Bio.Blast import NCBIWWW, NCBIXML dna_sequence = '''ATGAAAGCAATTTTCGTACTGAAAGGTTGGTGGCGCACTTCCTGA''' result_handle = NCBIWWW.qblast('blastx', 'nr', dna_sequence) blast_record = NCBIXML.read(result_handle) result_handle.close() for alignment in blast_record.alignments[:5]: hsp = alignment.hsps[0] print(f"{alignment.accession}: frame {hsp.frame}, E={hsp.expect:.2e}") print(f" {alignment.title[:60]}...")
Multiple Sequences
from Bio import SeqIO from Bio.Blast import NCBIWWW, NCBIXML import time def blast_multiple(fasta_file, program='blastn', database='nt'): results = {} for record in SeqIO.parse(fasta_file, 'fasta'): print(f"BLASTing {record.id}...") result_handle = NCBIWWW.qblast(program, database, str(record.seq), hitlist_size=5) blast_record = NCBIXML.read(result_handle) result_handle.close() results[record.id] = blast_record time.sleep(5) # Be nice to NCBI servers return results
Filter by Identity/Coverage
def filter_blast_hits(blast_record, min_identity=0.9, min_coverage=0.8): query_length = blast_record.query_length filtered = [] for alignment in blast_record.alignments: for hsp in alignment.hsps: identity = hsp.identities / hsp.align_length coverage = hsp.align_length / query_length if identity >= min_identity and coverage >= min_coverage: filtered.append({ 'accession': alignment.accession, 'title': alignment.title, 'identity': identity, 'coverage': coverage, 'evalue': hsp.expect }) return filtered
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Timeout | NCBI servers busy | Retry later, use smaller query |
| No hits | Wrong database or program | Check program/database match |
| Empty results | E-value too stringent | Increase expect parameter |
| URLError | Network issue | Check connection, retry |
Timing Notes
- Remote BLAST can take 30 seconds to several minutes
- Longer queries take longer
- Peak times (US business hours) may be slower
- For many queries, consider local BLAST
Decision Tree
Need to BLAST a sequence? ├── DNA query? │ ├── Against DNA database? → blastn │ └── Against protein database? → blastx ├── Protein query? │ ├── Against protein database? → blastp │ └── Against DNA database? → tblastn ├── Want to limit organisms? │ └── Use entrez_query parameter ├── Need many searches? │ └── Consider local-blast skill └── Need results fast? └── Use local-blast skill
Related Skills
- local-blast - Run BLAST locally for faster, unlimited searches
- entrez-fetch - Fetch full records for BLAST hits
- sequence-io - Read query sequences from files