AutoResearchClaw biology-biopython
Bioinformatics with Biopython for sequence manipulation, file parsing, BLAST, and phylogenetics. Use when working with DNA/RNA/protein sequences or biological databases.
install
source · Clone the upstream repo
git clone https://github.com/aiming-lab/AutoResearchClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiming-lab/AutoResearchClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/researchclaw/skills/builtin/domain/biology-biopython" ~/.claude/skills/aiming-lab-autoresearchclaw-biology-biopython-82978a && rm -rf "$T"
manifest:
researchclaw/skills/builtin/domain/biology-biopython/SKILL.mdsource content
Biopython Bioinformatics Best Practice
Sequence Manipulation
- Create sequences:
from Bio.Seq import Seq; seq = Seq("ATGCGA") - Complement:
; Reverse complement:seq.complement()seq.reverse_complement() - Transcription:
(DNA to RNA)seq.transcribe() - Translation:
(DNA/RNA to protein)seq.translate() - GC content:
from Bio.SeqUtils import gc_fraction; gc_fraction(seq) - Molecular weight:
from Bio.SeqUtils import molecular_weight
File Parsing (SeqIO)
- Read FASTA:
for rec in SeqIO.parse("file.fasta", "fasta"): ... - Read GenBank:
for rec in SeqIO.parse("file.gb", "genbank"): ... - Read single record:
rec = SeqIO.read("file.fasta", "fasta") - Write sequences:
SeqIO.write(records, "output.fasta", "fasta") - Convert formats:
SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta") - Index large files:
for random accessidx = SeqIO.index("large.fasta", "fasta")
BLAST Operations
- Online BLAST:
from Bio.Blast import NCBIWWW; result = NCBIWWW.qblast("blastn", "nt", seq) - Parse results:
from Bio.Blast import NCBIXML; records = NCBIXML.parse(result) - Local BLAST: run via subprocess, parse XML output with NCBIXML
- Always set
before any NCBI accessEntrez.email - Filter results by e-value (typically < 1e-5) and coverage
NCBI Database Access (Entrez)
- Always set email:
Entrez.email = "your@email.com" - Search:
handle = Entrez.esearch(db="pubmed", term="query") - Fetch records:
handle = Entrez.efetch(db="nucleotide", id="ID", rettype="fasta") - Use API key for higher rate limits (10 req/s vs 3 req/s)
- Respect NCBI rate limits; add delays between batch requests
Phylogenetics (Bio.Phylo)
- Read trees:
from Bio import Phylo; tree = Phylo.read("tree.nwk", "newick") - Draw trees:
orPhylo.draw(tree)Phylo.draw_ascii(tree) - Supported formats: newick, nexus, phyloxml
- Traverse clades:
for clade in tree.find_clades(): ... - Calculate distances:
tree.distance(clade1, clade2)
Structure Analysis (Bio.PDB)
- Parse PDB:
parser = PDBParser(); structure = parser.get_structure("id", "file.pdb") - Hierarchy: Structure > Model > Chain > Residue > Atom
- Get atoms: iterate through
structure.get_atoms() - Calculate distances: use atom coordinate vectors
- For mmCIF files: use
instead ofMMCIFParser()PDBParser()
Common Pitfalls
- Always handle
as an iterator — it exhausts after one passSeqIO.parse - Check sequence alphabet compatibility before operations
- Large files: use
notSeqIO.index()
to avoid memory issuesSeqIO.to_dict() - Set proper timeout for remote BLAST queries (can take minutes)
- Validate parsed data — missing annotations are common in public databases