AutoResearchClaw biology-biopython

Bioinformatics with Biopython for sequence manipulation, file parsing, BLAST, and phylogenetics. Use when working with DNA/RNA/protein sequences or biological databases.

install

source · Clone the upstream repo

git clone https://github.com/aiming-lab/AutoResearchClaw

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/aiming-lab/AutoResearchClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/researchclaw/skills/builtin/domain/biology-biopython" ~/.claude/skills/aiming-lab-autoresearchclaw-biology-biopython-82978a && rm -rf "$T"

manifest: researchclaw/skills/builtin/domain/biology-biopython/SKILL.md

source content

Biopython Bioinformatics Best Practice

Sequence Manipulation

Create sequences:

from Bio.Seq import Seq; seq = Seq("ATGCGA")

Complement:

seq.complement()

; Reverse complement:

seq.reverse_complement()

Transcription:
```
seq.transcribe()
```
(DNA to RNA)
Translation:
```
seq.translate()
```
(DNA/RNA to protein)

GC content:

from Bio.SeqUtils import gc_fraction; gc_fraction(seq)

Molecular weight:

from Bio.SeqUtils import molecular_weight

File Parsing (SeqIO)

Read FASTA:

for rec in SeqIO.parse("file.fasta", "fasta"): ...

Read GenBank:

for rec in SeqIO.parse("file.gb", "genbank"): ...

Read single record:
```
rec = SeqIO.read("file.fasta", "fasta")
```

Write sequences:

SeqIO.write(records, "output.fasta", "fasta")

Convert formats:

SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta")

Index large files:

idx = SeqIO.index("large.fasta", "fasta")

for random access

BLAST Operations

Online BLAST:

from Bio.Blast import NCBIWWW; result = NCBIWWW.qblast("blastn", "nt", seq)

Parse results:

from Bio.Blast import NCBIXML; records = NCBIXML.parse(result)

Local BLAST: run via subprocess, parse XML output with NCBIXML
Always set
```
Entrez.email
```
before any NCBI access
Filter results by e-value (typically < 1e-5) and coverage

NCBI Database Access (Entrez)

Always set email:
```
Entrez.email = "your@email.com"
```

Search:

handle = Entrez.esearch(db="pubmed", term="query")

Fetch records:

handle = Entrez.efetch(db="nucleotide", id="ID", rettype="fasta")

Use API key for higher rate limits (10 req/s vs 3 req/s)
Respect NCBI rate limits; add delays between batch requests

Phylogenetics (Bio.Phylo)

Read trees:

from Bio import Phylo; tree = Phylo.read("tree.nwk", "newick")

Draw trees:
```
Phylo.draw(tree)
```
or
```
Phylo.draw_ascii(tree)
```
Supported formats: newick, nexus, phyloxml
Traverse clades:
```
for clade in tree.find_clades(): ...
```
Calculate distances:
```
tree.distance(clade1, clade2)
```

Structure Analysis (Bio.PDB)

Parse PDB:

parser = PDBParser(); structure = parser.get_structure("id", "file.pdb")

Hierarchy: Structure > Model > Chain > Residue > Atom
Get atoms: iterate through
```
structure.get_atoms()
```
Calculate distances: use atom coordinate vectors
For mmCIF files: use
```
MMCIFParser()
```
instead of
```
PDBParser()
```

Common Pitfalls

Always handle
```
SeqIO.parse
```
as an iterator — it exhausts after one pass
Check sequence alphabet compatibility before operations
Large files: use
```
SeqIO.index()
```
not
```
SeqIO.to_dict()
```
to avoid memory issues
Set proper timeout for remote BLAST queries (can take minutes)
Validate parsed data — missing annotations are common in public databases