Medical-research-skills biopython-advanced
Advanced Biopython modules for motifs, population genetics, sequence utilities, restriction analysis, clustering, and GenomeDiagram visualization; use when you need extended bioinformatics analysis beyond basic sequence I/O and alignment.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Data Analysis/biopython-advanced" ~/.claude/skills/aipoch-medical-research-skills-biopython-advanced && rm -rf "$T"
manifest:
scientific-skills/Data Analysis/biopython-advanced/SKILL.mdsource content
biopython-advanced
When to Use
- You need motif discovery/statistics (e.g., PWM/consensus, motif counts across multiple sequences).
- You want restriction enzyme site analysis (e.g., find cut sites for specific enzymes in a DNA sequence).
- You need codon usage / sequence utility calculations (e.g., codon frequency from CDS, GC content, basic sequence stats).
- You are working with population genetics (PopGen) utilities for advanced analyses.
- You need advanced visualization such as GenomeDiagram-style plots for genomic features.
Key Features
- Motif analysis using Biopython’s
(counts, consensus, simple statistics).Bio.motifs - Restriction analysis using
(enzyme lookup, cut site detection).Bio.Restriction - Sequence utilities via
(codon usage and related helpers).Bio.SeqUtils - Access to additional advanced tools such as CodonTable, SeqFeature, and IUPACData when needed.
- Standardized workflow conventions:
- Write configuration to
as an intermediate artifact.config/task_config.json - Run tasks uniformly via
.python scripts/<task_name>.py - Avoid stacking many CLI flags; keep parameters in config files.
- Always use
for file I/O; JSON output usesencoding="utf-8"
.ensure_ascii=False
- Write configuration to
Dependencies
Required:
- biopython (>=1.80)
- numpy (>=1.21)
Optional (for reporting/plotting):
- reportlab (>=3.6)
- matplotlib (>=3.5)
Example Usage
The following examples are complete runnable scripts that follow the conventions:
- configuration stored in
config/task_config.json - invoked as
python scripts/<task_name>.py - explicit UTF-8 encoding and
for JSON outputensure_ascii=False
1) Motif Statistics
config/task_config.json
{ "task": "motif_stats", "sequences": ["ATGCATGCATGC", "ATGCGTGCATGC", "ATGCATGTATGC"] }
scripts/motif_stats.py
import json from Bio import motifs from Bio.Seq import Seq def main(): with open("config/task_config.json", "r", encoding="utf-8") as f: cfg = json.load(f) seqs = [Seq(s) for s in cfg["sequences"]] m = motifs.create(seqs) result = { "alphabet": str(m.alphabet), "length": m.length, "counts": {k: dict(v) for k, v in m.counts.items()}, "consensus": str(m.consensus), "degenerate_consensus": str(m.degenerate_consensus), } with open("outputs/motif_stats.json", "w", encoding="utf-8") as f: json.dump(result, f, ensure_ascii=False, indent=2) if __name__ == "__main__": main()
Run:
python scripts/motif_stats.py
2) Restriction Enzyme Cleavage Sites
config/task_config.json
{ "task": "restriction_sites", "sequence": "GAATTCGCGGAATTC", "enzymes": ["EcoRI", "BamHI"] }
scripts/restriction_sites.py
import json from Bio.Seq import Seq from Bio.Restriction import RestrictionBatch def main(): with open("config/task_config.json", "r", encoding="utf-8") as f: cfg = json.load(f) seq = Seq(cfg["sequence"]) batch = RestrictionBatch(cfg["enzymes"]) analysis = batch.search(seq) # Convert enzyme keys to strings for JSON serialization result = {str(enzyme): positions for enzyme, positions in analysis.items()} with open("outputs/restriction_sites.json", "w", encoding="utf-8") as f: json.dump(result, f, ensure_ascii=False, indent=2) if __name__ == "__main__": main()
Run:
python scripts/restriction_sites.py
3) Codon Usage Frequency (CDS)
config/task_config.json
{ "task": "codon_usage", "cds": "ATGGCTGCTGCTGCTTAA" }
scripts/codon_usage.py
import json from collections import Counter def main(): with open("config/task_config.json", "r", encoding="utf-8") as f: cfg = json.load(f) cds = cfg["cds"].upper().replace(" ", "").replace("\n", "") codons = [cds[i:i+3] for i in range(0, len(cds) - (len(cds) % 3), 3)] counts = Counter(codons) total = sum(counts.values()) or 1 result = { "total_codons": total, "codon_counts": dict(sorted(counts.items())), "codon_frequencies": {k: v / total for k, v in sorted(counts.items())}, "note": "This example computes raw codon frequencies from the provided CDS. Validate CDS frame and stop codons for your use case." } with open("outputs/codon_usage.json", "w", encoding="utf-8") as f: json.dump(result, f, ensure_ascii=False, indent=2) if __name__ == "__main__": main()
Run:
python scripts/codon_usage.py
Implementation Details
-
Configuration-first execution
- All task parameters are stored in
to keep CLI invocation stable and reproducible.config/task_config.json - Scripts read the config as the single source of truth and write results to
.outputs/*.json
- All task parameters are stored in
-
Motif statistics (
)Bio.motifs- A motif is created from aligned sequences of equal length.
- Outputs typically include:
: per-position nucleotide countscounts
andconsensus
: derived consensus sequencesdegenerate_consensus
- If sequences differ in length, you must align/trim/pad them before motif creation.
-
Restriction analysis (
)Bio.Restriction
returns cut positions per enzyme.RestrictionBatch(enzymes).search(seq)- Enzyme objects are converted to strings for JSON serialization.
-
Codon usage
- The example computes codon frequencies by splitting the CDS into triplets in-frame.
- Practical considerations:
- Ensure the CDS length is a multiple of 3 (or decide how to handle remainder bases).
- Confirm the correct reading frame and whether to include terminal stop codons.
- For organism-specific codon usage tables, integrate
as needed.Bio.Data.CodonTable
-
I/O requirements
- Always open files with
.encoding="utf-8" - Use
to preserve non-ASCII characters in outputs.json.dump(..., ensure_ascii=False)
- Always open files with
-
Further reference
- See
for additional notes and module coverage (motifs/PopGen/SeqUtils/Restriction/Cluster, GenomeDiagram, CodonTable/SeqFeature/IUPACData).references/advanced.md
- See