OpenClaw-Medical-Skills bio-format-conversion
Convert between sequence file formats (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when changing file formats or preparing data for different tools.
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-format-conversion" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-format-conversion && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-format-conversion" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-format-conversion && rm -rf "$T"
skills/bio-format-conversion/SKILL.mdVersion Compatibility
Reference examples tested with: BioPython 1.83+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
thenpip show <package>
to check signatureshelp(module.function)
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Format Conversion
"Convert this file to a different format" → Read records in one format, optionally add missing annotations, and write in the target format.
- Python:
for direct conversion, orSeqIO.convert()
+SeqIO.parse()
when modifications are needed (BioPython)SeqIO.write() - CLI:
(SeqKit) for FASTA/FASTQ;seqkit seq
for SAM/BAM/CRAMsamtools view
Convert sequence files between formats using Biopython's Bio.SeqIO module.
Required Import
from Bio import SeqIO
Core Function
SeqIO.convert() - Direct Conversion
Convert between formats in a single call. Most efficient method.
count = SeqIO.convert('input.gb', 'genbank', 'output.fasta', 'fasta') print(f'Converted {count} records')
Parameters:
- Input filename or handlein_file
- Input format stringin_format
- Output filename or handleout_file
- Output format stringout_format
Returns: Number of records converted
Common Conversions
| From | To | Notes |
|---|---|---|
| GenBank | FASTA | Loses annotations, keeps sequence |
| FASTA | GenBank | Need to add molecule_type |
| FASTQ | FASTA | Loses quality scores |
| FASTA | FASTQ | Need to add quality scores |
| GenBank | EMBL | Usually works directly |
| Stockholm | FASTA | Alignment to sequences |
Code Patterns
Simple Conversion
SeqIO.convert('input.gb', 'genbank', 'output.fasta', 'fasta')
GenBank to FASTA
SeqIO.convert('sequence.gb', 'genbank', 'sequence.fasta', 'fasta')
FASTQ to FASTA (drop quality)
SeqIO.convert('reads.fastq', 'fastq', 'reads.fasta', 'fasta')
FASTA to GenBank (requires molecule_type)
Goal: Convert FASTA to GenBank format, which requires molecule_type annotation.
Approach: Stream records through a generator that injects the missing annotation, then write.
Reference (BioPython 1.83+):
records = SeqIO.parse('input.fasta', 'fasta') def add_molecule_type(records): for record in records: record.annotations['molecule_type'] = 'DNA' yield record SeqIO.write(add_molecule_type(records), 'output.gb', 'genbank')
FASTA to FASTQ (add dummy quality)
Goal: Convert FASTA to FASTQ by assigning uniform placeholder quality scores.
Approach: Stream records through a generator that adds phred_quality to each, then write as FASTQ.
Reference (BioPython 1.83+):
def add_quality(records, quality=30): for record in records: record.letter_annotations['phred_quality'] = [quality] * len(record.seq) yield record records = SeqIO.parse('input.fasta', 'fasta') SeqIO.write(add_quality(records), 'output.fastq', 'fastq')
Batch Convert Multiple Files
Goal: Convert all files of one format in a directory to another format.
Approach: Glob for input files, apply
SeqIO.convert() to each, and report per-file counts.
Reference (BioPython 1.83+):
from pathlib import Path for gb_file in Path('.').glob('*.gb'): fasta_file = gb_file.with_suffix('.fasta') count = SeqIO.convert(str(gb_file), 'genbank', str(fasta_file), 'fasta') print(f'{gb_file.name}: {count} records')
Convert with Modifications
from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord def uppercase_record(rec): return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description) records = SeqIO.parse('input.fasta', 'fasta') modified = (uppercase_record(rec) for rec in records) SeqIO.write(modified, 'output.fasta', 'fasta')
Alignment Format Conversion
from Bio import AlignIO AlignIO.convert('alignment.sto', 'stockholm', 'alignment.phy', 'phylip')
Format Compatibility Matrix
Can convert directly (no modifications needed):
- GenBank <-> EMBL
- FASTA -> any format (may need annotations added)
- Any format -> FASTA (always works, may lose data)
- FASTQ -> FASTA
Requires adding data:
- FASTA -> FASTQ (need quality scores)
- FASTA -> GenBank (need molecule_type)
May lose data:
- GenBank -> FASTA (loses features, annotations)
- FASTQ -> FASTA (loses quality scores)
- Any rich format -> FASTA
Common Errors
| Error | Cause | Solution |
|---|---|---|
| FASTA to GenBank | Add molecule_type annotation |
| FASTA to FASTQ | Add phred_quality to letter_annotations |
| Wrong FASTQ variant | Try 'fastq-sanger', 'fastq-illumina' |
Decision Tree
Converting formats? ├── Simple conversion (no data changes)? │ └── Use SeqIO.convert() directly ├── Need to add annotations? │ └── Parse, modify records, then write ├── Need to transform sequences? │ └── Parse, apply transformation, then write └── Multiple files? └── Loop with SeqIO.convert() or batch generator
Related Skills
- read-sequences - Parse sequences for custom conversion logic
- write-sequences - Write converted sequences with modifications
- batch-processing - Convert multiple files at once
- compressed-files - Handle compressed input/output during conversion
- alignment-files - For SAM/BAM/CRAM conversion, use samtools view