Claude-skill-registry bio-format-conversion
Convert between sequence file formats (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when changing file formats or preparing data for different tools.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/format-conversion" ~/.claude/skills/majiayu000-claude-skill-registry-bio-format-conversion && rm -rf "$T"
manifest:
skills/data/format-conversion/SKILL.mdsource content
Format Conversion
Convert sequence files between formats using Biopython's Bio.SeqIO module.
Required Import
from Bio import SeqIO
Core Function
SeqIO.convert() - Direct Conversion
Convert between formats in a single call. Most efficient method.
count = SeqIO.convert('input.gb', 'genbank', 'output.fasta', 'fasta') print(f'Converted {count} records')
Parameters:
- Input filename or handlein_file
- Input format stringin_format
- Output filename or handleout_file
- Output format stringout_format
Returns: Number of records converted
Common Conversions
| From | To | Notes |
|---|---|---|
| GenBank | FASTA | Loses annotations, keeps sequence |
| FASTA | GenBank | Need to add molecule_type |
| FASTQ | FASTA | Loses quality scores |
| FASTA | FASTQ | Need to add quality scores |
| GenBank | EMBL | Usually works directly |
| Stockholm | FASTA | Alignment to sequences |
Code Patterns
Simple Conversion
SeqIO.convert('input.gb', 'genbank', 'output.fasta', 'fasta')
GenBank to FASTA
SeqIO.convert('sequence.gb', 'genbank', 'sequence.fasta', 'fasta')
FASTQ to FASTA (drop quality)
SeqIO.convert('reads.fastq', 'fastq', 'reads.fasta', 'fasta')
FASTA to GenBank (requires molecule_type)
records = SeqIO.parse('input.fasta', 'fasta') def add_molecule_type(records): for record in records: record.annotations['molecule_type'] = 'DNA' yield record SeqIO.write(add_molecule_type(records), 'output.gb', 'genbank')
FASTA to FASTQ (add dummy quality)
def add_quality(records, quality=30): for record in records: record.letter_annotations['phred_quality'] = [quality] * len(record.seq) yield record records = SeqIO.parse('input.fasta', 'fasta') SeqIO.write(add_quality(records), 'output.fastq', 'fastq')
Batch Convert Multiple Files
from pathlib import Path for gb_file in Path('.').glob('*.gb'): fasta_file = gb_file.with_suffix('.fasta') count = SeqIO.convert(str(gb_file), 'genbank', str(fasta_file), 'fasta') print(f'{gb_file.name}: {count} records')
Convert with Modifications
from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord def uppercase_record(rec): return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description) records = SeqIO.parse('input.fasta', 'fasta') modified = (uppercase_record(rec) for rec in records) SeqIO.write(modified, 'output.fasta', 'fasta')
Alignment Format Conversion
from Bio import AlignIO AlignIO.convert('alignment.sto', 'stockholm', 'alignment.phy', 'phylip')
Format Compatibility Matrix
Can convert directly (no modifications needed):
- GenBank <-> EMBL
- FASTA -> any format (may need annotations added)
- Any format -> FASTA (always works, may lose data)
- FASTQ -> FASTA
Requires adding data:
- FASTA -> FASTQ (need quality scores)
- FASTA -> GenBank (need molecule_type)
May lose data:
- GenBank -> FASTA (loses features, annotations)
- FASTQ -> FASTA (loses quality scores)
- Any rich format -> FASTA
Common Errors
| Error | Cause | Solution |
|---|---|---|
| FASTA to GenBank | Add molecule_type annotation |
| FASTA to FASTQ | Add phred_quality to letter_annotations |
| Wrong FASTQ variant | Try 'fastq-sanger', 'fastq-illumina' |
Decision Tree
Converting formats? ├── Simple conversion (no data changes)? │ └── Use SeqIO.convert() directly ├── Need to add annotations? │ └── Parse, modify records, then write ├── Need to transform sequences? │ └── Parse, apply transformation, then write └── Multiple files? └── Loop with SeqIO.convert() or batch generator
Related Skills
- read-sequences - Parse sequences for custom conversion logic
- write-sequences - Write converted sequences with modifications
- batch-processing - Convert multiple files at once
- compressed-files - Handle compressed input/output during conversion
- alignment-files - For SAM/BAM/CRAM conversion, use samtools view