OpenClaw-Medical-Skills bio-write-sequences
Write biological sequences to files (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when saving sequences, creating new sequence files, or outputting modified records.
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-write-sequences" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-write-sequences && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-write-sequences" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-write-sequences && rm -rf "$T"
skills/bio-write-sequences/SKILL.mdVersion Compatibility
Reference examples tested with: BioPython 1.83+, pysam 0.22+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
thenpip show <package>
to check signatureshelp(module.function)
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Write Sequences
"Write sequences to a file" → Serialize SeqRecord objects into a formatted sequence file.
- Python:
(BioPython)SeqIO.write() - R:
(Biostrings)writeXStringSet()
Write SeqRecord objects to sequence files using Biopython's Bio.SeqIO module.
Required Import
from Bio import SeqIO from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord
Core Functions
SeqIO.write() - Write Records to File
Write one or more SeqRecord objects to a file.
SeqIO.write(records, 'output.fasta', 'fasta')
Parameters:
- Single SeqRecord, list, or iterator of SeqRecordsrecords
- Filename (string) or file handlehandle
- Output format stringformat
Returns: Number of records written (integer)
record.format() - Get Formatted String
Get a string representation without writing to file.
formatted = record.format('fasta') print(formatted)
Creating SeqRecord Objects
Goal: Construct in-memory sequence records suitable for writing to any format.
Approach: Create
SeqRecord with at minimum a Seq and id. Add letter_annotations for FASTQ, annotations['molecule_type'] for GenBank/EMBL.
"Create a sequence record from scratch" → Wrap a
Seq string in a SeqRecord with metadata fields.
- Python:
(BioPython)SeqRecord(Seq(...), id=...)
Minimal SeqRecord
record = SeqRecord(Seq('ATGCGATCGATCG'), id='seq1')
Full SeqRecord
record = SeqRecord( Seq('ATGCGATCGATCG'), id='seq1', name='sequence_one', description='Example sequence for demonstration' )
With Annotations (for GenBank output)
from Bio.SeqFeature import SeqFeature, FeatureLocation record = SeqRecord( Seq('ATGCGATCGATCG'), id='seq1', annotations={'molecule_type': 'DNA'} ) record.features.append( SeqFeature(FeatureLocation(0, 9), type='gene', qualifiers={'gene': ['exampleGene']}) )
Common Formats
| Format | String | Notes |
|---|---|---|
| FASTA | | Most universal, sequence + header only |
| FASTQ | | Requires quality scores in letter_annotations |
| GenBank | | Requires annotations and molecule_type |
| EMBL | | Similar requirements to GenBank |
| Tab | | Simple ID + sequence tabular format |
Code Patterns
Write Single Record
record = SeqRecord(Seq('ATGC'), id='my_seq', description='test sequence') SeqIO.write(record, 'output.fasta', 'fasta')
Write Multiple Records
records = [ SeqRecord(Seq('ATGC'), id='seq1'), SeqRecord(Seq('GCTA'), id='seq2'), SeqRecord(Seq('TTAA'), id='seq3') ] count = SeqIO.write(records, 'output.fasta', 'fasta') print(f'Wrote {count} records')
Write to File Handle
with open('output.fasta', 'w') as handle: SeqIO.write(records, handle, 'fasta')
Write Modified Records
Goal: Transform sequences in-memory and write the modified versions to a new file.
Approach: Parse input, apply transformation via generator, write output. Using a generator avoids loading all records into memory.
"Modify sequences and save" → Parse records, transform each, write to new file with
SeqIO.write().
from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord def uppercase_record(rec): return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description) records = SeqIO.parse('input.fasta', 'fasta') modified = (uppercase_record(rec) for rec in records) SeqIO.write(modified, 'output.fasta', 'fasta')
Append to Existing File
with open('output.fasta', 'a') as handle: SeqIO.write(new_records, handle, 'fasta')
Write FASTQ with Quality Scores
record = SeqRecord(Seq('ATGCGATCG'), id='read1') record.letter_annotations['phred_quality'] = [30, 30, 28, 25, 30, 30, 28, 25, 30] SeqIO.write(record, 'output.fastq', 'fastq')
Write GenBank Format
record = SeqRecord(Seq('ATGCGATCGATCG'), id='SEQ001', name='example') record.annotations['molecule_type'] = 'DNA' record.annotations['topology'] = 'linear' record.annotations['organism'] = 'Example organism' SeqIO.write(record, 'output.gb', 'genbank')
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Passed raw string/Seq | Wrap in SeqRecord object |
| GenBank without annotations | Add |
| FASTQ without phred_quality | Add quality scores to letter_annotations |
| PHYLIP with unequal lengths | Pad or trim sequences first |
Format-Specific Requirements
FASTQ
Must have quality scores:
record.letter_annotations['phred_quality'] = [30] * len(record.seq)
GenBank/EMBL
Must have molecule_type:
record.annotations['molecule_type'] = 'DNA' # or 'RNA', 'protein'
PHYLIP
All sequences must be same length. IDs truncated to 10 characters.
Related Skills
- read-sequences - Read sequences before modifying and writing
- format-conversion - Direct format conversion without intermediate processing
- filter-sequences - Filter sequences before writing subset
- sequence-manipulation/seq-objects - Create SeqRecord objects to write
- alignment-files - For SAM/BAM output, use samtools/pysam