LLMs-Universal-Life-Science-and-Clinical-Skills- write-sequences

<!--

install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Sequence_Analysis/sequence-io/write-sequences" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-write-sequences && rm -rf "$T"
manifest: Skills/Sequence_Analysis/sequence-io/write-sequences/SKILL.md
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-write-sequences description: Write biological sequences to files (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when saving sequences, creating new sequence files, or outputting modified records. tool_type: python primary_tool: Bio.SeqIO measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Write Sequences

Write SeqRecord objects to sequence files using Biopython's Bio.SeqIO module.

Required Import

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

Core Functions

SeqIO.write() - Write Records to File

Write one or more SeqRecord objects to a file.

SeqIO.write(records, 'output.fasta', 'fasta')

Parameters:

  • records
    - Single SeqRecord, list, or iterator of SeqRecords
  • handle
    - Filename (string) or file handle
  • format
    - Output format string

Returns: Number of records written (integer)

record.format() - Get Formatted String

Get a string representation without writing to file.

formatted = record.format('fasta')
print(formatted)

Creating SeqRecord Objects

Minimal SeqRecord

record = SeqRecord(Seq('ATGCGATCGATCG'), id='seq1')

Full SeqRecord

record = SeqRecord(
    Seq('ATGCGATCGATCG'),
    id='seq1',
    name='sequence_one',
    description='Example sequence for demonstration'
)

With Annotations (for GenBank output)

from Bio.SeqFeature import SeqFeature, FeatureLocation

record = SeqRecord(
    Seq('ATGCGATCGATCG'),
    id='seq1',
    annotations={'molecule_type': 'DNA'}
)
record.features.append(
    SeqFeature(FeatureLocation(0, 9), type='gene', qualifiers={'gene': ['exampleGene']})
)

Common Formats

FormatStringNotes
FASTA
'fasta'
Most universal, sequence + header only
FASTQ
'fastq'
Requires quality scores in letter_annotations
GenBank
'genbank'
Requires annotations and molecule_type
EMBL
'embl'
Similar requirements to GenBank
Tab
'tab'
Simple ID + sequence tabular format

Code Patterns

Write Single Record

record = SeqRecord(Seq('ATGC'), id='my_seq', description='test sequence')
SeqIO.write(record, 'output.fasta', 'fasta')

Write Multiple Records

records = [
    SeqRecord(Seq('ATGC'), id='seq1'),
    SeqRecord(Seq('GCTA'), id='seq2'),
    SeqRecord(Seq('TTAA'), id='seq3')
]
count = SeqIO.write(records, 'output.fasta', 'fasta')
print(f'Wrote {count} records')

Write to File Handle

with open('output.fasta', 'w') as handle:
    SeqIO.write(records, handle, 'fasta')

Write Modified Records

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

def uppercase_record(rec):
    return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description)

records = SeqIO.parse('input.fasta', 'fasta')
modified = (uppercase_record(rec) for rec in records)
SeqIO.write(modified, 'output.fasta', 'fasta')

Append to Existing File

with open('output.fasta', 'a') as handle:
    SeqIO.write(new_records, handle, 'fasta')

Write FASTQ with Quality Scores

record = SeqRecord(Seq('ATGCGATCG'), id='read1')
record.letter_annotations['phred_quality'] = [30, 30, 28, 25, 30, 30, 28, 25, 30]
SeqIO.write(record, 'output.fastq', 'fastq')

Write GenBank Format

record = SeqRecord(Seq('ATGCGATCGATCG'), id='SEQ001', name='example')
record.annotations['molecule_type'] = 'DNA'
record.annotations['topology'] = 'linear'
record.annotations['organism'] = 'Example organism'
SeqIO.write(record, 'output.gb', 'genbank')

Common Errors

ErrorCauseSolution
TypeError: SeqRecord expected
Passed raw string/SeqWrap in SeqRecord object
ValueError: missing molecule_type
GenBank without annotationsAdd
record.annotations['molecule_type'] = 'DNA'
ValueError: missing quality scores
FASTQ without phred_qualityAdd quality scores to letter_annotations
ValueError: Sequences must all be the same length
PHYLIP with unequal lengthsPad or trim sequences first

Format-Specific Requirements

FASTQ

Must have quality scores:

record.letter_annotations['phred_quality'] = [30] * len(record.seq)

GenBank/EMBL

Must have molecule_type:

record.annotations['molecule_type'] = 'DNA'  # or 'RNA', 'protein'

PHYLIP

All sequences must be same length. IDs truncated to 10 characters.

Related Skills

  • read-sequences - Read sequences before modifying and writing
  • format-conversion - Direct format conversion without intermediate processing
  • filter-sequences - Filter sequences before writing subset
  • sequence-manipulation/seq-objects - Create SeqRecord objects to write
  • alignment-files - For SAM/BAM output, use samtools/pysam
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->