BioSkills bio-seq-objects
Create and manipulate Seq, MutableSeq, and SeqRecord objects using Biopython. Use when creating sequences from strings, modifying sequence data in-place, or building annotated sequence records.
git clone https://github.com/GPTomics/bioSkills
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/sequence-manipulation/seq-objects" ~/.claude/skills/gptomics-bioskills-bio-seq-objects && rm -rf "$T"
sequence-manipulation/seq-objects/SKILL.mdVersion Compatibility
Reference examples tested with: BioPython 1.83+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
thenpip show <package>
to check signatureshelp(module.function)
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Seq Objects
Create and manipulate biological sequence objects using Biopython.
"Create a sequence object" → Wrap a raw string in a typed sequence container for biological operations.
- Immutable:
(BioPython) — string-like, supports complement/translateSeq('ATGC') - Mutable:
(BioPython) — supports in-place editsMutableSeq('ATGC') - Annotated:
(BioPython) — adds metadata for file I/OSeqRecord(Seq(...), id=...)
Required Imports
from Bio.Seq import Seq, MutableSeq from Bio.SeqRecord import SeqRecord
Core Objects
Seq - Immutable Sequence
The basic sequence object. Immutable like Python strings.
seq = Seq('ATGCGATCGATCG')
Seq objects support string-like operations:
len(seq) # Length seq[0] # First base seq[-1] # Last base seq[0:10] # Slice (returns Seq) str(seq) # Convert to string 'ATG' in seq # Membership test seq.count('G') # Count occurrences seq.find('ATG') # Find position (-1 if not found) seq.upper() # Uppercase seq.lower() # Lowercase seq * 3 # Repeat sequence seq.strip() # Remove leading/trailing whitespace
MutableSeq - Mutable Sequence
For in-place modifications when performance matters.
mut_seq = MutableSeq('ATGCGATCG') mut_seq[0] = 'C' # Modify single position mut_seq[0:3] = 'GGG' # Replace slice mut_seq.append('A') # Add to end mut_seq.insert(0, 'G') # Insert at position mut_seq.pop() # Remove and return last mut_seq.remove('G') # Remove first occurrence mut_seq.reverse() # Reverse in place
Convert between types:
seq = Seq(mut_seq) # MutableSeq to Seq mut_seq = MutableSeq(seq) # Seq to MutableSeq
SeqRecord - Annotated Sequence
Sequence with metadata for file I/O and analysis.
record = SeqRecord( Seq('ATGCGATCG'), id='gene1', name='example_gene', description='An example gene sequence' )
SeqRecord attributes:
record.seq # The Seq object record.id # Identifier string record.name # Name string record.description # Description string record.features # List of SeqFeature objects record.annotations # Dict of annotations record.letter_annotations # Per-letter annotations (quality scores) record.dbxrefs # Database cross-references
SeqRecord Methods
Goal: Transform entire records while preserving metadata.
Approach: Use SeqRecord methods that return new records with features remapped to new coordinates.
# Reverse complement (preserves ID, updates features) rc_record = record.reverse_complement(id='gene1_rc', description='reverse complement') # Translate to protein (creates new SeqRecord with protein) protein_record = record.translate(id='gene1_protein') # Quick format output (returns string in file format) fasta_str = record.format('fasta') genbank_str = record.format('genbank')
Slicing preserves features (adjusted to new coordinates):
# Slice SeqRecord - features are clipped/adjusted automatically subset = record[10:50] # Features outside range are dropped
Code Patterns
Create Seq from String
dna = Seq('ATGCGATCGATCG') rna = Seq('AUGCGAUCGAUCG') protein = Seq('MRCRS')
Create SeqRecord for File Output
record = SeqRecord(Seq('ATGCGATCG'), id='seq1', description='My sequence')
Create SeqRecord with Annotations
record = SeqRecord(Seq('ATGCGATCG'), id='gene1', description='Example') record.annotations['organism'] = 'Homo sapiens' record.annotations['molecule_type'] = 'DNA'
Build SeqRecord from Parsed Data
from Bio.SeqFeature import SeqFeature, FeatureLocation record = SeqRecord(Seq('ATGCGATCGATCG'), id='gene1') feature = SeqFeature(FeatureLocation(0, 9), type='CDS', qualifiers={'product': ['Example protein']}) record.features.append(feature)
Batch Create SeqRecords
sequences = ['ATGC', 'GCTA', 'TTAA'] records = [SeqRecord(Seq(s), id=f'seq_{i}') for i, s in enumerate(sequences)]
Copy a SeqRecord
from copy import deepcopy new_record = deepcopy(record) new_record.id = 'modified_copy'
Modify SeqRecord Sequence
record = SeqRecord(Seq('ATGCGATCG'), id='seq1') record.seq = Seq('GGGGGATCG') # Replace entire sequence
Join Sequences into One SeqRecord
combined_seq = seq1 + Seq('NNNN') + seq2 # With linker combined_record = SeqRecord(combined_seq, id='combined')
Transform SeqRecord with reverse_complement
# Reverse complement a gene sequence record = SeqRecord(Seq('ATGCGATCGATCG'), id='gene1', description='Forward strand') rc_record = record.reverse_complement(id=f'{record.id}_rc', description='Reverse complement') # Features are remapped to new coordinates
Translate SeqRecord to Protein
# Translate coding sequence cds_record = SeqRecord(Seq('ATGCGATCGATCGTAA'), id='cds1', description='Coding sequence') protein_record = cds_record.translate(id=f'{cds_record.id}_protein', to_stop=True)
Quick Output with format()
record = SeqRecord(Seq('ATGCGATCG'), id='seq1', description='Example sequence') print(record.format('fasta')) # >seq1 Example sequence # ATGCGATCG
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Trying to modify immutable Seq | Use MutableSeq instead |
| Passed string instead of Seq | Wrap string in Seq() |
| Missing annotations in output | Didn't set required annotations | Add to annotations for GenBank output |
Decision Tree
Need to work with sequence data? ├── Just doing string-like operations? │ └── Use Seq ├── Need to modify sequence in-place? │ └── Use MutableSeq ├── Need metadata (ID, description, features)? │ └── Use SeqRecord └── Need to write to file? └── Use SeqRecord with appropriate annotations
Related Skills
- sequence-io/read-sequences - Parse files to get SeqRecord objects
- sequence-io/write-sequences - Write SeqRecord objects to files
- transcription-translation - Transform Seq objects (DNA to protein)
- reverse-complement - Get reverse complement of Seq
- sequence-slicing - Slice and extract from Seq/SeqRecord
- database-access/entrez-fetch - Fetch sequences from NCBI as SeqRecords