BioSkills bio-transcription-translation

Transcribe DNA to RNA and translate to protein using Biopython. Use when converting between DNA, RNA, and protein sequences, finding ORFs, or using alternative codon tables.

install
source · Clone the upstream repo
git clone https://github.com/GPTomics/bioSkills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/sequence-manipulation/transcription-translation" ~/.claude/skills/gptomics-bioskills-bio-transcription-translation && rm -rf "$T"
manifest: sequence-manipulation/transcription-translation/SKILL.md
source content

Version Compatibility

Reference examples tested with: BioPython 1.83+

Before using code patterns, verify installed versions match. If versions differ:

  • Python:
    pip show <package>
    then
    help(module.function)
    to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Transcription and Translation

"Translate my DNA sequence to protein" → Transcribe DNA to RNA and translate to protein, handling alternative codon tables and six-frame translation.

  • Python:
    Seq.translate()
    ,
    Seq.transcribe()
    (BioPython)

Convert between DNA, RNA, and protein sequences using Biopython.

Required Import

from Bio.Seq import Seq

Core Methods

Transcription (DNA to RNA)

dna = Seq('ATGCGATCGATCG')
rna = dna.transcribe()  # Returns Seq('AUGCGAUCGAUCG')

Transcription replaces T with U. Works on coding strand (5' to 3').

Back Transcription (RNA to DNA)

rna = Seq('AUGCGAUCGAUCG')
dna = rna.back_transcribe()  # Returns Seq('ATGCGATCGATCG')

Translation (RNA/DNA to Protein)

# From coding DNA (includes ATG start)
coding_dna = Seq('ATGTTTGGT')
protein = coding_dna.translate()  # Returns Seq('MFG')

# From RNA
rna = Seq('AUGUUUGGU')
protein = rna.translate()  # Returns Seq('MFG')

Translation Options

Stop at First Stop Codon

seq = Seq('ATGTTTGGTTAAGGG')
protein = seq.translate(to_stop=True)  # Stops at TAA, excludes stop

Include Stop Codon Symbol

seq = Seq('ATGTTTGGTTAA')
protein = seq.translate()  # Returns Seq('MFG*')

Alternative Codon Tables

Biopython supports NCBI codon tables. Common tables:

IDNameUse Case
1StandardDefault, most organisms
2Vertebrate MitochondrialHuman/vertebrate mitochondria
4Mold MitochondrialFungi, protozoa mitochondria
5Invertebrate MitochondrialInsects, worms mitochondria
6Ciliate NuclearTetrahymena, Paramecium
11Bacterial/ArchaealProkaryotes, plastids
# Bacterial translation
seq = Seq('ATGTTTGGT')
protein = seq.translate(table=11)

# Mitochondrial translation
protein = seq.translate(table=2)

# By name
protein = seq.translate(table='Vertebrate Mitochondrial')

CDS Translation (Complete Coding Sequence)

For validated coding sequences with proper start/stop:

cds = Seq('ATGTTTGGTTAA')  # Must start with start codon, end with stop
protein = cds.translate(cds=True)  # Validates and removes stop

The

cds=True
option:

  • Validates start codon (ATG or alternative)
  • Validates stop codon at end
  • Removes stop codon from result
  • Raises error if invalid CDS

Code Patterns

Basic Transcription and Translation Pipeline

dna = Seq('ATGTTTGGTCATTAA')
rna = dna.transcribe()
protein = rna.translate()
print(f'DNA: {dna}')
print(f'RNA: {rna}')
print(f'Protein: {protein}')

Translate All Six Reading Frames

Goal: Translate a DNA sequence in all six frames (three forward, three reverse) to find all possible protein products.

Approach: For each strand, offset by 0, 1, and 2 bases, trim to a multiple of 3, and translate.

Reference (BioPython 1.83+):

def six_frame_translation(seq):
    frames = []
    for strand, s in [('+', seq), ('-', seq.reverse_complement())]:
        for frame in range(3):
            length = 3 * ((len(s) - frame) // 3)
            fragment = s[frame:frame + length]
            frames.append((strand, frame, fragment.translate()))
    return frames

seq = Seq('ATGCGATCGATCGATCGATCG')
for strand, frame, protein in six_frame_translation(seq):
    print(f'{strand}{frame}: {protein}')

Find All ORFs (Start to Stop)

Goal: Identify all open reading frames (M to stop codon) across both strands and all three frames.

Approach: Translate each of the six frames, then scan for Met-to-stop segments meeting the minimum length.

Reference (BioPython 1.83+):

def find_orfs(seq, min_length=30):
    orfs = []
    for strand, s in [('+', seq), ('-', seq.reverse_complement())]:
        for frame in range(3):
            trans = s[frame:].translate()
            aa_start = 0
            while True:
                start = trans.find('M', aa_start)
                if start == -1:
                    break
                stop = trans.find('*', start)
                if stop == -1:
                    stop = len(trans)
                orf = trans[start:stop]
                if len(orf) * 3 >= min_length:
                    orfs.append((strand, frame, start * 3 + frame, str(orf)))
                aa_start = start + 1
    return orfs

seq = Seq('ATGCGATCGATCGATCGATCGTAA')
for strand, frame, pos, orf in find_orfs(seq, min_length=3):
    print(f'{strand} frame {frame} pos {pos}: {orf}')

Translate with Quality Check

def translate_cds_safe(seq):
    try:
        return seq.translate(cds=True)
    except Exception as e:
        return seq.translate(to_stop=True)  # Fallback

Get Codon Table Info

from Bio.Data import CodonTable

table = CodonTable.unambiguous_dna_by_id[1]
print(f'Start codons: {table.start_codons}')
print(f'Stop codons: {table.stop_codons}')

Common Errors

ErrorCauseSolution
TranslationError: First codon is not a start codon
Used
cds=True
without valid start
Remove
cds=True
or fix sequence
TranslationError: Final codon is not a stop codon
Used
cds=True
without stop codon
Remove
cds=True
or add stop codon
TranslationError: Sequence length not multiple of 3
Partial codons at endTrim sequence to multiple of 3
Unexpected amino acidsWrong codon tableSpecify correct table for organism

Decision Tree

Need to convert sequence?
├── DNA to RNA?
│   └── Use seq.transcribe()
├── RNA to DNA?
│   └── Use seq.back_transcribe()
├── DNA/RNA to protein?
│   ├── Complete CDS with start/stop?
│   │   └── Use translate(cds=True)
│   ├── Stop at first stop codon?
│   │   └── Use translate(to_stop=True)
│   ├── Non-standard organism?
│   │   └── Use translate(table=N)
│   └── Get all including stop symbol?
│       └── Use translate()
└── Find all ORFs?
    └── Translate all six frames, search for M...*

Related Skills

  • seq-objects - Create Seq objects for translation
  • reverse-complement - Translate both strands (six-frame translation)
  • codon-usage - Analyze codon bias in coding sequences
  • sequence-io/read-sequences - Parse GenBank files with CDS features
  • database-access/entrez-fetch - Fetch CDS sequences from NCBI for translation