BioClaw sequence-analysis

Analyze DNA/RNA/protein sequences. Use when the user provides a sequence and asks for analysis, translation, GC content, ORFs, motifs, restriction sites, or primer design. Triggers on "sequence", "translate", "GC content", "ORF", "primer", "restriction", "complement", "reverse complement".

install

source · Clone the upstream repo

git clone https://github.com/Runchuan-BU/BioClaw

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Runchuan-BU/BioClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/container/skills/sequence-analysis" ~/.claude/skills/runchuan-bu-bioclaw-sequence-analysis && rm -rf "$T"

manifest: container/skills/sequence-analysis/SKILL.md

source content

Sequence Analysis

Comprehensive sequence analysis using BioPython and command-line tools.

When to Use

User provides a DNA/RNA/protein sequence for analysis
User asks about sequence properties (GC%, length, composition)
User wants to translate DNA to protein
User asks for ORF finding, primer design, restriction site analysis

Analysis Workflows

1. Basic Sequence Properties

from Bio.Seq import Seq
from Bio.SeqUtils import gc_fraction, molecular_weight

seq = Seq("ATGCGATCGATCGATCG...")

print(f"Length: {len(seq)} bp")
print(f"GC Content: {gc_fraction(seq)*100:.1f}%")
print(f"Complement: {seq.complement()}")
print(f"Reverse Complement: {seq.reverse_complement()}")
print(f"Protein: {seq.translate()}")

2. ORF Finding

from Bio.Seq import Seq

def find_orfs(sequence, min_length=100):
    orfs = []
    seq = Seq(str(sequence))
    
    for strand, nuc in [("+", seq), ("-", seq.reverse_complement())]:
        for frame in range(3):
            trans = nuc[frame:].translate()
            aa_seq = str(trans)
            
            start = 0
            while start < len(aa_seq):
                m_pos = aa_seq.find("M", start)
                if m_pos == -1:
                    break
                stop_pos = aa_seq.find("*", m_pos)
                if stop_pos == -1:
                    stop_pos = len(aa_seq)
                
                orf_len = (stop_pos - m_pos) * 3
                if orf_len >= min_length:
                    nt_start = frame + m_pos * 3
                    orfs.append({
                        "strand": strand,
                        "frame": frame + 1,
                        "start": nt_start,
                        "length_aa": stop_pos - m_pos,
                        "length_nt": orf_len,
                        "protein": aa_seq[m_pos:stop_pos]
                    })
                start = stop_pos + 1
    
    return sorted(orfs, key=lambda x: x["length_nt"], reverse=True)

3. Restriction Site Analysis

from Bio.Restriction import RestrictionBatch, Analysis
from Bio.Seq import Seq

seq = Seq("ATGCGATCGATCG...")
rb = RestrictionBatch(["EcoRI", "BamHI", "HindIII", "NotI", "XhoI"])
ana = Analysis(rb, seq)
results = ana.full()

for enzyme, sites in results.items():
    if sites:
        print(f"{enzyme}: cuts at positions {sites}")

4. Primer Design (basic)

from Bio.Seq import Seq
from Bio.SeqUtils import MeltingTemp as mt

def design_primers(seq_str, product_size_range=(200, 800)):
    seq = Seq(seq_str)
    
    # Forward primer (first 20bp)
    fwd = seq[:20]
    fwd_tm = mt.Tm_NN(fwd)
    
    # Reverse primer (last 20bp, reverse complement)
    rev = seq[-20:].reverse_complement()
    rev_tm = mt.Tm_NN(rev)
    
    print(f"Forward: 5'-{fwd}-3' (Tm={fwd_tm:.1f}°C, GC={gc_fraction(fwd)*100:.0f}%)")
    print(f"Reverse: 5'-{rev}-3' (Tm={rev_tm:.1f}°C, GC={gc_fraction(rev)*100:.0f}%)")
    print(f"Product size: {len(seq)} bp")

5. Multiple Sequence Alignment (using command-line)

If user provides multiple sequences:

# Write sequences to FASTA file
cat > /tmp/sequences.fa << 'EOF'
>seq1
ATGCGATCG...
>seq2
ATGCAATCG...
EOF

# If clustalw/muscle available, use them
# Otherwise use BioPython's pairwise alignment

from Bio import pairwise2
from Bio.pairwise2 import format_alignment

alignments = pairwise2.align.globalxx(seq1, seq2)
print(format_alignment(*alignments[0]))

6. Output format for WhatsApp

*Sequence Analysis Results*

• Length: 1,234 bp
• GC Content: 52.3%
• ORFs found: 3 (longest: 456 aa)

*Protein Translation (frame +1):*
```MRSSIDLK...STOP```

*Restriction Sites:*
• EcoRI: positions 123, 456
• BamHI: position 789
• HindIII: no sites found

7. Follow-up suggestions

"Want me to BLAST this sequence?"
"Should I design primers for a specific region?"
"Want a detailed ORF map?"
"Should I check for conserved domains?"