BioClaw sequence-analysis

Analyze DNA/RNA/protein sequences. Use when the user provides a sequence and asks for analysis, translation, GC content, ORFs, motifs, restriction sites, or primer design. Triggers on "sequence", "translate", "GC content", "ORF", "primer", "restriction", "complement", "reverse complement".

install
source · Clone the upstream repo
git clone https://github.com/Runchuan-BU/BioClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Runchuan-BU/BioClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/container/skills/sequence-analysis" ~/.claude/skills/runchuan-bu-bioclaw-sequence-analysis && rm -rf "$T"
manifest: container/skills/sequence-analysis/SKILL.md
source content

Sequence Analysis

Comprehensive sequence analysis using BioPython and command-line tools.

When to Use

  • User provides a DNA/RNA/protein sequence for analysis
  • User asks about sequence properties (GC%, length, composition)
  • User wants to translate DNA to protein
  • User asks for ORF finding, primer design, restriction site analysis

Analysis Workflows

1. Basic Sequence Properties

from Bio.Seq import Seq
from Bio.SeqUtils import gc_fraction, molecular_weight

seq = Seq("ATGCGATCGATCGATCG...")

print(f"Length: {len(seq)} bp")
print(f"GC Content: {gc_fraction(seq)*100:.1f}%")
print(f"Complement: {seq.complement()}")
print(f"Reverse Complement: {seq.reverse_complement()}")
print(f"Protein: {seq.translate()}")

2. ORF Finding

from Bio.Seq import Seq

def find_orfs(sequence, min_length=100):
    orfs = []
    seq = Seq(str(sequence))
    
    for strand, nuc in [("+", seq), ("-", seq.reverse_complement())]:
        for frame in range(3):
            trans = nuc[frame:].translate()
            aa_seq = str(trans)
            
            start = 0
            while start < len(aa_seq):
                m_pos = aa_seq.find("M", start)
                if m_pos == -1:
                    break
                stop_pos = aa_seq.find("*", m_pos)
                if stop_pos == -1:
                    stop_pos = len(aa_seq)
                
                orf_len = (stop_pos - m_pos) * 3
                if orf_len >= min_length:
                    nt_start = frame + m_pos * 3
                    orfs.append({
                        "strand": strand,
                        "frame": frame + 1,
                        "start": nt_start,
                        "length_aa": stop_pos - m_pos,
                        "length_nt": orf_len,
                        "protein": aa_seq[m_pos:stop_pos]
                    })
                start = stop_pos + 1
    
    return sorted(orfs, key=lambda x: x["length_nt"], reverse=True)

3. Restriction Site Analysis

from Bio.Restriction import RestrictionBatch, Analysis
from Bio.Seq import Seq

seq = Seq("ATGCGATCGATCG...")
rb = RestrictionBatch(["EcoRI", "BamHI", "HindIII", "NotI", "XhoI"])
ana = Analysis(rb, seq)
results = ana.full()

for enzyme, sites in results.items():
    if sites:
        print(f"{enzyme}: cuts at positions {sites}")

4. Primer Design (basic)

from Bio.Seq import Seq
from Bio.SeqUtils import MeltingTemp as mt

def design_primers(seq_str, product_size_range=(200, 800)):
    seq = Seq(seq_str)
    
    # Forward primer (first 20bp)
    fwd = seq[:20]
    fwd_tm = mt.Tm_NN(fwd)
    
    # Reverse primer (last 20bp, reverse complement)
    rev = seq[-20:].reverse_complement()
    rev_tm = mt.Tm_NN(rev)
    
    print(f"Forward: 5'-{fwd}-3' (Tm={fwd_tm:.1f}°C, GC={gc_fraction(fwd)*100:.0f}%)")
    print(f"Reverse: 5'-{rev}-3' (Tm={rev_tm:.1f}°C, GC={gc_fraction(rev)*100:.0f}%)")
    print(f"Product size: {len(seq)} bp")

5. Multiple Sequence Alignment (using command-line)

If user provides multiple sequences:

# Write sequences to FASTA file
cat > /tmp/sequences.fa << 'EOF'
>seq1
ATGCGATCG...
>seq2
ATGCAATCG...
EOF

# If clustalw/muscle available, use them
# Otherwise use BioPython's pairwise alignment
from Bio import pairwise2
from Bio.pairwise2 import format_alignment

alignments = pairwise2.align.globalxx(seq1, seq2)
print(format_alignment(*alignments[0]))

6. Output format for WhatsApp

*Sequence Analysis Results*

• Length: 1,234 bp
• GC Content: 52.3%
• ORFs found: 3 (longest: 456 aa)

*Protein Translation (frame +1):*
```MRSSIDLK...STOP```

*Restriction Sites:*
• EcoRI: positions 123, 456
• BamHI: position 789
• HindIII: no sites found

7. Follow-up suggestions

  • "Want me to BLAST this sequence?"
  • "Should I design primers for a specific region?"
  • "Want a detailed ORF map?"
  • "Should I check for conserved domains?"