OpenClaw-Medical-Skills bio-alignment-pairwise
Perform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-alignment-pairwise" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-pairwise && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-alignment-pairwise" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-pairwise && rm -rf "$T"
skills/bio-alignment-pairwise/SKILL.mdVersion Compatibility
Reference examples tested with: BioPython 1.83+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
thenpip show <package>
to check signatureshelp(module.function)
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Pairwise Sequence Alignment
"Align two sequences" → Compute an optimal alignment between a pair of sequences using dynamic programming.
- Python:
(BioPython Bio.Align)PairwiseAligner() - CLI:
(global) orneedle
(local) from EMBOSSwater - R:
(Biostrings)pairwiseAlignment()
Align two sequences using dynamic programming algorithms (Needleman-Wunsch for global, Smith-Waterman for local).
Required Import
Goal: Load modules needed for pairwise alignment operations.
Approach: Import the PairwiseAligner class along with sequence and I/O utilities from Biopython.
from Bio.Align import PairwiseAligner from Bio.Seq import Seq from Bio import SeqIO
Core Concepts
| Mode | Algorithm | Use Case |
|---|---|---|
| Needleman-Wunsch | Full-length alignment, similar-length sequences |
| Smith-Waterman | Find best matching regions, different-length sequences |
Creating an Aligner
Goal: Configure a PairwiseAligner with appropriate scoring for the sequence type.
Approach: Instantiate PairwiseAligner with mode, scoring parameters, or a substitution matrix depending on DNA vs protein input.
# Basic aligner with defaults aligner = PairwiseAligner() # Configure mode and scoring aligner = PairwiseAligner(mode='global', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5) # For protein alignment with substitution matrix from Bio.Align import substitution_matrices aligner = PairwiseAligner(mode='global', substitution_matrix=substitution_matrices.load('BLOSUM62'))
Performing Alignments
"Align two sequences" → Compute optimal alignment(s) between a pair of sequences, returning alignment objects or a score.
Goal: Align two sequences and retrieve the optimal alignment(s) or score.
Approach: Call
aligner.align() for full alignment objects or aligner.score() for score-only (faster for large sequences).
seq1 = Seq('ACCGGTAACGTAG') seq2 = Seq('ACCGTTAACGAAG') # Get all optimal alignments alignments = aligner.align(seq1, seq2) print(f'Found {len(alignments)} optimal alignments') print(alignments[0]) # Print first alignment # Get score only (faster for large sequences) score = aligner.score(seq1, seq2)
Alignment Output Format
target 0 ACCGGTAACGTAG 13 0 |||||.||||.|| 13 query 0 ACCGTTAACGAAG 13
Accessing Alignment Data
Goal: Extract alignment properties including score, shape, aligned sequences, and coordinate mappings.
Approach: Access alignment object attributes and indexing to retrieve per-sequence aligned strings and coordinate arrays.
alignment = alignments[0] # Basic properties print(alignment.score) # Alignment score print(alignment.shape) # (num_seqs, alignment_length) print(len(alignment)) # Alignment length # Get aligned sequences with gaps target_aligned = alignment[0, :] # First sequence (target) with gaps query_aligned = alignment[1, :] # Second sequence (query) with gaps # Get coordinate mapping print(alignment.aligned) # Array of aligned segment coordinates print(alignment.coordinates) # Full coordinate array
Alignment Counts (Identities, Mismatches, Gaps)
Goal: Quantify identities, mismatches, and gaps in an alignment to calculate percent identity.
Approach: Use the
.counts() method on the alignment object and derive percent identity from identity and mismatch totals.
alignment = alignments[0] counts = alignment.counts() print(f'Identities: {counts.identities}') print(f'Mismatches: {counts.mismatches}') print(f'Gaps: {counts.gaps}') # Calculate percent identity total_aligned = counts.identities + counts.mismatches percent_identity = counts.identities / total_aligned * 100 print(f'Percent identity: {percent_identity:.1f}%')
Common Scoring Configurations
DNA/RNA Alignment
aligner = PairwiseAligner(mode='global', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5)
Protein Alignment
from Bio.Align import substitution_matrices blosum62 = substitution_matrices.load('BLOSUM62') aligner = PairwiseAligner(mode='global', substitution_matrix=blosum62, open_gap_score=-11, extend_gap_score=-1)
Local Alignment (Find Best Region)
aligner = PairwiseAligner(mode='local', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5)
Semiglobal (Overlap/Extension)
# Allow free end gaps on query (useful for primer alignment) aligner = PairwiseAligner(mode='global') aligner.query_left_open_gap_score = 0 aligner.query_left_extend_gap_score = 0 aligner.query_right_open_gap_score = 0 aligner.query_right_extend_gap_score = 0
Available Substitution Matrices
Goal: Load and select substitution matrices for protein alignment scoring.
Approach: List available matrices with
substitution_matrices.load() and load specific ones (BLOSUM62 for general, BLOSUM80 for close homologs, PAM250 for distant).
from Bio.Align import substitution_matrices print(substitution_matrices.load()) # List all available matrices # Common matrices blosum62 = substitution_matrices.load('BLOSUM62') # General protein blosum80 = substitution_matrices.load('BLOSUM80') # Closely related proteins pam250 = substitution_matrices.load('PAM250') # Distantly related proteins
Working with SeqRecord Objects
Goal: Align sequences loaded from FASTA files rather than hardcoded strings.
Approach: Parse SeqRecord objects from a FASTA file and pass their
.seq attributes to the aligner.
from Bio import SeqIO records = list(SeqIO.parse('sequences.fasta', 'fasta')) seq1, seq2 = records[0].seq, records[1].seq aligner = PairwiseAligner(mode='global', match_score=1, mismatch_score=-1) alignments = aligner.align(seq1, seq2)
Iterating Over Multiple Alignments
# Limit number of alignments returned (memory efficient) aligner.max_alignments = 100 for i, alignment in enumerate(alignments): print(f'Alignment {i+1}: score={alignment.score}') if i >= 4: break
Substitution Matrix from Alignment
Goal: Extract observed substitution frequencies from a completed alignment.
Approach: Access the
.substitutions property to get a matrix of observed base/residue substitution counts.
alignment = alignments[0] substitutions = alignment.substitutions # View as array (rows=target, cols=query) print(substitutions) # Access specific substitution counts # substitutions['A', 'T'] gives count of A aligned to T
Export Alignment to Different Formats
Goal: Convert an alignment to standard bioinformatics file formats for downstream tools.
Approach: Use Python's
format() function with format specifiers (fasta, clustal, psl, sam) on the alignment object.
alignment = alignments[0] # Various output formats print(format(alignment, 'fasta')) # FASTA format print(format(alignment, 'clustal')) # Clustal format print(format(alignment, 'psl')) # PSL format (BLAT) print(format(alignment, 'sam')) # SAM format
Quick Reference: Scoring Parameters
| Parameter | Description | Typical DNA | Typical Protein |
|---|---|---|---|
| Score for identical bases | 1-2 | Use matrix |
| Penalty for mismatches | -1 to -3 | Use matrix |
| Cost to start a gap | -5 to -15 | -10 to -12 |
| Cost per gap extension | -0.5 to -2 | -0.5 to -1 |
| Scoring matrix | N/A | BLOSUM62 |
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Too many optimal alignments | Set |
| Low scores | Wrong scoring scheme | Use substitution matrix for proteins |
| No alignments in local mode | Scores all negative | Ensure > 0 |
Decision Tree: Choosing Alignment Mode
Need full-length comparison? ├── Yes → Use mode='global' │ └── Sequences similar length? │ ├── Yes → Standard global │ └── No → Consider semiglobal (free end gaps) └── No → Use mode='local' └── Find best matching regions only
Related Skills
- alignment-io - Save alignments to files in various formats
- msa-parsing - Work with multiple sequence alignments
- msa-statistics - Calculate identity, similarity metrics
- sequence-manipulation/motif-search - Pattern matching in sequences