install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Sequence_Analysis/alignment/pairwise-alignment" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-pairwise-alignment && rm -rf "$T"
manifest:
Skills/Sequence_Analysis/alignment/pairwise-alignment/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-alignment-pairwise description: Perform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences. tool_type: python primary_tool: Bio.Align measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Pairwise Sequence Alignment
Align two sequences using dynamic programming algorithms (Needleman-Wunsch for global, Smith-Waterman for local).
Required Import
from Bio.Align import PairwiseAligner from Bio.Seq import Seq from Bio import SeqIO
Core Concepts
| Mode | Algorithm | Use Case |
|---|---|---|
| Needleman-Wunsch | Full-length alignment, similar-length sequences |
| Smith-Waterman | Find best matching regions, different-length sequences |
Creating an Aligner
# Basic aligner with defaults aligner = PairwiseAligner() # Configure mode and scoring aligner = PairwiseAligner(mode='global', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5) # For protein alignment with substitution matrix from Bio.Align import substitution_matrices aligner = PairwiseAligner(mode='global', substitution_matrix=substitution_matrices.load('BLOSUM62'))
Performing Alignments
seq1 = Seq('ACCGGTAACGTAG') seq2 = Seq('ACCGTTAACGAAG') # Get all optimal alignments alignments = aligner.align(seq1, seq2) print(f'Found {len(alignments)} optimal alignments') print(alignments[0]) # Print first alignment # Get score only (faster for large sequences) score = aligner.score(seq1, seq2)
Alignment Output Format
target 0 ACCGGTAACGTAG 13 0 |||||.||||.|| 13 query 0 ACCGTTAACGAAG 13
Accessing Alignment Data
alignment = alignments[0] # Basic properties print(alignment.score) # Alignment score print(alignment.shape) # (num_seqs, alignment_length) print(len(alignment)) # Alignment length # Get aligned sequences with gaps target_aligned = alignment[0, :] # First sequence (target) with gaps query_aligned = alignment[1, :] # Second sequence (query) with gaps # Get coordinate mapping print(alignment.aligned) # Array of aligned segment coordinates print(alignment.coordinates) # Full coordinate array
Alignment Counts (Identities, Mismatches, Gaps)
alignment = alignments[0] counts = alignment.counts() print(f'Identities: {counts.identities}') print(f'Mismatches: {counts.mismatches}') print(f'Gaps: {counts.gaps}') # Calculate percent identity total_aligned = counts.identities + counts.mismatches percent_identity = counts.identities / total_aligned * 100 print(f'Percent identity: {percent_identity:.1f}%')
Common Scoring Configurations
DNA/RNA Alignment
aligner = PairwiseAligner(mode='global', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5)
Protein Alignment
from Bio.Align import substitution_matrices blosum62 = substitution_matrices.load('BLOSUM62') aligner = PairwiseAligner(mode='global', substitution_matrix=blosum62, open_gap_score=-11, extend_gap_score=-1)
Local Alignment (Find Best Region)
aligner = PairwiseAligner(mode='local', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5)
Semiglobal (Overlap/Extension)
# Allow free end gaps on query (useful for primer alignment) aligner = PairwiseAligner(mode='global') aligner.query_left_open_gap_score = 0 aligner.query_left_extend_gap_score = 0 aligner.query_right_open_gap_score = 0 aligner.query_right_extend_gap_score = 0
Available Substitution Matrices
from Bio.Align import substitution_matrices print(substitution_matrices.load()) # List all available matrices # Common matrices blosum62 = substitution_matrices.load('BLOSUM62') # General protein blosum80 = substitution_matrices.load('BLOSUM80') # Closely related proteins pam250 = substitution_matrices.load('PAM250') # Distantly related proteins
Working with SeqRecord Objects
from Bio import SeqIO records = list(SeqIO.parse('sequences.fasta', 'fasta')) seq1, seq2 = records[0].seq, records[1].seq aligner = PairwiseAligner(mode='global', match_score=1, mismatch_score=-1) alignments = aligner.align(seq1, seq2)
Iterating Over Multiple Alignments
# Limit number of alignments returned (memory efficient) aligner.max_alignments = 100 for i, alignment in enumerate(alignments): print(f'Alignment {i+1}: score={alignment.score}') if i >= 4: break
Substitution Matrix from Alignment
alignment = alignments[0] substitutions = alignment.substitutions # View as array (rows=target, cols=query) print(substitutions) # Access specific substitution counts # substitutions['A', 'T'] gives count of A aligned to T
Export Alignment to Different Formats
alignment = alignments[0] # Various output formats print(format(alignment, 'fasta')) # FASTA format print(format(alignment, 'clustal')) # Clustal format print(format(alignment, 'psl')) # PSL format (BLAT) print(format(alignment, 'sam')) # SAM format
Quick Reference: Scoring Parameters
| Parameter | Description | Typical DNA | Typical Protein |
|---|---|---|---|
| Score for identical bases | 1-2 | Use matrix |
| Penalty for mismatches | -1 to -3 | Use matrix |
| Cost to start a gap | -5 to -15 | -10 to -12 |
| Cost per gap extension | -0.5 to -2 | -0.5 to -1 |
| Scoring matrix | N/A | BLOSUM62 |
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Too many optimal alignments | Set |
| Low scores | Wrong scoring scheme | Use substitution matrix for proteins |
| No alignments in local mode | Scores all negative | Ensure > 0 |
Decision Tree: Choosing Alignment Mode
Need full-length comparison? ├── Yes → Use mode='global' │ └── Sequences similar length? │ ├── Yes → Standard global │ └── No → Consider semiglobal (free end gaps) └── No → Use mode='local' └── Find best matching regions only
Related Skills
- alignment-io - Save alignments to files in various formats
- msa-parsing - Work with multiple sequence alignments
- msa-statistics - Calculate identity, similarity metrics
- sequence-manipulation/motif-search - Pattern matching in sequences