OpenClaw-Medical-Skills bio-read-alignment-bwa-alignment

<!--

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-read-alignment-bwa-alignment" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-read-alignment-bwa-alignment && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-read-alignment-bwa-alignment" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-read-alignment-bwa-alignment && rm -rf "$T"
manifest: skills/bio-read-alignment-bwa-alignment/SKILL.md
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-read-alignment-bwa-alignment description: Align DNA short reads to reference genomes using bwa-mem2, the faster successor to BWA-MEM. Use when aligning DNA short reads to a reference genome. tool_type: cli primary_tool: bwa-mem2 measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

BWA-MEM2 Alignment

Build Index

# Index reference genome (required once)
bwa-mem2 index reference.fa

# Creates: reference.fa.0123, reference.fa.amb, reference.fa.ann, reference.fa.bwt.2bit.64, reference.fa.pac

Basic Alignment

# Paired-end reads
bwa-mem2 mem -t 8 reference.fa reads_1.fq.gz reads_2.fq.gz > aligned.sam

# Single-end reads
bwa-mem2 mem -t 8 reference.fa reads.fq.gz > aligned.sam

Alignment with Read Groups

# Add read group information (required for GATK)
bwa-mem2 mem -t 8 \
    -R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA\tLB:lib1' \
    reference.fa reads_1.fq.gz reads_2.fq.gz > aligned.sam

Direct to Sorted BAM

# Pipe to samtools for sorted BAM output
bwa-mem2 mem -t 8 \
    -R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA' \
    reference.fa reads_1.fq.gz reads_2.fq.gz | \
    samtools sort -@ 4 -o aligned.sorted.bam -

# Index the BAM
samtools index aligned.sorted.bam

Mark Duplicates Pipeline

# Full pipeline: align, fixmate, sort, markdup
bwa-mem2 mem -t 8 -R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA' \
    reference.fa reads_1.fq.gz reads_2.fq.gz | \
    samtools fixmate -m -@ 4 - - | \
    samtools sort -@ 4 - | \
    samtools markdup -@ 4 - aligned.markdup.bam

samtools index aligned.markdup.bam

Common Options

bwa-mem2 mem -t 8 \         # Threads
    -M \                     # Mark shorter split hits as secondary (Picard compatible)
    -Y \                     # Use soft clipping for supplementary alignments
    -K 100000000 \           # Process INT input bases in each batch
    -R '@RG\tID:s1\tSM:s1' \ # Read group
    reference.fa r1.fq r2.fq

Key Parameters

ParameterDefaultDescription
-t1Number of threads
-k19Minimum seed length
-w100Band width for extension
-r1.5Re-seeding trigger ratio
-c500Skip seeds with more than INT hits
-A1Match score
-B4Mismatch penalty
-O6Gap open penalty
-E1Gap extension penalty
-MoffMark secondary alignments

Output Filters

# Filter unmapped and low quality
bwa-mem2 mem -t 8 reference.fa r1.fq r2.fq | \
    samtools view -@ 4 -bS -q 20 -F 4 - | \
    samtools sort -@ 4 -o aligned.filtered.bam -

Split Read Alignment

# For SV detection, use -Y for soft clipping
bwa-mem2 mem -t 8 -Y reference.fa r1.fq r2.fq > aligned.sam

Memory Requirements

  • Index loading: ~10GB for human genome
  • Per thread: ~1-2GB
  • Typical human WGS: 30-50GB RAM with 8 threads

BWA-MEM (Alternative)

# Build index
bwa index reference.fa

# Paired-end alignment
bwa mem -t 8 reference.fa reads_1.fq.gz reads_2.fq.gz > aligned.sam

# With read groups
bwa mem -t 8 -R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA' \
    reference.fa reads_1.fq.gz reads_2.fq.gz > aligned.sam

# Direct to sorted BAM
bwa mem -t 8 -R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA' \
    reference.fa reads_1.fq.gz reads_2.fq.gz | \
    samtools sort -@ 4 -o aligned.sorted.bam -

BWA-MEM vs BWA-MEM2

FeatureBWA-MEMBWA-MEM2
StatusActiveArchived
Speed1x2-3x faster
Index format.bwt.bwt.2bit.64
ResultsBaselineNearly identical
Memory~5GB~10GB

Related Skills

  • read-qc/fastp-workflow - Preprocess reads before alignment
  • alignment-files/alignment-sorting - Post-alignment processing
  • alignment-files/duplicate-handling - Mark duplicates
  • variant-calling/variant-calling - Call variants from BAM
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->