install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Genomics/long-read-sequencing/long-read-alignment" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-long-read-alignmen && rm -rf "$T"
manifest:
Skills/Genomics/long-read-sequencing/long-read-alignment/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-longread-alignment description: Align long reads using minimap2 for Oxford Nanopore and PacBio data. Supports various presets for different read types and applications. Use when aligning ONT or PacBio reads to a reference genome for variant calling, SV detection, or coverage analysis. tool_type: cli primary_tool: minimap2 measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Long-Read Alignment with minimap2
Oxford Nanopore Alignment
# Basic ONT alignment minimap2 -ax map-ont reference.fa reads.fastq.gz | \ samtools sort -o aligned.bam samtools index aligned.bam
PacBio HiFi Alignment
# PacBio HiFi reads (high accuracy) minimap2 -ax map-hifi reference.fa reads.fastq.gz | \ samtools sort -o aligned.bam samtools index aligned.bam
PacBio CLR Alignment
# PacBio CLR (continuous long reads, lower accuracy) minimap2 -ax map-pb reference.fa reads.fastq.gz | \ samtools sort -o aligned.bam samtools index aligned.bam
Pre-Build Index for Multiple Runs
# Build index once minimap2 -d reference.mmi reference.fa # Use index for alignment minimap2 -ax map-ont reference.mmi reads.fastq.gz | samtools sort -o aligned.bam
Common Options
minimap2 -ax map-ont \ -t 8 \ # Threads -R '@RG\tID:sample\tSM:sample' \ # Read group --secondary=no \ # No secondary alignments --MD \ # Generate MD tag for variants -Y \ # Use soft clipping for supplementary reference.fa reads.fastq.gz | \ samtools sort -@ 4 -o aligned.bam
Splice-Aware Alignment (RNA)
# For direct RNA or cDNA sequencing minimap2 -ax splice reference.fa reads.fastq.gz | \ samtools sort -o aligned.bam
With Junction BED (Known Splice Sites)
# Provide known splice junctions minimap2 -ax splice --junc-bed junctions.bed \ reference.fa reads.fastq.gz | samtools sort -o aligned.bam
Assembly to Reference Alignment
# Assembly with ~0.1% divergence minimap2 -ax asm5 reference.fa assembly.fa > aligned.sam # Assembly with higher divergence (~5%) minimap2 -ax asm20 reference.fa assembly.fa > aligned.sam
Output PAF (Faster, No BAM)
# PAF format (faster, for quick analysis) minimap2 -x map-ont reference.fa reads.fastq.gz > alignments.paf
Keep Secondary and Supplementary
# Keep all alignments (for SV calling) minimap2 -ax map-ont \ --secondary=yes \ -N 5 \ # Max secondary alignments reference.fa reads.fastq.gz | samtools sort -o aligned.bam
Filter Alignments
# During alignment pipeline minimap2 -ax map-ont reference.fa reads.fastq.gz | \ samtools view -b -q 10 | \ # Min mapping quality 10 samtools sort -o aligned.bam
Multiple FASTQ Files
# Concatenate inputs minimap2 -ax map-ont reference.fa reads1.fastq.gz reads2.fastq.gz | \ samtools sort -o aligned.bam # Or use file list cat file_list.txt | xargs minimap2 -ax map-ont reference.fa | \ samtools sort -o aligned.bam
Output Statistics
# Get alignment statistics samtools flagstat aligned.bam # Detailed stats samtools stats aligned.bam | grep ^SN
Convert PAF to BED
# Extract alignments to BED awk 'OFS="\t" {print $6, $8, $9, $1, $12, ($5=="+")?"+":"-"}' alignments.paf > alignments.bed
Key Presets
| Preset | Description | Best For |
|---|---|---|
| map-ont | ONT reads | Nanopore genomic |
| map-hifi | PacBio HiFi | PacBio genomic |
| map-pb | PacBio CLR | PacBio CLR |
| splice | Long RNA reads | cDNA, direct RNA |
| asm5 | Low divergence | Same species assembly |
| asm20 | High divergence | Cross-species assembly |
| sr | Short reads | Illumina (basic) |
Key Parameters
| Parameter | Default | Description |
|---|---|---|
| -t | 3 | CPU threads |
| -k | 15 | K-mer size |
| -w | 10 | Minimizer window |
| -a | off | Output SAM |
| -x | none | Preset |
| --secondary | yes | Output secondary |
| -N | 5 | Max secondary alignments |
| --MD | off | Generate MD tag |
| -R | none | Read group header |
| -Y | off | Soft clipping for supplementary |
Output Formats
| Format | Flag | Description |
|---|---|---|
| PAF | (default) | Pairwise Alignment Format |
| SAM | -a | Sequence Alignment Map |
| BAM | -a | samtools | Binary SAM |
Related Skills
- medaka-polishing - Polish consensus with medaka
- structural-variants - Call SVs from alignments
- alignment-files/sam-bam-basics - BAM manipulation