install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-alignment-files-bam-statistics" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-files-bam-statistics && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-alignment-files-bam-statistics" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-files-bam-statistics && rm -rf "$T"
manifest:
skills/bio-alignment-files-bam-statistics/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-alignment-files-bam-statistics description: Generate alignment statistics using samtools flagstat, stats, depth, and coverage. Use when assessing alignment quality, calculating coverage, or generating QC reports. tool_type: cli primary_tool: samtools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
BAM Statistics
Generate alignment statistics using samtools and pysam.
Quick Summary Commands
| Command | Output | Speed |
|---|---|---|
| Read counts by category | Very fast |
| Per-chromosome counts | Very fast (needs index) |
| Comprehensive statistics | Moderate |
| Per-position depth | Slow (full scan) |
| Per-region coverage | Fast (needs index) |
samtools flagstat
Fast summary of alignment flags.
samtools flagstat input.bam
Output:
10000000 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 50000 + 0 supplementary 0 + 0 duplicates 9800000 + 0 mapped (98.00% : N/A) 9950000 + 0 paired in sequencing 4975000 + 0 read1 4975000 + 0 read2 9700000 + 0 properly paired (97.49% : N/A) 9750000 + 0 with itself and mate mapped 100000 + 0 singletons (1.01% : N/A) 25000 + 0 with mate mapped to a different chr 10000 + 0 with mate mapped to a different chr (mapQ>=5)
Multi-threaded
samtools flagstat -@ 4 input.bam
Output to File
samtools flagstat input.bam > flagstat.txt
samtools idxstats
Per-chromosome read counts (requires index).
samtools idxstats input.bam
Output format:
chrom length mapped unmapped
chr1 248956422 5000000 1000 chr2 242193529 4800000 800 chrM 16569 50000 100 * 0 0 150000
Parse idxstats
# Total mapped reads samtools idxstats input.bam | awk '{sum += $3} END {print sum}' # Mitochondrial percentage samtools idxstats input.bam | awk ' /^chrM/ {mt = $3} {total += $3} END {print mt/total*100 "% mitochondrial"}'
samtools stats
Comprehensive statistics including insert size, base quality, and more.
samtools stats input.bam > stats.txt
View Summary Numbers
samtools stats input.bam | grep "^SN"
Key summary fields:
- Total readsraw total sequences
- Mapped readsreads mapped
- Properly pairedreads mapped and paired
- Mean insert sizeinsert size average
- Insert size spreadinsert size standard deviation
- Mean read lengthaverage length
- Mismatch rateerror rate
Generate Plots (with plot-bamstats)
samtools stats input.bam > stats.txt plot-bamstats -p plots/ stats.txt
Stats for Specific Region
samtools stats input.bam chr1:1000000-2000000 > region_stats.txt
samtools depth
Per-position read depth.
Basic Depth
samtools depth input.bam > depth.txt
Output:
chrom position depth
Depth at Specific Positions
samtools depth -r chr1:1000-2000 input.bam
Include Zero-Depth Positions
samtools depth -a input.bam > depth_with_zeros.txt
Maximum Depth Cap
samtools depth -d 0 input.bam # No cap (default 8000)
Depth from BED Regions
samtools depth -b regions.bed input.bam
Calculate Mean Depth
samtools depth input.bam | awk '{sum += $3; n++} END {print sum/n}'
samtools coverage
Per-chromosome or per-region coverage statistics (faster than depth).
samtools coverage input.bam
Output columns:
- Reference name#rname
- Start positionstartpos
- End positionendpos
- Number of readsnumreads
- Bases with coveragecovbases
- Percentage of bases coveredcoverage
- Mean depthmeandepth
- Mean base qualitymeanbaseq
- Mean mapping qualitymeanmapq
Coverage for Specific Region
samtools coverage -r chr1:1000000-2000000 input.bam
Coverage from BED
samtools coverage -b regions.bed input.bam
Histogram Output
samtools coverage -m input.bam
pysam Python Alternative
Count Reads
import pysam with pysam.AlignmentFile('input.bam', 'rb') as bam: total = mapped = paired = proper = 0 for read in bam: total += 1 if not read.is_unmapped: mapped += 1 if read.is_paired: paired += 1 if read.is_proper_pair: proper += 1 print(f'Total: {total}') print(f'Mapped: {mapped} ({mapped/total*100:.1f}%)') print(f'Properly paired: {proper} ({proper/paired*100:.1f}%)')
Per-Chromosome Counts
import pysam with pysam.AlignmentFile('input.bam', 'rb') as bam: for stat in bam.get_index_statistics(): print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')
Calculate Depth at Position
import pysam with pysam.AlignmentFile('input.bam', 'rb') as bam: for pileup in bam.pileup('chr1', 1000000, 1000001): print(f'Position {pileup.pos}: depth {pileup.n}')
Mean Depth in Region
import pysam def mean_depth(bam_path, chrom, start, end): depths = [] with pysam.AlignmentFile(bam_path, 'rb') as bam: for pileup in bam.pileup(chrom, start, end, truncate=True): depths.append(pileup.n) if depths: return sum(depths) / len(depths) return 0 depth = mean_depth('input.bam', 'chr1', 1000000, 2000000) print(f'Mean depth: {depth:.1f}x')
Coverage Statistics
import pysam def coverage_stats(bam_path, chrom, start, end): covered = 0 total_depth = 0 with pysam.AlignmentFile(bam_path, 'rb') as bam: for pileup in bam.pileup(chrom, start, end, truncate=True): covered += 1 total_depth += pileup.n length = end - start pct_covered = covered / length * 100 mean_depth = total_depth / length if length > 0 else 0 return { 'length': length, 'covered_bases': covered, 'pct_covered': pct_covered, 'mean_depth': mean_depth } stats = coverage_stats('input.bam', 'chr1', 1000000, 2000000) print(f'Coverage: {stats["pct_covered"]:.1f}%') print(f'Mean depth: {stats["mean_depth"]:.1f}x')
Insert Size Distribution
import pysam from collections import Counter insert_sizes = Counter() with pysam.AlignmentFile('input.bam', 'rb') as bam: for read in bam: if read.is_proper_pair and read.is_read1 and read.template_length > 0: insert_sizes[read.template_length] += 1 sizes = list(insert_sizes.keys()) mean_insert = sum(s * c for s, c in insert_sizes.items()) / sum(insert_sizes.values()) print(f'Mean insert size: {mean_insert:.0f}') print(f'Min: {min(sizes)}, Max: {max(sizes)}')
Quick Reference
| Task | Command |
|---|---|
| Quick counts | |
| Per-chrom counts | |
| Full stats | |
| Coverage summary | |
| Per-position depth | |
| Mean depth | |
Common Metrics
| Metric | Good | Concerning |
|---|---|---|
| Mapping rate | >95% | <80% |
| Proper pair rate | >90% | <70% |
| Duplicate rate | <20% | >40% |
| Error rate | <1% | >2% |
| Coverage uniformity | <2x CV | >3x CV |
Related Skills
- sam-bam-basics - View alignment files
- alignment-indexing - idxstats requires index
- duplicate-handling - Check duplicate rates
- alignment-filtering - Filter before stats
- sequence-io/sequence-statistics - FASTA/FASTQ statistics