OpenClaw-Medical-Skills bio-alignment-files-bam-statistics

<!--

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-alignment-files-bam-statistics" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-files-bam-statistics && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-alignment-files-bam-statistics" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-files-bam-statistics && rm -rf "$T"
manifest: skills/bio-alignment-files-bam-statistics/SKILL.md
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-alignment-files-bam-statistics description: Generate alignment statistics using samtools flagstat, stats, depth, and coverage. Use when assessing alignment quality, calculating coverage, or generating QC reports. tool_type: cli primary_tool: samtools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

BAM Statistics

Generate alignment statistics using samtools and pysam.

Quick Summary Commands

CommandOutputSpeed
flagstat
Read counts by categoryVery fast
idxstats
Per-chromosome countsVery fast (needs index)
stats
Comprehensive statisticsModerate
depth
Per-position depthSlow (full scan)
coverage
Per-region coverageFast (needs index)

samtools flagstat

Fast summary of alignment flags.

samtools flagstat input.bam

Output:

10000000 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
50000 + 0 supplementary
0 + 0 duplicates
9800000 + 0 mapped (98.00% : N/A)
9950000 + 0 paired in sequencing
4975000 + 0 read1
4975000 + 0 read2
9700000 + 0 properly paired (97.49% : N/A)
9750000 + 0 with itself and mate mapped
100000 + 0 singletons (1.01% : N/A)
25000 + 0 with mate mapped to a different chr
10000 + 0 with mate mapped to a different chr (mapQ>=5)

Multi-threaded

samtools flagstat -@ 4 input.bam

Output to File

samtools flagstat input.bam > flagstat.txt

samtools idxstats

Per-chromosome read counts (requires index).

samtools idxstats input.bam

Output format:

chrom length mapped unmapped

chr1    248956422    5000000    1000
chr2    242193529    4800000    800
chrM    16569        50000      100
*       0            0          150000

Parse idxstats

# Total mapped reads
samtools idxstats input.bam | awk '{sum += $3} END {print sum}'

# Mitochondrial percentage
samtools idxstats input.bam | awk '
    /^chrM/ {mt = $3}
    {total += $3}
    END {print mt/total*100 "% mitochondrial"}'

samtools stats

Comprehensive statistics including insert size, base quality, and more.

samtools stats input.bam > stats.txt

View Summary Numbers

samtools stats input.bam | grep "^SN"

Key summary fields:

  • raw total sequences
    - Total reads
  • reads mapped
    - Mapped reads
  • reads mapped and paired
    - Properly paired
  • insert size average
    - Mean insert size
  • insert size standard deviation
    - Insert size spread
  • average length
    - Mean read length
  • error rate
    - Mismatch rate

Generate Plots (with plot-bamstats)

samtools stats input.bam > stats.txt
plot-bamstats -p plots/ stats.txt

Stats for Specific Region

samtools stats input.bam chr1:1000000-2000000 > region_stats.txt

samtools depth

Per-position read depth.

Basic Depth

samtools depth input.bam > depth.txt

Output:

chrom position depth

Depth at Specific Positions

samtools depth -r chr1:1000-2000 input.bam

Include Zero-Depth Positions

samtools depth -a input.bam > depth_with_zeros.txt

Maximum Depth Cap

samtools depth -d 0 input.bam  # No cap (default 8000)

Depth from BED Regions

samtools depth -b regions.bed input.bam

Calculate Mean Depth

samtools depth input.bam | awk '{sum += $3; n++} END {print sum/n}'

samtools coverage

Per-chromosome or per-region coverage statistics (faster than depth).

samtools coverage input.bam

Output columns:

  • #rname
    - Reference name
  • startpos
    - Start position
  • endpos
    - End position
  • numreads
    - Number of reads
  • covbases
    - Bases with coverage
  • coverage
    - Percentage of bases covered
  • meandepth
    - Mean depth
  • meanbaseq
    - Mean base quality
  • meanmapq
    - Mean mapping quality

Coverage for Specific Region

samtools coverage -r chr1:1000000-2000000 input.bam

Coverage from BED

samtools coverage -b regions.bed input.bam

Histogram Output

samtools coverage -m input.bam

pysam Python Alternative

Count Reads

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    total = mapped = paired = proper = 0
    for read in bam:
        total += 1
        if not read.is_unmapped:
            mapped += 1
        if read.is_paired:
            paired += 1
        if read.is_proper_pair:
            proper += 1

    print(f'Total: {total}')
    print(f'Mapped: {mapped} ({mapped/total*100:.1f}%)')
    print(f'Properly paired: {proper} ({proper/paired*100:.1f}%)')

Per-Chromosome Counts

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for stat in bam.get_index_statistics():
        print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')

Calculate Depth at Position

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for pileup in bam.pileup('chr1', 1000000, 1000001):
        print(f'Position {pileup.pos}: depth {pileup.n}')

Mean Depth in Region

import pysam

def mean_depth(bam_path, chrom, start, end):
    depths = []
    with pysam.AlignmentFile(bam_path, 'rb') as bam:
        for pileup in bam.pileup(chrom, start, end, truncate=True):
            depths.append(pileup.n)

    if depths:
        return sum(depths) / len(depths)
    return 0

depth = mean_depth('input.bam', 'chr1', 1000000, 2000000)
print(f'Mean depth: {depth:.1f}x')

Coverage Statistics

import pysam

def coverage_stats(bam_path, chrom, start, end):
    covered = 0
    total_depth = 0

    with pysam.AlignmentFile(bam_path, 'rb') as bam:
        for pileup in bam.pileup(chrom, start, end, truncate=True):
            covered += 1
            total_depth += pileup.n

    length = end - start
    pct_covered = covered / length * 100
    mean_depth = total_depth / length if length > 0 else 0

    return {
        'length': length,
        'covered_bases': covered,
        'pct_covered': pct_covered,
        'mean_depth': mean_depth
    }

stats = coverage_stats('input.bam', 'chr1', 1000000, 2000000)
print(f'Coverage: {stats["pct_covered"]:.1f}%')
print(f'Mean depth: {stats["mean_depth"]:.1f}x')

Insert Size Distribution

import pysam
from collections import Counter

insert_sizes = Counter()

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for read in bam:
        if read.is_proper_pair and read.is_read1 and read.template_length > 0:
            insert_sizes[read.template_length] += 1

sizes = list(insert_sizes.keys())
mean_insert = sum(s * c for s, c in insert_sizes.items()) / sum(insert_sizes.values())
print(f'Mean insert size: {mean_insert:.0f}')
print(f'Min: {min(sizes)}, Max: {max(sizes)}')

Quick Reference

TaskCommand
Quick counts
samtools flagstat input.bam
Per-chrom counts
samtools idxstats input.bam
Full stats
samtools stats input.bam
Coverage summary
samtools coverage input.bam
Per-position depth
samtools depth input.bam
Mean depth
samtools depth input.bam | awk '{sum+=$3;n++}END{print sum/n}'

Common Metrics

MetricGoodConcerning
Mapping rate>95%<80%
Proper pair rate>90%<70%
Duplicate rate<20%>40%
Error rate<1%>2%
Coverage uniformity<2x CV>3x CV

Related Skills

  • sam-bam-basics - View alignment files
  • alignment-indexing - idxstats requires index
  • duplicate-handling - Check duplicate rates
  • alignment-filtering - Filter before stats
  • sequence-io/sequence-statistics - FASTA/FASTQ statistics
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->