BioSkills bio-alignment-files-bam-statistics

Generate alignment statistics using samtools flagstat, stats, depth, and coverage. Use when assessing alignment quality, calculating coverage, or generating QC reports.

install
source · Clone the upstream repo
git clone https://github.com/GPTomics/bioSkills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/alignment-files/bam-statistics" ~/.claude/skills/gptomics-bioskills-bio-alignment-files-bam-statistics && rm -rf "$T"
manifest: alignment-files/bam-statistics/SKILL.md
source content

Version Compatibility

Reference examples tested with: pysam 0.22+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:

  • Python:
    pip show <package>
    then
    help(module.function)
    to check signatures
  • CLI:
    <tool> --version
    then
    <tool> --help
    to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

BAM Statistics

"Get alignment statistics and coverage from my BAM file" → Generate read counts, mapping rates, per-chromosome statistics, depth profiles, and coverage summaries.

  • CLI:
    samtools flagstat
    ,
    samtools stats
    ,
    samtools depth
    ,
    samtools coverage
    (samtools)
  • Python:
    pysam.AlignmentFile
    with
    pileup()
    and
    get_index_statistics()
    (pysam)

Generate alignment statistics using samtools and pysam.

Quick Summary Commands

CommandOutputSpeed
flagstat
Read counts by categoryVery fast
idxstats
Per-chromosome countsVery fast (needs index)
stats
Comprehensive statisticsModerate
depth
Per-position depthSlow (full scan)
coverage
Per-region coverageFast (needs index)

samtools flagstat

Fast summary of alignment flags.

samtools flagstat input.bam

Output:

10000000 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
50000 + 0 supplementary
0 + 0 duplicates
9800000 + 0 mapped (98.00% : N/A)
9950000 + 0 paired in sequencing
4975000 + 0 read1
4975000 + 0 read2
9700000 + 0 properly paired (97.49% : N/A)
9750000 + 0 with itself and mate mapped
100000 + 0 singletons (1.01% : N/A)
25000 + 0 with mate mapped to a different chr
10000 + 0 with mate mapped to a different chr (mapQ>=5)

Multi-threaded

samtools flagstat -@ 4 input.bam

Output to File

samtools flagstat input.bam > flagstat.txt

samtools idxstats

Per-chromosome read counts (requires index).

samtools idxstats input.bam

Output format:

chrom length mapped unmapped

chr1    248956422    5000000    1000
chr2    242193529    4800000    800
chrM    16569        50000      100
*       0            0          150000

Parse idxstats

# Total mapped reads
samtools idxstats input.bam | awk '{sum += $3} END {print sum}'

# Mitochondrial percentage
samtools idxstats input.bam | awk '
    /^chrM/ {mt = $3}
    {total += $3}
    END {print mt/total*100 "% mitochondrial"}'

samtools stats

Comprehensive statistics including insert size, base quality, and more.

samtools stats input.bam > stats.txt

View Summary Numbers

samtools stats input.bam | grep "^SN"

Key summary fields:

  • raw total sequences
    - Total reads
  • reads mapped
    - Mapped reads
  • reads mapped and paired
    - Properly paired
  • insert size average
    - Mean insert size
  • insert size standard deviation
    - Insert size spread
  • average length
    - Mean read length
  • error rate
    - Mismatch rate

Generate Plots (with plot-bamstats)

samtools stats input.bam > stats.txt
plot-bamstats -p plots/ stats.txt

Stats for Specific Region

samtools stats input.bam chr1:1000000-2000000 > region_stats.txt

samtools depth

Per-position read depth.

Basic Depth

samtools depth input.bam > depth.txt

Output:

chrom position depth

Depth at Specific Positions

samtools depth -r chr1:1000-2000 input.bam

Include Zero-Depth Positions

samtools depth -a input.bam > depth_with_zeros.txt

Maximum Depth Cap

samtools depth -d 0 input.bam  # No cap (default 8000)

Depth from BED Regions

samtools depth -b regions.bed input.bam

Calculate Mean Depth

samtools depth input.bam | awk '{sum += $3; n++} END {print sum/n}'

samtools coverage

Per-chromosome or per-region coverage statistics (faster than depth).

samtools coverage input.bam

Output columns:

  • #rname
    - Reference name
  • startpos
    - Start position
  • endpos
    - End position
  • numreads
    - Number of reads
  • covbases
    - Bases with coverage
  • coverage
    - Percentage of bases covered
  • meandepth
    - Mean depth
  • meanbaseq
    - Mean base quality
  • meanmapq
    - Mean mapping quality

Coverage for Specific Region

samtools coverage -r chr1:1000000-2000000 input.bam

Coverage from BED

samtools coverage -b regions.bed input.bam

Histogram Output

samtools coverage -m input.bam

pysam Python Alternative

Count Reads

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    total = mapped = paired = proper = 0
    for read in bam:
        total += 1
        if not read.is_unmapped:
            mapped += 1
        if read.is_paired:
            paired += 1
        if read.is_proper_pair:
            proper += 1

    print(f'Total: {total}')
    print(f'Mapped: {mapped} ({mapped/total*100:.1f}%)')
    print(f'Properly paired: {proper} ({proper/paired*100:.1f}%)')

Per-Chromosome Counts

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for stat in bam.get_index_statistics():
        print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')

Calculate Depth at Position

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for pileup in bam.pileup('chr1', 1000000, 1000001):
        print(f'Position {pileup.pos}: depth {pileup.n}')

Mean Depth in Region

import pysam

def mean_depth(bam_path, chrom, start, end):
    depths = []
    with pysam.AlignmentFile(bam_path, 'rb') as bam:
        for pileup in bam.pileup(chrom, start, end, truncate=True):
            depths.append(pileup.n)

    if depths:
        return sum(depths) / len(depths)
    return 0

depth = mean_depth('input.bam', 'chr1', 1000000, 2000000)
print(f'Mean depth: {depth:.1f}x')

Coverage Statistics

Goal: Compute coverage breadth and depth for a genomic region from a BAM file.

Approach: Iterate pileup columns in the region, count covered positions and accumulate depth, then derive percentages and means.

Reference (pysam 0.22+):

import pysam

def coverage_stats(bam_path, chrom, start, end):
    covered = 0
    total_depth = 0

    with pysam.AlignmentFile(bam_path, 'rb') as bam:
        for pileup in bam.pileup(chrom, start, end, truncate=True):
            covered += 1
            total_depth += pileup.n

    length = end - start
    pct_covered = covered / length * 100
    mean_depth = total_depth / length if length > 0 else 0

    return {
        'length': length,
        'covered_bases': covered,
        'pct_covered': pct_covered,
        'mean_depth': mean_depth
    }

stats = coverage_stats('input.bam', 'chr1', 1000000, 2000000)
print(f'Coverage: {stats["pct_covered"]:.1f}%')
print(f'Mean depth: {stats["mean_depth"]:.1f}x')

Insert Size Distribution

Goal: Compute the insert size distribution to assess library preparation quality.

Approach: Iterate properly paired read1 records, accumulate template lengths into a Counter, then compute summary statistics.

Reference (pysam 0.22+):

import pysam
from collections import Counter

insert_sizes = Counter()

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for read in bam:
        if read.is_proper_pair and read.is_read1 and read.template_length > 0:
            insert_sizes[read.template_length] += 1

sizes = list(insert_sizes.keys())
mean_insert = sum(s * c for s, c in insert_sizes.items()) / sum(insert_sizes.values())
print(f'Mean insert size: {mean_insert:.0f}')
print(f'Min: {min(sizes)}, Max: {max(sizes)}')

Quick Reference

TaskCommand
Quick counts
samtools flagstat input.bam
Per-chrom counts
samtools idxstats input.bam
Full stats
samtools stats input.bam
Coverage summary
samtools coverage input.bam
Per-position depth
samtools depth input.bam
Mean depth
samtools depth input.bam | awk '{sum+=$3;n++}END{print sum/n}'

Common Metrics

MetricGoodConcerning
Mapping rate>95%<80%
Proper pair rate>90%<70%
Duplicate rate<20%>40%
Error rate<1%>2%
Coverage uniformity<2x CV>3x CV

Related Skills

  • sam-bam-basics - View alignment files
  • alignment-indexing - idxstats requires index
  • duplicate-handling - Check duplicate rates
  • alignment-filtering - Filter before stats
  • sequence-io/sequence-statistics - FASTA/FASTQ statistics