OpenClaw-Medical-Skills bio-alignment-indexing

<!--

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-alignment-indexing" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-indexing && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-alignment-indexing" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-indexing && rm -rf "$T"
manifest: skills/bio-alignment-indexing/SKILL.md
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-alignment-indexing description: Create and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions. tool_type: cli primary_tool: samtools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Alignment Indexing

Create indices for random access to alignment files using samtools and pysam.

Index Types

IndexExtensionUse Case
BAI
.bai
Standard BAM index, chromosomes < 512 Mbp
CSI
.csi
Large chromosomes, custom bin sizes
CRAI
.crai
CRAM index

samtools index

Create BAI Index

samtools index input.bam
# Creates input.bam.bai

Create CSI Index

samtools index -c input.bam
# Creates input.bam.csi

Specify Output Name

samtools index input.bam output.bai

Multi-threaded Indexing

samtools index -@ 4 input.bam

Index CRAM

samtools index input.cram
# Creates input.cram.crai

Index Requirements

Indexing requires coordinate-sorted files:

# Check sort order
samtools view -H input.bam | grep "^@HD"
# Should show SO:coordinate

# Sort if needed, then index
samtools sort -o sorted.bam input.bam
samtools index sorted.bam

Using Indices for Region Access

samtools view with Region

# Requires index file present
samtools view input.bam chr1:1000000-2000000

Multiple Regions

samtools view input.bam chr1:1000-2000 chr2:3000-4000

Regions from BED File

samtools view -L regions.bed input.bam

pysam Python Alternative

Create Index

import pysam

pysam.index('input.bam')
# Creates input.bam.bai

Create CSI Index

pysam.index('input.bam', 'input.bam.csi', csi=True)

Fetch with Index

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    # fetch() requires index
    for read in bam.fetch('chr1', 1000000, 2000000):
        print(read.query_name)

Check if Indexed

import pysam
from pathlib import Path

def is_indexed(bam_path):
    bam_path = Path(bam_path)
    return (bam_path.with_suffix('.bam.bai').exists() or
            Path(str(bam_path) + '.bai').exists() or
            bam_path.with_suffix('.bam.csi').exists())

if not is_indexed('input.bam'):
    pysam.index('input.bam')

Fetch Multiple Regions

regions = [('chr1', 1000, 2000), ('chr1', 5000, 6000), ('chr2', 1000, 2000)]

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for chrom, start, end in regions:
        count = sum(1 for _ in bam.fetch(chrom, start, end))
        print(f'{chrom}:{start}-{end}: {count} reads')

Count Reads in Region

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    count = bam.count('chr1', 1000000, 2000000)
    print(f'Reads in region: {count}')

Get Reads Covering Position

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for read in bam.fetch('chr1', 1000000, 1000001):
        if read.reference_start <= 1000000 < read.reference_end:
            print(f'{read.query_name} covers position 1000000')

Index File Locations

samtools looks for indices in two locations:

input.bam.bai   # Standard location
input.bai       # Alternative location

For CRAM:

input.cram.crai

idxstats - Index Statistics

Get Per-Chromosome Counts

samtools idxstats input.bam

Output format:

chr1    248956422    5000000    0
chr2    242193529    4500000    0
*       0            0          10000

Columns: reference name, length, mapped reads, unmapped reads

Sum Total Mapped Reads

samtools idxstats input.bam | awk '{sum += $3} END {print sum}'

pysam idxstats

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for stat in bam.get_index_statistics():
        print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')

FASTA Index (faidx)

Related but different - index reference FASTA for random access:

samtools faidx reference.fa
# Creates reference.fa.fai

# Fetch region from indexed FASTA
samtools faidx reference.fa chr1:1000-2000

pysam FastaFile

with pysam.FastaFile('reference.fa') as ref:
    seq = ref.fetch('chr1', 1000, 2000)
    print(seq)

Quick Reference

Tasksamtoolspysam
Create BAI
samtools index file.bam
pysam.index('file.bam')
Create CSI
samtools index -c file.bam
pysam.index('file.bam', csi=True)
Fetch region
samtools view file.bam chr1:1-1000
bam.fetch('chr1', 0, 1000)
Count in region
samtools view -c file.bam chr1:1-1000
bam.count('chr1', 0, 1000)
Index stats
samtools idxstats file.bam
bam.get_index_statistics()
Index FASTA
samtools faidx ref.fa
Automatic with FastaFile

Common Errors

ErrorCauseSolution
random alignment retrieval only works for indexed BAM
Missing indexRun
samtools index file.bam
file is not sorted
Unsorted BAMSort first with
samtools sort
chromosome not found
Wrong chromosome nameCheck names with
samtools view -H

Related Skills

  • sam-bam-basics - View and convert alignment files
  • alignment-sorting - Sort BAM files (required before indexing)
  • alignment-filtering - Filter by regions using index
  • bam-statistics - Use idxstats for quick counts
  • sequence-io/read-sequences - Index FASTA with SeqIO.index_db()
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->