OpenClaw-Medical-Skills bio-alignment-indexing

<!--

install

source · Clone the upstream repo

git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-alignment-indexing" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-indexing && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-alignment-indexing" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-indexing && rm -rf "$T"

manifest: skills/bio-alignment-indexing/SKILL.md

Alignment Indexing

Create indices for random access to alignment files using samtools and pysam.

Index Types

Index	Extension	Use Case
BAI	`.bai`	Standard BAM index, chromosomes < 512 Mbp
CSI	`.csi`	Large chromosomes, custom bin sizes
CRAI	`.crai`	CRAM index

samtools index

Create BAI Index

samtools index input.bam
# Creates input.bam.bai

Create CSI Index

samtools index -c input.bam
# Creates input.bam.csi

Specify Output Name

samtools index input.bam output.bai

Multi-threaded Indexing

samtools index -@ 4 input.bam

Index CRAM

samtools index input.cram
# Creates input.cram.crai

Index Requirements

Indexing requires coordinate-sorted files:

# Check sort order
samtools view -H input.bam | grep "^@HD"
# Should show SO:coordinate

# Sort if needed, then index
samtools sort -o sorted.bam input.bam
samtools index sorted.bam

Using Indices for Region Access

samtools view with Region

# Requires index file present
samtools view input.bam chr1:1000000-2000000

Multiple Regions

samtools view input.bam chr1:1000-2000 chr2:3000-4000

Regions from BED File

samtools view -L regions.bed input.bam

pysam Python Alternative

Create Index

import pysam

pysam.index('input.bam')
# Creates input.bam.bai

Create CSI Index

pysam.index('input.bam', 'input.bam.csi', csi=True)

Fetch with Index

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    # fetch() requires index
    for read in bam.fetch('chr1', 1000000, 2000000):
        print(read.query_name)

Check if Indexed

import pysam
from pathlib import Path

def is_indexed(bam_path):
    bam_path = Path(bam_path)
    return (bam_path.with_suffix('.bam.bai').exists() or
            Path(str(bam_path) + '.bai').exists() or
            bam_path.with_suffix('.bam.csi').exists())

if not is_indexed('input.bam'):
    pysam.index('input.bam')

Fetch Multiple Regions

regions = [('chr1', 1000, 2000), ('chr1', 5000, 6000), ('chr2', 1000, 2000)]

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for chrom, start, end in regions:
        count = sum(1 for _ in bam.fetch(chrom, start, end))
        print(f'{chrom}:{start}-{end}: {count} reads')

Count Reads in Region

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    count = bam.count('chr1', 1000000, 2000000)
    print(f'Reads in region: {count}')

Get Reads Covering Position

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for read in bam.fetch('chr1', 1000000, 1000001):
        if read.reference_start <= 1000000 < read.reference_end:
            print(f'{read.query_name} covers position 1000000')

Index File Locations

samtools looks for indices in two locations:

input.bam.bai   # Standard location
input.bai       # Alternative location

For CRAM:

input.cram.crai

idxstats - Index Statistics

Get Per-Chromosome Counts

samtools idxstats input.bam

Output format:

chr1    248956422    5000000    0
chr2    242193529    4500000    0
*       0            0          10000

Columns: reference name, length, mapped reads, unmapped reads

Sum Total Mapped Reads

samtools idxstats input.bam | awk '{sum += $3} END {print sum}'

pysam idxstats

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for stat in bam.get_index_statistics():
        print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')

FASTA Index (faidx)

Related but different - index reference FASTA for random access:

samtools faidx reference.fa
# Creates reference.fa.fai

# Fetch region from indexed FASTA
samtools faidx reference.fa chr1:1000-2000

pysam FastaFile

with pysam.FastaFile('reference.fa') as ref:
    seq = ref.fetch('chr1', 1000, 2000)
    print(seq)

Quick Reference

Task	samtools	pysam
Create BAI	`samtools index file.bam`	`pysam.index('file.bam')`
Create CSI	`samtools index -c file.bam`	`pysam.index('file.bam', csi=True)`
Fetch region	`samtools view file.bam chr1:1-1000`	`bam.fetch('chr1', 0, 1000)`
Count in region	`samtools view -c file.bam chr1:1-1000`	`bam.count('chr1', 0, 1000)`
Index stats	`samtools idxstats file.bam`	`bam.get_index_statistics()`
Index FASTA	`samtools faidx ref.fa`	Automatic with FastaFile

Common Errors

Error	Cause	Solution
`random alignment retrieval only works for indexed BAM`	Missing index	Run `samtools index file.bam`
`file is not sorted`	Unsorted BAM	Sort first with `samtools sort`
`chromosome not found`	Wrong chromosome name	Check names with `samtools view -H`

Related Skills

sam-bam-basics - View and convert alignment files
alignment-sorting - Sort BAM files (required before indexing)
alignment-filtering - Filter by regions using index
bam-statistics - Use idxstats for quick counts
sequence-io/read-sequences - Index FASTA with SeqIO.index_db()