install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-alignment-indexing" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-indexing && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-alignment-indexing" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-alignment-indexing && rm -rf "$T"
manifest:
skills/bio-alignment-indexing/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-alignment-indexing description: Create and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions. tool_type: cli primary_tool: samtools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Alignment Indexing
Create indices for random access to alignment files using samtools and pysam.
Index Types
| Index | Extension | Use Case |
|---|---|---|
| BAI | | Standard BAM index, chromosomes < 512 Mbp |
| CSI | | Large chromosomes, custom bin sizes |
| CRAI | | CRAM index |
samtools index
Create BAI Index
samtools index input.bam # Creates input.bam.bai
Create CSI Index
samtools index -c input.bam # Creates input.bam.csi
Specify Output Name
samtools index input.bam output.bai
Multi-threaded Indexing
samtools index -@ 4 input.bam
Index CRAM
samtools index input.cram # Creates input.cram.crai
Index Requirements
Indexing requires coordinate-sorted files:
# Check sort order samtools view -H input.bam | grep "^@HD" # Should show SO:coordinate # Sort if needed, then index samtools sort -o sorted.bam input.bam samtools index sorted.bam
Using Indices for Region Access
samtools view with Region
# Requires index file present samtools view input.bam chr1:1000000-2000000
Multiple Regions
samtools view input.bam chr1:1000-2000 chr2:3000-4000
Regions from BED File
samtools view -L regions.bed input.bam
pysam Python Alternative
Create Index
import pysam pysam.index('input.bam') # Creates input.bam.bai
Create CSI Index
pysam.index('input.bam', 'input.bam.csi', csi=True)
Fetch with Index
with pysam.AlignmentFile('input.bam', 'rb') as bam: # fetch() requires index for read in bam.fetch('chr1', 1000000, 2000000): print(read.query_name)
Check if Indexed
import pysam from pathlib import Path def is_indexed(bam_path): bam_path = Path(bam_path) return (bam_path.with_suffix('.bam.bai').exists() or Path(str(bam_path) + '.bai').exists() or bam_path.with_suffix('.bam.csi').exists()) if not is_indexed('input.bam'): pysam.index('input.bam')
Fetch Multiple Regions
regions = [('chr1', 1000, 2000), ('chr1', 5000, 6000), ('chr2', 1000, 2000)] with pysam.AlignmentFile('input.bam', 'rb') as bam: for chrom, start, end in regions: count = sum(1 for _ in bam.fetch(chrom, start, end)) print(f'{chrom}:{start}-{end}: {count} reads')
Count Reads in Region
with pysam.AlignmentFile('input.bam', 'rb') as bam: count = bam.count('chr1', 1000000, 2000000) print(f'Reads in region: {count}')
Get Reads Covering Position
with pysam.AlignmentFile('input.bam', 'rb') as bam: for read in bam.fetch('chr1', 1000000, 1000001): if read.reference_start <= 1000000 < read.reference_end: print(f'{read.query_name} covers position 1000000')
Index File Locations
samtools looks for indices in two locations:
input.bam.bai # Standard location input.bai # Alternative location
For CRAM:
input.cram.crai
idxstats - Index Statistics
Get Per-Chromosome Counts
samtools idxstats input.bam
Output format:
chr1 248956422 5000000 0 chr2 242193529 4500000 0 * 0 0 10000
Columns: reference name, length, mapped reads, unmapped reads
Sum Total Mapped Reads
samtools idxstats input.bam | awk '{sum += $3} END {print sum}'
pysam idxstats
with pysam.AlignmentFile('input.bam', 'rb') as bam: for stat in bam.get_index_statistics(): print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')
FASTA Index (faidx)
Related but different - index reference FASTA for random access:
samtools faidx reference.fa # Creates reference.fa.fai # Fetch region from indexed FASTA samtools faidx reference.fa chr1:1000-2000
pysam FastaFile
with pysam.FastaFile('reference.fa') as ref: seq = ref.fetch('chr1', 1000, 2000) print(seq)
Quick Reference
| Task | samtools | pysam |
|---|---|---|
| Create BAI | | |
| Create CSI | | |
| Fetch region | | |
| Count in region | | |
| Index stats | | |
| Index FASTA | | Automatic with FastaFile |
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Missing index | Run |
| Unsorted BAM | Sort first with |
| Wrong chromosome name | Check names with |
Related Skills
- sam-bam-basics - View and convert alignment files
- alignment-sorting - Sort BAM files (required before indexing)
- alignment-filtering - Filter by regions using index
- bam-statistics - Use idxstats for quick counts
- sequence-io/read-sequences - Index FASTA with SeqIO.index_db()