install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-pileup-generation" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-pileup-generation && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-pileup-generation" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-pileup-generation && rm -rf "$T"
manifest:
skills/bio-pileup-generation/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-pileup-generation description: Generate pileup data for variant calling using samtools mpileup and pysam. Use when preparing data for variant calling, analyzing per-position read data, or calculating allele frequencies. tool_type: cli primary_tool: samtools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Pileup Generation
Generate pileup data for variant calling and position-level analysis.
What is Pileup?
Pileup shows all reads covering each position in the reference, used for:
- Variant calling (with bcftools)
- Coverage analysis
- Allele frequency calculation
- SNP/indel detection
samtools mpileup
Basic Pileup
samtools mpileup -f reference.fa input.bam > pileup.txt
Pileup for Variant Calling (Output BCF)
samtools mpileup -f reference.fa -g input.bam -o output.bcf
Pileup Specific Region
samtools mpileup -f reference.fa -r chr1:1000000-2000000 input.bam
Regions from BED
samtools mpileup -f reference.fa -l targets.bed input.bam
Multiple BAM Files
samtools mpileup -f reference.fa sample1.bam sample2.bam sample3.bam > pileup.txt
Output Format
Text pileup format (6 columns per sample):
chr1 1000 A 15 ............... FFFFFFFFFFF chr1 1001 T 12 ............ FFFFFFFFFFFF
| Column | Description |
|---|---|
| 1 | Chromosome |
| 2 | Position (1-based) |
| 3 | Reference base |
| 4 | Read depth |
| 5 | Read bases |
| 6 | Base qualities |
Read Bases Encoding
| Symbol | Meaning |
|---|---|
| Match on forward strand |
| Match on reverse strand |
| Mismatch (uppercase = forward) |
| Mismatch (lowercase = reverse) |
| Start of read (Q = MAPQ as ASCII) |
| End of read |
| Insertion of N bases |
| Deletion of N bases |
| Deleted base |
/ | Reference skip (intron) |
Quality Filtering Options
Minimum Mapping Quality
samtools mpileup -f reference.fa -q 20 input.bam
Minimum Base Quality
samtools mpileup -f reference.fa -Q 20 input.bam
Combined Quality Filters
samtools mpileup -f reference.fa -q 20 -Q 20 input.bam
Maximum Depth
# Prevent memory issues with high coverage samtools mpileup -f reference.fa -d 1000 input.bam
Variant Calling Pipeline
mpileup to bcftools call
samtools mpileup -f reference.fa input.bam | bcftools call -mv -o variants.vcf
Direct BCF Output
samtools mpileup -f reference.fa -g -o output.bcf input.bam bcftools call -mv output.bcf -o variants.vcf
Full Pipeline
samtools mpileup -f reference.fa -q 20 -Q 20 input.bam | \ bcftools call -mv -Oz -o variants.vcf.gz bcftools index variants.vcf.gz
pysam Python Alternative
Basic Pileup
import pysam with pysam.AlignmentFile('input.bam', 'rb') as bam: for pileup_column in bam.pileup('chr1', 1000000, 1001000): print(f'{pileup_column.reference_name}:{pileup_column.pos} depth={pileup_column.n}')
Access Reads at Position
import pysam with pysam.AlignmentFile('input.bam', 'rb') as bam: for pileup_column in bam.pileup('chr1', 1000000, 1000001, truncate=True): print(f'Position: {pileup_column.pos}') print(f'Depth: {pileup_column.n}') for pileup_read in pileup_column.pileups: if pileup_read.is_del: print(' Deletion') elif pileup_read.is_refskip: print(' Reference skip') else: qpos = pileup_read.query_position base = pileup_read.alignment.query_sequence[qpos] qual = pileup_read.alignment.query_qualities[qpos] print(f' {base} (Q{qual})')
Count Alleles at Position
import pysam from collections import Counter def allele_counts(bam_path, chrom, pos): counts = Counter() with pysam.AlignmentFile(bam_path, 'rb') as bam: for pileup_column in bam.pileup(chrom, pos, pos + 1, truncate=True): if pileup_column.pos != pos: continue for pileup_read in pileup_column.pileups: if pileup_read.is_del: counts['DEL'] += 1 elif pileup_read.is_refskip: continue else: qpos = pileup_read.query_position base = pileup_read.alignment.query_sequence[qpos] counts[base.upper()] += 1 return dict(counts) counts = allele_counts('input.bam', 'chr1', 1000000) print(counts) # {'A': 45, 'G': 5}
Calculate Allele Frequency
import pysam from collections import Counter def allele_frequency(bam_path, chrom, pos, min_qual=20): counts = Counter() with pysam.AlignmentFile(bam_path, 'rb') as bam: for pileup_column in bam.pileup(chrom, pos, pos + 1, truncate=True, min_base_quality=min_qual): if pileup_column.pos != pos: continue for pileup_read in pileup_column.pileups: if pileup_read.is_del or pileup_read.is_refskip: continue qpos = pileup_read.query_position base = pileup_read.alignment.query_sequence[qpos] counts[base.upper()] += 1 total = sum(counts.values()) if total == 0: return {} return {base: count / total for base, count in counts.items()} freq = allele_frequency('input.bam', 'chr1', 1000000) for base, f in sorted(freq.items(), key=lambda x: -x[1]): print(f'{base}: {f:.1%}')
Pileup with Quality Filtering
import pysam with pysam.AlignmentFile('input.bam', 'rb') as bam: for pileup_column in bam.pileup('chr1', 1000000, 1001000, truncate=True, min_mapping_quality=20, min_base_quality=20): print(f'{pileup_column.pos}: {pileup_column.n}')
Generate Pileup Text
import pysam def pileup_text(bam_path, ref_path, chrom, start, end): with pysam.AlignmentFile(bam_path, 'rb') as bam: with pysam.FastaFile(ref_path) as ref: for pileup_column in bam.pileup(chrom, start, end, truncate=True): pos = pileup_column.pos ref_base = ref.fetch(chrom, pos, pos + 1) depth = pileup_column.n bases = [] for pileup_read in pileup_column.pileups: if pileup_read.is_del: bases.append('*') elif pileup_read.is_refskip: bases.append('>') else: qpos = pileup_read.query_position base = pileup_read.alignment.query_sequence[qpos] if base.upper() == ref_base.upper(): bases.append('.' if not pileup_read.alignment.is_reverse else ',') else: bases.append(base.upper() if not pileup_read.alignment.is_reverse else base.lower()) print(f'{chrom}\t{pos+1}\t{ref_base}\t{depth}\t{"".join(bases)}') pileup_text('input.bam', 'reference.fa', 'chr1', 1000000, 1000100)
Pileup Options Summary
| Option | Description |
|---|---|
| Reference FASTA (required) |
| Restrict to region |
| BED file of regions |
| Min mapping quality |
| Min base quality |
| Max depth (default 8000) |
| Output BCF format |
| Uncompressed BCF output |
Quick Reference
| Task | Command |
|---|---|
| Basic pileup | |
| Quality filter | |
| Region | |
| BCF output | |
| To bcftools | |
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Missing -f option | Add |
| Wrong reference | Use same reference as alignment |
| Out of memory | High coverage region | Use to cap depth |
Related Skills
- alignment-filtering - Filter BAM before pileup
- reference-operations - Index reference for pileup
- bam-statistics - depth command for coverage
- variant-calling - bcftools call from pileup