LLMs-Universal-Life-Science-and-Clinical-Skills- genomics-epigenomics
install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Genomics/genomics-epigenomics" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-genomics-epigenomi && rm -rf "$T"
manifest:
Skills/Genomics/genomics-epigenomics/SKILL.mdsource content
🧬 Epigenomics Analysis
Peak calling and chromatin accessibility analysis for ATAC-seq and ChIP-seq data.
Core Capabilities
- MACS3 peak calling: ATAC-specific and ChIP-seq peak detection
- Motif analysis: Homer, MEME motif enrichment in open chromatin regions
- Quality control: Fragment size distribution, TSS enrichment, FRiP scores
- Differential accessibility: Condition comparison of chromatin regions
- IDR replicate analysis: Irreproducible Discovery Rate for peak reproducibility
CLI Reference
python omicsclaw.py run epigenomics --demo python omicsclaw.py run epigenomics --input <data.bam> --output <dir>
Algorithm / Methodology
MACS3 for ATAC-seq
Goal: Identify open chromatin regions from ATAC-seq data using ATAC-specific parameters.
Approach: Run MACS3 in paired-end mode with Tn5 shift correction, no model building, and duplicate retention.
macs3 callpeak \ -t sample.bam \ -f BAMPE \ -g hs \ -n sample \ --outdir peaks/ \ -q 0.05 \ --nomodel \ --shift -75 \ --extsize 150 \ --keep-dup all \ -B \ --call-summits
Why These Parameters?
| Parameter | Reason |
|---|---|
| ATAC doesn't have control, can't build model |
| Centers on Tn5 insertion site |
| Smooths signal around cut sites |
| Tn5 creates duplicate cuts at accessible sites |
| Uses actual fragment size from paired-end |
Call Peaks on NFR Only
# Filter to nucleosome-free reads (<100bp fragments) samtools view -h sample.bam | \ awk 'substr($0,1,1)=="@" || ($9>0 && $9<100) || ($9<0 && $9>-100)' | \ samtools view -b > nfr.bam # Call peaks on NFR macs3 callpeak -t nfr.bam -f BAMPE -g hs -n sample_nfr \ --nomodel --shift -37 --extsize 75 --keep-dup all -q 0.01
Broad Peaks (Optional)
macs3 callpeak -t sample.bam -f BAMPE -g hs -n sample_broad \ --nomodel --shift -75 --extsize 150 --broad --broad-cutoff 0.1
Batch Processing
#!/bin/bash GENOME=hs # hs for human, mm for mouse OUTDIR=peaks mkdir -p $OUTDIR for bam in *.bam; do sample=$(basename $bam .bam) macs3 callpeak -t $bam -f BAMPE -g $GENOME -n $sample --outdir $OUTDIR \ --nomodel --shift -75 --extsize 150 --keep-dup all -q 0.05 -B --call-summits done
IDR for Replicate Consistency
# Call peaks on each replicate separately macs3 callpeak -t rep1.bam -f BAMPE -g hs -n rep1 ... macs3 callpeak -t rep2.bam -f BAMPE -g hs -n rep2 ... # Run IDR idr --samples rep1_peaks.narrowPeak rep2_peaks.narrowPeak \ --input-file-type narrowPeak --output-file idr_peaks.txt --plot # Filter by IDR threshold awk '$5 >= 540' idr_peaks.txt > reproducible_peaks.bed
Convert to BigWig
sort -k1,1 -k2,2n sample_treat_pileup.bdg > sample.sorted.bdg bedGraphToBigWig sample.sorted.bdg chrom.sizes sample.bw
Output Files
| File | Description |
|---|---|
| Peak locations (BED-like) |
| Peak summit positions |
| Peak statistics |
| Signal track (bedGraph) |
narrowPeak Format
Columns: chrom, start, end, name, score, strand, signalValue, pValue, qValue, summit_offset
Parameters
| Parameter | Default | Description |
|---|---|---|
| | macs3, homer, genrich |
| | atac, chipseq |
| | hs (human), mm (mouse) |
| | FDR threshold |
Why This Exists
- Without it: Open chromatin peaks are modeled with poor shifting logic or non-reproducible parameters
- With it: ATAC-specific logic correctly infers true transposase insertions vs nucleosomes
- Why OmicsClaw: Exposes high-level biology queries directly mapped to strictly modeled peak calling routines
Workflow
- Calculate: Prepare cross-correlation and fragment histograms.
- Execute: Test local enrichment over poisson distribution null.
- Assess: Perform IDR replicates and FDR significance tests.
- Generate: Output structured summit maps and bedGraph.
- Report: Tabulate core peaks and TF motifs.
Example Queries
- "Run ATAC peak calling with MACS3 on this bam file"
- "Perform motif analysis on epigenomics data BED"
Output Structure
output_directory/ ├── report.md ├── result.json ├── peaks.narrowPeak ├── figures/ │ └── footprint_plot.png ├── tables/ │ └── motif_enrichment.csv └── reproducibility/ ├── commands.sh ├── environment.yml └── checksums.sha256
Safety
- Local-first: Strict offline processing without external upload.
- Disclaimer: Requires OmicsClaw reporting structures and disclaimers.
- Audit trail: Hyperparameters and operational flow states are logged fully.
Integration with Orchestrator
Trigger conditions:
- Automatically invoked dynamically based on tool metadata and user intent matching.
Chaining partners:
— Upstream generation of BAMalign
— Upstream filteringgenomics-qc
Version Compatibility
Reference examples tested with: MACS3 3.0+, samtools 1.19+
Dependencies
Required: MACS3 (macs3), samtools Optional: Homer, Genrich, IDR, bedtools, pyGenomeTracks
Citations
- MACS3 — Zhang et al., Genome Biology 2008
- Homer — Heinz et al., Molecular Cell 2010
- IDR — Li et al., Annals of Applied Statistics 2011
- ATAC-seq — Buenrostro et al., Nature Methods 2013
Related Skills
— QC before peak callinggenomics-qc
— Read alignment upstreamalign