BioSkills bio-clip-seq-clip-peak-calling
Call protein-RNA binding site peaks from CLIP-seq data using CLIPper, PureCLIP, or Piranha. Use when identifying RBP binding sites from aligned CLIP reads.
git clone https://github.com/GPTomics/bioSkills
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/clip-seq/clip-peak-calling" ~/.claude/skills/gptomics-bioskills-bio-clip-seq-clip-peak-calling && rm -rf "$T"
clip-seq/clip-peak-calling/SKILL.mdVersion Compatibility
Reference examples tested with: MACS3 3.0+, bedtools 2.31+, pandas 2.2+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
thenpip show <package>
to check signatureshelp(module.function) - CLI:
then<tool> --version
to confirm flags<tool> --help
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
CLIP-seq Peak Calling
"Call binding sites from my CLIP-seq data" → Identify protein-RNA interaction sites from aligned CLIP reads using specialized peak callers that model crosslink-induced signatures.
- CLI:
(CLIPper)clipper -b aligned.bam -s hg38 -o peaks.bed - CLI:
(PureCLIP)pureclip -i aligned.bam -bai aligned.bam.bai -g genome.fa -o sites.bed
CLIPper (Recommended)
Goal: Identify protein-RNA binding sites from aligned CLIP-seq reads.
Approach: Run CLIPper on deduplicated BAM with species-specific gene annotations to call peaks with FDR control and superlocal background correction.
# Basic peak calling clipper \ -b deduped.bam \ -s hg38 \ -o peaks.bed \ --save-pickle # With FDR threshold clipper \ -b deduped.bam \ -s hg38 \ -o peaks.bed \ --FDR 0.05 \ --superlocal # Specify gene annotations clipper \ -b deduped.bam \ -s hg38 \ --gene genes.bed \ -o peaks.bed
CLIPper Options
| Option | Description |
|---|---|
| -b | Input BAM file |
| -s | Species (hg38, mm10) |
| -o | Output BED file |
| --FDR | FDR threshold (default 0.05) |
| --superlocal | Use superlocal background |
| --gene | Custom gene annotation BED |
| --save-pickle | Save intermediate data |
PureCLIP (HMM-Based)
Goal: Identify single-nucleotide crosslink sites using a hidden Markov model that jointly models enrichment and truncation signals.
Approach: Run PureCLIP with aligned BAM and genome FASTA to produce both single-nucleotide crosslink sites and broader binding regions, optionally using an input control BAM.
PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.
# Installation conda install -c bioconda pureclip # Basic peak calling pureclip \ -i deduped.bam \ -bai deduped.bam.bai \ -g genome.fa \ -o crosslink_sites.bed \ -or binding_regions.bed \ -nt 4 # -nt 4: Number of threads. Adjust based on CPU cores. # -o: Single-nucleotide crosslink sites # -or: Broader binding regions
PureCLIP Options
| Option | Description |
|---|---|
| -i | Input BAM file |
| -bai | BAM index file |
| -g | Reference genome FASTA |
| -o | Crosslink sites output |
| -or | Binding regions output |
| -nt | Number of threads |
| -iv | Interval file to restrict analysis |
| -dm | Min distance for merging |
PureCLIP with Input Control
# With SMInput control BAM pureclip \ -i clip.bam \ -bai clip.bam.bai \ -g genome.fa \ -ibam sminput.bam \ -ibai sminput.bam.bai \ -o crosslinks.bed \ -or regions.bed \ -nt 8 # -ibam/-ibai: Input control BAM for background modeling
PureCLIP Output
# Crosslink sites BED contains: # chr start end name score strand # Score interpretation: # Higher scores = more confident crosslink sites # Filter by score # score>=3: Medium confidence. Use 5+ for high confidence. awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed
PureCLIP for Different CLIP Types
# eCLIP (recommended settings) pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa \ -o sites.bed -or regions.bed -nt 4 -dm 8 # iCLIP (single-nucleotide resolution) pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa \ -o sites.bed -or regions.bed -nt 4 # PAR-CLIP (T-to-C transitions) pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa \ -o sites.bed -or regions.bed -nt 4
Piranha
# Basic usage Piranha -s deduped.bam -o peaks.bed # With p-value threshold Piranha -s deduped.bam -o peaks.bed -p 0.01 # Stranded analysis Piranha -s deduped.bam -o peaks.bed -p 0.01 -u # Zero-truncated negative binomial Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial
PEAKachu (for PAR-CLIP)
# PAR-CLIP specific caller peakachu adaptive \ -c control.bam \ -t treatment.bam \ -r reference.fa \ -o peakachu_peaks.gff
MACS3 for CLIP (Alternative)
# Use narrow peak calling mode macs3 callpeak \ -t deduped.bam \ -f BAM \ -g hs \ -n clip_peaks \ --nomodel \ --extsize 50 \ -q 0.01
Strand-Specific Peak Calling
Goal: Call CLIP-seq peaks separately on each strand for strand-specific binding analysis.
Approach: Split the BAM into plus and minus strand reads using samtools flag filtering, call peaks on each strand independently, and merge.
# Split BAM by strand samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam # Call peaks on each strand clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed # Combine cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed
Filter Peaks
# By score awk '$5 >= 10' peaks.bed > peaks_filtered.bed # By size awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed # By read count (if in name field) awk '$5 >= 5' peaks.bed > peaks_min5reads.bed
Merge Replicates
# Use bedtools to find consensus peaks bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa | \ sort -u > consensus_peaks.bed # Require overlap in N replicates bedtools multiinter -i rep1.bed rep2.bed rep3.bed | \ awk '$4 >= 2' | \ bedtools merge > consensus_peaks.bed
Peak Metrics
import pandas as pd def load_clip_peaks(bed_path): peaks = pd.read_csv(bed_path, sep='\t', header=None, names=['chrom', 'start', 'end', 'name', 'score', 'strand']) return peaks def peak_stats(peaks): stats = { 'n_peaks': len(peaks), 'mean_width': (peaks['end'] - peaks['start']).mean(), 'median_score': peaks['score'].median(), 'peaks_per_chrom': peaks.groupby('chrom').size().to_dict() } return stats peaks = load_clip_peaks('peaks.bed') print(peak_stats(peaks))
Quality Metrics
| Metric | Good Value | Description |
|---|---|---|
| Peak count | 1,000-50,000 | Depends on RBP |
| Peak width | 20-100 nt | Typical for RBP footprint |
| FRiP | >0.1 | Fraction reads in peaks |
Calculate FRiP
# Reads in peaks reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -) # Total reads total_reads=$(samtools view -c deduped.bam) # FRiP frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc) echo "FRiP: $frip"
Related Skills
- clip-alignment - Generate aligned BAM
- clip-preprocessing - UMI deduplication
- binding-site-annotation - Annotate peaks with gene features
- clip-motif-analysis - Find enriched motifs in peaks