BioSkills bio-bedgraph-handling
Create, manipulate, and convert bedGraph files for genome browser visualization. Covers bedGraph format, conversion to/from bigWig, normalization, and signal processing. Use when handling coverage and signal tracks from ChIP-seq, ATAC-seq, or RNA-seq.
git clone https://github.com/GPTomics/bioSkills
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/genome-intervals/bedgraph-handling" ~/.claude/skills/gptomics-bioskills-bio-bedgraph-handling && rm -rf "$T"
genome-intervals/bedgraph-handling/SKILL.mdVersion Compatibility
Reference examples tested with: bedtools 2.31+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
thenpip show <package>
to check signatureshelp(module.function) - CLI:
then<tool> --version
to confirm flags<tool> --help
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
bedGraph Handling
"Work with bedGraph signal tracks" → Create, manipulate, and convert bedGraph files for displaying coverage or signal intensity on genome browsers.
- CLI:
to generate,bedtools genomecov -bg
to convertbedGraphToBigWig - Python:
,pyBigWigpybedtools
bedGraph is a text format for displaying continuous-valued data on genome browsers. Common for coverage, signal intensity, and scores.
bedGraph Format
track type=bedGraph name="Sample" description="Coverage" chr1 0 100 1.5 chr1 100 200 2.3 chr1 200 300 0.8
Four columns: chrom, start, end, value (0-based, half-open)
Create bedGraph from BAM
Using bedtools genomecov
bedtools genomecov -ibam sample.bam -bg > sample.bedgraph bedtools genomecov -ibam sample.bam -bg -split > sample.bedgraph bedtools genomecov -ibam sample.bam -bg -scale 1.5 > sample.scaled.bedgraph
Strand-Specific
bedtools genomecov -ibam sample.bam -bg -strand + > sample.plus.bedgraph bedtools genomecov -ibam sample.bam -bg -strand - > sample.minus.bedgraph
5' End Coverage (ChIP-seq)
bedtools genomecov -ibam sample.bam -bg -5 > sample.5prime.bedgraph
Normalize by Library Size (CPM)
total_reads=$(samtools view -c -F 260 sample.bam) scale=$(echo "scale=10; 1000000 / $total_reads" | bc) bedtools genomecov -ibam sample.bam -bg -scale $scale > sample.cpm.bedgraph
Sort bedGraph
bedGraph must be sorted for conversion to bigWig.
sort -k1,1 -k2,2n sample.bedgraph > sample.sorted.bedgraph LC_ALL=C sort -k1,1 -k2,2n sample.bedgraph > sample.sorted.bedgraph
Convert bedGraph to bigWig
Using UCSC bedGraphToBigWig
bedGraphToBigWig sample.sorted.bedgraph chrom.sizes sample.bw fetchChromSizes hg38 > hg38.chrom.sizes bedGraphToBigWig sample.sorted.bedgraph hg38.chrom.sizes sample.bw
Generate chrom.sizes
samtools faidx reference.fa cut -f1,2 reference.fa.fai > chrom.sizes fetchChromSizes hg38 > hg38.chrom.sizes mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -e \ "select chrom, size from hg38.chromInfo" > hg38.chrom.sizes
Clip to Chromosome Boundaries
bedClip sample.bedgraph chrom.sizes sample.clipped.bedgraph bedGraphToBigWig sample.clipped.bedgraph chrom.sizes sample.bw
Convert bigWig to bedGraph
bigWigToBedGraph sample.bw sample.bedgraph bigWigToBedGraph sample.bw sample.chr1.bedgraph -chrom=chr1 bigWigToBedGraph sample.bw sample.region.bedgraph -chrom=chr1 -start=1000 -end=2000
Merge bedGraph Files
Using bedtools unionbedg
bedtools unionbedg -i sample1.bedgraph sample2.bedgraph sample3.bedgraph \ -header -names sample1 sample2 sample3 > merged.bedgraph
Average Across Samples
bedtools unionbedg -i sample1.bedgraph sample2.bedgraph sample3.bedgraph | \ awk '{sum=0; for(i=4;i<=NF;i++) sum+=$i; print $1,$2,$3,sum/(NF-3)}' OFS='\t' \ > average.bedgraph
Mathematical Operations
bedtools map for Region Statistics
bedtools map -a regions.bed -b sample.bedgraph -c 4 -o mean > region_means.bed bedtools map -a regions.bed -b sample.bedgraph -c 4 -o sum > region_sums.bed bedtools map -a regions.bed -b sample.bedgraph -c 4 -o max > region_max.bed
Subtract Background
bedtools unionbedg -i treatment.bedgraph input.bedgraph | \ awk '{diff=$4-$5; if(diff<0) diff=0; print $1,$2,$3,diff}' OFS='\t' \ > subtracted.bedgraph
Log Transform
awk '{print $1,$2,$3,log($4+1)/log(2)}' OFS='\t' sample.bedgraph > sample.log2.bedgraph
Smooth Signal
bedtools slop -i sample.bedgraph -g chrom.sizes -b 50 | \ bedtools merge -i - -c 4 -o mean > smoothed.bedgraph
Python with pyBigWig
Write bedGraph
import pyBigWig bw = pyBigWig.open('output.bedgraph', 'w') bw.addHeader([('chr1', 248956422), ('chr2', 242193529)]) chroms = ['chr1', 'chr1', 'chr1'] starts = [0, 100, 200] ends = [100, 200, 300] values = [1.5, 2.3, 0.8] bw.addEntries(chroms, starts, ends=ends, values=values) bw.close()
Read bigWig to bedGraph Format
import pyBigWig bw = pyBigWig.open('sample.bw') for chrom, size in bw.chroms().items(): intervals = bw.intervals(chrom) if intervals: for start, end, value in intervals: print(f'{chrom}\t{start}\t{end}\t{value}') bw.close()
Convert bigWig Region to bedGraph
import pyBigWig bw = pyBigWig.open('sample.bw') intervals = bw.intervals('chr1', 1000000, 2000000) with open('region.bedgraph', 'w') as f: for start, end, value in intervals: f.write(f'chr1\t{start}\t{end}\t{value}\n') bw.close()
deepTools for Normalization
bamCoverage (BAM to bedGraph/bigWig)
bamCoverage -b sample.bam -o sample.bw --normalizeUsing RPKM bamCoverage -b sample.bam -o sample.bw --normalizeUsing CPM bamCoverage -b sample.bam -o sample.bw --normalizeUsing BPM bamCoverage -b sample.bam -o sample.bedgraph --outFileFormat bedgraph --normalizeUsing CPM
bamCompare (Treatment vs Control)
bamCompare -b1 treatment.bam -b2 input.bam -o log2ratio.bw --scaleFactorsMethod readCount bamCompare -b1 treatment.bam -b2 input.bam -o subtracted.bw --ratio subtract
bigwigCompare
bigwigCompare -b1 treatment.bw -b2 input.bw -o ratio.bw --ratio log2 bigwigCompare -b1 sample1.bw -b2 sample2.bw -o diff.bw --ratio subtract
Filtering and Subsetting
Filter by Value
awk '$4 >= 1.0' sample.bedgraph > high_signal.bedgraph awk '$4 > 0' sample.bedgraph > nonzero.bedgraph
Extract Regions
bedtools intersect -a sample.bedgraph -b regions.bed > subset.bedgraph
Remove Specific Chromosomes
grep -v "^chrM" sample.bedgraph | grep -v "_random" > filtered.bedgraph awk '$1 ~ /^chr[0-9XY]+$/' sample.bedgraph > standard_chroms.bedgraph
Aggregate to Bins
Fixed-Size Bins
bedtools makewindows -g chrom.sizes -w 1000 > bins.bed bedtools map -a bins.bed -b sample.bedgraph -c 4 -o mean > binned.bedgraph
Gene Bodies
bedtools map -a genes.bed -b sample.bedgraph -c 4 -o mean > gene_signal.bed
Quality Control
Check for Overlapping Intervals
bedtools merge -i sample.bedgraph -c 4 -o collapse | \ awk 'index($4,",") > 0' | head
Verify Sorted Order
sort -c -k1,1 -k2,2n sample.bedgraph && echo "Sorted" || echo "Not sorted"
Check Value Range
awk 'NR==1 {min=$4; max=$4} {if($4<min) min=$4; if($4>max) max=$4} END {print "Min:", min, "Max:", max}' sample.bedgraph
Complete Pipeline
Goal: Generate a CPM-normalized bigWig track from a BAM file in a single automated workflow.
Approach: Count total mapped reads with samtools, compute a CPM scale factor, generate a scaled bedGraph with bedtools genomecov, sort and clip to chromosome boundaries, then convert to bigWig format.
#!/bin/bash BAM=$1 NAME=$(basename $BAM .bam) CHROM_SIZES=$2 total_reads=$(samtools view -c -F 260 $BAM) scale=$(echo "scale=10; 1000000 / $total_reads" | bc) bedtools genomecov -ibam $BAM -bg -scale $scale > ${NAME}.bedgraph sort -k1,1 -k2,2n ${NAME}.bedgraph > ${NAME}.sorted.bedgraph bedClip ${NAME}.sorted.bedgraph $CHROM_SIZES ${NAME}.clipped.bedgraph bedGraphToBigWig ${NAME}.clipped.bedgraph $CHROM_SIZES ${NAME}.bw rm ${NAME}.bedgraph ${NAME}.sorted.bedgraph ${NAME}.clipped.bedgraph echo "Created ${NAME}.bw (CPM normalized)"
Track Header for UCSC
echo 'track type=bedGraph name="Sample" description="CPM normalized" visibility=full color=0,0,255 altColor=255,0,0 autoScale=on graphType=bar' > track.bedgraph cat sample.bedgraph >> track.bedgraph
Related Skills
- coverage-analysis - Generate coverage from alignments
- bigwig-tracks - Work with bigWig format
- chip-seq/chipseq-visualization - Visualize signal tracks
- alignment-files/sam-bam-basics - BAM file processing