BioSkills bio-genome-intervals-proximity-operations
Find nearest features, search within windows, and extend intervals using closest, window, flank, and slop operations. Use when performing TSS proximity analysis, assigning enhancers to genes, defining promoter regions, or finding nearby genomic features.
git clone https://github.com/GPTomics/bioSkills
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/genome-intervals/proximity-operations" ~/.claude/skills/gptomics-bioskills-bio-genome-intervals-proximity-operations && rm -rf "$T"
genome-intervals/proximity-operations/SKILL.mdVersion Compatibility
Reference examples tested with: bedtools 2.31+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
thenpip show <package>
to check signatureshelp(module.function) - CLI:
then<tool> --version
to confirm flags<tool> --help
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Proximity Operations
"Find nearest features or extend intervals" → Identify the closest genomic feature to each interval, or expand intervals by a fixed flank size.
- CLI:
,bedtools closest -a peaks.bed -b genes.bedbedtools slop -b 1000 - Python:
,a.closest(b)
(pybedtools)a.slop(b=1000, g=genome)
Operations for finding nearby features and extending intervals using bedtools and pybedtools.
Closest - Find Nearest Feature
CLI
# Find nearest gene to each peak bedtools closest -a peaks.bed -b genes.bed > peaks_with_nearest.bed # Report distance to nearest feature bedtools closest -a peaks.bed -b genes.bed -d > with_distance.bed # Ignore overlapping features (find next nearest) bedtools closest -a peaks.bed -b genes.bed -io > nearest_non_overlap.bed # Ignore features on different strands bedtools closest -a peaks.bed -b genes.bed -s > same_strand.bed # Ignore features on same strand (opposite strand only) bedtools closest -a peaks.bed -b genes.bed -S > opposite_strand.bed # Only upstream features (5' direction relative to A strand) bedtools closest -a peaks.bed -b genes.bed -D a -iu > upstream_only.bed # Only downstream features bedtools closest -a peaks.bed -b genes.bed -D a -id > downstream_only.bed # Report multiple ties bedtools closest -a peaks.bed -b genes.bed -t all > all_ties.bed # First tie only bedtools closest -a peaks.bed -b genes.bed -t first > first_tie.bed
Python
import pybedtools a = pybedtools.BedTool('peaks.bed') b = pybedtools.BedTool('genes.bed') # Basic closest result = a.closest(b) # With distance result = a.closest(b, d=True) # Ignore overlaps result = a.closest(b, io=True) # Same strand only result = a.closest(b, s=True) # Report all ties result = a.closest(b, t='all') result.saveas('closest.bed')
Window - Find Features Within Distance
CLI
# Find genes within 10kb of peaks bedtools window -a peaks.bed -b genes.bed -w 10000 > genes_within_10kb.bed # Asymmetric window (5kb upstream, 2kb downstream of A) bedtools window -a peaks.bed -b genes.bed -l 5000 -r 2000 > asymmetric.bed # Same strand only bedtools window -a peaks.bed -b genes.bed -w 10000 -sm > same_strand.bed # Strand-aware window (upstream/downstream relative to strand) bedtools window -a peaks.bed -b genes.bed -l 5000 -r 2000 -sw > strand_aware.bed
Python
import pybedtools a = pybedtools.BedTool('peaks.bed') b = pybedtools.BedTool('genes.bed') # Symmetric window result = a.window(b, w=10000) # Asymmetric window result = a.window(b, l=5000, r=2000) # Same strand result = a.window(b, w=10000, sm=True) result.saveas('window.bed')
Slop - Extend Interval Boundaries
CLI
# Extend both ends by 100bp (requires genome file) bedtools slop -i peaks.bed -g genome.txt -b 100 > extended.bed # Extend 5' end by 500bp, 3' end by 100bp bedtools slop -i peaks.bed -g genome.txt -l 500 -r 100 > asymmetric.bed # Strand-aware extension (upstream/downstream) bedtools slop -i peaks.bed -g genome.txt -l 500 -r 100 -s > strand_aware.bed # Extend by percentage bedtools slop -i peaks.bed -g genome.txt -b 0.5 -pct > extend_50pct.bed # Header passthrough bedtools slop -i peaks.bed -g genome.txt -b 100 -header > with_header.bed
Python
import pybedtools bed = pybedtools.BedTool('peaks.bed') # Symmetric extension result = bed.slop(g='genome.txt', b=100) # Asymmetric extension result = bed.slop(g='genome.txt', l=500, r=100) # Strand-aware result = bed.slop(g='genome.txt', l=500, r=100, s=True) # Percentage result = bed.slop(g='genome.txt', b=0.5, pct=True) result.saveas('extended.bed')
Flank - Get Flanking Regions
CLI
# Get 100bp flanks on both sides (not original interval) bedtools flank -i peaks.bed -g genome.txt -b 100 > flanks.bed # Get upstream flank only bedtools flank -i peaks.bed -g genome.txt -l 100 -r 0 > upstream.bed # Get downstream flank only bedtools flank -i peaks.bed -g genome.txt -l 0 -r 100 > downstream.bed # Strand-aware flanking bedtools flank -i peaks.bed -g genome.txt -l 500 -r 0 -s > upstream_strand.bed # Percentage of interval size bedtools flank -i peaks.bed -g genome.txt -b 0.5 -pct > flank_50pct.bed
Python
import pybedtools bed = pybedtools.BedTool('peaks.bed') # Both flanks result = bed.flank(g='genome.txt', b=100) # Upstream only (left) result = bed.flank(g='genome.txt', l=100, r=0) # Strand-aware upstream result = bed.flank(g='genome.txt', l=500, r=0, s=True) result.saveas('flanks.bed')
Shift - Move Intervals
CLI
# Shift all intervals downstream by 100bp bedtools shift -i peaks.bed -g genome.txt -s 100 > shifted.bed # Shift upstream (negative) bedtools shift -i peaks.bed -g genome.txt -s -100 > shifted_up.bed # Shift by percentage bedtools shift -i peaks.bed -g genome.txt -s 0.5 -pct > shift_50pct.bed # Shift with chromosome-specific values bedtools shift -i peaks.bed -g genome.txt -s 100 -p 200 > shifted.bed # plus strand +100, minus +200
Python
import pybedtools bed = pybedtools.BedTool('peaks.bed') # Shift downstream result = bed.shift(g='genome.txt', s=100) # Shift upstream result = bed.shift(g='genome.txt', s=-100) result.saveas('shifted.bed')
Common Patterns
Find Peaks Within 10kb of TSS
# Get TSS from genes (assumes BED6+ with strand) awk -v OFS='\t' '{ if ($6 == "+") print $1, $2, $2+1, $4, $5, $6; else print $1, $3-1, $3, $4, $5, $6; }' genes.bed > tss.bed # Find peaks within 10kb of TSS bedtools window -a peaks.bed -b tss.bed -w 10000 > peaks_near_tss.bed
Create Promoter Regions
# 2kb upstream, 500bp downstream of TSS (strand-aware) bedtools flank -i tss.bed -g genome.txt -l 2000 -r 0 -s | \ bedtools slop -i stdin -g genome.txt -l 0 -r 500 -s > promoters.bed # Or simpler with slop from TSS bedtools slop -i tss.bed -g genome.txt -l 2000 -r 500 -s > promoters.bed
Find Nearest Gene Within 100kb
import pybedtools peaks = pybedtools.BedTool('peaks.bed') genes = pybedtools.BedTool('genes.bed') # Find closest gene closest = peaks.closest(genes, d=True) # Filter to within 100kb within_100kb = closest.filter(lambda x: abs(int(x.fields[-1])) <= 100000) within_100kb.saveas('peaks_with_nearby_genes.bed')
Enhancer-Gene Assignment
Goal: Link putative enhancers to potential target genes based on genomic proximity within a defined distance window.
Approach: Use bedtools window with a large symmetric window (e.g., 1Mb) to find all TSS sites near each enhancer, producing an enhancer-gene pair list for downstream regulatory analysis.
import pybedtools enhancers = pybedtools.BedTool('enhancers.bed') tss = pybedtools.BedTool('tss.bed') # Find all genes within 1Mb window assignments = enhancers.window(tss, w=1000000) # Convert to DataFrame for analysis df = assignments.to_dataframe()
Genome File Format
# genome.txt format: chromosome<TAB>size chr1 248956422 chr2 242193529 chr3 198295559 ... # Create from FASTA index cut -f1,2 reference.fa.fai > genome.txt # Download UCSC chromosome sizes wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
Key Parameters
| Operation | Parameter | Description |
|---|---|---|
| closest -d | Distance | Report distance in last column |
| closest -io | Ignore overlap | Skip overlapping features |
| closest -D | Direction | Report signed distance (a/b/ref) |
| window -w | Window | Symmetric window size |
| window -l/-r | Left/Right | Asymmetric window |
| slop -b | Both | Extend both ends |
| slop -s | Strand | Strand-aware extension |
| flank -l/-r | Left/Right | Flank size by side |
Related Skills
- bed-file-basics - BED format fundamentals
- interval-arithmetic - intersect, subtract, merge
- gtf-gff-handling - Extract TSS from annotations
- chip-seq/peak-annotation - Peak annotation