SciAgent-Skills homer-motif-analysis

De novo motif discovery and known TF motif enrichment in ChIP-seq and ATAC-seq peak sets using HOMER. findMotifsGenome.pl searches for over-represented sequence patterns against a background; annotatePeaks.pl assigns genomic context (TSS distance, gene, repeat). Use after MACS3 peak calling to identify which transcription factors are enriched in your peaks, annotate peaks with nearest genes, and validate ChIP-seq quality by checking the target TF's own motif.

install
source · Clone the upstream repo
git clone https://github.com/jaechang-hits/SciAgent-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/genomics-bioinformatics/homer-motif-analysis" ~/.claude/skills/jaechang-hits-sciagent-skills-homer-motif-analysis && rm -rf "$T"
manifest: skills/genomics-bioinformatics/homer-motif-analysis/SKILL.md
source content

HOMER — Motif Analysis and Peak Annotation

Overview

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of Perl/C++ tools for analyzing genomic regulatory elements. Its two primary commands are

findMotifsGenome.pl
, which performs de novo motif discovery and known motif enrichment against JASPAR/HOMER databases, and
annotatePeaks.pl
, which maps each peak to the nearest gene, distance to TSS, and genomic feature class (promoter, intron, intergenic, repeat). HOMER takes BED-format peak files from MACS3 or similar peak callers and a reference genome assembly as input, and outputs HTML/text reports ranking enriched motifs by p-value and fold enrichment over a matched background.

When to Use

  • Identifying which transcription factors are bound in a ChIP-seq peak set by enriching their known motifs from JASPAR or the HOMER motif library
  • Discovering novel sequence motifs de novo in open chromatin regions from ATAC-seq without prior knowledge of the binding TF
  • Comparing motif landscapes between two conditions (e.g., treated vs. untreated peak sets) by running HOMER with one set as target and the other as background
  • Annotating genomic peaks with nearest genes and distance to TSS for downstream functional analysis or integration with DESeq2 results
  • Validating ChIP-seq experiment quality: a successful pull-down should show the target TF's canonical motif as the top hit
  • Use
    macs3-peak-calling
    first to generate the peak BED files that serve as input to HOMER
  • Use
    jaspar-database
    to cross-reference HOMER-discovered motifs with JASPAR IDs and additional TF metadata
  • Use
    MEME-CHIP
    (web or local) when you need a more probabilistic ZOOPS/TCM model or the MEME Suite ecosystem
  • Use
    AME
    (part of MEME Suite) as a faster alternative for known motif scanning without de novo discovery

Prerequisites

  • Software: HOMER (Perl + compiled binaries), conda or manual install
  • Genomes: must download genome sequence via
    installGenome.pl
    after HOMER install
  • Input: BED file of peaks (at minimum: chr, start, end columns); ideally summit-centered peaks from MACS3
  • Python packages (for parsing/visualization):
    pandas
    ,
    matplotlib
    ,
    seaborn

Check before installing: The tool may already be available in the current environment (e.g., inside a

pixi
/
conda
env). Run
command -v findMotifsGenome.pl
first and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool via
pixi run findMotifsGenome.pl
rather than bare
findMotifsGenome.pl
.

# Install HOMER via conda (recommended — handles Perl dependencies)
conda install -c bioconda homer

# Verify installation
findMotifsGenome.pl 2>&1 | head -3
# Usage: findMotifsGenome.pl <peak/BED file> <genome> <output directory> [options]

annotatePeaks.pl 2>&1 | head -3
# Usage: annotatePeaks.pl <peak/BED file> <genome> [options]

# Install reference genomes (downloads 2-way masker + sequence; ~3–10 GB each)
installGenome.pl hg38
installGenome.pl mm10

# Install Python parsing dependencies
pip install pandas matplotlib seaborn

Quick Start

# Run de novo + known motif enrichment on TF ChIP-seq peaks (hg38, 200 bp window)
findMotifsGenome.pl peaks/tf_chip_summits.bed hg38 motif_output/ \
    -size 200 -mask -p 4

# Annotate peaks with nearest genes and genomic features
annotatePeaks.pl peaks/tf_chip_peaks.narrowPeak hg38 > annotated_peaks.txt

echo "Top known motif:"
head -2 motif_output/knownResults.txt | tail -1 | cut -f1-4

echo "Annotated peaks: $(wc -l < annotated_peaks.txt) lines"

Workflow

Step 1: Installation and Genome Setup

Install HOMER and download the reference genome sequence required for motif analysis.

# Activate conda environment (or use existing env)
conda create -n homer_env -c bioconda homer python=3.10 -y
conda activate homer_env

# List available genomes
installGenome.pl list

# Install human (hg38) and mouse (mm10) genomes
# Downloads masked genome sequence and annotation files
installGenome.pl hg38
# Output: Installing hg38... Done. (3-5 min, ~3 GB)

installGenome.pl mm10
# Output: Installing mm10... Done. (3-5 min, ~2.5 GB)

# Verify genome is installed
ls ~/.homer/data/genomes/hg38/
# genome.fa  chrom.sizes  ...

# Check HOMER motif database
ls ~/.homer/data/knownTFs/
# vertebrates.motifs  jaspar.motifs  ...

Step 2: Prepare Input Peak File

Prepare a summit-centered BED file from MACS3 output for optimal motif resolution.

# Option A: Use MACS3 summit file directly (already 1 bp summit positions)
# Expand summits to ±100 bp (200 bp total) centered on summit
awk 'BEGIN{OFS="\t"} {
    start = ($2 - 100 < 0) ? 0 : $2 - 100;
    print $1, start, $2 + 100, $4, $5
}' peaks/tf_chip_summits.bed > peaks/tf_chip_200bp.bed

echo "Summit-centered peaks: $(wc -l < peaks/tf_chip_200bp.bed)"
# Summit-centered peaks: 12453

# Option B: Use narrowPeak file directly (HOMER accepts multi-column BED)
# HOMER uses columns 1-3 (chr, start, end) and centers internally with -size
cp peaks/tf_chip_peaks.narrowPeak peaks/input_peaks.bed

# Option C: Prepare a custom background region file (matched GC content)
# HOMER auto-generates background if not provided, but explicit background
# is recommended when comparing two peak sets
# Use the control peak set or random genomic regions as background:
bedtools shuffle -i peaks/tf_chip_peaks.narrowPeak \
    -g ~/.homer/data/genomes/hg38/chrom.sizes \
    -excl peaks/tf_chip_peaks.narrowPeak > peaks/background_regions.bed

echo "Background regions: $(wc -l < peaks/background_regions.bed)"
# Background regions: 12453

Step 3: De Novo Motif Discovery

Run

findMotifsGenome.pl
for de novo motif discovery and known motif enrichment simultaneously.

mkdir -p motif_output/

# Full run: de novo + known motif enrichment
# -size 200: use 200 bp window centered on peak midpoint
# -mask: mask repetitive elements (recommended for clean motifs)
# -p 4: use 4 CPU threads
# -S 25: find top 25 de novo motifs (default)
findMotifsGenome.pl peaks/tf_chip_200bp.bed hg38 motif_output/ \
    -size 200 \
    -mask \
    -p 4 \
    -S 25

# Check progress output:
# Reading genome sizes for hg38 ...
# Scanning for motifs...
# Optimizing 25 motifs...
# Done! Output in motif_output/

echo "Known results: $(wc -l < motif_output/knownResults.txt) motifs"
echo "De novo motifs: $(ls motif_output/homerResults/*.motif 2>/dev/null | wc -l) motifs"
# Known results: 392 motifs
# De novo motifs: 25 motifs

# For mouse peaks (mm10)
# findMotifsGenome.pl peaks/atac_peaks.bed mm10 motif_output_mm10/ \
#     -size 200 -mask -p 4

Step 4: Known Motif Enrichment Only

Scan peaks for occurrences of a specific known motif or skip de novo discovery for speed.

# Skip de novo discovery (faster when you only need known motifs)
findMotifsGenome.pl peaks/tf_chip_200bp.bed hg38 motif_output_known/ \
    -size 200 \
    -mask \
    -p 4 \
    -nomotif

echo "Known motif results: $(wc -l < motif_output_known/knownResults.txt)"
# Known motif results: 392

# Find occurrences of a specific motif across peaks (outputs peak-level annotation)
# Extract the motif matrix file for the TF of interest from homerResults/
findMotifsGenome.pl peaks/tf_chip_200bp.bed hg38 motif_scan_out/ \
    -size 200 \
    -mask \
    -find motif_output/homerResults/motif1.motif \
    > peaks_with_motif1.txt

echo "Peaks containing motif1: $(wc -l < peaks_with_motif1.txt)"
# Peaks containing motif1: 8941

# Custom background: compare treated vs. control peak sets
findMotifsGenome.pl peaks/treated_peaks.bed hg38 motif_treated_vs_ctrl/ \
    -size 200 \
    -mask \
    -p 4 \
    -bg peaks/control_peaks.bed

Step 5: Peak Annotation

Use

annotatePeaks.pl
to assign each peak to a genomic feature and nearest gene.

# Annotate peaks with nearest gene and TSS distance
# Outputs a tab-delimited file with genomic context for each peak
annotatePeaks.pl peaks/tf_chip_peaks.narrowPeak hg38 \
    > annotated_peaks.txt

echo "Annotated peaks: $(($(wc -l < annotated_peaks.txt) - 1)) peaks"
# Annotated peaks: 12453 peaks

# Preview column headers and first peak
head -2 annotated_peaks.txt | cut -f1-10

# Annotate ATAC-seq peaks (same command, different input)
annotatePeaks.pl peaks/atac_sample_peaks.narrowPeak hg38 \
    > annotated_atac.txt

# Generate TSS-distance histogram (for tag density plots)
# annotatePeaks.pl can compute read density around peaks with -d flag
# annotatePeaks.pl tss hg38 -size 4000 -hist 10 \
#     -d chip_tagdir/ > tss_histogram.txt

Step 6: Parse HOMER Results with Python

Read

knownResults.txt
and de novo motif files into pandas for downstream analysis.

import pandas as pd
import subprocess
import io

# --- Parse known motif enrichment results ---
# knownResults.txt columns:
# Motif Name | Consensus | P-value | Log P-value | q-value | # Target Seqs w/ motif | % Target | # Bg Seqs w/ motif | % Bg
known_cols = [
    "motif_name", "consensus", "pvalue", "log_pvalue",
    "qvalue", "n_target_seqs", "pct_target",
    "n_bg_seqs", "pct_bg"
]
known = pd.read_csv(
    "motif_output/knownResults.txt",
    sep="\t", header=0, names=known_cols
)

# Convert string percentages to floats
known["pct_target"] = known["pct_target"].str.rstrip("%").astype(float)
known["pct_bg"] = known["pct_bg"].str.rstrip("%").astype(float)
known["fold_enrichment"] = known["pct_target"] / known["pct_bg"].replace(0, 0.001)
known["-log10_pvalue"] = -known["log_pvalue"] / 2.303  # log10 from natural log

print(f"Total known motifs tested: {len(known)}")
print(f"Significant (p < 1e-5): {(known['pvalue'].astype(float) < 1e-5).sum()}")
print("\nTop 5 enriched motifs:")
print(
    known[["motif_name", "pct_target", "pct_bg", "fold_enrichment", "-log10_pvalue"]]
    .head(5)
    .to_string(index=False)
)
# Total known motifs tested: 391
# Significant (p < 1e-5): 47
# Top 5 enriched motifs:
#  motif_name  pct_target  pct_bg  fold_enrichment  -log10_pvalue
#  CTCF(Zf)/GM12878-CTCF-ChIP-Seq...   78.21    8.32           9.40         312.4
#  ZNF143(Zf)/...                        42.11    5.10           8.26         188.7
#  ...

Step 7: Parse Peak Annotations and Visualize

Read annotated peak output, summarize genomic feature distribution, and plot motif enrichments.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns

# --- Parse annotated peaks ---
# annotatePeaks.pl output header: PeakID chr start end strand score ...
# Key columns: "Annotation", "Distance to TSS", "Nearest RefSeq", "Gene Name"
annot = pd.read_csv("annotated_peaks.txt", sep="\t", header=0, low_memory=False)
annot.columns = annot.columns.str.strip()

# Simplify annotation categories
def simplify_annotation(ann):
    if pd.isna(ann):
        return "Other"
    ann = str(ann)
    if "promoter" in ann.lower():
        return "Promoter (<2 kb)"
    elif "exon" in ann.lower():
        return "Exon"
    elif "intron" in ann.lower():
        return "Intron"
    elif "tts" in ann.lower() or "downstream" in ann.lower():
        return "Downstream"
    elif "intergenic" in ann.lower():
        return "Intergenic"
    else:
        return "Other"

annot["simple_ann"] = annot["Annotation"].apply(simplify_annotation)
ann_counts = annot["simple_ann"].value_counts()

print("Peak annotation summary:")
print(ann_counts.to_string())
# Peak annotation summary:
# Intron             5832
# Intergenic         3102
# Promoter (<2 kb)   2218
# Exon                891
# Downstream          287
# Other               123

# --- Plot 1: Annotation pie chart ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

colors = ["#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00", "#A65628"]
axes[0].pie(
    ann_counts.values,
    labels=ann_counts.index,
    colors=colors[:len(ann_counts)],
    autopct="%1.1f%%",
    startangle=140,
    textprops={"fontsize": 9}
)
axes[0].set_title("Peak Genomic Annotation\n(n=12,453 peaks)", fontsize=11)

# --- Plot 2: Top known motif enrichments ---
known = pd.read_csv("motif_output/knownResults.txt", sep="\t", header=0)
known.columns = [
    "motif_name", "consensus", "pvalue", "log_pvalue",
    "qvalue", "n_target_seqs", "pct_target", "n_bg_seqs", "pct_bg"
]
known["pct_target"] = known["pct_target"].str.rstrip("%").astype(float)
known["pct_bg"] = known["pct_bg"].str.rstrip("%").astype(float)
known["-log10_p"] = (-known["log_pvalue"].astype(float)) / 2.303
known["short_name"] = known["motif_name"].str.split("/").str[0]

top_motifs = known.head(15).copy()
sns.barplot(
    data=top_motifs,
    x="-log10_p",
    y="short_name",
    palette="Blues_r",
    ax=axes[1]
)
axes[1].set_xlabel("-log10(p-value)", fontsize=10)
axes[1].set_ylabel("Motif", fontsize=10)
axes[1].set_title("Top 15 Enriched Known Motifs\n(HOMER/JASPAR database)", fontsize=11)
axes[1].axvline(x=5, color="red", linestyle="--", linewidth=0.8, label="p=1e-5")
axes[1].legend(fontsize=8)

plt.tight_layout()
plt.savefig("homer_summary.png", dpi=150, bbox_inches="tight")
plt.close()
print("Saved: homer_summary.png")

Key Parameters

ParameterDefaultRange / OptionsEffect
-size
200
50–1000Window size (bp) centered on peak midpoint for motif search; 200 for TF ChIP-seq, 150 for ATAC-seq
-mask
offflagMask repetitive elements (N-mask); strongly recommended to reduce false positives
-p
1
1–64Number of parallel CPU threads; use 4–8 for typical datasets
-S
25
1–100Number of de novo motifs to find; 25 is usually sufficient
-nomotif
offflagSkip de novo motif discovery; only perform known motif enrichment (10× faster)
-bg
auto-generatedBED file pathCustom background region file; auto-background is GC-matched if omitted
-find
.motif
file path
Scan peaks for occurrences of a specific motif matrix; outputs per-peak annotation
-mis
2
0–4Maximum mismatches allowed when matching known motifs
-len
8,10,12
comma-separated integersMotif lengths to try for de novo search; adding
6
finds shorter motifs
genome
required
hg38
,
mm10
,
hg19
,
dm6
,
ce11
, etc.
Reference genome assembly; must be installed with
installGenome.pl

Common Recipes

Recipe 1: Python subprocess Wrapper for findMotifsGenome.pl

Run HOMER from a Python script and capture completion status.

import subprocess
import sys
from pathlib import Path

def run_homer_motifs(
    peaks_bed: str,
    genome: str,
    output_dir: str,
    size: int = 200,
    mask: bool = True,
    cpus: int = 4,
    n_motifs: int = 25,
    denovo: bool = True,
    bg_bed: str = None
) -> int:
    """Run findMotifsGenome.pl and return exit code."""
    Path(output_dir).mkdir(parents=True, exist_ok=True)
    cmd = [
        "findMotifsGenome.pl", peaks_bed, genome, output_dir,
        "-size", str(size),
        "-p", str(cpus),
        "-S", str(n_motifs),
    ]
    if mask:
        cmd.append("-mask")
    if not denovo:
        cmd.append("-nomotif")
    if bg_bed:
        cmd.extend(["-bg", bg_bed])

    print(f"Running: {' '.join(cmd)}")
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        print(f"HOMER stderr:\n{result.stderr[-2000:]}", file=sys.stderr)
        return result.returncode
    print(f"Done. Output in {output_dir}/")
    return 0

# Usage
exit_code = run_homer_motifs(
    peaks_bed="peaks/tf_chip_200bp.bed",
    genome="hg38",
    output_dir="motif_output/",
    size=200, mask=True, cpus=4, n_motifs=25
)
print(f"Exit code: {exit_code}")
# Running: findMotifsGenome.pl peaks/tf_chip_200bp.bed hg38 motif_output/ -size 200 -p 4 -S 25 -mask
# Done. Output in motif_output/
# Exit code: 0

Recipe 2: Batch Motif Analysis Across Multiple Peak Sets

Run HOMER on multiple ChIP-seq samples in a loop.

#!/bin/bash
# Batch motif analysis for several ChIP-seq experiments (same genome)
GENOME="hg38"
PEAK_DIR="peaks"
MOTIF_DIR="motif_results"
mkdir -p "$MOTIF_DIR"

SAMPLES=(CTCF H3K4me3 FOXA2 RUNX1)

for sample in "${SAMPLES[@]}"; do
    echo "=== Processing $sample ==="
    peak_file="${PEAK_DIR}/${sample}_summits.bed"

    if [ ! -f "$peak_file" ]; then
        echo "  Skipping $sample: $peak_file not found"
        continue
    fi

    # Center on summit ±100 bp
    awk 'BEGIN{OFS="\t"} {s=$2-100<0?0:$2-100; print $1,s,$2+100,$4,$5}' \
        "$peak_file" > "${PEAK_DIR}/${sample}_200bp.bed"

    findMotifsGenome.pl "${PEAK_DIR}/${sample}_200bp.bed" "$GENOME" \
        "${MOTIF_DIR}/${sample}/" \
        -size 200 -mask -p 4 -nomotif \
        2> "${MOTIF_DIR}/${sample}.log"

    n=$(wc -l < "${MOTIF_DIR}/${sample}/knownResults.txt")
    echo "  $sample: $n motifs tested. Top hit: $(sed -n '2p' "${MOTIF_DIR}/${sample}/knownResults.txt" | cut -f1)"
done
# === Processing CTCF ===
#   CTCF: 392 motifs tested. Top hit: CTCF(Zf)/GM12878-CTCF-ChIP-Seq(GSE32465)/Homer
# === Processing FOXA2 ===
#   FOXA2: 392 motifs tested. Top hit: Foxa2(Forkhead)/Liver-Foxa2-ChIP-Seq(GSE25694)/Homer

Recipe 3: Parse De Novo Motif Matrices from homerResults/

Load de novo motif PWM matrices for downstream comparison or plotting.

import os
import re
import pandas as pd

def parse_homer_motif(motif_file: str) -> dict:
    """Parse a HOMER .motif file into name, log_odds_threshold, and PWM."""
    with open(motif_file) as f:
        header = f.readline().strip()  # >motif_name\tlog_odds\tlog_p-value\t0\tnucs
        rows = []
        for line in f:
            line = line.strip()
            if line:
                rows.append([float(x) for x in line.split("\t")])
    parts = header.lstrip(">").split("\t")
    name = parts[0]
    log_odds = float(parts[1]) if len(parts) > 1 else 0.0
    log_p = float(parts[2]) if len(parts) > 2 else 0.0
    pwm = pd.DataFrame(rows, columns=["A", "C", "G", "T"])
    return {"name": name, "log_odds": log_odds, "log_p": log_p, "pwm": pwm}

# Load all de novo motifs
motif_dir = "motif_output/homerResults/"
motifs = []
for fn in sorted(os.listdir(motif_dir)):
    if fn.endswith(".motif"):
        motif = parse_homer_motif(os.path.join(motif_dir, fn))
        motifs.append(motif)
        print(f"{fn}: {motif['name']} ({len(motif['pwm'])} positions, log_p={motif['log_p']:.1f})")

print(f"\nLoaded {len(motifs)} de novo motifs")
# motif1.motif: CTCF-motif (19 positions, log_p=-8234.1)
# motif2.motif: CTCFL-motif (17 positions, log_p=-3421.7)
# ...
# Loaded 25 de novo motifs

Recipe 4: Annotate Peaks and Join with Differential Expression Results

Combine peak annotations with RNA-seq DE results to find regulated genes near peaks.

import pandas as pd

# Load annotated peaks
annot = pd.read_csv("annotated_peaks.txt", sep="\t", header=0, low_memory=False)
annot.columns = annot.columns.str.strip()

# Key columns from HOMER annotation
peak_genes = annot[["PeakID (cmd=annotatePeaks.pl peaks.bed hg38)",
                     "Chr", "Start", "End",
                     "Annotation", "Distance to TSS",
                     "Nearest RefSeq", "Gene Name"]].copy()
peak_genes.columns = ["peak_id", "chr", "start", "end",
                      "annotation", "tss_dist", "refseq", "gene_name"]

# Filter promoter-proximal peaks (within 2 kb of TSS)
promoter_peaks = peak_genes[peak_genes["tss_dist"].abs() < 2000].copy()
print(f"Promoter-proximal peaks (|TSS| < 2kb): {len(promoter_peaks)}")
# Promoter-proximal peaks (|TSS| < 2kb): 2218

# Load DESeq2 results (gene_name, log2FC, padj)
de_results = pd.read_csv("deseq2_results.csv")
de_sig = de_results[de_results["padj"] < 0.05].copy()

# Merge: find DE genes with a nearby peak
merged = promoter_peaks.merge(de_sig, on="gene_name", how="inner")
print(f"DE genes with promoter-proximal peak: {len(merged)}")
print(merged[["gene_name", "tss_dist", "annotation", "log2FC", "padj"]].head())
# DE genes with promoter-proximal peak: 347

Expected Outputs

OutputFormatDescription
motif_output/knownResults.txt
TSVAll known motifs tested: name, p-value, q-value, % target, % background; primary result file
motif_output/knownResults.html
HTMLInteractive HTML report with motif logos, statistics, and links
motif_output/homerResults/
DirectoryDe novo motifs:
motifN.motif
(PWM matrix),
motifN.logo.png
,
similar.motifs.txt
motif_output/homerResults.html
HTMLInteractive HTML report for de novo motifs
annotated_peaks.txt
TSVOne row per peak: chr, start, end, annotation, TSS distance, nearest gene, RefSeq ID
peaks_with_motif1.txt
TSVPer-peak motif occurrence scores (from
-find
mode)

Troubleshooting

ProblemCauseSolution
ERROR: Genome not found
Genome not installed via
installGenome.pl
Run
installGenome.pl hg38
; confirm
~/.homer/data/genomes/hg38/
exists
command not found: findMotifsGenome.pl
HOMER binaries not in
$PATH
Add HOMER bin directory to PATH:
export PATH=~/.homer/bin:$PATH
(or activate conda env)
Poor de novo motif quality (GC-rich artifacts)Unmasked repetitive elementsAlways use
-mask
; also try increasing
-size
to 300 or reducing to 150
No significant known motifs (all p > 0.01)Too few peaks or wrong peak sizeEnsure ≥500 peaks; check
-size
matches expected binding footprint (150–200 for TFs)
Conda HOMER conflicts (Perl module errors)HOMER's Perl scripts conflict with conda environment PerlInstall HOMER in a dedicated conda env:
conda create -n homer_only -c bioconda homer
annotatePeaks.pl
gives all "Intergenic"
Genome annotation not installedHOMER annotation requires
installGenome.pl
which downloads GTF; check
~/.homer/data/genomes/hg38/
contains
*.ann
files
Very slow run time (>2 hr)Single-threaded or large peak setAdd
-p 8
; reduce
-S
to 10 for speed; use
-nomotif
for known-only runs
-bg
custom background gives poor motifs
Background regions have very different GC contentUse a GC-matched background; run
bedtools shuffle
with
-noOverlapping
to generate random same-size regions

References