Claude-skill-registry bio-chip-seq-motif-analysis

De novo motif discovery and known motif enrichment analysis using HOMER and MEME-ChIP. Identify transcription factor binding motifs in ChIP-seq, ATAC-seq, or other genomic peak data. Use when finding enriched DNA motifs in peak sequences.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/bio-chip-seq-motif-analysis" ~/.claude/skills/majiayu000-claude-skill-registry-bio-chip-seq-motif-analysis && rm -rf "$T"
manifest: skills/data/bio-chip-seq-motif-analysis/SKILL.md
source content

Motif Analysis

Identify DNA sequence motifs enriched in ChIP-seq or ATAC-seq peaks to discover transcription factor binding sites.

Tool Comparison

ToolStrengthsUse Case
HOMERFast, comprehensive, built-in databasesGeneral motif analysis
MEME-ChIPMultiple algorithms, web interfacePublication-quality
MEMEDe novo discovery onlySimple discovery
FIMOKnown motif scanningGenome-wide scanning

HOMER

Installation

conda install -c bioconda homer

# Configure genome (required once)
perl /path/to/homer/configureHomer.pl -install hg38
perl /path/to/homer/configureHomer.pl -install mm10

De Novo Motif Discovery

# Basic motif finding
findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200

# With background regions
findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -bg background.bed

# Specify motif lengths to search
findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -len 8,10,12

Key Options

OptionDescription
-size <#>
Fragment size for analysis (default 200)
-size given
Use actual peak sizes
-bg <file>
Background regions (BED)
-len <#,#,...>
Motif lengths to search
-mask
Mask repeats
-p <#>
Number of CPUs
-S <#>
Number of motifs to find (default 25)
-mis <#>
Mismatches allowed (default 2)
-noweight
Don't adjust for GC content

Output Files

output_dir/
├── homerResults.html      # Main results page
├── knownResults.html      # Known motif enrichment
├── homerMotifs.all.motifs # All discovered motifs
├── knownResults.txt       # Known motif statistics
└── motif1.motif           # Individual motif files

Known Motif Enrichment Only

# Skip de novo, only check known motifs
findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -nomotif

Scan for Specific Motifs

# Find instances of motif in peaks
annotatePeaks.pl peaks.bed hg38 -m motif.motif > annotated.txt

# Scan genome for motif occurrences
scanMotifGenomeWide.pl motif.motif hg38 > motif_sites.bed

Motif Comparison

# Compare discovered motifs to known database
compareMotifs.pl motifs.motif output_dir/ -known

Create Custom Motif

# From consensus sequence
seq2profile.pl CACGTG 4 > MYC.motif

# From aligned sequences
cat aligned_seqs.txt | alignAndConvert.pl - > custom.motif

MEME Suite

Installation

conda install -c bioconda meme

Extract Sequences from Peaks

# Get FASTA sequences under peaks
bedtools getfasta -fi genome.fa -bed peaks.bed -fo peaks.fa

# Center peaks and resize
bedtools slop -i peaks.bed -g genome.sizes -b 100 | \
    bedtools getfasta -fi genome.fa -bed - -fo peaks_centered.fa

MEME (De Novo Discovery)

# Basic de novo discovery
meme peaks.fa -dna -oc meme_output -mod zoops -nmotifs 10 -minw 6 -maxw 20

# With Markov background
fasta-get-markov peaks.fa > background.model
meme peaks.fa -dna -oc meme_output -bfile background.model -mod zoops -nmotifs 10

MEME Options

OptionDescription
-mod zoops
Zero or one per sequence (default for ChIP)
-mod oops
Exactly one per sequence
-mod anr
Any number of repeats
-nmotifs <#>
Number of motifs to find
-minw <#>
Minimum motif width
-maxw <#>
Maximum motif width
-revcomp
Search both strands
-bfile <file>
Background model file

MEME-ChIP (Comprehensive Pipeline)

# All-in-one ChIP-seq motif analysis
meme-chip -oc meme_chip_output -db motif_database.meme peaks.fa

MEME-ChIP runs:

  1. MEME - De novo discovery (central enrichment)
  2. DREME - Short motif discovery
  3. CentriMo - Central enrichment analysis
  4. TOMTOM - Compare to known motifs
  5. FIMO - Find motif instances

DREME (Short Motifs)

# Find short enriched motifs
dreme -oc dreme_output -p peaks.fa -n background.fa

CentriMo (Central Enrichment)

# Test for central enrichment of known motifs
centrimo -oc centrimo_output peaks.fa motif_database.meme

TOMTOM (Motif Comparison)

# Compare discovered motifs to database
tomtom -oc tomtom_output discovered.meme database.meme

FIMO (Motif Scanning)

# Scan sequences for motif matches
fimo --oc fimo_output motif.meme sequences.fa

# Scan genome
fimo --oc fimo_output --max-stored-scores 1000000 motif.meme genome.fa

Motif Databases

HOMER Built-in

# List available motif sets
ls /path/to/homer/data/knownTFs/

# Vertebrate, known motifs (default)
findMotifsGenome.pl peaks.bed hg38 output/ -mknown vertebrates/known.motifs

JASPAR

# Download JASPAR motifs
wget https://jaspar.genereg.net/download/data/2024/CORE/JASPAR2024_CORE_vertebrates_non-redundant_pfms_meme.txt

# Use with MEME suite
meme-chip -db JASPAR2024_CORE_vertebrates_non-redundant_pfms_meme.txt peaks.fa

HOCOMOCO

# Download HOCOMOCO
wget https://hocomoco11.autosome.org/final_bundle/hocomoco11/core/HUMAN/mono/HOCOMOCOv11_core_HUMAN_mono_meme_format.meme

# Use with MEME suite
tomtom discovered.meme HOCOMOCOv11_core_HUMAN_mono_meme_format.meme

Python: Parse HOMER Results

import pandas as pd

def parse_homer_known(results_file):
    '''Parse HOMER knownResults.txt.'''
    df = pd.read_csv(results_file, sep='\t')
    df.columns = ['Motif', 'Consensus', 'P-value', 'Log P-value',
                  'q-value', 'Targets', 'Target%', 'Background', 'Background%']
    df['P-value'] = df['P-value'].astype(float)
    return df.sort_values('P-value')

known = parse_homer_known('output_dir/knownResults.txt')
print(known[['Motif', 'P-value', 'Target%']].head(20))

Python: Parse MEME Results

from Bio import motifs

def parse_meme_file(meme_file):
    '''Parse MEME output file.'''
    with open(meme_file) as f:
        record = motifs.parse(f, 'meme')
    return record

record = parse_meme_file('meme_output/meme.txt')
for m in record:
    print(f'{m.name}: {m.consensus}')
    print(m.counts)

Complete Workflows

ChIP-seq Motif Analysis

#!/bin/bash
set -euo pipefail

PEAKS=$1  # narrowPeak or BED file
GENOME=$2  # hg38, mm10, etc.
OUTDIR=$3

mkdir -p $OUTDIR

# HOMER analysis
echo "Running HOMER..."
findMotifsGenome.pl $PEAKS $GENOME ${OUTDIR}/homer \
    -size 200 -p 8 -mask

# Extract sequences for MEME
echo "Extracting sequences..."
bedtools slop -i $PEAKS -g ${GENOME}.chrom.sizes -b 0 | \
    awk 'BEGIN{OFS="\t"} {center=int(($2+$3)/2); print $1,center-100,center+100}' | \
    bedtools getfasta -fi ${GENOME}.fa -bed - -fo ${OUTDIR}/peaks.fa

# MEME-ChIP analysis
echo "Running MEME-ChIP..."
meme-chip -oc ${OUTDIR}/meme_chip \
    -db /path/to/JASPAR.meme \
    ${OUTDIR}/peaks.fa

echo "Done. Results in ${OUTDIR}/"

ATAC-seq Footprint Motifs

# Analyze motifs in footprint regions
findMotifsGenome.pl footprints.bed hg38 footprint_motifs/ \
    -size given -mask -p 8

# Compare to accessible regions background
findMotifsGenome.pl footprints.bed hg38 footprint_motifs/ \
    -size given -bg accessible_peaks.bed -mask -p 8

Visualization

HOMER Logo

# Generate sequence logo
motif2Logo.pl motif.motif > logo.eps

Plot with Python

import logomaker
import pandas as pd
import matplotlib.pyplot as plt

def plot_motif(pwm_file):
    '''Plot sequence logo from HOMER PWM.'''
    pwm = pd.read_csv(pwm_file, sep='\t', skiprows=1, header=None)
    pwm.columns = ['A', 'C', 'G', 'T']
    logo = logomaker.Logo(pwm, shade_below=0.5, fade_below=0.5)
    plt.show()

Quality Metrics

MetricGoodConcerning
P-value< 1e-10> 1e-5
Target %> 20%< 5%
Background %< Target/2Similar to Target
Bit score> 10< 5

Common Issues

No Significant Motifs

  • Check peak quality (too few peaks?)
  • Try different peak sizes (
    -size
    )
  • Ensure genome build matches
  • Check for repeat masking issues

Too Many Motifs

  • Increase significance threshold
  • Use
    -S
    to limit number of motifs
  • Filter by target percentage

Wrong Background

  • Use matched GC content background
  • Consider using input/control peaks
  • Try shuffled sequences

Related Skills

  • peak-calling - Generate input peaks
  • peak-annotation - Annotate peaks with genes
  • atac-seq/footprinting - TF footprint analysis
  • genome-intervals - BED file operations