install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Epigenomics/chip-seq/motif-analysis" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-motif-analysis && rm -rf "$T"
manifest:
Skills/Epigenomics/chip-seq/motif-analysis/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-chipseq-motif-analysis description: De novo motif discovery and known motif enrichment analysis using HOMER and MEME-ChIP. Identify transcription factor binding motifs in ChIP-seq, ATAC-seq, or other genomic peak data. Use when finding enriched DNA motifs in peak sequences. tool_type: cli primary_tool: HOMER measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Motif Analysis
Identify DNA sequence motifs enriched in ChIP-seq or ATAC-seq peaks to discover transcription factor binding sites.
Tool Comparison
| Tool | Strengths | Use Case |
|---|---|---|
| HOMER | Fast, comprehensive, built-in databases | General motif analysis |
| MEME-ChIP | Multiple algorithms, web interface | Publication-quality |
| MEME | De novo discovery only | Simple discovery |
| FIMO | Known motif scanning | Genome-wide scanning |
HOMER
Installation
conda install -c bioconda homer # Configure genome (required once) perl /path/to/homer/configureHomer.pl -install hg38 perl /path/to/homer/configureHomer.pl -install mm10
De Novo Motif Discovery
# Basic motif finding findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 # With background regions findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -bg background.bed # Specify motif lengths to search findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -len 8,10,12
Key Options
| Option | Description |
|---|---|
| Fragment size for analysis (default 200) |
| Use actual peak sizes |
| Background regions (BED) |
| Motif lengths to search |
| Mask repeats |
| Number of CPUs |
| Number of motifs to find (default 25) |
| Mismatches allowed (default 2) |
| Don't adjust for GC content |
Output Files
output_dir/ ├── homerResults.html # Main results page ├── knownResults.html # Known motif enrichment ├── homerMotifs.all.motifs # All discovered motifs ├── knownResults.txt # Known motif statistics └── motif1.motif # Individual motif files
Known Motif Enrichment Only
# Skip de novo, only check known motifs findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -nomotif
Scan for Specific Motifs
# Find instances of motif in peaks annotatePeaks.pl peaks.bed hg38 -m motif.motif > annotated.txt # Scan genome for motif occurrences scanMotifGenomeWide.pl motif.motif hg38 > motif_sites.bed
Motif Comparison
# Compare discovered motifs to known database compareMotifs.pl motifs.motif output_dir/ -known
Create Custom Motif
# From consensus sequence seq2profile.pl CACGTG 4 > MYC.motif # From aligned sequences cat aligned_seqs.txt | alignAndConvert.pl - > custom.motif
MEME Suite
Installation
conda install -c bioconda meme
Extract Sequences from Peaks
# Get FASTA sequences under peaks bedtools getfasta -fi genome.fa -bed peaks.bed -fo peaks.fa # Center peaks and resize bedtools slop -i peaks.bed -g genome.sizes -b 100 | \ bedtools getfasta -fi genome.fa -bed - -fo peaks_centered.fa
MEME (De Novo Discovery)
# Basic de novo discovery meme peaks.fa -dna -oc meme_output -mod zoops -nmotifs 10 -minw 6 -maxw 20 # With Markov background fasta-get-markov peaks.fa > background.model meme peaks.fa -dna -oc meme_output -bfile background.model -mod zoops -nmotifs 10
MEME Options
| Option | Description |
|---|---|
| Zero or one per sequence (default for ChIP) |
| Exactly one per sequence |
| Any number of repeats |
| Number of motifs to find |
| Minimum motif width |
| Maximum motif width |
| Search both strands |
| Background model file |
MEME-ChIP (Comprehensive Pipeline)
# All-in-one ChIP-seq motif analysis meme-chip -oc meme_chip_output -db motif_database.meme peaks.fa
MEME-ChIP runs:
- MEME - De novo discovery (central enrichment)
- DREME - Short motif discovery
- CentriMo - Central enrichment analysis
- TOMTOM - Compare to known motifs
- FIMO - Find motif instances
DREME (Short Motifs)
# Find short enriched motifs dreme -oc dreme_output -p peaks.fa -n background.fa
CentriMo (Central Enrichment)
# Test for central enrichment of known motifs centrimo -oc centrimo_output peaks.fa motif_database.meme
TOMTOM (Motif Comparison)
# Compare discovered motifs to database tomtom -oc tomtom_output discovered.meme database.meme
FIMO (Motif Scanning)
# Scan sequences for motif matches fimo --oc fimo_output motif.meme sequences.fa # Scan genome fimo --oc fimo_output --max-stored-scores 1000000 motif.meme genome.fa
Motif Databases
HOMER Built-in
# List available motif sets ls /path/to/homer/data/knownTFs/ # Vertebrate, known motifs (default) findMotifsGenome.pl peaks.bed hg38 output/ -mknown vertebrates/known.motifs
JASPAR
# Download JASPAR motifs wget https://jaspar.genereg.net/download/data/2024/CORE/JASPAR2024_CORE_vertebrates_non-redundant_pfms_meme.txt # Use with MEME suite meme-chip -db JASPAR2024_CORE_vertebrates_non-redundant_pfms_meme.txt peaks.fa
HOCOMOCO
# Download HOCOMOCO wget https://hocomoco11.autosome.org/final_bundle/hocomoco11/core/HUMAN/mono/HOCOMOCOv11_core_HUMAN_mono_meme_format.meme # Use with MEME suite tomtom discovered.meme HOCOMOCOv11_core_HUMAN_mono_meme_format.meme
Python: Parse HOMER Results
import pandas as pd def parse_homer_known(results_file): '''Parse HOMER knownResults.txt.''' df = pd.read_csv(results_file, sep='\t') df.columns = ['Motif', 'Consensus', 'P-value', 'Log P-value', 'q-value', 'Targets', 'Target%', 'Background', 'Background%'] df['P-value'] = df['P-value'].astype(float) return df.sort_values('P-value') known = parse_homer_known('output_dir/knownResults.txt') print(known[['Motif', 'P-value', 'Target%']].head(20))
Python: Parse MEME Results
from Bio import motifs def parse_meme_file(meme_file): '''Parse MEME output file.''' with open(meme_file) as f: record = motifs.parse(f, 'meme') return record record = parse_meme_file('meme_output/meme.txt') for m in record: print(f'{m.name}: {m.consensus}') print(m.counts)
Complete Workflows
ChIP-seq Motif Analysis
#!/bin/bash set -euo pipefail PEAKS=$1 # narrowPeak or BED file GENOME=$2 # hg38, mm10, etc. OUTDIR=$3 mkdir -p $OUTDIR # HOMER analysis echo "Running HOMER..." findMotifsGenome.pl $PEAKS $GENOME ${OUTDIR}/homer \ -size 200 -p 8 -mask # Extract sequences for MEME echo "Extracting sequences..." bedtools slop -i $PEAKS -g ${GENOME}.chrom.sizes -b 0 | \ awk 'BEGIN{OFS="\t"} {center=int(($2+$3)/2); print $1,center-100,center+100}' | \ bedtools getfasta -fi ${GENOME}.fa -bed - -fo ${OUTDIR}/peaks.fa # MEME-ChIP analysis echo "Running MEME-ChIP..." meme-chip -oc ${OUTDIR}/meme_chip \ -db /path/to/JASPAR.meme \ ${OUTDIR}/peaks.fa echo "Done. Results in ${OUTDIR}/"
ATAC-seq Footprint Motifs
# Analyze motifs in footprint regions findMotifsGenome.pl footprints.bed hg38 footprint_motifs/ \ -size given -mask -p 8 # Compare to accessible regions background findMotifsGenome.pl footprints.bed hg38 footprint_motifs/ \ -size given -bg accessible_peaks.bed -mask -p 8
Visualization
HOMER Logo
# Generate sequence logo motif2Logo.pl motif.motif > logo.eps
Plot with Python
import logomaker import pandas as pd import matplotlib.pyplot as plt def plot_motif(pwm_file): '''Plot sequence logo from HOMER PWM.''' pwm = pd.read_csv(pwm_file, sep='\t', skiprows=1, header=None) pwm.columns = ['A', 'C', 'G', 'T'] logo = logomaker.Logo(pwm, shade_below=0.5, fade_below=0.5) plt.show()
Quality Metrics
| Metric | Good | Concerning |
|---|---|---|
| P-value | < 1e-10 | > 1e-5 |
| Target % | > 20% | < 5% |
| Background % | < Target/2 | Similar to Target |
| Bit score | > 10 | < 5 |
Common Issues
No Significant Motifs
- Check peak quality (too few peaks?)
- Try different peak sizes (
)-size - Ensure genome build matches
- Check for repeat masking issues
Too Many Motifs
- Increase significance threshold
- Use
to limit number of motifs-S - Filter by target percentage
Wrong Background
- Use matched GC content background
- Consider using input/control peaks
- Try shuffled sequences
Related Skills
- peak-calling - Generate input peaks
- peak-annotation - Annotate peaks with genes
- atac-seq/footprinting - TF footprint analysis
- genome-intervals - BED file operations