SciAgent-Skills remap-database

Query the ReMap 2022 TF ChIP-seq binding peak database via REST API and BED file downloads. Retrieve all TF binding peaks overlapping a genomic region (chr:start-end), find TF peaks near a gene by name, list TFs available for a species, filter peaks by regulatory biotype (promoter, enhancer), and download peak BED files for a TF-cell type pair. Use for TF co-occupancy analysis, regulatory region annotation, and building TF binding atlases. For JASPAR motif matrices use jaspar-database; for ENCODE regulatory tracks use encode-database.

install

source · Clone the upstream repo

git clone https://github.com/jaechang-hits/SciAgent-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/genomics-bioinformatics/remap-database" ~/.claude/skills/jaechang-hits-sciagent-skills-remap-database && rm -rf "$T"

manifest: skills/genomics-bioinformatics/remap-database/SKILL.md

source content

ReMap Database

Overview

ReMap 2022 is an integrative database of transcription factor (TF), cofactor, and chromatin regulator binding sites derived from uniformly reprocessed ChIP-seq experiments. The 2022 release catalogs 165 million non-redundant peaks from 8,113 ChIP-seq datasets covering 1,210 TFs across human (hg38/hg19), mouse (mm10), Drosophila, and Arabidopsis genomes. All peaks are called with a consistent pipeline from public GEO/ArrayExpress experiments. Access is via the ReMap 2022 REST API at

https://remap2022.univ-amu.fr/api/

and bulk BED file downloads; no authentication required.

When to Use

Finding all TFs with ChIP-seq peaks overlapping a genomic region of interest (e.g., a GWAS SNP locus or candidate enhancer)
Retrieving TF peaks near a gene's transcription start site to map its proximal regulatory landscape
Listing all TFs available in ReMap for human or mouse with their peak and dataset counts
Filtering ChIP-seq peaks by regulatory biotype annotation (promoter, enhancer, exon, intron, intergenic) for a TF in a specific cell line
Downloading a BED file of all binding peaks for a TF across all cell types for offline analysis
Identifying co-binding TFs at a locus by querying all overlapping peaks and grouping by TF name
Use
```
jaspar-database
```
instead when you need PWM/PFM sequence models of TF binding specificity rather than ChIP-seq peak locations
For ENCODE-specific regulatory tracks and accessibility data use
```
encode-database
```
; ReMap aggregates TF binding peaks from many sources including ENCODE

Prerequisites

Python packages:
```
requests
```
,
```
pandas
```
,
```
matplotlib
```
Data requirements: genomic coordinates (GRCh38/hg38 or hg19), gene names, or TF names
Environment: internet connection; no API key required
Rate limits: no official published limits; use
```
time.sleep(0.5)
```
between batch requests to avoid server overload
Note: The ReMap API is a research API; endpoint availability may vary. All examples include a BED download fallback.

pip install requests pandas matplotlib

Quick Start

import requests

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

# Query TF peaks overlapping a genomic region
r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
    "chr": "chr17",
    "start": 7_670_000,
    "end": 7_690_000,
    "assembly": "hg38"
}, timeout=30)
r.raise_for_status()
peaks = r.json()
print(f"Peaks overlapping TP53 locus: {len(peaks)}")
tfs = set(p.get("name", "").split(":")[0] for p in peaks)
print(f"Unique TFs: {len(tfs)}")
print(f"TF names (first 10): {sorted(tfs)[:10]}")

Core API

Query 1: Region Overlap

Find all TF ChIP-seq peaks overlapping a specified genomic window. Returns peak records including TF name, cell type, coordinates, and score.

import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_region(chrom, start, end, assembly="hg38", timeout=30):
    """Return all ReMap peaks overlapping [chrom:start-end]."""
    r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
        "chr": chrom, "start": start, "end": end, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

# Query 100 kb window on chr17 around TP53
peaks = query_region("chr17", 7_670_000, 7_690_000, assembly="hg38")
print(f"Total peaks: {len(peaks)}")

# Parse name field: format is "TF:experiment_id:cell_type"
rows = []
for p in peaks:
    parts = p.get("name", "::").split(":")
    tf   = parts[0] if len(parts) > 0 else ""
    exp  = parts[1] if len(parts) > 1 else ""
    cell = parts[2] if len(parts) > 2 else ""
    rows.append({
        "chr": p.get("chr", p.get("chrom", "")),
        "start": p.get("start", 0),
        "end": p.get("end", 0),
        "tf_name": tf,
        "experiment_id": exp,
        "cell_type": cell,
        "score": p.get("score", 0),
    })

df = pd.DataFrame(rows)
print(f"\nUnique TFs: {df['tf_name'].nunique()}")
print(f"Top TFs by peak count:\n{df['tf_name'].value_counts().head(10).to_string()}")

# Fallback: if API is unavailable, use a locally downloaded BED file
# Download from: https://remap2022.univ-amu.fr/download_page
# e.g., remap2022_all_macs2_hg38_v1_0.bed.gz

import pandas as pd

def query_region_from_bed(bed_file, chrom, start, end):
    """Filter a ReMap BED file for overlapping peaks."""
    cols = ["chr", "start", "end", "name", "score", "strand",
            "thick_start", "thick_end", "color"]
    df = pd.read_csv(bed_file, sep="\t", header=None, names=cols,
                     compression="infer")
    mask = (df["chr"] == chrom) & (df["end"] > start) & (df["start"] < end)
    return df[mask].reset_index(drop=True)

# Usage (requires downloaded BED):
# df = query_region_from_bed("remap2022_all_macs2_hg38_v1_0.bed.gz",
#                             "chr17", 7_670_000, 7_690_000)

Query 2: Gene-Centric Query

Retrieve all TF ChIP-seq peaks near a gene's TSS, providing a promoter-proximal regulatory landscape for the gene.

import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_gene_peaks(gene_name, assembly="hg38", timeout=30):
    """Return all ReMap peaks near a gene TSS."""
    r = requests.get(f"{REMAP_API}/peaks/gene/", params={
        "gene": gene_name, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

peaks = query_gene_peaks("MYC", assembly="hg38")
print(f"Peaks near MYC TSS: {len(peaks)}")

rows = []
for p in peaks:
    parts = p.get("name", "::").split(":")
    rows.append({
        "tf_name": parts[0] if parts else "",
        "cell_type": parts[2] if len(parts) > 2 else "",
        "chr": p.get("chr", p.get("chrom", "")),
        "start": p.get("start", 0),
        "end": p.get("end", 0),
        "score": p.get("score", 0),
        "biotype": p.get("biotype", ""),
    })

df = pd.DataFrame(rows)
print(f"\nTFs near MYC TSS ({df['tf_name'].nunique()} unique):")
print(df["tf_name"].value_counts().head(10).to_string())
print(f"\nCell types represented: {df['cell_type'].nunique()}")

Query 3: TF Browser

List all TFs available in ReMap for a given genome assembly, with peak and experiment counts.

import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def list_tfs(assembly="hg38", timeout=30):
    """Return all TFs in ReMap for the given assembly with statistics."""
    r = requests.get(f"{REMAP_API}/tfbs/list/", params={"assembly": assembly}, timeout=timeout)
    r.raise_for_status()
    return r.json()

def get_database_stats(assembly="hg38", timeout=30):
    """Return overall database statistics for the assembly."""
    r = requests.get(f"{REMAP_API}/stats/", params={"assembly": assembly}, timeout=timeout)
    r.raise_for_status()
    return r.json()

# Database overview
try:
    stats = get_database_stats("hg38")
    print(f"ReMap 2022 hg38 statistics:")
    for k, v in stats.items():
        print(f"  {k}: {v}")
except Exception as e:
    print(f"Stats endpoint unavailable: {e}")
    print("ReMap 2022 hg38: 165M peaks, 1,210 TFs, 8,113 datasets (from publication)")

# TF list
try:
    tfs = list_tfs("hg38")
    df_tfs = pd.DataFrame(tfs)
    print(f"\nTFs available (hg38): {len(df_tfs)}")
    if "peak_count" in df_tfs.columns:
        top = df_tfs.nlargest(10, "peak_count")[["name", "peak_count", "dataset_count"]]
        print("Top 10 TFs by peak count:")
        print(top.to_string(index=False))
except Exception as e:
    print(f"TF list endpoint unavailable: {e}")
    print("Use TF name queries directly (Query 4) or download TF-specific BED files.")

Query 4: TF-Specific Peak Query

Retrieve all peaks for a named TF in a given assembly, optionally filtered by cell type.

import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_tf_peaks(tf_name, assembly="hg38", timeout=30):
    """Return all ChIP-seq peaks for a TF across all cell types."""
    r = requests.get(f"{REMAP_API}/tfbs/name/", params={
        "name": tf_name, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

peaks = query_tf_peaks("CTCF", assembly="hg38")
print(f"CTCF peaks (all cell types): {len(peaks)}")

# Parse and summarize
rows = []
for p in peaks:
    parts = p.get("name", "::").split(":")
    rows.append({
        "tf_name": parts[0] if parts else "",
        "cell_type": parts[2] if len(parts) > 2 else "",
        "chr":   p.get("chr",   p.get("chrom", "")),
        "start": p.get("start", 0),
        "end":   p.get("end",   0),
        "score": p.get("score", 0),
        "biotype": p.get("biotype", ""),
    })

df = pd.DataFrame(rows)
print(f"Cell types: {df['cell_type'].nunique()}")
print(f"Chromosomes: {df['chr'].nunique()}")
print(f"Peak width stats (bp):")
df["width"] = df["end"] - df["start"]
print(f"  Median: {df['width'].median():.0f}  Mean: {df['width'].mean():.0f}  "
      f"Min: {df['width'].min()}  Max: {df['width'].max()}")

Query 5: Biotype Filter and Regulatory Annotation

Filter peaks by regulatory biotype annotation to identify binding at promoters, enhancers, or intergenic regions.

import requests, pandas as pd, matplotlib.pyplot as plt

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def get_biotypes(assembly="hg38", timeout=30):
    """List all regulatory biotype categories available."""
    r = requests.get(f"{REMAP_API}/biotypes/", params={"assembly": assembly}, timeout=timeout)
    r.raise_for_status()
    return r.json()

def query_tf_by_biotype(tf_name, biotype, assembly="hg38", timeout=30):
    """Retrieve TF peaks filtered by regulatory biotype."""
    r = requests.get(f"{REMAP_API}/peaks/biotype/", params={
        "name": tf_name, "biotype": biotype, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

# List available biotypes
try:
    biotypes = get_biotypes("hg38")
    print(f"Available biotypes: {biotypes}")
except Exception:
    biotypes = ["promoter", "enhancer", "exon", "intron", "intergenic", "UTR"]
    print(f"Using known biotypes: {biotypes}")

# Query CTCF peaks and plot biotype distribution
peaks = query_tf_peaks("CTCF", assembly="hg38")  # from Query 4 function above

def query_tf_peaks(tf_name, assembly="hg38", timeout=30):
    r = requests.get(f"https://remap2022.univ-amu.fr/api/v1/tfbs/name/",
                     params={"name": tf_name, "assembly": assembly}, timeout=timeout)
    r.raise_for_status()
    return r.json()

peaks = query_tf_peaks("CTCF")
rows = [{"biotype": p.get("biotype", "unknown"),
         "cell_type": p.get("name", "::").split(":")[2] if len(p.get("name","").split(":")) > 2 else ""}
        for p in peaks]
df = pd.DataFrame(rows)

biotype_counts = df["biotype"].value_counts()
biotype_counts = biotype_counts[biotype_counts > 0]
print(f"\nCTCF peak biotype distribution:")
print(biotype_counts.to_string())

# Stacked bar chart across top 5 cell types
top_cells = df["cell_type"].value_counts().head(5).index.tolist()
pivot = (df[df["cell_type"].isin(top_cells)]
         .groupby(["cell_type", "biotype"])
         .size()
         .unstack(fill_value=0))

fig, ax = plt.subplots(figsize=(9, 5))
pivot.plot(kind="bar", stacked=True, ax=ax, colormap="tab10", edgecolor="white")
ax.set_xlabel("Cell Type")
ax.set_ylabel("Peak Count")
ax.set_title("CTCF ChIP-seq Peak Biotype Distribution by Cell Type (ReMap 2022, hg38)")
ax.legend(title="Biotype", bbox_to_anchor=(1.01, 1), loc="upper left", fontsize=8)
plt.tight_layout()
plt.savefig("CTCF_biotype_distribution.png", dpi=150, bbox_inches="tight")
print("Saved CTCF_biotype_distribution.png")

Key Concepts

Peak Name Field Format

The

name

field in every ReMap peak record encodes three pieces of information as a colon-separated string:

TF_NAME:EXPERIMENT_ID:CELL_TYPE

For example:

CTCF:GSE30263.SRX028592:GM12878

Always parse with

.split(":")

and guard against missing parts. Some records may have fewer than three components if metadata is incomplete.

Assemblies

Assembly code	Organism	Notes
`hg38`	Homo sapiens (GRCh38)	Primary human assembly in ReMap 2022
`hg19`	Homo sapiens (GRCh37)	Legacy human assembly; fewer datasets
`mm10`	Mus musculus	Primary mouse assembly
`dm6`	Drosophila melanogaster	Smaller dataset collection
`tair10`	Arabidopsis thaliana	Plant TF dataset

BED File Download (API Fallback)

When the REST API is unavailable or for offline bulk analysis, ReMap provides pre-built BED files at

https://remap2022.univ-amu.fr/download_page

. Key files:

```
remap2022_all_macs2_hg38_v1_0.bed.gz
```
— all peaks, hg38 (large, ~5 GB)
```
remap2022_{TF}_macs2_hg38_v1_0.bed.gz
```
— per-TF peak files
```
remap2022_crm_macs2_hg38_v1_0.bed.gz
```
— cis-regulatory modules (merged peaks)

import pandas as pd

def load_remap_bed(bed_path, chrom=None, start=None, end=None):
    """
    Load a ReMap BED file with optional region filter.
    Columns: chr, start, end, name (TF:exp:cell), score, strand,
             thick_start, thick_end, itemRgb
    """
    cols = ["chr", "start", "end", "name", "score", "strand",
            "thick_start", "thick_end", "itemRgb"]
    df = pd.read_csv(bed_path, sep="\t", header=None, names=cols,
                     compression="infer", low_memory=False)
    if chrom:
        df = df[df["chr"] == chrom]
    if start is not None and end is not None:
        df = df[(df["end"] > start) & (df["start"] < end)]
    # Parse name field
    parts = df["name"].str.split(":", expand=True)
    df["tf_name"]       = parts[0]
    df["experiment_id"] = parts[1] if 1 in parts.columns else ""
    df["cell_type"]     = parts[2] if 2 in parts.columns else ""
    return df.reset_index(drop=True)

# Usage example (offline):
# df = load_remap_bed("remap2022_CTCF_macs2_hg38_v1_0.bed.gz",
#                     chrom="chr17", start=7_670_000, end=7_690_000)
# print(df.head())

Common Workflows

Workflow 1: TF Co-occupancy Analysis at a Locus

Goal: Identify all TFs with ChIP-seq evidence at a genomic locus and rank by peak count, then export a co-occupancy matrix.

import requests, time, pandas as pd, matplotlib.pyplot as plt

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_region(chrom, start, end, assembly="hg38", timeout=30):
    r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
        "chr": chrom, "start": start, "end": end, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

def parse_peaks(peaks):
    rows = []
    for p in peaks:
        parts = p.get("name", "::").split(":")
        rows.append({
            "tf_name":  parts[0] if len(parts) > 0 else "unknown",
            "cell_type": parts[2] if len(parts) > 2 else "unknown",
            "chr":   p.get("chr",   p.get("chrom", "")),
            "start": p.get("start", 0),
            "end":   p.get("end",   0),
            "score": p.get("score", 0),
        })
    return pd.DataFrame(rows)

# BRCA1 promoter region (GRCh38)
peaks = query_region("chr17", 43_044_000, 43_050_000, assembly="hg38")
df = parse_peaks(peaks)
print(f"Peaks at BRCA1 promoter: {len(df)}")

# TF occupancy summary
tf_summary = (df.groupby("tf_name")
                .agg(peak_count=("tf_name", "count"),
                     cell_types=("cell_type", "nunique"),
                     mean_score=("score", "mean"))
                .sort_values("peak_count", ascending=False))
print(f"\nTop TFs at BRCA1 promoter:")
print(tf_summary.head(15).to_string())
tf_summary.to_csv("BRCA1_promoter_TF_occupancy.csv")

# Horizontal bar chart
top = tf_summary.head(20)
fig, ax = plt.subplots(figsize=(8, 6))
ax.barh(top.index[::-1], top["peak_count"][::-1], color="#1f77b4", edgecolor="white")
ax.set_xlabel("Number of ChIP-seq Peaks")
ax.set_title("TF Co-occupancy at BRCA1 Promoter (ReMap 2022, hg38)")
plt.tight_layout()
plt.savefig("BRCA1_promoter_TF_cooccupancy.png", dpi=150, bbox_inches="tight")
print("Saved BRCA1_promoter_TF_cooccupancy.png")

Workflow 2: Gene Regulatory Profile — TSS-Proximal TF Binding Atlas

Goal: For a list of genes, retrieve their promoter-proximal TF binding profiles and compare the TF repertoires across genes.

import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_gene_peaks(gene_name, assembly="hg38", timeout=30):
    try:
        r = requests.get(f"{REMAP_API}/peaks/gene/", params={
            "gene": gene_name, "assembly": assembly
        }, timeout=timeout)
        r.raise_for_status()
        return r.json()
    except Exception as e:
        print(f"  Warning: {gene_name} failed — {e}")
        return []

genes_of_interest = ["MYC", "TP53", "BRCA1", "EGFR", "CDK4"]
gene_tf_profiles = {}

for gene in genes_of_interest:
    peaks = query_gene_peaks(gene, assembly="hg38")
    if peaks:
        tfs = set()
        for p in peaks:
            parts = p.get("name", "").split(":")
            if parts:
                tfs.add(parts[0])
        gene_tf_profiles[gene] = tfs
        print(f"{gene}: {len(peaks)} peaks, {len(tfs)} unique TFs")
    time.sleep(0.5)

# Build binary TF presence matrix
all_tfs = sorted(set().union(*gene_tf_profiles.values()))
matrix = pd.DataFrame(
    {gene: [1 if tf in gene_tf_profiles.get(gene, set()) else 0 for tf in all_tfs]
     for gene in genes_of_interest},
    index=all_tfs
)
print(f"\nTF × Gene matrix: {matrix.shape}")
print(f"TFs shared by all genes: {(matrix.sum(axis=1) == len(genes_of_interest)).sum()}")
matrix.to_csv("gene_TF_binding_atlas.csv")
print("Saved gene_TF_binding_atlas.csv")

Workflow 3: Download and Analyze TF Peak BED File

Goal: Download a TF-specific ReMap BED file and analyze its genomic distribution with pandas.

import requests, gzip, io, pandas as pd, time

# ReMap provides per-TF BED files. For large-scale offline analysis:
REMAP_DOWNLOAD_BASE = "https://remap2022.univ-amu.fr/storage/remap2022/hg38/MACS2"

def download_tf_bed(tf_name, assembly="hg38", save_path=None):
    """
    Attempt to download TF-specific BED file from ReMap.
    Falls back to API region query if download unavailable.
    """
    filename = f"remap2022_{tf_name}_macs2_{assembly}_v1_0.bed.gz"
    url = f"{REMAP_DOWNLOAD_BASE}/{filename}"
    print(f"Attempting download: {url}")
    r = requests.get(url, stream=True, timeout=60)
    if r.status_code == 200:
        if save_path:
            with open(save_path, "wb") as f:
                for chunk in r.iter_content(chunk_size=8192):
                    f.write(chunk)
            print(f"Saved: {save_path}")
            return save_path
        else:
            # Read directly into DataFrame
            content = b"".join(r.iter_content(chunk_size=8192))
            cols = ["chr", "start", "end", "name", "score", "strand",
                    "thick_start", "thick_end", "itemRgb"]
            with gzip.open(io.BytesIO(content), "rt") as gz:
                df = pd.read_csv(gz, sep="\t", header=None, names=cols)
            return df
    else:
        print(f"Download returned {r.status_code}; use API query as fallback")
        return None

# Analyze a downloaded BED file
def analyze_remap_bed(df):
    """Compute summary statistics for a ReMap peak DataFrame."""
    parts = df["name"].str.split(":", expand=True)
    df = df.copy()
    df["tf_name"]   = parts[0]
    df["cell_type"] = parts[2] if 2 in parts.columns else "unknown"
    df["width"] = df["end"] - df["start"]

    print(f"Total peaks: {len(df):,}")
    print(f"Unique TFs: {df['tf_name'].nunique()}")
    print(f"Unique cell types: {df['cell_type'].nunique()}")
    print(f"\nPeak width (bp): median={df['width'].median():.0f}  "
          f"mean={df['width'].mean():.0f}  range=[{df['width'].min()}, {df['width'].max()}]")
    print(f"\nChromosome distribution:")
    chr_counts = df["chr"].value_counts().head(5)
    print(chr_counts.to_string())
    return df

# Example usage (requires BED download or substitute with API results):
# df_raw = download_tf_bed("CTCF", save_path="CTCF_hg38.bed.gz")
# if df_raw is not None:
#     df_analyzed = analyze_remap_bed(df_raw)

Key Parameters

Parameter	Endpoint	Default	Range / Options	Effect
`chr`	`/peaks/overlap/`	—	`chr1` – `chrX` , `chrY` , `chrM`	Chromosome for region query (include `chr` prefix)
`start`	`/peaks/overlap/`	—	Integer genomic coordinate	Region start (0-based)
`end`	`/peaks/overlap/`	—	Integer genomic coordinate	Region end (exclusive)
`assembly`	All endpoints	—	`hg38` , `hg19` , `mm10` , `dm6` , `tair10`	Genome assembly for coordinates and peak lookup
`gene`	`/peaks/gene/`	—	HGNC gene symbol (e.g., `TP53` , `MYC` )	Queries peaks near the gene's annotated TSS
`name`	`/tfbs/name/`	—	TF name as in ReMap (e.g., `CTCF` , `SP1` )	TF name is case-sensitive; match ReMap TF naming
`biotype`	`/peaks/biotype/`	—	`promoter` , `enhancer` , `exon` , `intron` , `intergenic` , `UTR`	Filters peaks by Ensembl regulatory biotype
`timeout`	All requests	30	Integer seconds	Increase to 60–120 for large gene/TF queries

Best Practices

Parse the
```
name
```
field defensively: The
```
TF:experiment:cell_type
```
format may have fewer than three components for some records. Always guard with
```
parts[n] if len(parts) > n else ""
```
.
Use BED downloads for genome-wide analyses: Querying large genomic regions or all peaks for a TF via the REST API can time out. For whole-genome or per-chromosome scans, download the per-TF or per-assembly BED files from the ReMap download page and filter locally with pandas or bedtools.
Cross-reference with JASPAR for sequence evidence: ReMap peaks show where TF binding was detected by ChIP-seq (positional evidence); JASPAR PWMs show what sequence the TF prefers (motif evidence). For robust regulatory annotation, require both: a ReMap peak in the region AND a JASPAR motif hit within the peak.
Use
```
time.sleep(0.5)
```
in batch loops: The ReMap API serves a research community; polite request pacing prevents throttling.
Validate assembly coordinates: ReMap 2022 hg38 peaks use 0-based half-open BED coordinates (
```
[start, end)
```
). When comparing with VCF or 1-based GFF coordinates, add 1 to
```
start
```
.

Common Recipes

Recipe: Find TFs Binding at a GWAS SNP

When to use: Prioritize functional candidates from a GWAS hit by identifying which TFs bind at the SNP location.

import requests

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def tfs_at_snp(chrom, pos, window=500, assembly="hg38"):
    """Find TFs with ChIP-seq peaks overlapping a SNP position ± window bp."""
    r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
        "chr": chrom, "start": pos - window, "end": pos + window,
        "assembly": assembly
    }, timeout=30)
    r.raise_for_status()
    peaks = r.json()
    tfs = {}
    for p in peaks:
        parts = p.get("name", "::").split(":")
        tf = parts[0] if parts else "unknown"
        tfs[tf] = tfs.get(tf, 0) + 1
    return dict(sorted(tfs.items(), key=lambda x: -x[1]))

# Example: rs2736100 (TERT locus, chr5:1,286,401)
snp_tfs = tfs_at_snp("chr5", 1_286_401, window=500, assembly="hg38")
print(f"TFs at TERT GWAS SNP (±500 bp): {len(snp_tfs)}")
for tf, count in list(snp_tfs.items())[:10]:
    print(f"  {tf:<20s} {count:3d} peaks")

Recipe: Compare TF Binding Profiles of Two Genes

When to use: Check whether two co-regulated genes share the same upstream TF binding landscape.

import requests, time

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def get_gene_tfs(gene, assembly="hg38"):
    try:
        r = requests.get(f"{REMAP_API}/peaks/gene/", params={"gene": gene, "assembly": assembly}, timeout=30)
        r.raise_for_status()
        peaks = r.json()
        return set(p.get("name", "").split(":")[0] for p in peaks if p.get("name", ""))
    except Exception as e:
        print(f"Warning: {gene} → {e}")
        return set()

gene_a, gene_b = "MYC", "MYCN"
tfs_a = get_gene_tfs(gene_a)
time.sleep(0.5)
tfs_b = get_gene_tfs(gene_b)

shared = tfs_a & tfs_b
only_a = tfs_a - tfs_b
only_b = tfs_b - tfs_a

print(f"{gene_a} TFs: {len(tfs_a)}  |  {gene_b} TFs: {len(tfs_b)}")
print(f"Shared: {len(shared)}  |  {gene_a}-only: {len(only_a)}  |  {gene_b}-only: {len(only_b)}")
print(f"\nShared TFs (first 15): {sorted(shared)[:15]}")
print(f"\n{gene_a}-only (first 10): {sorted(only_a)[:10]}")

Recipe: Export Region Peaks as BED

When to use: Export ReMap query results to BED format for downstream bedtools intersection or IGV visualization.

import requests, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def export_region_as_bed(chrom, start, end, outfile, assembly="hg38"):
    """Query ReMap region and save as 6-column BED file."""
    r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
        "chr": chrom, "start": start, "end": end, "assembly": assembly
    }, timeout=30)
    r.raise_for_status()
    peaks = r.json()
    rows = [{
        "chr":   p.get("chr",   p.get("chrom", "")),
        "start": p.get("start", 0),
        "end":   p.get("end",   0),
        "name":  p.get("name",  "."),
        "score": p.get("score", 0),
        "strand": p.get("strand", "."),
    } for p in peaks]
    df = pd.DataFrame(rows)
    df = df.sort_values(["chr", "start"])
    df.to_csv(outfile, sep="\t", header=False, index=False)
    print(f"Saved {len(df)} peaks to {outfile}")
    return df

export_region_as_bed("chr17", 7_670_000, 7_690_000, "TP53_locus_remap.bed")

Troubleshooting

Problem	Cause	Solution
`404 Not Found` from API	Endpoint path changed or unavailable	Check `https://remap2022.univ-amu.fr/api/` for current endpoint list; fall back to BED download
Empty JSON list `[]` from region query	No peaks in region, or assembly mismatch	Verify coordinates are on the correct assembly; try a wider window (±10 kb)
Gene query returns empty	Gene symbol not recognized by ReMap	Try Ensembl gene symbol; some aliases are not mapped — verify with HGNC
`requests.exceptions.Timeout`	Large region or slow server	Increase `timeout=60` ; for regions >1 Mb use BED file download instead
`name` field has only one component	Incomplete metadata in ReMap for that experiment	Guard with `parts[n] if len(parts) > n else "unknown"`
BED download 404	Per-TF files use exact ReMap TF naming	Check TF name case and spelling at `https://remap2022.univ-amu.fr/download_page`
Duplicate peaks for same TF	Multiple experiments per TF in a cell type	Group by `tf_name` and count unique experiments; deduplicate peaks with bedtools merge

Related Skills

```
jaspar-database
```
— TF binding motif matrices (PWMs/PFMs); use alongside ReMap peak evidence for sequence-level validation
```
encode-database
```
— ENCODE regulatory tracks including TF ChIP-seq, DNase-seq, and ATAC-seq; partially overlaps with ReMap
```
homer-motif-analysis
```
— de novo motif discovery in ChIP-seq peak sets from ReMap or MACS3
```
macs3-peak-calling
```
— call peaks from raw ChIP-seq BAM files; ReMap provides pre-called peaks from the same approach
```
regulomedb-database
```
— regulatory variant scoring that integrates TF binding evidence similar to ReMap

References

ReMap 2022 API documentation — REST API endpoint reference and interactive explorer
Hammal et al., Nucleic Acids Research 2022 — ReMap 2022 paper describing the 2022 release (165M peaks, 1,210 TFs)
ReMap portal and download page — web browser, download page for BED files and cis-regulatory modules
Chèneby et al., Nucleic Acids Research 2020 — ReMap 2020 paper describing the reprocessing pipeline and quality control methodology