SciAgent-Skills jaspar-database

Query the JASPAR 2024 TF binding profile database via REST API and pyJASPAR. Retrieve position frequency matrices (PFMs) and position weight matrices (PWMs) by TF name, JASPAR ID, species, or structural class. Scan DNA sequences for transcription factor binding sites (TFBS). Browse profiles by taxon (Homo sapiens, Mus musculus) or TF family (bHLH, zinc finger). Use for motif enrichment input, TFBS scanning, and regulatory sequence analysis. For ChIP-seq peak-based motif discovery use homer-motif-analysis; for regulatory variant scoring use regulomedb-database.

install
source · Clone the upstream repo
git clone https://github.com/jaechang-hits/SciAgent-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/genomics-bioinformatics/jaspar-database" ~/.claude/skills/jaechang-hits-sciagent-skills-jaspar-database && rm -rf "$T"
manifest: skills/genomics-bioinformatics/jaspar-database/SKILL.md
source content

JASPAR Database

Overview

JASPAR is a curated, open-access database of transcription factor (TF) binding profiles represented as position frequency matrices (PFMs). The 2024 release contains 1,209 profiles in the CORE vertebrate collection, covering 783 TFs with experimentally validated binding data from SELEX, ChIP-seq, and PBM experiments. Access is free via the JASPAR REST API at

https://jaspar.elixir.no/api/v1/
— no authentication required — and through the
pyJASPAR
Python library for matrix retrieval and manipulation.

When to Use

  • Looking up the PWM or PFM for a specific TF by name (e.g., CTCF, SP1, GATA1) to use as motif input for a scanning tool
  • Retrieving all JASPAR profiles for a species (e.g., Homo sapiens, Mus musculus) to build a motif library for enrichment analysis
  • Scanning a DNA promoter sequence for predicted TF binding sites using a known PWM
  • Finding all TFs of a given structural class (bHLH, zinc finger, homeodomain) to build a TF family binding profile set
  • Getting metadata for a JASPAR matrix: number of binding sites, information content, GC content, experiment type
  • Downloading complete JASPAR collection sets (CORE, UNVALIDATED, CNE) in JASPAR or MEME format for batch analysis
  • Use
    homer-motif-analysis
    instead when you need de novo motif discovery from ChIP-seq peaks; JASPAR is for retrieving known matrices
  • For regulatory element annotations tied to a genomic region use
    encode-database
    or
    regulomedb-database

Prerequisites

  • Python packages:
    requests
    ,
    pandas
    ,
    matplotlib
    ,
    numpy
  • Optional:
    pyJASPAR
    (Python library wrapping JASPAR REST API with BIOPYTHON motif objects)
  • Data requirements: TF gene symbols, JASPAR matrix IDs (e.g.,
    MA0139.1
    ), or DNA sequences (string or FASTA)
  • Environment: internet connection; no API key required
  • Rate limits: no official published limits; use
    time.sleep(0.5)
    between batch requests
pip install requests pandas matplotlib numpy
pip install pyJASPAR   # optional; pulls in biopython

Quick Start

import requests

JASPAR_API = "https://jaspar.elixir.no/api/v1"

# Search for CTCF profile in the CORE vertebrate collection
r = requests.get(f"{JASPAR_API}/matrix/", params={
    "search": "CTCF",
    "collection": "CORE",
    "tax_group": "vertebrates",
    "format": "json"
}, timeout=15)
r.raise_for_status()
results = r.json()
print(f"Profiles found: {results['count']}")
for m in results["results"][:3]:
    print(f"  {m['matrix_id']}  {m['name']}  sites={m['sites']}  type={m['type']}")
# Profiles found: 2
#   MA0139.1  CTCF  sites=190  type=ChIP-seq
#   MA1929.1  CTCF  sites=2135  type=ChIP-seq

Core API

Query 1: Matrix Search

Search for TF profiles by TF name, species, collection, or taxonomic group. Returns a paginated list of matching profile records.

import requests, time

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def jaspar_search(search=None, collection="CORE", tax_id=None, tax_group=None,
                  tf_class=None, tf_family=None, page_size=50):
    """Search JASPAR matrices. Returns list of result dicts."""
    params = {"format": "json", "page_size": page_size}
    if search:      params["search"]     = search
    if collection:  params["collection"] = collection
    if tax_id:      params["tax_id"]     = tax_id
    if tax_group:   params["tax_group"]  = tax_group
    if tf_class:    params["tf_class"]   = tf_class
    if tf_family:   params["tf_family"]  = tf_family

    all_results = []
    url = f"{JASPAR_API}/matrix/"
    while url:
        r = requests.get(url, params=params if url == f"{JASPAR_API}/matrix/" else None, timeout=15)
        r.raise_for_status()
        data = r.json()
        all_results.extend(data["results"])
        url = data.get("next")   # follow pagination
        time.sleep(0.3)
    return all_results

# Example: all CORE vertebrate profiles for GATA family
gata_profiles = jaspar_search(search="GATA", collection="CORE", tax_group="vertebrates")
print(f"GATA profiles: {len(gata_profiles)}")
for m in gata_profiles[:4]:
    print(f"  {m['matrix_id']}  {m['name']:12s}  {m.get('tf_class','')}  sites={m['sites']}")

Query 2: Matrix Retrieval

Fetch the full profile record for a specific matrix ID, including the raw PFM counts, metadata, and TF annotations.

import requests

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_matrix(matrix_id):
    """Return full matrix record for a JASPAR ID (e.g. 'MA0139.1')."""
    r = requests.get(f"{JASPAR_API}/matrix/{matrix_id}/", params={"format": "json"}, timeout=15)
    r.raise_for_status()
    return r.json()

m = get_matrix("MA0139.1")   # CTCF
print(f"ID: {m['matrix_id']}  Name: {m['name']}")
print(f"Collection: {m['collection']}  Type: {m['type']}")
print(f"Species: {[s['name'] for s in m.get('species', [])]}")
print(f"UniProt: {m.get('uniprot_ids', [])}")
print(f"Sites: {m['sites']}  Binding sites used to build matrix")
print(f"TF class: {m.get('class_name', 'n/a')}  Family: {m.get('family_name', 'n/a')}")

# PFM structure: dict mapping position (as str) -> {A, C, G, T: count}
pfm = m["pfm"]
n_positions = len(pfm)
print(f"\nPFM length: {n_positions} positions")
print(f"Position 0: {pfm['0']}")   # {A: x, C: y, G: z, T: w}
# Position 0: {'A': 87, 'C': 12, 'G': 22, 'T': 69}

Query 3: PWM Computation from PFM

Convert a raw PFM (count matrix) to a position weight matrix (PWM) using log-odds scoring. The PWM is used for binding site scanning.

import requests, numpy as np

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def pfm_to_pwm(pfm_dict, pseudocount=0.8, background=None):
    """
    Convert JASPAR PFM dict to PWM (log2 odds).
    pfm_dict: dict of str(position) -> {A, C, G, T: float}
    Returns: numpy array shape (4, L), rows = [A, C, G, T]
    """
    if background is None:
        background = {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25}
    bases = ["A", "C", "G", "T"]
    L = len(pfm_dict)
    counts = np.array([[pfm_dict[str(i)][b] for i in range(L)] for b in bases], dtype=float)
    counts += pseudocount
    freqs = counts / counts.sum(axis=0, keepdims=True)
    bg = np.array([background[b] for b in bases])[:, None]
    pwm = np.log2(freqs / bg)
    return pwm   # shape (4, L)

r = requests.get(f"{JASPAR_API}/matrix/MA0139.1/", params={"format": "json"}, timeout=15)
pfm = r.json()["pfm"]
pwm = pfm_to_pwm(pfm)

print(f"PWM shape: {pwm.shape}  (4 bases × {pwm.shape[1]} positions)")
print(f"Min score per position: {pwm.min(axis=0)[:5]}")
print(f"Max score per position: {pwm.max(axis=0)[:5]}")
# PWM shape: (19, ) -> transposed view: (4, 19)
# Min score per position: [-2.32 -3.64 -3.64 -1.32 -3.64]
# Max score per position: [ 1.87  1.82  1.87  1.71  1.90]

Query 4: Sequence Scanning

Scan a DNA sequence for TFBS matches by sliding the PWM across the sequence and computing log-odds scores at each position.

import requests, numpy as np

JASPAR_API = "https://jaspar.elixir.no/api/v1"

BASE_IDX = {"A": 0, "C": 1, "G": 2, "T": 3}

def pfm_to_pwm(pfm_dict, pseudocount=0.8):
    bases = ["A", "C", "G", "T"]
    L = len(pfm_dict)
    counts = np.array([[pfm_dict[str(i)][b] for i in range(L)] for b in bases], dtype=float)
    counts += pseudocount
    freqs = counts / counts.sum(axis=0, keepdims=True)
    return np.log2(freqs / 0.25)

def scan_sequence(seq, pwm, threshold_pct=0.80):
    """
    Slide pwm over seq, return hits above threshold_pct of max possible score.
    Returns list of (position, score, strand).
    """
    seq = seq.upper()
    L = pwm.shape[1]
    max_score = pwm.clip(min=0).sum(axis=0).sum()
    min_score = pwm.clip(max=0).sum(axis=0).sum()
    threshold = min_score + threshold_pct * (max_score - min_score)
    hits = []
    for i in range(len(seq) - L + 1):
        window = seq[i:i+L]
        if "N" in window:
            continue
        score = sum(pwm[BASE_IDX[window[j]], j] for j in range(L))
        if score >= threshold:
            hits.append((i, round(score, 3), "+"))
    return hits, max_score, threshold

# Fetch CTCF matrix and scan a synthetic CTCF-like sequence
r = requests.get(f"{JASPAR_API}/matrix/MA0139.1/", params={"format": "json"}, timeout=15)
pfm = r.json()["pfm"]
pwm = pfm_to_pwm(pfm)

# 100 bp sequence with known CTCF consensus embedded
seq = ("GCAGGTTTAAGCTTCCTGGCATTTAAGCTTCCTGGCATTTCCCCAGGGGGCGGAGGCAGAG"
       "CCGCGAGCCGCGAGCCGCGAGCCGCGAGCCGCGAGTTTAAG")
hits, max_score, thresh = scan_sequence(seq, pwm, threshold_pct=0.80)
print(f"Max PWM score: {max_score:.2f}  |  Threshold (80%): {thresh:.2f}")
print(f"Hits: {len(hits)}")
for pos, score, strand in hits:
    print(f"  pos={pos}  score={score:.2f}  {strand}  seq={seq[pos:pos+pwm.shape[1]]}")

Query 5: Taxon Browser

List all JASPAR CORE profiles for a specific organism, identified by NCBI taxonomy ID.

import requests, time, pandas as pd

JASPAR_API = "https://jaspar.elixir.no/api/v1"

# Common taxonomy IDs
TAX_IDS = {
    "Homo sapiens":    9606,
    "Mus musculus":    10090,
    "Rattus norvegicus": 10116,
    "Drosophila melanogaster": 7227,
    "Saccharomyces cerevisiae": 4932,
}

def get_species_profiles(tax_id, collection="CORE"):
    """Return all matrices for a species tax ID."""
    params = {"tax_id": tax_id, "collection": collection, "format": "json", "page_size": 100}
    results = []
    url = f"{JASPAR_API}/matrix/"
    while url:
        r = requests.get(url, params=params if url == f"{JASPAR_API}/matrix/" else None, timeout=15)
        r.raise_for_status()
        data = r.json()
        results.extend(data["results"])
        url = data.get("next")
        time.sleep(0.3)
    return results

human_profiles = get_species_profiles(TAX_IDS["Homo sapiens"])
print(f"Human CORE profiles: {len(human_profiles)}")
df = pd.DataFrame([{
    "matrix_id": m["matrix_id"],
    "name": m["name"],
    "tf_class": m.get("class_name", ""),
    "tf_family": m.get("family_name", ""),
    "sites": m["sites"],
    "type": m["type"],
} for m in human_profiles])
print(df.head(5).to_string(index=False))
print(f"\nExperiment types:\n{df['type'].value_counts().to_string()}")

Query 6: TF Class and Family Browser

Find all profiles belonging to a specific TF structural class or family, useful for building class-specific motif libraries.

import requests, time

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_class_profiles(tf_class=None, tf_family=None, collection="CORE", tax_group="vertebrates"):
    """Return all profiles for a TF structural class or family."""
    params = {"collection": collection, "tax_group": tax_group, "format": "json", "page_size": 100}
    if tf_class:  params["tf_class"]  = tf_class
    if tf_family: params["tf_family"] = tf_family
    results = []
    url = f"{JASPAR_API}/matrix/"
    while url:
        r = requests.get(url, params=params if url == f"{JASPAR_API}/matrix/" else None, timeout=15)
        r.raise_for_status()
        data = r.json()
        results.extend(data["results"])
        url = data.get("next")
        time.sleep(0.3)
    return results

# Common TF classes: "Zinc-coordinating", "Basic leucine zipper", "Helix-turn-helix"
# Common TF families: "C2H2 ZF", "bHLH", "bZIP", "Homeodomain"
bhlh = get_class_profiles(tf_family="bHLH", collection="CORE", tax_group="vertebrates")
print(f"bHLH vertebrate profiles: {len(bhlh)}")
for m in bhlh[:5]:
    print(f"  {m['matrix_id']:12s} {m['name']:15s} sites={m['sites']:5d}  {m['type']}")
# bHLH vertebrate profiles: 47
#   MA0006.1     Ahr::Arnt      sites=6    ChIP-seq
#   MA0010.1     Tal1::Gata1    sites=5    SELEX

Key Concepts

JASPAR Collections

JASPAR organizes profiles into curated collections:

CollectionDescriptionProfile count
CORE
Manually curated, non-redundant, high-quality profiles~1,200 (2024)
CNE
Profiles derived from conserved noncoding elements~100
UNVALIDATED
Profiles not yet manually curated~1,000
PHYLOFACTS
Profiles from phylogenetically constrained sites~100
POLII
RNA polymerase II binding profiles~10

For most analyses use

CORE
. The JASPAR CORE 2024 vertebrate collection is the default reference for motif enrichment tools.

Matrix ID Versioning

JASPAR matrix IDs have the format

MA{number}.{version}
(e.g.,
MA0139.1
). A new version is released when the binding data is updated. When scripting, use the versioned ID for reproducibility. Searching by name (e.g.,
CTCF
) returns all versions; select the highest-numbered version for the most up-to-date matrix.

Information Content

The information content (IC) at each position (in bits) measures binding site specificity:

  • IC = 2 - H(position), where H is the Shannon entropy
  • High IC (close to 2 bits) = near-invariant base (e.g., always A)
  • Low IC (close to 0) = little positional preference
  • Total IC = sum over all positions; high total IC = more specific binding
import numpy as np

def information_content(pfm_dict, pseudocount=0.8):
    """Compute per-position IC and total IC for a JASPAR PFM."""
    bases = ["A", "C", "G", "T"]
    L = len(pfm_dict)
    counts = np.array([[pfm_dict[str(i)][b] for i in range(L)] for b in bases], dtype=float)
    counts += pseudocount
    freqs = counts / counts.sum(axis=0, keepdims=True)
    entropy = -np.sum(freqs * np.log2(freqs + 1e-12), axis=0)
    ic_per_pos = 2 - entropy
    return ic_per_pos, ic_per_pos.sum()

import requests
r = requests.get("https://jaspar.elixir.no/api/v1/matrix/MA0139.1/",
                 params={"format": "json"}, timeout=15)
pfm = r.json()["pfm"]
ic_pos, total_ic = information_content(pfm)
print(f"Total IC: {total_ic:.2f} bits")
print(f"Max IC position: {ic_pos.argmax()}  ({ic_pos.max():.2f} bits)")
# Total IC: 19.47 bits
# Max IC position: 9  (1.99 bits)

Common Workflows

Workflow 1: Build a Human TF Motif Library and Export to MEME Format

Goal: Download all human CORE profiles and write them in MEME minimal format for use with FIMO, AME, or TOMTOM.

import requests, time, numpy as np

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_all_profiles(tax_id=9606, collection="CORE"):
    params = {"tax_id": tax_id, "collection": collection, "format": "json", "page_size": 100}
    results, url = [], f"{JASPAR_API}/matrix/"
    while url:
        r = requests.get(url, params=params if url == f"{JASPAR_API}/matrix/" else None, timeout=15)
        r.raise_for_status()
        data = r.json()
        results.extend(data["results"])
        url = data.get("next")
        time.sleep(0.3)
    return results

def pfm_to_freq(pfm_dict, pseudocount=0.1):
    """Return (4, L) frequency matrix [A, C, G, T]."""
    bases = ["A", "C", "G", "T"]
    L = len(pfm_dict)
    counts = np.array([[pfm_dict[str(i)][b] for i in range(L)] for b in bases], dtype=float)
    counts += pseudocount
    return counts / counts.sum(axis=0, keepdims=True)

profiles = get_all_profiles(tax_id=9606, collection="CORE")
print(f"Downloaded {len(profiles)} human CORE profiles")

meme_lines = [
    "MEME version 4",
    "ALPHABET= ACGT",
    "strands: + -",
    "Background letter frequencies",
    "A 0.25 C 0.25 G 0.25 T 0.25",
    "",
]
for m in profiles:
    pfm = m["pfm"]
    freq = pfm_to_freq(pfm)
    L = freq.shape[1]
    meme_lines.append(f"MOTIF {m['matrix_id']} {m['name']}")
    meme_lines.append(f"letter-probability matrix: alength= 4 w= {L} nsites= {m['sites']}")
    for j in range(L):
        row = " ".join(f"{freq[b, j]:.4f}" for b in range(4))
        meme_lines.append(f"  {row}")
    meme_lines.append("")

output_path = "jaspar_human_core_2024.meme"
with open(output_path, "w") as f:
    f.write("\n".join(meme_lines))
print(f"Written: {output_path}  ({len(profiles)} motifs)")

Workflow 2: Promoter Scan for Multiple TFs and Visualize PWM Logos

Goal: Download PWMs for a set of TFs and scan a promoter sequence, then visualize the best-scoring hit as a bar logo.

import requests, numpy as np, time
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

JASPAR_API = "https://jaspar.elixir.no/api/v1"
BASE_IDX = {"A": 0, "C": 1, "G": 2, "T": 3}
BASE_COLORS = {"A": "#2ca02c", "C": "#1f77b4", "G": "#ff7f0e", "T": "#d62728"}

def fetch_pwm(matrix_id, pseudocount=0.8):
    r = requests.get(f"{JASPAR_API}/matrix/{matrix_id}/", params={"format": "json"}, timeout=15)
    r.raise_for_status()
    pfm = r.json()["pfm"]
    L = len(pfm)
    counts = np.array([[pfm[str(i)][b] for i in range(L)] for b in ["A","C","G","T"]], dtype=float)
    counts += pseudocount
    freqs = counts / counts.sum(axis=0, keepdims=True)
    return np.log2(freqs / 0.25), freqs

def scan(seq, pwm, pct=0.80):
    seq = seq.upper()
    L = pwm.shape[1]
    max_s = pwm.clip(min=0).sum()
    min_s = pwm.clip(max=0).sum()
    thresh = min_s + pct * (max_s - min_s)
    return [(i, sum(pwm[BASE_IDX[seq[i+j]], j] for j in range(L)))
            for i in range(len(seq)-L+1) if "N" not in seq[i:i+L]
            and sum(pwm[BASE_IDX[seq[i+j]], j] for j in range(L)) >= thresh]

def plot_logo(freqs, title, outfile):
    """Bar-chart sequence logo from frequency matrix (4, L)."""
    L = freqs.shape[1]
    entropy = -np.sum(freqs * np.log2(freqs + 1e-12), axis=0)
    ic = 2 - entropy
    bases = ["A", "C", "G", "T"]
    fig, ax = plt.subplots(figsize=(max(6, L * 0.5), 2.5))
    bottom = np.zeros(L)
    for idx, base in enumerate(bases):
        heights = freqs[idx] * ic
        ax.bar(range(L), heights, bottom=bottom, color=BASE_COLORS[base], label=base, width=0.8)
        bottom += heights
    ax.set_xticks(range(L))
    ax.set_xticklabels([str(i+1) for i in range(L)], fontsize=7)
    ax.set_ylabel("Information Content (bits)")
    ax.set_title(title, fontsize=10)
    ax.legend(handles=[mpatches.Patch(color=BASE_COLORS[b], label=b) for b in bases],
              loc="upper right", fontsize=7, ncol=4)
    plt.tight_layout()
    plt.savefig(outfile, dpi=150, bbox_inches="tight")
    plt.close()
    print(f"Saved {outfile}")

# TP53 promoter-region fragment (GRCh38 chr17:7,687,000-7,687,100)
promoter = ("GCAGAGGCGGAGGATTTGCCTTTTTTCGAGTTGGTGAGAGATCTGGGGCGGGGCAGGGCC"
            "CTGGAACGGCAGGACGGAGAGCAAGGCCGGGGAAGGGCGGGAGCGGGCGGG")

# Scan for SP1 (MA0080.4) and TP53 (MA0106.3) hits
for matrix_id, tf_name in [("MA0080.4", "SP1"), ("MA0106.3", "TP53")]:
    pwm, freqs = fetch_pwm(matrix_id)
    hits = scan(promoter, pwm, pct=0.75)
    print(f"{tf_name} ({matrix_id}): {len(hits)} hits at >=75% threshold")
    for pos, score in hits[:3]:
        print(f"   pos={pos}  score={score:.2f}  {promoter[pos:pos+pwm.shape[1]]}")
    plot_logo(freqs, f"{tf_name} ({matrix_id}) PWM logo", f"{tf_name}_logo.png")
    time.sleep(0.5)

Workflow 3: TF Co-binding Partner Discovery via Shared Matrix Families

Goal: Find all TF families that have profiles in JASPAR, then retrieve family members to identify potential co-binding partners of a TF of interest.

import requests, time, pandas as pd
from collections import Counter

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_profiles_df(tax_group="vertebrates", collection="CORE"):
    params = {"tax_group": tax_group, "collection": collection, "format": "json", "page_size": 100}
    results, url = [], f"{JASPAR_API}/matrix/"
    while url:
        r = requests.get(url, params=params if url == f"{JASPAR_API}/matrix/" else None, timeout=15)
        r.raise_for_status()
        data = r.json()
        results.extend(data["results"])
        url = data.get("next")
        time.sleep(0.3)
    return pd.DataFrame([{
        "matrix_id": m["matrix_id"],
        "name": m["name"],
        "tf_class": m.get("class_name", "Unknown"),
        "tf_family": m.get("family_name", "Unknown"),
        "sites": m["sites"],
        "type": m["type"],
        "uniprot": ";".join(m.get("uniprot_ids", [])),
    } for m in results])

df = get_profiles_df(tax_group="vertebrates", collection="CORE")
print(f"Total CORE vertebrate profiles: {len(df)}")

# Top TF families
family_counts = df["tf_family"].value_counts()
print(f"\nTop 10 TF families:")
print(family_counts.head(10).to_string())

# Co-binding: find all TFs in the same family as CTCF
ctcf_row = df[df["name"] == "CTCF"].iloc[0]
ctcf_family = ctcf_row["tf_family"]
co_family = df[df["tf_family"] == ctcf_family][["matrix_id", "name", "sites", "type"]]
print(f"\nTFs in {ctcf_family} family (potential CTCF co-binders by class):")
print(co_family.to_string(index=False))
df.to_csv("jaspar_core_vertebrates.csv", index=False)
print(f"\nSaved jaspar_core_vertebrates.csv")

Key Parameters

ParameterEndpointDefaultRange / OptionsEffect
search
/matrix/
Any string (TF name, gene symbol)Full-text search across name and aliases
collection
/matrix/
CORE
,
UNVALIDATED
,
CNE
,
POLII
,
PHYLOFACTS
Restricts to a JASPAR sub-collection
tax_group
/matrix/
vertebrates
,
insects
,
plants
,
fungi
,
nematodes
,
urochordates
Filter by broad taxonomic group
tax_id
/matrix/
NCBI taxonomy ID integer (e.g.,
9606
)
Restrict to a single species
tf_class
/matrix/
"Zinc-coordinating"
,
"Basic leucine zipper"
,
"Helix-turn-helix"
, etc.
Filter by TF structural class
tf_family
/matrix/
"C2H2 ZF"
,
"bHLH"
,
"bZIP"
,
"Homeodomain"
, etc.
Filter by TF structural family
page_size
/matrix/
101–100Results per page; use 100 for batch downloads
pseudocount
PFM→PWM (local)0.80.01–1.0Smooths zero-count positions; higher = less extreme PWM values
threshold_pct
scan (local)0.800.50–0.99Fraction of max score required to call a hit; lower = more permissive

Best Practices

  1. Pin matrix ID versions for reproducibility: Use

    MA0139.1
    (not just
    CTCF
    ) in scripts and manuscripts so results do not silently change across JASPAR releases.

  2. Always add pseudocounts when computing PWMs: Raw PFMs contain zero counts for rare positions. A zero count produces -inf in log space, eliminating any sequence with that nucleotide. Use

    pseudocount = 0.8
    (JASPAR recommendation) or
    pseudocount = sqrt(sites) / 4
    .

  3. Use the CORE collection for standard analyses:

    UNVALIDATED
    profiles have not been manually curated and may contain lower-confidence motifs. Reserve
    UNVALIDATED
    for exploratory analyses.

  4. Respect pagination: JASPAR returns at most 100 results per page. Always follow the

    next
    URL in responses when building complete profile sets.

  5. For batch scanning, use FIMO (MEME suite) rather than manual sliding window: The manual scanning above is educational. For production use, export matrices to MEME format (Workflow 1) and use

    fimo --thresh 1e-4 motifs.meme sequence.fa
    .

Common Recipes

Recipe: Fetch All Versions of a TF's Matrix

When to use: Compare old and new profile versions of the same TF to check for changes.

import requests

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_all_versions(tf_name, collection="CORE"):
    r = requests.get(f"{JASPAR_API}/matrix/", params={
        "search": tf_name, "collection": collection, "format": "json", "page_size": 50
    }, timeout=15)
    r.raise_for_status()
    return r.json()["results"]

versions = get_all_versions("SP1")
print(f"SP1 matrix versions in CORE: {len(versions)}")
for m in sorted(versions, key=lambda x: x["matrix_id"]):
    print(f"  {m['matrix_id']:12s}  sites={m['sites']:5d}  type={m['type']}")
# SP1 matrix versions in CORE: 4
#   MA0080.1      sites=   18  type=SELEX
#   MA0080.2      sites=   20  type=SELEX
#   MA0080.3      sites=  146  type=ChIP-seq
#   MA0080.4      sites=  481  type=ChIP-seq

Recipe: Retrieve Matrix in JASPAR Flat-File Format

When to use: Download a matrix in the legacy JASPAR text format for tools that accept it directly.

import requests

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_jaspar_format(matrix_id):
    """Return matrix as JASPAR flat-file string."""
    r = requests.get(f"{JASPAR_API}/matrix/{matrix_id}/", params={"format": "jaspar"}, timeout=15)
    r.raise_for_status()
    return r.text

jaspar_str = get_jaspar_format("MA0139.1")
print(jaspar_str[:200])
# >MA0139.1  CTCF
# A [ 87  167  281  56  8  32  15  4  55 ...]
# C [ 12   30   98  42  111  30  ...]
# G [ 22   36  160  40  66  21  ...]
# T [ 69   67  ...  ]

with open("CTCF_MA0139.1.jaspar", "w") as f:
    f.write(jaspar_str)
print("Saved CTCF_MA0139.1.jaspar")

Recipe: pyJASPAR Quick Motif Retrieval

When to use: Retrieve a motif as a BioPython

motifs.Motif
object when downstream tools expect that interface.

# Requires: pip install pyJASPAR
import pyJASPAR

db = pyJASPAR.JASPAR2024(auto_reverse_complement=True)
motif = db.fetch_motif_by_id("MA0139.1")   # CTCF
print(f"Name: {motif.name}")
print(f"Matrix ID: {motif.matrix_id}")
print(f"Length: {len(motif)}")
print(f"Consensus: {motif.consensus}")
# Name: CTCF
# Matrix ID: MA0139.1
# Length: 19
# Consensus: CCGCGNGGNGGCAG

# Fetch multiple motifs for a TF family
motifs = db.fetch_motifs(collection="CORE", tax_id=9606, tf_family="bHLH")
print(f"Human bHLH motifs: {len(motifs)}")
for m in motifs[:3]:
    print(f"  {m.matrix_id}  {m.name}  len={len(m)}")

Troubleshooting

ProblemCauseSolution
Empty
results
list from search
TF not in JASPAR, wrong collection, or wrong
tax_group
Try
collection=None
to search all collections; check TF alias (e.g., NF-kB → RELA)
404 Not Found
for matrix ID
Invalid or misspelled matrix IDVerify ID format:
MA
+ 4 digits +
.
+ version (e.g.,
MA0139.1
); search by name first
pfm['0']
missing key
Some JASPAR profiles have 0-indexed positions as integers, not stringsCast position keys:
pfm_dict = {str(k): v for k, v in pfm.items()}
PWM scan produces no hitsThreshold too strict or sequence too shortLower
threshold_pct
to 0.70; check sequence length vs motif length
pyJASPAR
install fails
Requires Python ≥3.8 and C extensions for BIOPYTHONUse
requests
-based API directly; pyJASPAR is optional
Pagination stops early
next
field is
null
before expected total
Check
count
field in first response vs
len(results)
after loop
High IC positions show wrong basePFM row order assumed incorrectlyJASPAR always returns
{A, C, G, T}
keys; never assume positional ordering

Related Skills

  • homer-motif-analysis
    — de novo motif discovery from ChIP-seq or ATAC-seq peak sets; complements JASPAR known-motif library
  • regulomedb-database
    — regulatory variant scoring using TF binding evidence overlapping JASPAR motifs
  • encode-database
    — download TF ChIP-seq peak files that can be cross-referenced with JASPAR profiles
  • remap-database
    — TF binding peak sets from ChIP-seq experiments for binding site validation
  • macs3-peak-calling
    — produce ChIP-seq peak BED files for downstream JASPAR motif enrichment

References