SciAgent-Skills archs4-database
Query uniformly processed RNA-seq gene expression profiles, tissue-specific expression patterns, and co-expression networks from the ARCHS4 database REST API. Retrieve z-score normalized expression across 1M+ human and mouse samples, find co-expressed genes, search samples by metadata, and download HDF5 expression matrices. For variant-level population genetics use gnomad-database; for pathway enrichment from gene lists use gget-genomic-databases (Enrichr).
git clone https://github.com/jaechang-hits/SciAgent-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/genomics-bioinformatics/archs4-database" ~/.claude/skills/jaechang-hits-sciagent-skills-archs4-database && rm -rf "$T"
skills/genomics-bioinformatics/archs4-database/SKILL.mdARCHS4 Database
Overview
ARCHS4 (All RNA-seq and ChIP-seq Sample and Signature Search) is a resource of uniformly aligned and processed human and mouse RNA-seq data from NCBI GEO and SRA, covering 1 million+ samples. The REST API at
https://maayanlab.cloud/archs4/api/ provides gene-level expression profiles, z-score normalized tissue expression, co-expression networks, and sample metadata search — all without authentication. Large-scale bulk queries can also use the downloadable HDF5 expression matrices.
When to Use
- Retrieving tissue-specific or cell-type-specific expression z-scores for a gene of interest across hundreds of tissue types
- Finding genes co-expressed with a query gene (co-expression network construction or guilt-by-association analysis)
- Searching for RNA-seq samples by tissue, disease, or metadata keyword to identify candidate datasets for reanalysis
- Comparing expression profiles of multiple genes across tissues to prioritize candidates for wet-lab follow-up
- Accessing uniformly processed gene expression matrices (HDF5 format) for large-scale cross-study analysis
- Validating differential expression results by checking whether a gene's expression direction matches population-level tissue profiles
- For variant-level population allele frequencies use
; ARCHS4 provides expression evidence onlygnomad-database - For Enrichr pathway enrichment from a gene list use
(gget-genomic-databases
); ARCHS4 is for expression lookupsgget enrichr
Prerequisites
- Python packages:
,requests
,pandas
,matplotlibseaborn - Data requirements: gene symbols (HGNC format, e.g.,
,TP53
); sample GEO/SRA IDs for direct sample queriesBRCA1 - Environment: internet connection; no API key or account required
- Rate limits: ~10 requests/second; add
between sequential gene queries to avoid throttlingtime.sleep(0.1)
pip install requests pandas matplotlib seaborn
Quick Start
import requests ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" def archs4_get(endpoint: str, params: dict = None) -> dict: """Send a GET request to the ARCHS4 API and return parsed JSON.""" r = requests.get(f"{ARCHS4_BASE}/{endpoint}", params=params, timeout=30) r.raise_for_status() return r.json() # Quick check: top tissues expressing TP53 data = archs4_get("meta/genes/TP53/zscore") tissues = data.get("values", []) print(f"TP53 tissue expression entries: {len(tissues)}") top5 = sorted(tissues, key=lambda x: x.get("zscore", 0), reverse=True)[:5] for t in top5: print(f" {t['tissue']:<40} z={t['zscore']:.2f}") # TP53 tissue expression entries: 200 # thymus z=2.81 # testis z=2.44
Core API
Query 1: Gene Expression Z-Scores Across Tissues
Retrieve z-score normalized expression for a gene across all available tissue types. Z-scores are computed per-sample relative to the population distribution; positive values indicate above-average expression.
import requests import pandas as pd ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" def get_gene_tissue_zscore(gene_symbol: str, species: str = "human") -> pd.DataFrame: """Return tissue z-score expression profile for a gene. Parameters ---------- gene_symbol : str HGNC gene symbol (e.g., 'TP53'). species : str 'human' or 'mouse' (default: 'human'). """ endpoint = f"meta/genes/{gene_symbol}/zscore" r = requests.get( f"{ARCHS4_BASE}/{endpoint}", params={"species": species}, timeout=30 ) r.raise_for_status() data = r.json() records = data.get("values", []) df = pd.DataFrame(records) return df.sort_values("zscore", ascending=False).reset_index(drop=True) df = get_gene_tissue_zscore("MYC") print(f"MYC tissue z-scores: {len(df)} tissue types") print(df[["tissue", "zscore"]].head(10).to_string(index=False)) # MYC tissue z-scores: 200 # tissue zscore # colon 3.12 # small intestine 2.98 # placenta 2.74
# Query mouse tissues for a gene df_mouse = get_gene_tissue_zscore("Myc", species="mouse") print(f"Mouse Myc: top 5 tissues") print(df_mouse[["tissue", "zscore"]].head(5).to_string(index=False))
Query 2: Co-expressed Genes
Find genes whose expression is most correlated with a query gene across all ARCHS4 samples. Useful for identifying pathway partners, regulators, or candidate targets.
import requests import pandas as pd ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" def get_coexpressed_genes(gene_symbol: str, top_n: int = 50, species: str = "human") -> pd.DataFrame: """Return genes co-expressed with the query gene. Parameters ---------- gene_symbol : str HGNC gene symbol. top_n : int Number of correlated genes to return (default: 50). species : str 'human' or 'mouse' (default: 'human'). """ r = requests.get( f"{ARCHS4_BASE}/meta/genes/{gene_symbol}/correlations", params={"species": species, "limit": top_n}, timeout=30 ) r.raise_for_status() data = r.json() records = data.get("values", []) df = pd.DataFrame(records) return df.sort_values("correlation", ascending=False).reset_index(drop=True) coexp = get_coexpressed_genes("PCNA", top_n=20) print(f"Top co-expressed genes with PCNA (n={len(coexp)}):") print(coexp[["gene", "correlation"]].head(10).to_string(index=False)) # Top co-expressed genes with PCNA (n=20): # gene correlation # RFC4 0.91 # RFC2 0.89 # MCM6 0.87
# Extract gene list for downstream enrichment gene_list = coexp["gene"].tolist() print(f"Co-expression gene list: {gene_list[:10]}") # Pass gene_list to Enrichr or pathway analysis tools
Query 3: Sample Search
Search for RNA-seq samples by metadata keyword (tissue, disease condition, cell type, treatment). Returns GEO/SRA sample identifiers with metadata fields.
import requests import pandas as pd ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" def search_samples(keyword: str, species: str = "human", limit: int = 100) -> pd.DataFrame: """Search ARCHS4 samples by metadata keyword. Parameters ---------- keyword : str Search term (e.g., 'breast cancer', 'liver', 'HeLa'). species : str 'human' or 'mouse'. limit : int Maximum number of samples to return. """ r = requests.get( f"{ARCHS4_BASE}/samples/search", params={"query": keyword, "species": species, "limit": limit}, timeout=30 ) r.raise_for_status() data = r.json() records = data.get("samples", []) return pd.DataFrame(records) samples = search_samples("pancreatic cancer", limit=50) print(f"Samples matching 'pancreatic cancer': {len(samples)}") if len(samples) > 0: print(samples[["sample_id", "series_id", "title"]].head(5).to_string(index=False)) # Samples matching 'pancreatic cancer': 50 # sample_id series_id title # GSM2345678 GSE123456 Pancreatic ductal adenocarcinoma - sample 1
Query 4: Gene-Level Metadata Summary
Retrieve summary statistics and metadata for a gene including the number of samples expressing it, expression percentile, and available annotation.
import requests ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" def get_gene_metadata(gene_symbol: str, species: str = "human") -> dict: """Return metadata and expression summary for a gene.""" r = requests.get( f"{ARCHS4_BASE}/meta/genes/{gene_symbol}", params={"species": species}, timeout=30 ) r.raise_for_status() return r.json() meta = get_gene_metadata("GAPDH") print(f"Gene: {meta.get('gene_symbol', 'N/A')}") print(f"Species: {meta.get('species', 'N/A')}") print(f"Ensembl ID: {meta.get('ensembl_gene_id', 'N/A')}") print(f"Description: {meta.get('description', 'N/A')[:80]}")
# Compare metadata for a panel of housekeeping genes import time housekeeping = ["GAPDH", "ACTB", "B2M", "HPRT1", "RPLP0"] for gene in housekeeping: meta = get_gene_metadata(gene) print(f" {gene:<8} {meta.get('ensembl_gene_id', 'N/A')}") time.sleep(0.1)
Query 5: Visualization — Tissue Expression Barplot
Generate a publication-ready barplot of z-score expression across the top tissues for a gene.
import requests import pandas as pd import matplotlib.pyplot as plt ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" def plot_tissue_expression(gene_symbol: str, top_n: int = 20, species: str = "human", output_file: str = None) -> None: """Plot top tissue z-score expression for a gene. Parameters ---------- gene_symbol : str HGNC gene symbol. top_n : int Number of top tissues to display. species : str 'human' or 'mouse'. output_file : str If provided, save figure to this path. """ r = requests.get( f"{ARCHS4_BASE}/meta/genes/{gene_symbol}/zscore", params={"species": species}, timeout=30 ) r.raise_for_status() records = r.json().get("values", []) df = pd.DataFrame(records).sort_values("zscore", ascending=False).head(top_n) fig, ax = plt.subplots(figsize=(10, 6)) colors = ["#D73027" if z > 0 else "#4575B4" for z in df["zscore"]] bars = ax.barh(df["tissue"][::-1], df["zscore"][::-1], color=colors[::-1]) ax.axvline(0, color="black", linewidth=0.8, linestyle="--") ax.set_xlabel("Expression Z-Score") ax.set_title(f"ARCHS4 Tissue Expression: {gene_symbol} ({species})\nTop {top_n} tissues") ax.bar_label(bars, fmt="%.2f", padding=3, fontsize=8) plt.tight_layout() fname = output_file or f"{gene_symbol}_tissue_expression.png" plt.savefig(fname, dpi=150, bbox_inches="tight") print(f"Saved {fname} ({len(df)} tissues plotted)") plot_tissue_expression("BRCA1", top_n=15, output_file="BRCA1_tissue_expression.png")
Query 6: HDF5 Bulk Data Access
Download or stream from ARCHS4's precomputed HDF5 expression matrices for large-scale cross-sample analysis. The HDF5 files contain gene × sample count matrices for human and mouse.
import requests # HDF5 files are available for bulk download from the ARCHS4 data portal # URL pattern: https://maayanlab.cloud/archs4/download#expression # Human gene-level: human_gene_v2.6.h5 # Mouse gene-level: mouse_gene_v2.6.h5 def get_h5_download_urls() -> dict: """Return download URLs for ARCHS4 HDF5 expression matrices.""" base = "https://maayanlab.cloud/archs4" return { "human_gene": f"{base}/files/human_gene_v2.6.h5", "mouse_gene": f"{base}/files/mouse_gene_v2.6.h5", "human_transcript": f"{base}/files/human_transcript_v2.6.h5", "mouse_transcript": f"{base}/files/mouse_transcript_v2.6.h5", } urls = get_h5_download_urls() for key, url in urls.items(): print(f" {key:<22} {url}") # To work with a downloaded HDF5 file: try: import h5py import numpy as np h5_path = "human_gene_v2.6.h5" # after download def extract_gene_from_h5(h5_path: str, gene_symbol: str, n_samples: int = 1000) -> dict: """Extract expression values for a gene from the HDF5 matrix.""" with h5py.File(h5_path, "r") as f: genes = [g.decode() for g in f["meta"]["genes"]["gene_symbol"][:]] if gene_symbol not in genes: raise ValueError(f"{gene_symbol} not found in HDF5") idx = genes.index(gene_symbol) expr = f["data"]["expression"][idx, :n_samples] sample_ids = [s.decode() for s in f["meta"]["samples"]["geo_accession"][:n_samples]] return {"gene": gene_symbol, "expression": expr, "sample_ids": sample_ids} result = extract_gene_from_h5(h5_path, "TP53", n_samples=500) print(f"TP53 expression: mean={result['expression'].mean():.2f}," f" max={result['expression'].max():.2f} (n={len(result['expression'])} samples)") except ImportError: print("h5py not installed. Install with: pip install h5py") except FileNotFoundError: print("HDF5 file not downloaded yet. Use the URLs above to download first.")
Key Concepts
Z-Score Normalization
ARCHS4 reports gene expression as z-scores computed relative to all samples for that gene. A z-score of 0 means expression at the population mean; a z-score of 2.0 means expression 2 standard deviations above the mean. Z-scores are more interpretable across datasets than raw counts because they account for library size differences and batch effects introduced by uniform alignment across studies.
# Example: Positive z-score = above-average expression for that gene # z > 2.0 → top ~2.5% of samples for that gene # z < -2.0 → bottom ~2.5% of samples for that gene # Use absolute z-score thresholds consistently when comparing across genes
HDF5 vs REST API
| Access method | Best for | Limitations |
|---|---|---|
REST API (, ) | Quick single-gene queries, exploration | Aggregated profiles only, no per-sample access |
REST API () | Discovering relevant datasets | Returns metadata, not expression values |
| HDF5 download | Bulk analysis, custom co-expression, ML | Requires 30–60 GB disk; download once |
Species and Gene Symbol Conventions
ARCHS4 indexes human samples using HGNC gene symbols (uppercase, e.g.,
TP53) and mouse samples using MGI symbols (first letter uppercase, e.g., Trp53). The species parameter accepts "human" or "mouse". Mixed-case or ensemble IDs will return empty results.
Common Workflows
Workflow 1: Multi-Gene Tissue Expression Heatmap
Goal: Compare tissue expression profiles of a gene panel and visualize as a heatmap to identify tissue-specific vs ubiquitous expression patterns.
import requests, time import pandas as pd import matplotlib.pyplot as plt import seaborn as sns ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" gene_panel = ["MYC", "TP53", "BRCA1", "EGFR", "KRAS", "CDK4"] top_n_tissues = 25 def get_tissue_zscores(gene: str) -> pd.Series: r = requests.get( f"{ARCHS4_BASE}/meta/genes/{gene}/zscore", params={"species": "human"}, timeout=30 ) r.raise_for_status() records = r.json().get("values", []) df = pd.DataFrame(records).set_index("tissue")["zscore"] return df # Build expression matrix (genes × tissues) all_data = {} for gene in gene_panel: try: all_data[gene] = get_tissue_zscores(gene) print(f" Fetched {gene}") except Exception as e: print(f" Warning: {gene} failed — {e}") time.sleep(0.1) matrix = pd.DataFrame(all_data).T # genes × tissues # Select top tissues by max absolute z-score tissue_importance = matrix.abs().max(axis=0).sort_values(ascending=False) top_tissues = tissue_importance.head(top_n_tissues).index matrix_subset = matrix[top_tissues] # Plot heatmap fig, ax = plt.subplots(figsize=(14, 5)) sns.heatmap( matrix_subset, cmap="RdBu_r", center=0, vmin=-3, vmax=3, ax=ax, cbar_kws={"label": "Z-Score"}, linewidths=0.5 ) ax.set_title("ARCHS4 Tissue Expression Profiles — Gene Panel") ax.set_xlabel("Tissue") ax.set_ylabel("Gene") plt.xticks(rotation=45, ha="right", fontsize=8) plt.tight_layout() plt.savefig("archs4_panel_heatmap.png", dpi=150, bbox_inches="tight") print(f"Saved archs4_panel_heatmap.png ({matrix_subset.shape})")
Workflow 2: Co-expression Network Seed Expansion
Goal: Start from a seed gene, retrieve co-expressed partners, then query their co-expressed genes in turn to build a two-hop co-expression neighborhood.
import requests, time import pandas as pd ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" def get_coexp(gene: str, top_n: int = 20, species: str = "human") -> list: r = requests.get( f"{ARCHS4_BASE}/meta/genes/{gene}/correlations", params={"species": species, "limit": top_n}, timeout=30 ) r.raise_for_status() return [rec["gene"] for rec in r.json().get("values", [])] seed_gene = "PCNA" min_correlation = 0.80 # Hop 1: direct co-expressed partners hop1_genes = get_coexp(seed_gene, top_n=30) print(f"Hop 1 partners of {seed_gene}: {len(hop1_genes)}") time.sleep(0.1) # Hop 2: co-expressed genes of each partner edges = set() for gene in hop1_genes[:10]: # limit for demonstration partners = get_coexp(gene, top_n=20) for partner in partners: if partner != seed_gene: edges.add((gene, partner)) time.sleep(0.1) # Summarize the network network_df = pd.DataFrame(list(edges), columns=["source", "target"]) hub_counts = network_df["source"].value_counts() print(f"\nTwo-hop network: {len(edges)} edges") print(f"Top hub genes:") print(hub_counts.head(5)) network_df.to_csv(f"{seed_gene}_coexp_network.csv", index=False) print(f"\nSaved {seed_gene}_coexp_network.csv")
Workflow 3: Sample Discovery and Dataset Summary
Goal: Search for samples by disease keyword, summarize how many GEO series are available, and export sample metadata for downstream reanalysis selection.
import requests, time import pandas as pd ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" def search_and_summarize(keyword: str, species: str = "human", limit: int = 200) -> pd.DataFrame: """Search samples and return a tidy metadata DataFrame.""" r = requests.get( f"{ARCHS4_BASE}/samples/search", params={"query": keyword, "species": species, "limit": limit}, timeout=30 ) r.raise_for_status() records = r.json().get("samples", []) return pd.DataFrame(records) keyword = "colorectal cancer" df = search_and_summarize(keyword, limit=150) print(f"Samples matching '{keyword}': {len(df)}") if len(df) > 0: # Summarize by GEO series series_counts = df["series_id"].value_counts() print(f"\nTop GEO series (by sample count):") print(series_counts.head(8).to_string()) # Export sample list df.to_csv(f"{keyword.replace(' ', '_')}_samples.csv", index=False) print(f"\nSaved {keyword.replace(' ', '_')}_samples.csv ({len(df)} samples)") print(f"Unique GEO series: {df['series_id'].nunique()}")
Key Parameters
| Parameter | Endpoint | Default | Range / Options | Effect |
|---|---|---|---|---|
| All gene endpoints | | , | Selects the species-specific sample index |
| , | | – | Number of results returned |
(path) | , | — | HGNC symbol (human) or MGI symbol (mouse) | Query gene; case-sensitive |
| | — | free-text string | Metadata keyword search across title, tissue, source fields |
| | | integer | Pagination offset for large result sets |
(response field) | | — | – | Pearson correlation coefficient; filter for high co-expression |
(response field) | | — | continuous float | Expression z-score; = high expression |
(HDF5) | HDF5 slice | all | any integer | Number of samples to extract per read from HDF5 |
Best Practices
-
Use z-score thresholds consistently: Because z-scores are gene-specific, a z-score of 2.0 for a ubiquitous gene (GAPDH) and a tissue-restricted gene (TTR, liver) have different interpretive meaning. Always annotate which gene you are comparing and the tissue background.
-
Sleep between batch queries: ARCHS4 enforces a soft rate limit of ~10 requests/second. Add
between sequential gene queries to avoidtime.sleep(0.1)
errors.429 Too Many Requests -
Download HDF5 for large-scale analyses: For queries covering 50+ genes or requiring per-sample expression values, the REST API is impractical. Download the HDF5 file once and use
slicing for fast matrix access; this avoids hitting rate limits and is 100× faster for bulk extraction.h5py -
Match gene symbol conventions by species: Human queries require HGNC uppercase symbols (e.g.,
); mouse queries require MGI-style symbols (e.g.,TP53
). Using the wrong case returns empty results without an error.Trp53 -
Validate co-expression findings across datasets: ARCHS4 co-expression aggregates across all tissue types. A high correlation may be driven by a single tissue or study. Cross-check with tissue-specific queries or manually inspect the top contributing GEO series.
Common Recipes
Recipe: Quick Tissue Specificity Check
When to use: Rapidly determine whether a gene is broadly expressed (housekeeping) or tissue-restricted before designing experiments.
import requests ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" def tissue_specificity_summary(gene_symbol: str) -> None: """Print a summary of high and low expression tissues for a gene.""" r = requests.get( f"{ARCHS4_BASE}/meta/genes/{gene_symbol}/zscore", params={"species": "human"}, timeout=30 ) r.raise_for_status() records = r.json().get("values", []) zscores = [rec["zscore"] for rec in records if rec.get("zscore") is not None] top_high = sorted(records, key=lambda x: x.get("zscore", 0), reverse=True)[:5] top_low = sorted(records, key=lambda x: x.get("zscore", float("inf")))[:3] print(f"\n{gene_symbol} — {len(zscores)} tissues") print(f" Range: [{min(zscores):.2f}, {max(zscores):.2f}] " f"Mean: {sum(zscores)/len(zscores):.2f}") print(" High expression:") for t in top_high: print(f" {t['tissue']:<35} z={t['zscore']:.2f}") print(" Low expression:") for t in top_low: print(f" {t['tissue']:<35} z={t['zscore']:.2f}") tissue_specificity_summary("TTR") # Transthyretin — liver-specific
Recipe: Batch Gene Co-Expression Table
When to use: Generate a pairwise correlation table for a gene panel from a list of differentially expressed genes.
import requests, time import pandas as pd ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" def batch_coexpr_table(gene_list: list, top_n: int = 10) -> pd.DataFrame: """For each gene in gene_list, return its top co-expressed genes.""" rows = [] for gene in gene_list: try: r = requests.get( f"{ARCHS4_BASE}/meta/genes/{gene}/correlations", params={"species": "human", "limit": top_n}, timeout=30 ) r.raise_for_status() for rec in r.json().get("values", []): rows.append({ "query_gene": gene, "coexp_gene": rec.get("gene"), "correlation": rec.get("correlation"), }) time.sleep(0.1) except Exception as e: print(f"Warning: {gene} skipped — {e}") return pd.DataFrame(rows) deg_list = ["MYC", "CCND1", "CDK4", "RB1", "E2F1"] coexp_table = batch_coexpr_table(deg_list, top_n=10) print(f"Co-expression entries: {len(coexp_table)}") print(coexp_table.groupby("query_gene")["coexp_gene"].count()) coexp_table.to_csv("deg_coexpression_table.csv", index=False) print("Saved deg_coexpression_table.csv")
Recipe: Export Sample IDs for GEO Download
When to use: Identify relevant GEO accessions to download raw count matrices for a meta-analysis.
import requests import pandas as pd ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1" keyword = "glioblastoma" r = requests.get( f"{ARCHS4_BASE}/samples/search", params={"query": keyword, "species": "human", "limit": 200}, timeout=30 ) r.raise_for_status() samples = pd.DataFrame(r.json().get("samples", [])) if len(samples) > 0: # Get unique GEO series accessions series = samples["series_id"].dropna().unique() print(f"Unique GEO series for '{keyword}': {len(series)}") for s in series[:10]: n = (samples["series_id"] == s).sum() print(f" {s} ({n} samples)") # Export series list for GEO download script pd.Series(series, name="geo_series").to_csv( f"{keyword}_geo_series.txt", index=False ) print(f"\nSaved {keyword}_geo_series.txt")
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
for gene query | Gene symbol not found in ARCHS4 index | Verify HGNC symbol spelling; check parameter matches gene convention (human: uppercase, mouse: first-letter-upper) |
| Exceeded ~10 req/s rate limit | Add between requests; for batch queries use a 0.5 s delay |
Empty list in z-score response | Gene is not expressed in any indexed tissue, or wrong species | Switch species; verify gene is protein-coding and has GEO coverage |
Empty list from search | Keyword not matched in metadata fields | Try broader or alternative keywords (e.g., instead of ) |
| HDF5 gene not found | Symbol mismatch between HDF5 version and query | Check available genes in ; try Ensembl ID or alias |
| Slow API response under load | Increase ; retry with exponential backoff |
| Z-scores all near zero | Gene has very low or absent expression across tissues | Check the gene's expression in raw counts; the gene may be non-coding or very lowly expressed |
Related Skills
— Population variant frequencies; use after ARCHS4 to identify variants in highly expressed genesgnomad-database
— Enrichr pathway enrichment for ARCHS4 co-expression gene lists (gget-genomic-databases
)gget enrichr
— Differential expression analysis on bulk RNA-seq; ARCHS4 HDF5 matrices can serve as reference cohortspydeseq2-differential-expression
References
- ARCHS4 web portal — Interactive expression browser and dataset download
- ARCHS4 REST API documentation — Endpoint reference and parameters
- Lachmann et al., Nature Communications 2018 — ARCHS4 original publication describing uniform alignment pipeline
- ARCHS4 GitHub — Source code and HDF5 schema documentation