Awesome-Agent-Skills-for-Empirical-Research hal-archive-api
Access French and European research via the HAL open archive API
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/literature/fulltext/hal-archive-api" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-hal-archive-api && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/literature/fulltext/hal-archive-api/SKILL.mdsource content
HAL Open Archive API
Overview
HAL (Hyper Articles en Ligne) is France's national open archive for scholarly deposits. Managed by CNRS, it hosts 4M+ full-text documents from French research institutions and international collaborators. The API provides Solr-based search with full metadata, PDF links, and OAI-PMH harvesting. Free, no authentication required.
API Endpoints
Search API
# Keyword search curl "https://api.archives-ouvertes.fr/search/?q=machine+learning&rows=20&wt=json" # Search specific fields curl "https://api.archives-ouvertes.fr/search/?q=title_s:\"deep learning\"&wt=json" # Filter by document type curl "https://api.archives-ouvertes.fr/search/?q=neural+networks&\ fq=docType_s:ART&rows=20&wt=json" # Filter by year and language curl "https://api.archives-ouvertes.fr/search/?q=climate+change&\ fq=producedDateY_i:[2023 TO 2026]&fq=language_s:en&wt=json" # Filter by institution curl "https://api.archives-ouvertes.fr/search/?q=robotics&\ fq=structId_i:441569&wt=json" # Return specific fields curl "https://api.archives-ouvertes.fr/search/?q=CRISPR&\ fl=halId_s,title_s,authFullName_s,producedDateY_i,uri_s,files_s&wt=json"
Search Fields
| Field | Description | Example |
|---|---|---|
| Title | |
| Author name | |
| Abstract | |
| Keywords | |
| Year | |
| Document type | |
| Language | |
| Domain/subject | |
| Journal name | |
| Institution ID | Lab/university ID |
Document Types
| Code | Type |
|---|---|
| Journal article |
| Conference paper |
| PhD thesis |
| Habilitation thesis |
| Report |
| Book chapter |
| Book |
| Poster |
| Preprint/other |
Query Parameters
| Parameter | Description |
|---|---|
| Solr query |
| Filter query |
| Fields to return |
| Results per page (max 10000) |
| Pagination offset |
| Sort order (e.g., ) |
| Format: , , |
Response Structure
{ "response": { "numFound": 12500, "start": 0, "docs": [ { "halId_s": "hal-01234567", "title_s": ["Deep Learning for Climate Modeling"], "authFullName_s": ["Marie Dupont", "Jean Martin"], "producedDateY_i": 2024, "docType_s": "ART", "journalTitle_s": "Environmental Modelling", "uri_s": "https://hal.science/hal-01234567", "files_s": ["https://hal.science/hal-01234567/document"], "domain_s": ["sde.es", "info.info-ai"], "abstract_s": ["We propose a novel deep learning approach..."], "language_s": ["en"] } ] } }
Python Usage
import requests BASE_URL = "https://api.archives-ouvertes.fr/search/" def search_hal(query: str, rows: int = 20, doc_type: str = None, from_year: int = None, language: str = None) -> list: """Search HAL open archive.""" params = { "q": query, "wt": "json", "rows": rows, "fl": "halId_s,title_s,authFullName_s,producedDateY_i," "uri_s,files_s,docType_s,journalTitle_s,abstract_s", "sort": "producedDateY_i desc", } fq = [] if doc_type: fq.append(f"docType_s:{doc_type}") if from_year: fq.append(f"producedDateY_i:[{from_year} TO 2030]") if language: fq.append(f"language_s:{language}") if fq: params["fq"] = fq resp = requests.get(BASE_URL, params=params) resp.raise_for_status() data = resp.json() results = [] for doc in data.get("response", {}).get("docs", []): title = doc.get("title_s", [""])[0] if isinstance( doc.get("title_s"), list) else doc.get("title_s", "") results.append({ "hal_id": doc.get("halId_s"), "title": title, "authors": doc.get("authFullName_s", []), "year": doc.get("producedDateY_i"), "type": doc.get("docType_s"), "journal": doc.get("journalTitle_s"), "url": doc.get("uri_s"), "pdf": doc.get("files_s", [None])[0], }) return results def search_theses(topic: str, from_year: int = 2020) -> list: """Find French PhD theses on a topic.""" return search_hal(topic, rows=50, doc_type="THESE", from_year=from_year) def get_institution_publications(struct_id: int, from_year: int = 2023) -> list: """Get publications from a specific institution.""" params = { "q": "*:*", "fq": [f"structId_i:{struct_id}", f"producedDateY_i:[{from_year} TO 2030]"], "wt": "json", "rows": 100, "fl": "halId_s,title_s,authFullName_s,producedDateY_i,docType_s", "sort": "producedDateY_i desc", } resp = requests.get(BASE_URL, params=params) resp.raise_for_status() return resp.json().get("response", {}).get("docs", []) # Example: find recent French AI research papers = search_hal("intelligence artificielle", from_year=2024) for p in papers: pdf = " [PDF]" if p["pdf"] else "" print(f"[{p['year']}] {p['title']}{pdf}") # Example: find PhD theses on NLP theses = search_theses("natural language processing") for t in theses: print(f"{t['title']} — {', '.join(t['authors'][:2])}")
HAL Domains
| Code | Domain |
|---|---|
| Computer Science |
| Mathematics |
| Physics |
| Environmental Sciences |
| Life Sciences |
| Social Sciences & Humanities |
| Chemistry |
| Engineering Sciences |