Awesome-Agent-Skills-for-Empirical-Research base-academic-search
Search 400M+ open access documents via the BASE search engine API
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/literature/search/base-academic-search" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-base-academic-sea && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/literature/search/base-academic-search/SKILL.mdsource content
BASE (Bielefeld Academic Search Engine) API
Overview
BASE is one of the world's largest search engines for academic open access web resources. Operated by Bielefeld University Library, it indexes 400M+ documents from 11,000+ content providers including institutional repositories, preprint servers, and digital libraries. Unlike Google Scholar, BASE provides structured metadata, license information, and full-text links. The API is free with registration.
API Endpoints
Base URL
https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi
Search
# Basic keyword search (JSON response) curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\ func=PerformSearch&query=climate+change+adaptation&format=json&hits=20" # Search with field filters curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\ func=PerformSearch&query=dctitle:transformer+AND+dcsubject:NLP&format=json" # Filter by document type and year curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\ func=PerformSearch&query=deep+learning&dctypenorm=121&dcyear:2024&format=json" # Open access only curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\ func=PerformSearch&query=CRISPR&dcrights:open&format=json"
Search Fields
| Field | Description | Example |
|---|---|---|
| Title | |
| Author | |
| Subject/keywords | |
| Abstract | |
| Publication year | |
| Document type text | |
| Normalized type code | (journal article) |
| Access rights | |
| Language | |
| Source URL | |
| Open access status | (OA), (restricted) |
| Content provider | |
Document Type Codes
| Code | Type |
|---|---|
| Journal article |
| Book / monograph |
| Conference paper |
| Thesis / dissertation |
| Report |
| Preprint |
Query Parameters
| Parameter | Description | Default |
|---|---|---|
| Must be | Required |
| Search query with optional field prefixes | Required |
| Response format: or | |
| Results per page (max 125) | 10 |
| Pagination offset | 0 |
| Sort: , | relevance |
Response Structure
{ "response": { "numFound": 45200, "start": 0, "docs": [ { "dctitle": "Attention Is All You Need", "dccreator": ["Ashish Vaswani", "Noam Shazeer"], "dcyear": "2017", "dcsubject": ["machine learning", "attention mechanism"], "dcdescription": "The dominant sequence transduction models...", "dcidentifier": "https://arxiv.org/abs/1706.03762", "dcsource": "arXiv.org", "dcprovider": "arxiv.org", "dcdocid": "abc123xyz", "dcoa": 1, "dctypenorm": ["18"], "dclang": ["eng"] } ] } }
Python Usage
import requests BASE_URL = "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi" def search_base(query: str, hits: int = 20, doc_type: int = None, oa_only: bool = False) -> list: """Search BASE for academic open access documents.""" q = query if doc_type: q += f" AND dctypenorm:{doc_type}" if oa_only: q += " AND dcoa:1" params = { "func": "PerformSearch", "query": q, "format": "json", "hits": hits, "sortby": "dcyear desc", } resp = requests.get(BASE_URL, params=params) resp.raise_for_status() data = resp.json() results = [] for doc in data.get("response", {}).get("docs", []): results.append({ "title": doc.get("dctitle"), "authors": doc.get("dccreator", []), "year": doc.get("dcyear"), "source": doc.get("dcsource"), "url": doc.get("dcidentifier"), "abstract": (doc.get("dcdescription") or "")[:300], "open_access": doc.get("dcoa") == 1, "type": doc.get("dctypenorm", []), }) return results def search_dissertations(topic: str, lang: str = "eng") -> list: """Find dissertations and theses on a topic.""" query = f"{topic} AND dctypenorm:15 AND dclang:{lang}" return search_base(query, hits=50) def search_by_provider(query: str, provider: str) -> list: """Search within a specific content provider.""" full_query = f"{query} AND dcprovider:{provider}" return search_base(full_query) # Example: find recent open access ML papers papers = search_base("transformer self-attention", hits=10, oa_only=True) for p in papers: oa = "OA" if p["open_access"] else "restricted" print(f"[{p['year']}] {p['title']} ({oa}) — {p['source']}") # Example: find dissertations on climate modeling theses = search_dissertations("climate modeling ocean") for t in theses: print(f"[{t['year']}] {t['title']} — {', '.join(t['authors'][:2])}")
BASE vs Other Search Engines
| Feature | BASE | Google Scholar | OpenAlex |
|---|---|---|---|
| Records | 400M+ | Unknown | 250M+ |
| Open access focus | Yes | No | Yes |
| Structured API | Yes | No official API | Yes |
| License metadata | Yes | No | Partial |
| Dissertation coverage | Excellent | Good | Limited |
| Repository-level filtering | Yes | No | No |