Awesome-Agent-Skills-for-Empirical-Research semantic-scholar-recs-guide

Paper discovery via recommendation APIs (OpenAlex, CrossRef citation networks)

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/literature/discovery/semantic-scholar-recs-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-semantic-scholar- && rm -rf "$T"
manifest: skills/43-wentorai-research-plugins/skills/literature/discovery/semantic-scholar-recs-guide/SKILL.md
source content

Paper Discovery via OpenAlex & CrossRef

Leverage the OpenAlex and CrossRef APIs to discover related papers, traverse citation networks, and build comprehensive reading lists programmatically.

Overview

OpenAlex indexes over 250 million academic works and provides a free, no-key-required API that supports:

  • Work search by title, keyword, or DOI
  • Citation and reference graph traversal
  • Author profiles and publication histories
  • Concept-based discovery across disciplines
  • Institutional and venue filtering

Base URL:

https://api.openalex.org
CrossRef URL:
https://api.crossref.org

Finding Related Papers

Use OpenAlex's concept graph and citation data to discover related work from seed papers.

Concept-Based Discovery

import requests

HEADERS = {"User-Agent": "ResearchPlugins/1.0 (https://wentor.ai)"}
WORK_ID = "W2741809807"  # OpenAlex work ID

# Get the seed paper's concepts
response = requests.get(
    f"https://api.openalex.org/works/{WORK_ID}",
    headers=HEADERS
)
paper = response.json()
concepts = [c["id"] for c in paper.get("concepts", [])[:3]]

# Find works sharing the same concepts, sorted by citations
for concept_id in concepts:
    related = requests.get(
        "https://api.openalex.org/works",
        params={"filter": f"concepts.id:{concept_id}", "sort": "cited_by_count:desc", "per_page": 10},
        headers=HEADERS
    )
    for w in related.json().get("results", []):
        print(f"[{w.get('publication_year')}] {w.get('title')} (citations: {w.get('cited_by_count')})")

CrossRef Subject-Based Discovery

import requests

def search_crossref(query, limit=10, sort="is-referenced-by-count"):
    """Search CrossRef for papers sorted by citation count."""
    resp = requests.get(
        "https://api.crossref.org/works",
        params={"query": query, "rows": limit, "sort": sort, "order": "desc"},
        headers={"User-Agent": "ResearchPlugins/1.0 (https://wentor.ai; mailto:dev@wentor.ai)"}
    )
    return resp.json().get("message", {}).get("items", [])

results = search_crossref("transformer attention mechanism")
for w in results:
    title = w.get("title", [""])[0] if w.get("title") else ""
    print(f"  {title} — Cited by: {w.get('is-referenced-by-count', 0)}")

Citation Network Traversal

Walk the citation graph to discover foundational and derivative works.

Forward Citations (Who Cited This Paper?)

work_id = "W2741809807"

response = requests.get(
    "https://api.openalex.org/works",
    params={
        "filter": f"cites:{work_id}",
        "sort": "cited_by_count:desc",
        "per_page": 20
    },
    headers=HEADERS
)

for w in response.json().get("results", []):
    print(f"  [{w.get('publication_year')}] {w.get('title')} ({w.get('cited_by_count')} cites)")

Backward References (What Did This Paper Cite?)

response = requests.get(
    f"https://api.openalex.org/works/{work_id}",
    headers=HEADERS
)
paper = response.json()
ref_ids = paper.get("referenced_works", [])

# Fetch details for referenced works
for ref_id in ref_ids[:20]:
    ref = requests.get(f"https://api.openalex.org/works/{ref_id.split('/')[-1]}", headers=HEADERS).json()
    print(f"  [{ref.get('publication_year')}] {ref.get('title')} ({ref.get('cited_by_count')} cites)")

Building a Reading List Pipeline

Combine search, concept discovery, and citation traversal into a discovery pipeline:

StepMethodPurpose
1. Seed selectionManual or keyword searchIdentify 3-5 highly relevant papers
2. Expand via conceptsOpenAlex concept graphFind thematically related work
3. Forward citationOpenAlex cites filterFind recent derivative works
4. Backward citationreferenced_works fieldFind foundational papers
5. DeduplicateOpenAlex work ID matchingRemove duplicates across steps
6. Rank & filterSort by year, citations, relevancePrioritize reading order
def build_reading_list(seed_ids, max_papers=50):
    """Build a ranked reading list from seed papers."""
    seen = set()
    candidates = []

    for seed_id in seed_ids:
        # Get concepts from seed paper
        paper = requests.get(f"https://api.openalex.org/works/{seed_id}", headers=HEADERS).json()
        concept_ids = [c["id"] for c in paper.get("concepts", [])[:2]]

        # Find related works via concepts
        for cid in concept_ids:
            related = requests.get(
                "https://api.openalex.org/works",
                params={"filter": f"concepts.id:{cid}", "sort": "cited_by_count:desc", "per_page": 20},
                headers=HEADERS
            ).json().get("results", [])
            for w in related:
                wid = w.get("id", "").split("/")[-1]
                if wid not in seen:
                    seen.add(wid)
                    candidates.append(w)

        # Get citing works
        citing = requests.get(
            "https://api.openalex.org/works",
            params={"filter": f"cites:{seed_id}", "sort": "cited_by_count:desc", "per_page": 20},
            headers=HEADERS
        ).json().get("results", [])
        for w in citing:
            wid = w.get("id", "").split("/")[-1]
            if wid not in seen:
                seen.add(wid)
                candidates.append(w)

    # Rank by citation count and recency
    candidates.sort(key=lambda p: (p.get("publication_year", 0), p.get("cited_by_count", 0)), reverse=True)
    return candidates[:max_papers]

Best Practices

  • OpenAlex is free with no API key required; use a polite
    User-Agent
    header
  • CrossRef requires a polite pool user agent with contact info for higher rate limits
  • Always include only the fields you need via
    select
    parameter to reduce payload size
  • Use
    page
    and
    per_page
    for pagination on large result sets
  • Cache responses locally to avoid redundant requests
  • Use DOI as the universal identifier for cross-system compatibility