Skills semanticscholar-skill

Use when searching academic papers, looking up citations, finding authors, or getting paper recommendations using the Semantic Scholar API. Triggers on queries about research papers, academic search, citation analysis, or literature discovery.

install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/agents365-ai/semanticscholar-skill" ~/.claude/skills/openclaw-skills-semanticscholar-skill && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/agents365-ai/semanticscholar-skill" ~/.openclaw/skills/openclaw-skills-semanticscholar-skill && rm -rf "$T"
manifest: skills/agents365-ai/semanticscholar-skill/SKILL.md
source content

Semantic Scholar Search Workflow

Search academic papers via the Semantic Scholar API using a structured 4-phase workflow.

Critical rule: NEVER make multiple sequential Bash calls for API requests. Always write ONE Python script that runs all searches, then execute it once. All rate limiting is handled inside

s2.py
automatically.

Phase 1: Understand & Plan

Parse the user's intent and choose a search strategy:

Decision Tree

User wants...StrategyFunction
Broad topic explorationRelevance search
search_relevance()
Precise technical terms, exact phrasesBulk search with boolean operators
search_bulk()
with
build_bool_query()
Specific passages or methodsSnippet search
search_snippets()
Known paper by titleTitle match
match_title()
Known paper by DOI/PMID/ArXivDirect lookup
get_paper()
Papers citing a known workCitation traversal
get_citations()
Related to one paperSingle-seed recommendations
find_similar()
Related to multiple papersMulti-seed recommendations
recommend()
Find a researcherAuthor search
search_authors()
Researcher's profileAuthor details
get_author()
Researcher's publicationsAuthor papers
get_author_papers()

Query Construction Rules

  • Ambiguous terms (e.g., "stem cells" could mean mesenchymal or stem-like T cells): Use
    build_bool_query()
    with exact phrases and exclusions
    • Example:
      build_bool_query(phrases=["stem-like T cells"], required=["CD4", "TCF7"], excluded=["mesenchymal", "hematopoietic stem cell"])
  • Multi-context queries (e.g., "topic X in cancer AND autoimmunity"): Plan separate searches, deduplicate with
    deduplicate()
  • Broad topics: Use
    search_relevance()
    with filters (year, venue, fieldsOfStudy, minCitationCount)

Plan Filters

FilterUse when
year="2020-"
Recent work only
publication_date="2024-01-01:2024-06-30"
Precise date range (YYYY-MM-DD)
fields_of_study="Medicine"
Restrict to domain
min_citations=10
Only established papers
pub_types="Review"
Find reviews/meta-analyses
pub_types="ClinicalTrial"
Clinical trials only
open_access=True
Only open access papers

Checkpoint: Before proceeding, verify: (1) search strategy matches user intent, (2) filters are appropriate, (3) query is specific enough to avoid irrelevant results.

Phase 2: Execute Search

Write ONE Python script. Example:

import sys, os
SKILL_DIR = next((p for p in [
    os.path.expanduser("~/.claude/skills/semanticscholar-skill"),
    os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"),
] if os.path.isdir(p)), ".")
sys.path.insert(0, SKILL_DIR)
from s2 import *

# Build precise query
q = build_bool_query(
    phrases=["stem-like T cells"],
    required=["CD4", "IBD"],
    excluded=["mesenchymal"]
)
papers = search_bulk(q, max_results=30, year="2018-", fields_of_study="Medicine")
papers = deduplicate(papers)

print(format_results(papers, "Stem-like CD4 T cells in IBD"))

Execute with:

python3 /tmp/s2_search.py

Rules:

  • Import everything from s2:
    from s2 import *
  • Write script to
    /tmp/s2_search.py
    (or similar temp path)
  • One Bash call to execute. Never chain multiple API calls via separate Bash invocations.
  • Rate limiting, retries, and backoff are automatic inside s2.py

Checkpoint: Verify the script ran successfully (no exceptions) and returned results. If 0 results, broaden the query or relax filters before presenting.

Worked Examples

Example 1: Author workflow — "Find papers by Yann LeCun on self-supervised learning"

import sys, os
SKILL_DIR = next((p for p in [
    os.path.expanduser("~/.claude/skills/semanticscholar-skill"),
    os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"),
] if os.path.isdir(p)), ".")
sys.path.insert(0, SKILL_DIR)
from s2 import *

authors = search_authors("Yann LeCun", max_results=5)
print(format_authors(authors))

# Use the first match's ID to get their papers
author_id = authors[0]["authorId"]
papers = get_author_papers(author_id, max_results=50)
# Filter locally for topic
ssl_papers = [p for p in papers if "self-supervised" in (p.get("title") or "").lower()]
print(format_results(ssl_papers, "Yann LeCun - Self-Supervised Learning"))

Example 2: Citation chain — "Who cited the Transformer paper and what did they build on?"

import sys, os
SKILL_DIR = next((p for p in [
    os.path.expanduser("~/.claude/skills/semanticscholar-skill"),
    os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"),
] if os.path.isdir(p)), ".")
sys.path.insert(0, SKILL_DIR)
from s2 import *

paper = get_paper("DOI:10.48550/arXiv.1706.03762")
print(f"Title: {paper['title']}, Citations: {paper['citationCount']}")

# Get top-cited papers that cite this one
citing = get_citations(paper["paperId"], max_results=50)
citing_papers = [c["citingPaper"] for c in citing if c.get("citingPaper")]
citing_papers.sort(key=lambda p: p.get("citationCount", 0), reverse=True)
print(format_results(citing_papers, "Most-cited papers citing Attention Is All You Need"))

Example 3: Multi-seed recommendations with BibTeX export — "Find papers like these two but not about NLP"

import sys, os
SKILL_DIR = next((p for p in [
    os.path.expanduser("~/.claude/skills/semanticscholar-skill"),
    os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"),
] if os.path.isdir(p)), ".")
sys.path.insert(0, SKILL_DIR)
from s2 import *

recs = recommend(
    positive_ids=["DOI:10.1038/nature14539", "ARXIV:2010.11929"],
    negative_ids=["ARXIV:1706.03762"],
    limit=20
)
print(format_results(recs, "Vision papers like Deep Learning & ViT, excluding NLP"))

# Export BibTeX for top results
bib_data = batch_papers([r["paperId"] for r in recs[:10]], fields="title,citationStyles")
print(export_bibtex(bib_data))

Phase 3: Summarize & Present

  • Use
    format_results()
    for consistent output (summary table + top-10 details)
  • If user's language is Chinese, present summaries in Chinese
  • Always note total results count and search strategy used
  • Highlight most relevant papers based on the user's specific question

Phase 4: User Interaction Loop

After presenting results, always offer these options:

  1. Translate — titles/summaries to Chinese (or other language)
  2. Details — full abstract for specific paper numbers
  3. Refine — narrow or expand search with different terms/filters
  4. Similar — find papers similar to a specific result (
    find_similar()
    )
  5. Citations — who cited a specific paper (
    get_citations()
    )
  6. Export — save results via
    export_bibtex()
    ,
    export_markdown()
    , or
    export_json()
  7. Done — end search session

Loop until user says done. Each follow-up uses the same single-script pattern.


API Quick Reference

Helper Module (
s2.py
)

import sys, os
SKILL_DIR = next((p for p in [
    os.path.expanduser("~/.claude/skills/semanticscholar-skill"),
    os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"),
] if os.path.isdir(p)), ".")
sys.path.insert(0, SKILL_DIR)
from s2 import *

Paper Search Functions

FunctionPurposeMax Results
search_relevance(query, **filters)
Simple broad search1,000
search_bulk(query, sort=..., **filters)
Boolean precise search10,000,000
search_snippets(query, **filters)
Full-text passage search1,000
match_title(title)
Exact title match1
get_paper(paper_id)
Single paper details
get_citations(paper_id, max_results)
Who cited this10,000
get_references(paper_id, max_results)
What this cites10,000
find_similar(paper_id, limit, pool)
Single-seed recommendations500
recommend(positive_ids, negative_ids, limit)
Multi-seed recommendations500
batch_papers(ids, fields)
Batch lookup (≤500)

Author Functions

FunctionPurposeMax Results
search_authors(query, max_results)
Find researchers by name1,000
get_author(author_id)
Author profile (affiliations, h-index)
get_author_papers(author_id, max_results)
Author's publications10,000
get_paper_authors(paper_id, max_results)
Paper's author list1,000
batch_authors(ids, fields)
Batch author lookup (≤1000)

Filter Parameters (kwargs)

year
,
publication_date
,
venue
,
fields_of_study
,
min_citations
,
pub_types
,
open_access

  • year
    :
    "2020-"
    ,
    "-2019"
    ,
    "2016-2020"
  • publication_date
    :
    "2024-01-01:2024-06-30"
    (YYYY-MM-DD range, open-ended OK)
  • pub_types
    :
    Review
    ,
    JournalArticle
    ,
    Conference
    ,
    ClinicalTrial
    ,
    MetaAnalysis
    ,
    Dataset
    ,
    Book
    ,
    CaseReport
    ,
    Editorial
    ,
    LettersAndComments
    ,
    News
    ,
    Study
    ,
    BookSection

Boolean Query Syntax (bulk search only)

SyntaxExampleMeaning
"..."
"deep learning"
Exact phrase
+
+transformer
Must include
-
-survey
Exclude
|
CNN | RNN
OR
*
neuro*
Prefix wildcard
()
(CNN | RNN) +attention
Grouping

Use

build_bool_query(phrases, required, excluded, or_terms)
to construct safely.

Output Functions

FunctionPurpose
format_table(papers, max_rows=30)
Markdown summary table
format_details(papers, max_papers=10)
Detailed entries with TLDR/abstract
format_results(papers, query_desc)
Combined: summary + table + details
format_authors(authors, max_rows=20)
Author table (name, affiliations, h-index)
export_bibtex(papers)
BibTeX entries (requires
citationStyles
field)
export_markdown(papers, query_desc)
Full markdown report saved to file
export_json(papers, path)
JSON export saved to file
deduplicate(papers)
Remove duplicates by paperId

Supported ID Formats

DOI:10.1038/...
,
ARXIV:2106.15928
,
PMID:19872477
,
PMCID:PMC2323569
,
CorpusId:215416146
,
ACL:2020.acl-main.447
,
DBLP:conf/acl/...
,
MAG:3015453090
,
URL:https://...

Paper Fields

Default:

title,year,citationCount,authors,venue,externalIds,tldr

Additional:

abstract
,
references
,
citations
,
openAccessPdf
,
publicationDate
,
publicationVenue
,
fieldsOfStudy
,
s2FieldsOfStudy
,
journal
,
isOpenAccess
,
referenceCount
,
influentialCitationCount
,
citationStyles
,
embedding
,
textAvailability

Author fields:

name
,
affiliations
,
paperCount
,
citationCount
,
hIndex
,
homepage
,
externalIds
,
papers

Rate Limiting

Handled automatically by

s2.py
: 1.1s gap between requests, exponential backoff (2s→4s→8s→16s→32s, max 60s) on 429/504 errors, up to 5 retries.

Troubleshooting

ErrorCauseFix
HTTPError 403
Missing or invalid API keyVerify
S2_API_KEY
is set:
echo $S2_API_KEY
HTTPError 429
after 5 retries
Sustained rate limit exceededWait 60s, reduce
max_results
, or split into smaller batches
ModuleNotFoundError: s2
Skill directory not on pathVerify skill is installed at
~/.claude/skills/
or
~/.openclaw/skills/
ModuleNotFoundError: requests
requests
not installed
pip install requests
or
uv pip install requests
0 results returnedQuery too specific or filters too narrowBroaden query, remove filters, try
search_relevance()
instead of
search_bulk()
KeyError: 'data'
Endpoint returned error objectCheck
r.get("message")
for API error details
tldr
field is empty
Not all papers have TLDRFall back to
abstract
field; bulk search never returns
tldr