Skills semanticscholar-skill

Name: semanticscholar-skill
Author: openclaw

Use when searching academic papers, looking up citations, finding authors, or getting paper recommendations using the Semantic Scholar API. Triggers on queries about research papers, academic search, citation analysis, or literature discovery.

install

source · Clone the upstream repo

git clone https://github.com/openclaw/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/agents365-ai/semanticscholar-skill" ~/.claude/skills/openclaw-skills-semanticscholar-skill && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/agents365-ai/semanticscholar-skill" ~/.openclaw/skills/openclaw-skills-semanticscholar-skill && rm -rf "$T"

manifest: skills/agents365-ai/semanticscholar-skill/SKILL.md

Semantic Scholar Search Workflow

Search academic papers via the Semantic Scholar API using a structured 4-phase workflow.

Critical rule: NEVER make multiple sequential Bash calls for API requests. Always write ONE Python script that runs all searches, then execute it once. All rate limiting is handled inside

s2.py

automatically.

Phase 1: Understand & Plan

Parse the user's intent and choose a search strategy:

Decision Tree

User wants...	Strategy	Function
Broad topic exploration	Relevance search	`search_relevance()`
Precise technical terms, exact phrases	Bulk search with boolean operators	`search_bulk()` with `build_bool_query()`
Specific passages or methods	Snippet search	`search_snippets()`
Known paper by title	Title match	`match_title()`
Known paper by DOI/PMID/ArXiv	Direct lookup	`get_paper()`
Papers citing a known work	Citation traversal	`get_citations()`
Related to one paper	Single-seed recommendations	`find_similar()`
Related to multiple papers	Multi-seed recommendations	`recommend()`
Find a researcher	Author search	`search_authors()`
Researcher's profile	Author details	`get_author()`
Researcher's publications	Author papers	`get_author_papers()`

Query Construction Rules

Ambiguous terms (e.g., "stem cells" could mean mesenchymal or stem-like T cells): Use

build_bool_query()

with exact phrases and exclusions

Example:

build_bool_query(phrases=["stem-like T cells"], required=["CD4", "TCF7"], excluded=["mesenchymal", "hematopoietic stem cell"])

Multi-context queries (e.g., "topic X in cancer AND autoimmunity"): Plan separate searches, deduplicate with
```
deduplicate()
```
Broad topics: Use
```
search_relevance()
```
with filters (year, venue, fieldsOfStudy, minCitationCount)

Plan Filters

Filter	Use when
`year="2020-"`	Recent work only
`publication_date="2024-01-01:2024-06-30"`	Precise date range (YYYY-MM-DD)
`fields_of_study="Medicine"`	Restrict to domain
`min_citations=10`	Only established papers
`pub_types="Review"`	Find reviews/meta-analyses
`pub_types="ClinicalTrial"`	Clinical trials only
`open_access=True`	Only open access papers

Checkpoint: Before proceeding, verify: (1) search strategy matches user intent, (2) filters are appropriate, (3) query is specific enough to avoid irrelevant results.

Phase 2: Execute Search

Write ONE Python script. Example:

import sys, os
SKILL_DIR = next((p for p in [
    os.path.expanduser("~/.claude/skills/semanticscholar-skill"),
    os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"),
] if os.path.isdir(p)), ".")
sys.path.insert(0, SKILL_DIR)
from s2 import *

# Build precise query
q = build_bool_query(
    phrases=["stem-like T cells"],
    required=["CD4", "IBD"],
    excluded=["mesenchymal"]
)
papers = search_bulk(q, max_results=30, year="2018-", fields_of_study="Medicine")
papers = deduplicate(papers)

print(format_results(papers, "Stem-like CD4 T cells in IBD"))

Execute with:

python3 /tmp/s2_search.py

Rules:

Import everything from s2:
```
from s2 import *
```
Write script to
```
/tmp/s2_search.py
```
(or similar temp path)
One Bash call to execute. Never chain multiple API calls via separate Bash invocations.
Rate limiting, retries, and backoff are automatic inside s2.py

Checkpoint: Verify the script ran successfully (no exceptions) and returned results. If 0 results, broaden the query or relax filters before presenting.

Worked Examples

Example 1: Author workflow — "Find papers by Yann LeCun on self-supervised learning"

import sys, os
SKILL_DIR = next((p for p in [
    os.path.expanduser("~/.claude/skills/semanticscholar-skill"),
    os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"),
] if os.path.isdir(p)), ".")
sys.path.insert(0, SKILL_DIR)
from s2 import *

authors = search_authors("Yann LeCun", max_results=5)
print(format_authors(authors))

# Use the first match's ID to get their papers
author_id = authors[0]["authorId"]
papers = get_author_papers(author_id, max_results=50)
# Filter locally for topic
ssl_papers = [p for p in papers if "self-supervised" in (p.get("title") or "").lower()]
print(format_results(ssl_papers, "Yann LeCun - Self-Supervised Learning"))

Example 2: Citation chain — "Who cited the Transformer paper and what did they build on?"

import sys, os
SKILL_DIR = next((p for p in [
    os.path.expanduser("~/.claude/skills/semanticscholar-skill"),
    os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"),
] if os.path.isdir(p)), ".")
sys.path.insert(0, SKILL_DIR)
from s2 import *

paper = get_paper("DOI:10.48550/arXiv.1706.03762")
print(f"Title: {paper['title']}, Citations: {paper['citationCount']}")

# Get top-cited papers that cite this one
citing = get_citations(paper["paperId"], max_results=50)
citing_papers = [c["citingPaper"] for c in citing if c.get("citingPaper")]
citing_papers.sort(key=lambda p: p.get("citationCount", 0), reverse=True)
print(format_results(citing_papers, "Most-cited papers citing Attention Is All You Need"))

Example 3: Multi-seed recommendations with BibTeX export — "Find papers like these two but not about NLP"

import sys, os
SKILL_DIR = next((p for p in [
    os.path.expanduser("~/.claude/skills/semanticscholar-skill"),
    os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"),
] if os.path.isdir(p)), ".")
sys.path.insert(0, SKILL_DIR)
from s2 import *

recs = recommend(
    positive_ids=["DOI:10.1038/nature14539", "ARXIV:2010.11929"],
    negative_ids=["ARXIV:1706.03762"],
    limit=20
)
print(format_results(recs, "Vision papers like Deep Learning & ViT, excluding NLP"))

# Export BibTeX for top results
bib_data = batch_papers([r["paperId"] for r in recs[:10]], fields="title,citationStyles")
print(export_bibtex(bib_data))

Phase 3: Summarize & Present

Use
```
format_results()
```
for consistent output (summary table + top-10 details)
If user's language is Chinese, present summaries in Chinese
Always note total results count and search strategy used
Highlight most relevant papers based on the user's specific question

Phase 4: User Interaction Loop

After presenting results, always offer these options:

Translate — titles/summaries to Chinese (or other language)
Details — full abstract for specific paper numbers
Refine — narrow or expand search with different terms/filters
Similar — find papers similar to a specific result (
```
find_similar()
```
)
Citations — who cited a specific paper (
```
get_citations()
```
)

Export — save results via

export_bibtex()

export_markdown()

, or

export_json()

Done — end search session

Loop until user says done. Each follow-up uses the same single-script pattern.

API Quick Reference

Helper Module (

s2.py

)

import sys, os
SKILL_DIR = next((p for p in [
    os.path.expanduser("~/.claude/skills/semanticscholar-skill"),
    os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"),
] if os.path.isdir(p)), ".")
sys.path.insert(0, SKILL_DIR)
from s2 import *

Paper Search Functions

Function	Purpose	Max Results
`search_relevance(query, **filters)`	Simple broad search	1,000
`search_bulk(query, sort=..., **filters)`	Boolean precise search	10,000,000
`search_snippets(query, **filters)`	Full-text passage search	1,000
`match_title(title)`	Exact title match	1
`get_paper(paper_id)`	Single paper details	—
`get_citations(paper_id, max_results)`	Who cited this	10,000
`get_references(paper_id, max_results)`	What this cites	10,000
`find_similar(paper_id, limit, pool)`	Single-seed recommendations	500
`recommend(positive_ids, negative_ids, limit)`	Multi-seed recommendations	500
`batch_papers(ids, fields)`	Batch lookup (≤500)	—

Author Functions

Function	Purpose	Max Results
`search_authors(query, max_results)`	Find researchers by name	1,000
`get_author(author_id)`	Author profile (affiliations, h-index)	—
`get_author_papers(author_id, max_results)`	Author's publications	10,000
`get_paper_authors(paper_id, max_results)`	Paper's author list	1,000
`batch_authors(ids, fields)`	Batch author lookup (≤1000)	—

Filter Parameters (kwargs)

year

publication_date

venue

fields_of_study

min_citations

pub_types

open_access

```
year
```
:
```
"2020-"
```
,
```
"-2019"
```
,
```
"2016-2020"
```
```
publication_date
```
:
```
"2024-01-01:2024-06-30"
```
(YYYY-MM-DD range, open-ended OK)

pub_types

Review

JournalArticle

Conference

ClinicalTrial

MetaAnalysis

Dataset

Book

CaseReport

Editorial

LettersAndComments

News

Study

BookSection

Boolean Query Syntax (bulk search only)

Syntax	Example	Meaning
`"..."`	`"deep learning"`	Exact phrase
`+`	`+transformer`	Must include
`-`	`-survey`	Exclude
`\|`	`CNN \| RNN`	OR
`*`	`neuro*`	Prefix wildcard
`()`	`(CNN \| RNN) +attention`	Grouping

Use

build_bool_query(phrases, required, excluded, or_terms)

to construct safely.

Output Functions

Function	Purpose
`format_table(papers, max_rows=30)`	Markdown summary table
`format_details(papers, max_papers=10)`	Detailed entries with TLDR/abstract
`format_results(papers, query_desc)`	Combined: summary + table + details
`format_authors(authors, max_rows=20)`	Author table (name, affiliations, h-index)
`export_bibtex(papers)`	BibTeX entries (requires `citationStyles` field)
`export_markdown(papers, query_desc)`	Full markdown report saved to file
`export_json(papers, path)`	JSON export saved to file
`deduplicate(papers)`	Remove duplicates by paperId

Supported ID Formats

DOI:10.1038/...

ARXIV:2106.15928

PMID:19872477

PMCID:PMC2323569

CorpusId:215416146

ACL:2020.acl-main.447

DBLP:conf/acl/...

MAG:3015453090

URL:https://...

Paper Fields

Default:

title,year,citationCount,authors,venue,externalIds,tldr

Additional:

abstract

references

citations

openAccessPdf

publicationDate

publicationVenue

fieldsOfStudy

s2FieldsOfStudy

journal

isOpenAccess

referenceCount

influentialCitationCount

citationStyles

embedding

textAvailability

Author fields:

name

affiliations

paperCount

citationCount

hIndex

homepage

externalIds

papers

Rate Limiting

Handled automatically by

s2.py

: 1.1s gap between requests, exponential backoff (2s→4s→8s→16s→32s, max 60s) on 429/504 errors, up to 5 retries.

Troubleshooting

Error	Cause	Fix
`HTTPError 403`	Missing or invalid API key	Verify `S2_API_KEY` is set: `echo $S2_API_KEY`
`HTTPError 429` after 5 retries	Sustained rate limit exceeded	Wait 60s, reduce `max_results` , or split into smaller batches
`ModuleNotFoundError: s2`	Skill directory not on path	Verify skill is installed at `~/.claude/skills/` or `~/.openclaw/skills/`
`ModuleNotFoundError: requests`	`requests` not installed	`pip install requests` or `uv pip install requests`
0 results returned	Query too specific or filters too narrow	Broaden query, remove filters, try `search_relevance()` instead of `search_bulk()`
`KeyError: 'data'`	Endpoint returned error object	Check `r.get("message")` for API error details
`tldr` field is empty	Not all papers have TLDR	Fall back to `abstract` field; bulk search never returns `tldr`