Hermes-agent arxiv
Search and retrieve academic papers from arXiv using their free REST API. No API key needed. Search by keyword, author, category, or ID. Combine with web_extract or the ocr-and-documents skill to read full paper content.
git clone https://github.com/NousResearch/hermes-agent
T=$(mktemp -d) && git clone --depth=1 https://github.com/NousResearch/hermes-agent "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/research/arxiv" ~/.claude/skills/nousresearch-hermes-agent-arxiv-581824 && rm -rf "$T"
skills/research/arxiv/SKILL.mdarXiv Research
Search and retrieve academic papers from arXiv via their free REST API. No API key, no dependencies — just curl.
Quick Reference
| Action | Command |
|---|---|
| Search papers | |
| Get specific paper | |
| Read abstract (web) | |
| Read full paper (PDF) | |
Searching Papers
The API returns Atom XML. Parse with
grep/sed or pipe through python3 for clean output.
Basic search
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5"
Clean output (parse XML to readable format)
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c " import sys, xml.etree.ElementTree as ET ns = {'a': 'http://www.w3.org/2005/Atom'} root = ET.parse(sys.stdin).getroot() for i, entry in enumerate(root.findall('a:entry', ns)): title = entry.find('a:title', ns).text.strip().replace('\n', ' ') arxiv_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1] published = entry.find('a:published', ns).text[:10] authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns)) summary = entry.find('a:summary', ns).text.strip()[:200] cats = ', '.join(c.get('term') for c in entry.findall('a:category', ns)) print(f'{i+1}. [{arxiv_id}] {title}') print(f' Authors: {authors}') print(f' Published: {published} | Categories: {cats}') print(f' Abstract: {summary}...') print(f' PDF: https://arxiv.org/pdf/{arxiv_id}') print() "
Search Query Syntax
| Prefix | Searches | Example |
|---|---|---|
| All fields | |
| Title | |
| Author | |
| Abstract | |
| Category | |
| Comment | |
Boolean operators
# AND (default when using +) search_query=all:transformer+attention # OR search_query=all:GPT+OR+all:BERT # AND NOT search_query=all:language+model+ANDNOT+all:vision # Exact phrase search_query=ti:"chain+of+thought" # Combined search_query=au:hinton+AND+cat:cs.LG
Sort and Pagination
| Parameter | Options |
|---|---|
| , , |
| , |
| Result offset (0-based) |
| Number of results (default 10, max 30000) |
# Latest 10 papers in cs.AI curl -s "https://export.arxiv.org/api/query?search_query=cat:cs.AI&sortBy=submittedDate&sortOrder=descending&max_results=10"
Fetching Specific Papers
# By arXiv ID curl -s "https://export.arxiv.org/api/query?id_list=2402.03300" # Multiple papers curl -s "https://export.arxiv.org/api/query?id_list=2402.03300,2401.12345,2403.00001"
BibTeX Generation
After fetching metadata for a paper, generate a BibTeX entry:
{% raw %}
curl -s "https://export.arxiv.org/api/query?id_list=1706.03762" | python3 -c " import sys, xml.etree.ElementTree as ET ns = {'a': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'} root = ET.parse(sys.stdin).getroot() entry = root.find('a:entry', ns) if entry is None: sys.exit('Paper not found') title = entry.find('a:title', ns).text.strip().replace('\n', ' ') authors = ' and '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns)) year = entry.find('a:published', ns).text[:4] raw_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1] cat = entry.find('arxiv:primary_category', ns) primary = cat.get('term') if cat is not None else 'cs.LG' last_name = entry.find('a:author', ns).find('a:name', ns).text.split()[-1] print(f'@article{{{last_name}{year}_{raw_id.replace(\".\", \"\")},') print(f' title = {{{title}}},') print(f' author = {{{authors}}},') print(f' year = {{{year}}},') print(f' eprint = {{{raw_id}}},') print(f' archivePrefix = {{arXiv}},') print(f' primaryClass = {{{primary}}},') print(f' url = {{https://arxiv.org/abs/{raw_id}}}') print('}') "
{% endraw %}
Reading Paper Content
After finding a paper, read it:
# Abstract page (fast, metadata + abstract) web_extract(urls=["https://arxiv.org/abs/2402.03300"]) # Full paper (PDF → markdown via Firecrawl) web_extract(urls=["https://arxiv.org/pdf/2402.03300"])
For local PDF processing, see the
ocr-and-documents skill.
Common Categories
| Category | Field |
|---|---|
| Artificial Intelligence |
| Computation and Language (NLP) |
| Computer Vision |
| Machine Learning |
| Cryptography and Security |
| Machine Learning (Statistics) |
| Optimization and Control |
| Computational Physics |
Full list: https://arxiv.org/category_taxonomy
Helper Script
The
scripts/search_arxiv.py script handles XML parsing and provides clean output:
python scripts/search_arxiv.py "GRPO reinforcement learning" python scripts/search_arxiv.py "transformer attention" --max 10 --sort date python scripts/search_arxiv.py --author "Yann LeCun" --max 5 python scripts/search_arxiv.py --category cs.AI --sort date python scripts/search_arxiv.py --id 2402.03300 python scripts/search_arxiv.py --id 2402.03300,2401.12345
No dependencies — uses only Python stdlib.
Semantic Scholar (Citations, Related Papers, Author Profiles)
arXiv doesn't provide citation data or recommendations. Use the Semantic Scholar API for that — free, no key needed for basic use (1 req/sec), returns JSON.
Get paper details + citations
# By arXiv ID curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300?fields=title,authors,citationCount,referenceCount,influentialCitationCount,year,abstract" | python3 -m json.tool # By Semantic Scholar paper ID or DOI curl -s "https://api.semanticscholar.org/graph/v1/paper/DOI:10.1234/example?fields=title,citationCount"
Get citations OF a paper (who cited it)
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/citations?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool
Get references FROM a paper (what it cites)
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/references?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool
Search papers (alternative to arXiv search, returns JSON)
curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=GRPO+reinforcement+learning&limit=5&fields=title,authors,year,citationCount,externalIds" | python3 -m json.tool
Get paper recommendations
curl -s -X POST "https://api.semanticscholar.org/recommendations/v1/papers/" \ -H "Content-Type: application/json" \ -d '{"positivePaperIds": ["arXiv:2402.03300"], "negativePaperIds": []}' | python3 -m json.tool
Author profile
curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=Yann+LeCun&fields=name,hIndex,citationCount,paperCount" | python3 -m json.tool
Useful Semantic Scholar fields
title, authors, year, abstract, citationCount, referenceCount, influentialCitationCount, isOpenAccess, openAccessPdf, fieldsOfStudy, publicationVenue, externalIds (contains arXiv ID, DOI, etc.)
Complete Research Workflow
- Discover:
python scripts/search_arxiv.py "your topic" --sort date --max 10 - Assess impact:
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID?fields=citationCount,influentialCitationCount" - Read abstract:
web_extract(urls=["https://arxiv.org/abs/ID"]) - Read full paper:
web_extract(urls=["https://arxiv.org/pdf/ID"]) - Find related work:
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID/references?fields=title,citationCount&limit=20" - Get recommendations: POST to Semantic Scholar recommendations endpoint
- Track authors:
curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=NAME"
Rate Limits
| API | Rate | Auth |
|---|---|---|
| arXiv | ~1 req / 3 seconds | None needed |
| Semantic Scholar | 1 req / second | None (100/sec with API key) |
Notes
- arXiv returns Atom XML — use the helper script or parsing snippet for clean output
- Semantic Scholar returns JSON — pipe through
for readabilitypython3 -m json.tool - arXiv IDs: old format (
) vs new (hep-th/0601001
)2402.03300 - PDF:
— Abstract:https://arxiv.org/pdf/{id}https://arxiv.org/abs/{id} - HTML (when available):
https://arxiv.org/html/{id} - For local PDF processing, see the
skillocr-and-documents
ID Versioning
always resolves to the latest versionarxiv.org/abs/1706.03762
points to a specific immutable versionarxiv.org/abs/1706.03762v1- When generating citations, preserve the version suffix you actually read to prevent citation drift (a later version may substantially change content)
- The API
field returns the versioned URL (e.g.,<id>
)http://arxiv.org/abs/1706.03762v7
Withdrawn Papers
Papers can be withdrawn after submission. When this happens:
- The
field contains a withdrawal notice (look for "withdrawn" or "retracted")<summary> - Metadata fields may be incomplete
- Always check the summary before treating a result as a valid paper