Awesome-Agent-Skills-for-Empirical-Research arxiv-cli-tools
Command-line tools for searching and batch-downloading arXiv papers
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/literature/search/arxiv-cli-tools" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-arxiv-cli-tools && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/literature/search/arxiv-cli-tools/SKILL.mdsource content
arXiv CLI Tools
Overview
arxiv-cli-tools is a Python command-line interface for searching and downloading papers from arXiv.org. It wraps the arxiv Python client library into convenient CLI commands, enabling researchers to search by keyword, author, or category, view abstracts, and batch-download PDFs directly from the terminal. No API key is required.
Installation
# Recommended: isolated install with pipx pipx install arxiv-cli-tools # Alternative: pip pip install arxiv-cli-tools # Verify installation arxiv-cli --help
Searching Papers
Basic Search
# Search by keyword (default: 10 results) arxiv-cli search "transformer attention mechanism" # Limit results arxiv-cli search "quantum computing" -n 5 # Show abstracts in results arxiv-cli search "prompt engineering" -n 5 --summary
Filtered Search
# Filter by author arxiv-cli search "attention mechanism" --authors "Vaswani" # Filter by arXiv category arxiv-cli search "neural networks" --categories "cs.LG,cs.AI" # Combine filters arxiv-cli search "protein folding" --categories "q-bio" -n 20 --summary
Common arXiv Categories
| Prefix | Field | Popular Subcategories |
|---|---|---|
| Computer Science | cs.AI, cs.CL, cs.CV, cs.LG, cs.SE |
| Mathematics | math.ST, math.OC, math.PR |
| Physics | physics.comp-ph, hep-th, cond-mat |
| Statistics | stat.ML, stat.ME, stat.TH |
| Quantitative Biology | q-bio.BM, q-bio.GN |
| Quantitative Finance | q-fin.ST, q-fin.PM |
| Economics | econ.EM, econ.GN |
| Electrical Engineering | eess.SP, eess.AS |
Downloading Papers
Single Paper
# Download by arXiv ID arxiv-cli download --id 1706.03762 --dest ~/papers # Download PDF format explicitly arxiv-cli download --id 2301.13688 --dest ~/papers --pdf
Batch Download
# Download multiple papers arxiv-cli download --id 1706.03762 --id 2301.13688 --id 2303.08774 \ --dest ~/papers/transformers # Skip already downloaded files arxiv-cli download --id 1706.03762 --id 2301.13688 \ --dest ~/papers --skip-existing
Download from Search Results
A common workflow is to search first, then download selected papers:
# 1. Search and note IDs arxiv-cli search "diffusion models survey" -n 10 --summary # 2. Download the relevant ones arxiv-cli download --id 2209.00796 --id 2206.00364 --dest ~/papers/diffusion
Python API Alternative
For programmatic use, the underlying
arxiv library provides a Python API:
import arxiv # Search search = arxiv.Search( query="large language models", max_results=10, sort_by=arxiv.SortCriterion.SubmittedDate ) for result in arxiv.Client().results(search): print(f"{result.entry_id}: {result.title}") print(f" Authors: {', '.join(a.name for a in result.authors)}") print(f" Published: {result.published.date()}") print(f" PDF: {result.pdf_url}") print() # Download result.download_pdf(dirpath="./papers", filename="paper.pdf")
Workflow Integration
Daily Paper Check Script
#!/bin/bash # Check for new papers in your research area DATE=$(date +%Y-%m-%d) LOG="$HOME/papers/daily_${DATE}.txt" echo "=== arXiv Papers for $DATE ===" > "$LOG" arxiv-cli search "retrieval augmented generation" \ --categories "cs.CL,cs.AI" -n 20 --summary >> "$LOG" echo "Paper digest saved to $LOG"
Export to BibTeX
After finding relevant papers, retrieve BibTeX entries via the arXiv API:
# Get BibTeX for a specific paper curl -s "https://arxiv.org/bibtex/1706.03762"
Rate Limits and Etiquette
- arXiv API allows 1 request per 3 seconds for programmatic access
- For bulk downloads, add delays between requests
- The CLI tool respects rate limits by default
- See arXiv API Terms of Use