Learn-skills.dev paper-fetch
Use when the user wants to download a paper PDF from a DOI, title, or URL via legal open-access sources. Tries Unpaywall, arXiv, bioRxiv/medRxiv, PubMed Central, and Semantic Scholar in order. Never uses Sci-Hub or paywall bypass.
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/agents365-ai/paper-fetch/paper-fetch" ~/.claude/skills/neversight-learn-skills-dev-paper-fetch && rm -rf "$T"
data/skills-md/agents365-ai/paper-fetch/paper-fetch/SKILL.mdpaper-fetch
Fetch the legal open-access PDF for a paper given a DOI (or title). Tries multiple OA sources in priority order and stops at the first hit.
Resolution order
- Unpaywall —
, readhttps://api.unpaywall.org/v2/{doi}?email=$UNPAYWALL_EMAIL
(skipped ifbest_oa_location.url_for_pdf
not set)UNPAYWALL_EMAIL - Semantic Scholar —
https://api.semanticscholar.org/graph/v1/paper/DOI:{doi}?fields=openAccessPdf,externalIds - arXiv — if
present,externalIds.ArXivhttps://arxiv.org/pdf/{arxiv_id}.pdf - PubMed Central OA — if PMCID present,
https://www.ncbi.nlm.nih.gov/pmc/articles/{pmcid}/pdf/ - bioRxiv / medRxiv — if DOI prefix is
, query10.1101
for the latest version PDF URLhttps://api.biorxiv.org/details/{server}/{doi} - Otherwise → report failure with title/authors so the user can request via ILL
If only a title is given, resolve to a DOI first via Semantic Scholar
search_paper_by_title (asta MCP) or Crossref.
Usage
python scripts/fetch.py <DOI> [--out DIR] [--dry-run] [--format json|text]
Flags
| Flag | Default | Description |
|---|---|---|
| — | DOI to fetch (positional, e.g. ) |
| — | File with one DOI per line for bulk download |
| | Output directory |
| off | Resolve sources without downloading; preview the PDF URL and filename |
| | Output format: (for agents) or (for humans) |
Output contract
stdout emits a single JSON object (when
--format json):
Success (all DOIs resolved):
{ "ok": true, "data": { "results": [ { "doi": "10.1038/s41586-020-2649-2", "success": true, "source": "unpaywall", "pdf_url": "https://...", "file": "pdfs/Author_2020_Title.pdf", "meta": {"title": "...", "year": 2020, "author": "Smith"} } ], "summary": {"total": 1, "succeeded": 1, "failed": 0} } }
Partial failure (batch mode — some DOIs failed, exit code 1):
{ "ok": true, "data": { "results": [ { "doi": "10.1038/s41586-020-2649-2", "success": true, "source": "semantic_scholar", "pdf_url": "https://...", "file": "pdfs/Harris_2020_Array_programming_with_NumPy.pdf", "meta": {"title": "Array programming with NumPy", "year": 2020, "author": "Charles R. Harris"} }, { "doi": "10.1234/nonexistent", "success": false, "source": null, "pdf_url": null, "file": null, "meta": {}, "error": {"code": "not_found", "message": "No open-access PDF found", "retryable": false} } ], "summary": {"total": 2, "succeeded": 1, "failed": 1} } }
Top-level failure (bad arguments, exit code 3):
{ "ok": false, "error": { "code": "validation_error", "message": "Provide a DOI or --batch file", "retryable": false } }
stderr carries human-readable progress diagnostics (source attempts, download status).
Exit codes
| Code | Meaning |
|---|---|
| All DOIs resolved successfully |
| Runtime error (some DOIs failed, network/download issues) |
| Validation error (bad arguments, missing input) |
Error codes in JSON
| Code | Meaning | Retryable |
|---|---|---|
| Bad arguments or empty input | No |
| No open-access PDF found | No |
| Network failure during download | Yes |
| Response was not a PDF (HTML landing page) | No |
| PDF URL host not in allowlist | No |
| Response exceeded 50 MB limit | No |
| Local filesystem write failed | No |
| Unexpected error | No |
Examples
# Single DOI (JSON output for agents) python scripts/fetch.py 10.1038/s41586-020-2649-2 # Dry-run preview (resolve without downloading) python scripts/fetch.py 10.1038/s41586-020-2649-2 --dry-run # Human-readable output python scripts/fetch.py 10.1038/s41586-020-2649-2 --format text # Batch download python scripts/fetch.py --batch dois.txt --out ./papers # Works without UNPAYWALL_EMAIL (skips Unpaywall, uses remaining 4 sources) python scripts/fetch.py 10.1038/s41586-020-2649-2
Notes
is optional but recommended. Set it once:UNPAYWALL_EMAIL
(e.g. inexport UNPAYWALL_EMAIL=you@example.com
). Without it, Unpaywall is skipped and the remaining 4 sources are still tried.~/.zshrc- Downloads are restricted to a host allowlist of known OA providers, with a 50 MB size limit per PDF.
- Never attempts to bypass paywalls. If no OA copy exists, the skill reports failure — do not suggest Sci-Hub or similar.
- Default output directory:
. Filenames:./pdfs/
.{first_author}_{year}_{short_title}.pdf
Auto-update
When installed via
git clone, the skill keeps itself in sync with upstream automatically. On each invocation, fetch.py spawns a detached background git pull --ff-only in the skill directory:
- Non-blocking — the current invocation is not delayed; the pull runs in a new session and is fully detached
- Silent — all output goes to
, JSON contract on stdout is never polluted/dev/null - Throttled — at most once every 24 hours (stamped via
).git/.paper-fetch-last-update - Safe —
refuses to merge if you have local edits; conflicts never happen--ff-only - Convergence — updates apply on the next invocation, not the current one (because the pull is backgrounded)
Environment variables
| Variable | Default | Purpose |
|---|---|---|
| unset | Set to any value to completely disable auto-update |
| | Cooldown in seconds between update attempts |
Auto-update is a no-op when the skill is not a git checkout (e.g. tarball install), when
git is unavailable, or when the cooldown stamp is fresh. Force an immediate check with rm <skill_dir>/.git/.paper-fetch-last-update.