Learn-skills.dev paper-fetch

Use when the user wants to download a paper PDF from a DOI, title, or URL via legal open-access sources. Tries Unpaywall, arXiv, bioRxiv/medRxiv, PubMed Central, and Semantic Scholar in order. Never uses Sci-Hub or paywall bypass.

install

source · Clone the upstream repo

git clone https://github.com/NeverSight/learn-skills.dev

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/agents365-ai/paper-fetch/paper-fetch" ~/.claude/skills/neversight-learn-skills-dev-paper-fetch && rm -rf "$T"

manifest: data/skills-md/agents365-ai/paper-fetch/paper-fetch/SKILL.md

source content

paper-fetch

Fetch the legal open-access PDF for a paper given a DOI (or title). Tries multiple OA sources in priority order and stops at the first hit.

Resolution order

Unpaywall —

https://api.unpaywall.org/v2/{doi}?email=$UNPAYWALL_EMAIL

, read

best_oa_location.url_for_pdf

(skipped if

UNPAYWALL_EMAIL

not set)

Semantic Scholar —

https://api.semanticscholar.org/graph/v1/paper/DOI:{doi}?fields=openAccessPdf,externalIds

arXiv — if

externalIds.ArXiv

present,

https://arxiv.org/pdf/{arxiv_id}.pdf

PubMed Central OA — if PMCID present,

https://www.ncbi.nlm.nih.gov/pmc/articles/{pmcid}/pdf/

bioRxiv / medRxiv — if DOI prefix is
```
10.1101
```
, query
```
https://api.biorxiv.org/details/{server}/{doi}
```
for the latest version PDF URL
Otherwise → report failure with title/authors so the user can request via ILL

If only a title is given, resolve to a DOI first via Semantic Scholar

search_paper_by_title

(asta MCP) or Crossref.

Usage

python scripts/fetch.py <DOI> [--out DIR] [--dry-run] [--format json|text]

Flags

Flag	Default	Description
`doi`	—	DOI to fetch (positional, e.g. `10.1038/s41586-020-2649-2` )
`--batch FILE`	—	File with one DOI per line for bulk download
`--out DIR`	`pdfs`	Output directory
`--dry-run`	off	Resolve sources without downloading; preview the PDF URL and filename
`--format`	`json`	Output format: `json` (for agents) or `text` (for humans)

Output contract

stdout emits a single JSON object (when

--format json

Success (all DOIs resolved):

{
  "ok": true,
  "data": {
    "results": [
      {
        "doi": "10.1038/s41586-020-2649-2",
        "success": true,
        "source": "unpaywall",
        "pdf_url": "https://...",
        "file": "pdfs/Author_2020_Title.pdf",
        "meta": {"title": "...", "year": 2020, "author": "Smith"}
      }
    ],
    "summary": {"total": 1, "succeeded": 1, "failed": 0}
  }
}

Partial failure (batch mode — some DOIs failed, exit code 1):

{
  "ok": true,
  "data": {
    "results": [
      {
        "doi": "10.1038/s41586-020-2649-2",
        "success": true,
        "source": "semantic_scholar",
        "pdf_url": "https://...",
        "file": "pdfs/Harris_2020_Array_programming_with_NumPy.pdf",
        "meta": {"title": "Array programming with NumPy", "year": 2020, "author": "Charles R. Harris"}
      },
      {
        "doi": "10.1234/nonexistent",
        "success": false,
        "source": null,
        "pdf_url": null,
        "file": null,
        "meta": {},
        "error": {"code": "not_found", "message": "No open-access PDF found", "retryable": false}
      }
    ],
    "summary": {"total": 2, "succeeded": 1, "failed": 1}
  }
}

Top-level failure (bad arguments, exit code 3):

{
  "ok": false,
  "error": {
    "code": "validation_error",
    "message": "Provide a DOI or --batch file",
    "retryable": false
  }
}

stderr carries human-readable progress diagnostics (source attempts, download status).

Exit codes

Code	Meaning
`0`	All DOIs resolved successfully
`1`	Runtime error (some DOIs failed, network/download issues)
`3`	Validation error (bad arguments, missing input)

Error codes in JSON

Code	Meaning	Retryable
`validation_error`	Bad arguments or empty input	No
`not_found`	No open-access PDF found	No
`download_network_error`	Network failure during download	Yes
`download_not_a_pdf`	Response was not a PDF (HTML landing page)	No
`download_host_not_allowed`	PDF URL host not in allowlist	No
`download_size_exceeded`	Response exceeded 50 MB limit	No
`download_io_error`	Local filesystem write failed	No
`internal_error`	Unexpected error	No

Examples

# Single DOI (JSON output for agents)
python scripts/fetch.py 10.1038/s41586-020-2649-2

# Dry-run preview (resolve without downloading)
python scripts/fetch.py 10.1038/s41586-020-2649-2 --dry-run

# Human-readable output
python scripts/fetch.py 10.1038/s41586-020-2649-2 --format text

# Batch download
python scripts/fetch.py --batch dois.txt --out ./papers

# Works without UNPAYWALL_EMAIL (skips Unpaywall, uses remaining 4 sources)
python scripts/fetch.py 10.1038/s41586-020-2649-2

Notes

```
UNPAYWALL_EMAIL
```
is optional but recommended. Set it once:
```
export UNPAYWALL_EMAIL=you@example.com
```
(e.g. in
```
~/.zshrc
```
). Without it, Unpaywall is skipped and the remaining 4 sources are still tried.
Downloads are restricted to a host allowlist of known OA providers, with a 50 MB size limit per PDF.
Never attempts to bypass paywalls. If no OA copy exists, the skill reports failure — do not suggest Sci-Hub or similar.

Default output directory:

./pdfs/

. Filenames:

{first_author}_{year}_{short_title}.pdf

Auto-update

When installed via

git clone

, the skill keeps itself in sync with upstream automatically. On each invocation,

fetch.py

spawns a detached background
git pull --ff-only
in the skill directory:

Non-blocking — the current invocation is not delayed; the pull runs in a new session and is fully detached
Silent — all output goes to
```
/dev/null
```
, JSON contract on stdout is never polluted
Throttled — at most once every 24 hours (stamped via
```
.git/.paper-fetch-last-update
```
)
Safe —
```
--ff-only
```
refuses to merge if you have local edits; conflicts never happen
Convergence — updates apply on the next invocation, not the current one (because the pull is backgrounded)

Environment variables

Variable	Default	Purpose
`PAPER_FETCH_NO_AUTO_UPDATE`	unset	Set to any value to completely disable auto-update
`PAPER_FETCH_UPDATE_INTERVAL`	`86400`	Cooldown in seconds between update attempts

Auto-update is a no-op when the skill is not a git checkout (e.g. tarball install), when

git

is unavailable, or when the cooldown stamp is fresh. Force an immediate check with

rm <skill_dir>/.git/.paper-fetch-last-update