Awesome-Agent-Skills-for-Empirical-Research finding-open-access-papers

<!--

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/05-kthorn-research-superpower/research/finding-open-access-papers" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-finding-open-acce && rm -rf "$T"

manifest: skills/05-kthorn-research-superpower/research/finding-open-access-papers/SKILL.md

source content

name: Finding Open Access Papers description: Use Unpaywall API to find free full-text versions of paywalled papers when_to_use: When paper behind paywall. When PMC full text not available. When DOI resolution hits paywall. When need free access to paper. Before giving up on full text access. version: 1.0.0

Finding Open Access Papers

Overview

Use Unpaywall to find legally available open access versions of papers that appear to be behind paywalls.

Core principle: Many paywalled papers have free versions (preprints, author manuscripts, institutional repositories). Unpaywall finds them.

When to Use

Use this skill when:

DOI resolution hits a paywall
Paper not available in PubMed Central
Publisher site requires subscription
Need full text for highly relevant paper (score ≥7)

Use BEFORE giving up on full text access

Unpaywall API

Simple REST API - no authentication required for reasonable usage

Basic Request

curl "https://api.unpaywall.org/v2/DOI?email=YOUR_EMAIL"

Parameters:

```
DOI
```
- The paper's DOI (URL-encoded if needed)
```
email
```
- User's email (required, for courtesy/contact)

IMPORTANT: Ask user for their email at the start of research session. Do NOT use placeholder emails like claude@anthropic.com or researcher@example.com.

Example:

curl "https://api.unpaywall.org/v2/10.1038/nature12373?email=user@example.com"

Response Format

{
  "doi": "10.1038/nature12373",
  "title": "Paper Title",
  "is_oa": true,
  "best_oa_location": {
    "url": "https://europepmc.org/articles/pmc3858213",
    "url_for_pdf": "https://europepmc.org/articles/pmc3858213?pdf=render",
    "version": "publishedVersion",
    "license": "cc-by",
    "host_type": "repository"
  },
  "oa_locations": [
    {
      "url": "https://europepmc.org/articles/pmc3858213",
      "version": "publishedVersion"
    },
    {
      "url": "https://arxiv.org/abs/1234.5678",
      "version": "submittedVersion"
    }
  ]
}

Key Response Fields

is_oa

(boolean)

```
true
```
- Open access version available
```
false
```
- No free version found

best_oa_location

(object or null)

Unpaywall's recommended best open access source
Prioritizes published versions over preprints
Includes PDF URL when available

oa_locations

(array)

All known open access locations
Includes repositories, preprint servers, institutional sites
Ordered by quality/version

version

types:

```
publishedVersion
```
- Final published version (best)
```
acceptedVersion
```
- Author's accepted manuscript (good)
```
submittedVersion
```
- Preprint before peer review (useful)

Implementation Pattern

1. Check Unpaywall After Paywall Hit

# Try DOI first
curl -L "https://doi.org/10.1234/example.2023"

# If paywall detected (403, subscription required, etc):
curl "https://api.unpaywall.org/v2/10.1234/example.2023?email=your@email.com"

2. Extract Best URL

# Parse JSON response
response=$(curl -s "https://api.unpaywall.org/v2/DOI?email=EMAIL")

# Check if OA available
is_oa=$(echo $response | jq -r '.is_oa')

if [ "$is_oa" = "true" ]; then
  # Get best PDF URL
  pdf_url=$(echo $response | jq -r '.best_oa_location.url_for_pdf // .best_oa_location.url')

  # Download
  curl -L -o "papers/paper.pdf" "$pdf_url"
fi

3. Report to User

When OA found:

⚠️ Paper behind paywall at publisher
✓ Found open access version via Unpaywall!
   Source: Europe PMC (published version)
   PDF: https://europepmc.org/articles/pmc3858213?pdf=render
   → Downloading...

When no OA found:

⚠️ Paper behind paywall at publisher
✗ No open access version found via Unpaywall
   Options:
   - Request via institutional access
   - Contact authors for preprint
   - Continue with abstract only

4. Prioritize by Version

If multiple locations available:

Priority order:

```
publishedVersion
```
from publisher or PMC
```
acceptedVersion
```
from institutional repository
```
submittedVersion
```
from preprint server (arXiv, bioRxiv)

Integration with evaluating-paper-relevance

Add to full text fetching workflow:

Stage 2: Fetch Full Text

Try in order:
A. PubMed Central (free full text)
B. DOI resolution → If paywall, try Unpaywall
C. Unpaywall direct lookup
D. Preprints (bioRxiv, arXiv)

Updated workflow:

# 1. Try PMC
pmc_result=$(curl "https://eutils.ncbi.nlm.nih.gov/...")
if has_pmc_fulltext; then
  fetch_pmc
  exit 0
fi

# 2. Try DOI
doi_result=$(curl -L "https://doi.org/$doi")
if is_paywall; then
  # 3. Try Unpaywall
  unpaywall_result=$(curl "https://api.unpaywall.org/v2/$doi?email=$EMAIL")
  if has_oa; then
    fetch_unpaywall_pdf
    exit 0
  fi
fi

# 4. No full text available
report_no_fulltext

Rate Limiting

Free tier (with email):

100,000 requests per day
No hard rate limit, but be respectful
Include email in requests (required)

Best practices:

Add 100ms delay between requests
Cache responses (don't re-check same DOI)
Only check for papers you actually need

Python Helper Example

import requests
import time

def find_open_access(doi, email):
    """
    Find open access version via Unpaywall
    Returns: (pdf_url, version, source) or (None, None, None)
    """
    url = f"https://api.unpaywall.org/v2/{doi}"
    params = {"email": email}

    try:
        response = requests.get(url, params=params, timeout=10)
        response.raise_for_status()
        data = response.json()

        if not data.get('is_oa'):
            return None, None, None

        best_loc = data.get('best_oa_location')
        if not best_loc:
            return None, None, None

        pdf_url = best_loc.get('url_for_pdf') or best_loc.get('url')
        version = best_loc.get('version', 'unknown')
        source = best_loc.get('host_type', 'unknown')

        return pdf_url, version, source

    except Exception as e:
        print(f"Error checking Unpaywall for {doi}: {e}")
        return None, None, None

# Usage
doi = "10.1038/nature12373"
pdf_url, version, source = find_open_access(doi, "researcher@example.com")

if pdf_url:
    print(f"Found {version} at {source}")
    print(f"PDF: {pdf_url}")
    # Download PDF
    response = requests.get(pdf_url)
    with open(f'papers/{doi.replace("/", "_")}.pdf', 'wb') as f:
        f.write(response.content)
else:
    print("No open access version found")

time.sleep(0.1)  # Rate limiting

Common Sources Found

Repositories:

Europe PMC / PubMed Central
Institutional repositories (university sites)
PubMed Central international mirrors

Preprint servers:

bioRxiv (biology)
medRxiv (medicine)
arXiv (physics, CS, math)
ChemRxiv (chemistry)

Publisher sites:

Open access journals
Hybrid journals (OA articles in subscription journals)
Delayed open access (embargo expired)

Error Handling

DOI not found:

{
  "error": "true",
  "message": "DOI not found"
}

→ Check DOI format, try alternative identifiers

Network errors:

Retry with exponential backoff
Maximum 3 attempts
Report to user if all fail

Malformed response:

Check for
```
is_oa
```
field
Fallback to
```
oa_locations
```
array if
```
best_oa_location
```
missing

Quick Reference

Task	Command
Check if OA available	`curl "https://api.unpaywall.org/v2/DOI?email=EMAIL"`
Get best PDF URL	Parse `.best_oa_location.url_for_pdf`
List all OA sources	Parse `.oa_locations[]`
Check version type	Look at `.version` field
Download PDF	`curl -L -o paper.pdf "$pdf_url"`

Integration Points

Called by:

```
evaluating-paper-relevance
```
- When full text not in PMC
```
answering-research-questions
```
- For highly relevant papers

Updates:

```
papers-reviewed.json
```
- Note if OA found
```
SUMMARY.md
```
- Include OA source info

Common Mistakes

Using placeholder email: Using claude@anthropic.com or researcher@example.com → Ask user for their real email Not including email: Required parameter, requests will fail Checking every paper: Only check when needed (score ≥7, no PMC) Ignoring version type: Published version better than preprint Single source only: Check

oa_locations

array for alternatives No rate limiting: Add delays even though no hard limit

Success Criteria

Successful when:

Paywalled paper's OA version found and downloaded
Version type recorded (published/accepted/submitted)
User informed about source and version
Fallback options provided if no OA available

Next Steps

After finding OA version:

Download PDF to papers/ folder
Note source and version in SUMMARY.md
Continue with deep dive analysis
If no OA: note in summary, continue with abstract only