Claude-skill-registry Finding Open Access Papers
Use Unpaywall API to find free full-text versions of paywalled papers
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/finding-open-access-papers" ~/.claude/skills/majiayu000-claude-skill-registry-finding-open-access-papers && rm -rf "$T"
skills/data/finding-open-access-papers/SKILL.mdFinding Open Access Papers
Overview
Use Unpaywall to find legally available open access versions of papers that appear to be behind paywalls.
Core principle: Many paywalled papers have free versions (preprints, author manuscripts, institutional repositories). Unpaywall finds them.
When to Use
Use this skill when:
- DOI resolution hits a paywall
- Paper not available in PubMed Central
- Publisher site requires subscription
- Need full text for highly relevant paper (score ≥7)
Use BEFORE giving up on full text access
Unpaywall API
Simple REST API - no authentication required for reasonable usage
Basic Request
curl "https://api.unpaywall.org/v2/DOI?email=YOUR_EMAIL"
Parameters:
- The paper's DOI (URL-encoded if needed)DOI
- User's email (required, for courtesy/contact)email
IMPORTANT: Ask user for their email at the start of research session. Do NOT use placeholder emails like claude@anthropic.com or researcher@example.com.
Example:
curl "https://api.unpaywall.org/v2/10.1038/nature12373?email=user@example.com"
Response Format
{ "doi": "10.1038/nature12373", "title": "Paper Title", "is_oa": true, "best_oa_location": { "url": "https://europepmc.org/articles/pmc3858213", "url_for_pdf": "https://europepmc.org/articles/pmc3858213?pdf=render", "version": "publishedVersion", "license": "cc-by", "host_type": "repository" }, "oa_locations": [ { "url": "https://europepmc.org/articles/pmc3858213", "version": "publishedVersion" }, { "url": "https://arxiv.org/abs/1234.5678", "version": "submittedVersion" } ] }
Key Response Fields
(boolean)is_oa
- Open access version availabletrue
- No free version foundfalse
(object or null)best_oa_location
- Unpaywall's recommended best open access source
- Prioritizes published versions over preprints
- Includes PDF URL when available
(array)oa_locations
- All known open access locations
- Includes repositories, preprint servers, institutional sites
- Ordered by quality/version
types:version
- Final published version (best)publishedVersion
- Author's accepted manuscript (good)acceptedVersion
- Preprint before peer review (useful)submittedVersion
Implementation Pattern
1. Check Unpaywall After Paywall Hit
# Try DOI first curl -L "https://doi.org/10.1234/example.2023" # If paywall detected (403, subscription required, etc): curl "https://api.unpaywall.org/v2/10.1234/example.2023?email=your@email.com"
2. Extract Best URL
# Parse JSON response response=$(curl -s "https://api.unpaywall.org/v2/DOI?email=EMAIL") # Check if OA available is_oa=$(echo $response | jq -r '.is_oa') if [ "$is_oa" = "true" ]; then # Get best PDF URL pdf_url=$(echo $response | jq -r '.best_oa_location.url_for_pdf // .best_oa_location.url') # Download curl -L -o "papers/paper.pdf" "$pdf_url" fi
3. Report to User
When OA found:
⚠️ Paper behind paywall at publisher ✓ Found open access version via Unpaywall! Source: Europe PMC (published version) PDF: https://europepmc.org/articles/pmc3858213?pdf=render → Downloading...
When no OA found:
⚠️ Paper behind paywall at publisher ✗ No open access version found via Unpaywall Options: - Request via institutional access - Contact authors for preprint - Continue with abstract only
4. Prioritize by Version
If multiple locations available:
Priority order:
from publisher or PMCpublishedVersion
from institutional repositoryacceptedVersion
from preprint server (arXiv, bioRxiv)submittedVersion
Integration with evaluating-paper-relevance
Add to full text fetching workflow:
Stage 2: Fetch Full Text Try in order: A. PubMed Central (free full text) B. DOI resolution → If paywall, try Unpaywall C. Unpaywall direct lookup D. Preprints (bioRxiv, arXiv)
Updated workflow:
# 1. Try PMC pmc_result=$(curl "https://eutils.ncbi.nlm.nih.gov/...") if has_pmc_fulltext; then fetch_pmc exit 0 fi # 2. Try DOI doi_result=$(curl -L "https://doi.org/$doi") if is_paywall; then # 3. Try Unpaywall unpaywall_result=$(curl "https://api.unpaywall.org/v2/$doi?email=$EMAIL") if has_oa; then fetch_unpaywall_pdf exit 0 fi fi # 4. No full text available report_no_fulltext
Rate Limiting
Free tier (with email):
- 100,000 requests per day
- No hard rate limit, but be respectful
- Include email in requests (required)
Best practices:
- Add 100ms delay between requests
- Cache responses (don't re-check same DOI)
- Only check for papers you actually need
Python Helper Example
import requests import time def find_open_access(doi, email): """ Find open access version via Unpaywall Returns: (pdf_url, version, source) or (None, None, None) """ url = f"https://api.unpaywall.org/v2/{doi}" params = {"email": email} try: response = requests.get(url, params=params, timeout=10) response.raise_for_status() data = response.json() if not data.get('is_oa'): return None, None, None best_loc = data.get('best_oa_location') if not best_loc: return None, None, None pdf_url = best_loc.get('url_for_pdf') or best_loc.get('url') version = best_loc.get('version', 'unknown') source = best_loc.get('host_type', 'unknown') return pdf_url, version, source except Exception as e: print(f"Error checking Unpaywall for {doi}: {e}") return None, None, None # Usage doi = "10.1038/nature12373" pdf_url, version, source = find_open_access(doi, "researcher@example.com") if pdf_url: print(f"Found {version} at {source}") print(f"PDF: {pdf_url}") # Download PDF response = requests.get(pdf_url) with open(f'papers/{doi.replace("/", "_")}.pdf', 'wb') as f: f.write(response.content) else: print("No open access version found") time.sleep(0.1) # Rate limiting
Common Sources Found
Repositories:
- Europe PMC / PubMed Central
- Institutional repositories (university sites)
- PubMed Central international mirrors
Preprint servers:
- bioRxiv (biology)
- medRxiv (medicine)
- arXiv (physics, CS, math)
- ChemRxiv (chemistry)
Publisher sites:
- Open access journals
- Hybrid journals (OA articles in subscription journals)
- Delayed open access (embargo expired)
Error Handling
DOI not found:
{ "error": "true", "message": "DOI not found" }
→ Check DOI format, try alternative identifiers
Network errors:
- Retry with exponential backoff
- Maximum 3 attempts
- Report to user if all fail
Malformed response:
- Check for
fieldis_oa - Fallback to
array ifoa_locations
missingbest_oa_location
Quick Reference
| Task | Command |
|---|---|
| Check if OA available | |
| Get best PDF URL | Parse |
| List all OA sources | Parse |
| Check version type | Look at field |
| Download PDF | |
Integration Points
Called by:
- When full text not in PMCevaluating-paper-relevance
- For highly relevant papersanswering-research-questions
Updates:
- Note if OA foundpapers-reviewed.json
- Include OA source infoSUMMARY.md
Common Mistakes
Using placeholder email: Using claude@anthropic.com or researcher@example.com → Ask user for their real email Not including email: Required parameter, requests will fail Checking every paper: Only check when needed (score ≥7, no PMC) Ignoring version type: Published version better than preprint Single source only: Check
oa_locations array for alternatives
No rate limiting: Add delays even though no hard limit
Success Criteria
Successful when:
- Paywalled paper's OA version found and downloaded
- Version type recorded (published/accepted/submitted)
- User informed about source and version
- Fallback options provided if no OA available
Next Steps
After finding OA version:
- Download PDF to papers/ folder
- Note source and version in SUMMARY.md
- Continue with deep dive analysis
- If no OA: note in summary, continue with abstract only