Skills pmc-harvest
Fetch articles from PubMed Central using NCBI APIs. Search journals, retrieve full text via OAI-PMH, batch harvest for RAG pipelines. No API key required.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/angusthefuzz/pmc-harvest" ~/.claude/skills/openclaw-skills-pmc-harvest && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/angusthefuzz/pmc-harvest" ~/.openclaw/skills/openclaw-skills-pmc-harvest && rm -rf "$T"
manifest:
skills/angusthefuzz/pmc-harvest/SKILL.mdsource content
PMC Harvest
Fetch full-text articles from PubMed Central using official NCBI APIs.
Features
- E-utilities search — Find articles by journal, year, query
- OAI-PMH full text — Retrieve complete article XML (open access only)
- Batch harvesting — Process multiple journals at once
- Abstract fetch — Lightweight retrieval for review queues
- No API key required — Uses public NCBI APIs (rate-limited)
Usage
# Search a journal node {baseDir}/scripts/pmc-harvest.js --search "J Stroke[journal]" --year 2025 # Fetch full text for a specific article node {baseDir}/scripts/pmc-harvest.js --fetch PMC12345678 # Batch harvest from multiple journals node {baseDir}/scripts/pmc-harvest.js --harvest journals.json --year 2025 # Test with known journals node {baseDir}/scripts/pmc-harvest.js --test
Options
| Flag | Description |
|---|---|
| PMC search query (use journal[name] format) |
| Filter by publication year |
| Max results (default: 100) |
| Fetch full text for specific PMCID |
| Batch harvest from JSON journal list |
| Run test with sample journals |
Programmatic API
const pmc = require('{baseDir}/lib/api.js'); // Search const { count, pmcids } = await pmc.searchJournal('"J Stroke"[journal]', { year: 2025 }); // Get summaries const summaries = await pmc.getSummaries(pmcids); // Fetch full text const { available, xml, reason } = await pmc.fetchFullText('PMC12345678'); // Parse JATS XML const { title, abstract, body } = pmc.parseJATS(xml); // Fetch abstract only (lightweight) const { title, abstract } = await pmc.fetchAbstract('PMC12345678');
Journal Query Examples
const queries = { 'Stroke': '"Stroke"[journal]', 'Journal of Stroke': '"J Stroke"[journal]', 'Stroke & Vascular Neurology': '"Stroke Vasc Neurol"[journal]', 'European Stroke Journal': '"Eur Stroke J"[journal]', 'BMC Neurology': '"BMC Neurol"[journal]' };
Limitations
- OAI-PMH only returns open-access articles — restricted content unavailable
- Rate limits — ~3 requests/second without API key
- Peak hours — NCBI recommends avoiding 5AM-9PM ET for large batches
API Reference
This skill wraps NCBI's official APIs:
- E-utilities:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils
— Search PMCesearch.fcgi
— Get article metadataesummary.fcgi
- OAI-PMH:
https://pmc.ncbi.nlm.nih.gov/api/oai/v1/mh
— Fetch full text XMLGetRecord
Full docs: https://www.ncbi.nlm.nih.gov/books/NBK25501/