git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/05-kthorn-research-superpower/research/searching-literature" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-searching-literat && rm -rf "$T"
skills/05-kthorn-research-superpower/research/searching-literature/SKILL.mdname: Searching Scientific Literature description: PubMed search with keyword optimization, result parsing, and metadata extraction when_to_use: When starting literature search. When user asks about papers, publications, studies. When need to find scientific articles. When building initial paper list for research question. version: 1.0.0
Searching Scientific Literature
Overview
Search PubMed for scientific literature using optimized queries. Extract metadata and prepare papers for relevance evaluation.
Core principle: Cast a wide enough net to find relevant papers, but use targeted keywords to keep results manageable.
When to Use
Use this skill when:
- Starting a new research question
- User asks "find papers about..."
- Need initial paper set for evaluation
- Searching for specific methods, compounds, diseases, techniques
Search Strategy
1. Parse User Query
Extract:
- Keywords: Main concepts (e.g., "BTK inhibitor", "selectivity", "kinase")
- Data types: What user needs (IC50 values, methods, structures, results)
- Constraints: Date ranges, specific journals, author names
- Synonyms: Alternative terms (e.g., "Bruton's tyrosine kinase" = "BTK")
2. Construct PubMed Query
Boolean operators:
- AND - narrow results (must have both terms)
- OR - broaden results (either term)
- NOT - exclude terms
Example queries:
"BTK inhibitor"[Title/Abstract] AND selectivity[Title/Abstract] ("kinase inhibitor" OR "protein kinase") AND (selectivity OR "off-target") "ibrutinib"[Title/Abstract] AND ("IC50" OR "inhibitory concentration")
Field tags:
- search title and abstract only[Title/Abstract]
- title only (more precise)[Title]
- specific author[Author]
- specific journal[Journal]
- date range[Date]
3. Execute Search
API endpoint:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?\ db=pubmed&\ term=YOUR_QUERY&\ retmax=100&\ retmode=json&\ sort=relevance
Parameters:
- search PubMed databasedb=pubmed
- your query (URL encode spaces and special chars)term=
- max results (start with 100)retmax=100
- return JSONretmode=json
- most relevant first (orsort=relevance
for newest)pub_date
Example bash:
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=BTK+inhibitor+selectivity&retmax=100&retmode=json&sort=relevance"
Response format:
{ "esearchresult": { "count": "156", "retmax": "100", "idlist": ["12345678", "87654321", ...] } }
4. Fetch Paper Metadata
API endpoint:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?\ db=pubmed&\ id=12345678,87654321&\ retmode=json
Extract from response:
- Title
- Authors (list)
- Journal name
- Publication date
- Abstract (via separate efetch call or use esummary)
- PMID
- DOI (if available in
)articleids
Getting DOI from PMID:
"articleids": [ {"idtype": "pubmed", "value": "12345678"}, {"idtype": "doi", "value": "10.1234/example.2023"} ]
If DOI missing:
- Use PMID as fallback identifier
- Try to resolve DOI via PubMed Central or publisher APIs later
Output Format
Create list of paper objects:
[ { "pmid": "12345678", "doi": "10.1234/example.2023", "title": "Selective BTK inhibitors for autoimmune diseases", "authors": ["Smith J", "Doe A", "Johnson B"], "journal": "Nature Chemical Biology", "year": "2023", "abstract": "We developed a series of...", "source": "pubmed_search" } ]
Error Handling
Rate limits (CRITICAL - shared across all processes/subagents):
- No API key: 3 requests/second (official limit)
- With API key: 10 requests/second
- Single agent/script: Use 500ms delays (2 req/sec, safe margin)
- 350ms is theoretically sufficient but causes ~20% HTTP 429 errors in practice
- Multiple parallel subagents: Use longer delays to share capacity
- 2 parallel: 1 second each (2 total req/sec)
- 3 parallel: 1.5 seconds each (2 total req/sec)
- 5 parallel: 2.5 seconds each (2 total req/sec)
- Formula:
delay_seconds = (num_parallel / rate_limit) + safety_margin
- If you get HTTP 429 errors: Wait 5 seconds, resume with doubled delays
Empty results:
- Try broader terms
- Remove field tags
- Check for typos
- Use OR to add synonyms
Too many results (>500):
- Add more specific terms
- Use field tags to narrow
- Add date constraints
- Consider splitting into sub-queries
Integration with Other Skills
After search completes:
- Save results to research folder as
initial-search-results.json - For each paper, call
skillevaluating-paper-relevance - Track in
(use DOI as key, fallback to PMID)papers-reviewed.json
Quick Reference
| Task | Command |
|---|---|
| Search PubMed | |
| Get metadata | |
| URL encode query | Replace spaces with , special chars with |
| Narrow results | Use AND, add field tags, more specific terms |
| Broaden results | Use OR, remove field tags, add synonyms |
Common Mistakes
Too narrow: Only 5 results → Use OR, remove constraints Too broad: 5000 results → Add AND terms, use field tags Missing abstracts: Use efetch instead of esummary for full abstract text DOI not found: Many older papers lack DOI - use PMID as fallback Rate limiting: Add 500ms delays (single agent) or longer (parallel subagents sharing rate limit)
Next Steps
After completing search:
- Announce: "Found N papers matching query"
- Begin evaluation using
skills/research/evaluating-paper-relevance - Update user with progress as papers are screened