Awesome-Agent-Skills-for-Empirical-Research doi-resolution-guide
DOI content negotiation and metadata retrieval techniques
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/literature/metadata/doi-resolution-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-doi-resolution-gu && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/literature/metadata/doi-resolution-guide/SKILL.mdsource content
DOI Resolution Guide
Master DOI content negotiation to programmatically retrieve structured metadata, citation data, and formatted references from any Digital Object Identifier.
What Is DOI Content Negotiation?
Every DOI (e.g.,
10.1038/s41586-021-03819-2) resolves to a landing page by default. However, the DOI system supports HTTP content negotiation: by sending different Accept headers, you can retrieve structured metadata in various formats instead of an HTML page.
The DOI resolver endpoint is
https://doi.org/{doi} or equivalently https://dx.doi.org/{doi}.
Supported Metadata Formats
| Accept Header | Format | Use Case |
|---|---|---|
| CSL-JSON | Programmatic metadata extraction |
| Formatted citation | Ready-to-paste APA reference |
| BibTeX | LaTeX bibliography import |
| BibTeX (alt) | LaTeX bibliography import |
| RDF/XML | Linked data applications |
| Turtle RDF | Linked data applications |
| CrossRef Unixref | Full CrossRef metadata |
Retrieving Metadata via Content Negotiation
Get CSL-JSON (Most Useful for Programmatic Access)
curl -LH "Accept: application/vnd.citationstyles.csl+json" \ https://doi.org/10.1038/s41586-021-03819-2
import requests doi = "10.1038/s41586-021-03819-2" headers = {"Accept": "application/vnd.citationstyles.csl+json"} response = requests.get(f"https://doi.org/{doi}", headers=headers, allow_redirects=True) metadata = response.json() print(f"Title: {metadata['title']}") print(f"Authors: {', '.join(a.get('family', '') for a in metadata.get('author', []))}") print(f"Journal: {metadata.get('container-title', 'N/A')}") print(f"Year: {metadata.get('published', {}).get('date-parts', [[None]])[0][0]}") print(f"Type: {metadata.get('type')}")
Get a Formatted Citation
# APA format curl -LH "Accept: text/x-bibliography; style=apa" \ https://doi.org/10.1038/s41586-021-03819-2 # Chicago format curl -LH "Accept: text/x-bibliography; style=chicago-author-date" \ https://doi.org/10.1038/s41586-021-03819-2 # Harvard format curl -LH "Accept: text/x-bibliography; style=harvard-cite-them-right" \ https://doi.org/10.1038/s41586-021-03819-2
Get BibTeX for LaTeX
curl -LH "Accept: application/x-bibtex" \ https://doi.org/10.1038/s41586-021-03819-2
Output:
@article{Jumper_2021, title={Highly accurate protein structure prediction with AlphaFold}, volume={596}, DOI={10.1038/s41586-021-03819-2}, journal={Nature}, author={Jumper, John and Evans, Richard and ...}, year={2021}, pages={583--589} }
Using the CrossRef API
The CrossRef API provides richer metadata and supports batch queries without content negotiation.
Single Paper Lookup
import requests doi = "10.1038/s41586-021-03819-2" response = requests.get( f"https://api.crossref.org/works/{doi}", headers={"User-Agent": "ResearchClaw/1.0 (mailto:you@university.edu)"} ) work = response.json()["message"] print(f"Title: {work['title'][0]}") print(f"Publisher: {work['publisher']}") print(f"Citation count: {work.get('is-referenced-by-count', 0)}") print(f"Reference count: {work.get('references-count', 0)}") print(f"License: {work.get('license', [{}])[0].get('URL', 'N/A')}")
Batch DOI Resolution
dois = [ "10.1038/s41586-021-03819-2", "10.1126/science.abj8754", "10.1016/j.cell.2021.06.025" ] results = [] for doi in dois: resp = requests.get( f"https://api.crossref.org/works/{doi}", headers={"User-Agent": "ResearchClaw/1.0 (mailto:you@university.edu)"} ) if resp.status_code == 200: results.append(resp.json()["message"]) else: print(f"Failed to resolve: {doi}")
DOI Validation and Normalization
import re def normalize_doi(raw_input): """Extract and normalize a DOI from various input formats.""" # Match DOI pattern: 10.XXXX/... match = re.search(r'(10\.\d{4,9}/[^\s]+)', raw_input) if match: doi = match.group(1) # Remove trailing punctuation doi = doi.rstrip('.,;:)') return doi.lower() return None # Examples normalize_doi("https://doi.org/10.1038/s41586-021-03819-2") # 10.1038/s41586-021-03819-2 normalize_doi("DOI: 10.1038/s41586-021-03819-2.") # 10.1038/s41586-021-03819-2 normalize_doi("See paper at doi.org/10.1038/s41586-021-03819-2 for details") # works too
Practical Tips
- Polite pool: CrossRef provides faster responses to requests with a
header that includes aUser-Agent
contact. This is their "polite pool" with higher rate limits.mailto: - OpenAlex alternative: OpenAlex (https://api.openalex.org/works/doi:10.xxx/yyy) provides similar metadata for free, with richer entity linking.
- Handle redirects: Always use
(orallow_redirects=True
in curl) as DOIs redirect through the resolver.-L - Caching: DOI metadata rarely changes. Cache resolved metadata locally to avoid redundant API calls.
- Rate limits: CrossRef allows 50 requests/second in the polite pool. For bulk operations, use their data dumps instead.
See Also
- doi-content-negotiation -- Detailed API reference for retrieving metadata in multiple formats via HTTP content negotiation.