Awesome-Agent-Skills-for-Empirical-Research doi-resolution-guide

DOI content negotiation and metadata retrieval techniques

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/literature/metadata/doi-resolution-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-doi-resolution-gu && rm -rf "$T"

manifest: skills/43-wentorai-research-plugins/skills/literature/metadata/doi-resolution-guide/SKILL.md

source content

DOI Resolution Guide

Master DOI content negotiation to programmatically retrieve structured metadata, citation data, and formatted references from any Digital Object Identifier.

What Is DOI Content Negotiation?

Every DOI (e.g.,

10.1038/s41586-021-03819-2

) resolves to a landing page by default. However, the DOI system supports HTTP content negotiation: by sending different

Accept

headers, you can retrieve structured metadata in various formats instead of an HTML page.

The DOI resolver endpoint is

https://doi.org/{doi}

or equivalently

https://dx.doi.org/{doi}

Supported Metadata Formats

Accept Header	Format	Use Case
`application/vnd.citationstyles.csl+json`	CSL-JSON	Programmatic metadata extraction
`text/x-bibliography; style=apa`	Formatted citation	Ready-to-paste APA reference
`text/x-bibliography; style=bibtex`	BibTeX	LaTeX bibliography import
`application/x-bibtex`	BibTeX (alt)	LaTeX bibliography import
`application/rdf+xml`	RDF/XML	Linked data applications
`text/turtle`	Turtle RDF	Linked data applications
`application/vnd.crossref.unixref+xml`	CrossRef Unixref	Full CrossRef metadata

Retrieving Metadata via Content Negotiation

Get CSL-JSON (Most Useful for Programmatic Access)

curl -LH "Accept: application/vnd.citationstyles.csl+json" \
  https://doi.org/10.1038/s41586-021-03819-2

import requests

doi = "10.1038/s41586-021-03819-2"
headers = {"Accept": "application/vnd.citationstyles.csl+json"}
response = requests.get(f"https://doi.org/{doi}", headers=headers, allow_redirects=True)

metadata = response.json()
print(f"Title: {metadata['title']}")
print(f"Authors: {', '.join(a.get('family', '') for a in metadata.get('author', []))}")
print(f"Journal: {metadata.get('container-title', 'N/A')}")
print(f"Year: {metadata.get('published', {}).get('date-parts', [[None]])[0][0]}")
print(f"Type: {metadata.get('type')}")

Get a Formatted Citation

# APA format
curl -LH "Accept: text/x-bibliography; style=apa" \
  https://doi.org/10.1038/s41586-021-03819-2

# Chicago format
curl -LH "Accept: text/x-bibliography; style=chicago-author-date" \
  https://doi.org/10.1038/s41586-021-03819-2

# Harvard format
curl -LH "Accept: text/x-bibliography; style=harvard-cite-them-right" \
  https://doi.org/10.1038/s41586-021-03819-2

Get BibTeX for LaTeX

curl -LH "Accept: application/x-bibtex" \
  https://doi.org/10.1038/s41586-021-03819-2

Output:

@article{Jumper_2021,
  title={Highly accurate protein structure prediction with AlphaFold},
  volume={596},
  DOI={10.1038/s41586-021-03819-2},
  journal={Nature},
  author={Jumper, John and Evans, Richard and ...},
  year={2021},
  pages={583--589}
}

Using the CrossRef API

The CrossRef API provides richer metadata and supports batch queries without content negotiation.

Single Paper Lookup

import requests

doi = "10.1038/s41586-021-03819-2"
response = requests.get(
    f"https://api.crossref.org/works/{doi}",
    headers={"User-Agent": "ResearchClaw/1.0 (mailto:you@university.edu)"}
)

work = response.json()["message"]
print(f"Title: {work['title'][0]}")
print(f"Publisher: {work['publisher']}")
print(f"Citation count: {work.get('is-referenced-by-count', 0)}")
print(f"Reference count: {work.get('references-count', 0)}")
print(f"License: {work.get('license', [{}])[0].get('URL', 'N/A')}")

Batch DOI Resolution

dois = [
    "10.1038/s41586-021-03819-2",
    "10.1126/science.abj8754",
    "10.1016/j.cell.2021.06.025"
]

results = []
for doi in dois:
    resp = requests.get(
        f"https://api.crossref.org/works/{doi}",
        headers={"User-Agent": "ResearchClaw/1.0 (mailto:you@university.edu)"}
    )
    if resp.status_code == 200:
        results.append(resp.json()["message"])
    else:
        print(f"Failed to resolve: {doi}")

DOI Validation and Normalization

import re

def normalize_doi(raw_input):
    """Extract and normalize a DOI from various input formats."""
    # Match DOI pattern: 10.XXXX/...
    match = re.search(r'(10\.\d{4,9}/[^\s]+)', raw_input)
    if match:
        doi = match.group(1)
        # Remove trailing punctuation
        doi = doi.rstrip('.,;:)')
        return doi.lower()
    return None

# Examples
normalize_doi("https://doi.org/10.1038/s41586-021-03819-2")  # 10.1038/s41586-021-03819-2
normalize_doi("DOI: 10.1038/s41586-021-03819-2.")            # 10.1038/s41586-021-03819-2
normalize_doi("See paper at doi.org/10.1038/s41586-021-03819-2 for details")  # works too

Practical Tips

Polite pool: CrossRef provides faster responses to requests with a
```
User-Agent
```
header that includes a
```
mailto:
```
contact. This is their "polite pool" with higher rate limits.
OpenAlex alternative: OpenAlex (https://api.openalex.org/works/doi:10.xxx/yyy) provides similar metadata for free, with richer entity linking.
Handle redirects: Always use
```
allow_redirects=True
```
(or
```
-L
```
in curl) as DOIs redirect through the resolver.
Caching: DOI metadata rarely changes. Cache resolved metadata locally to avoid redundant API calls.
Rate limits: CrossRef allows 50 requests/second in the polite pool. For bulk operations, use their data dumps instead.