Medical-research-skills citation-management
Comprehensive citation management for academic research; use when you need to discover papers (Google Scholar/PubMed), extract/verify metadata (DOI/PMID/arXiv/URL), and produce validated, clean BibTeX for manuscripts.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/citation-management" ~/.claude/skills/aipoch-medical-research-skills-citation-management && rm -rf "$T"
manifest:
scientific-skills/Evidence Insight/citation-management/SKILL.mdsource content
When to Use
- You need to find relevant or highly cited papers on a topic using Google Scholar or PubMed.
- You have identifiers (e.g., DOI, PMID, arXiv ID, URL) and must convert them into correct BibTeX.
- You want to verify citation accuracy (DOI resolution, required fields, consistency with CrossRef/PubMed).
- You need to clean, deduplicate, sort, and standardize an existing
file before submission..bib - You are preparing a thesis/manuscript and need a reproducible workflow from search → extraction → formatting → validation.
Key Features
- Paper discovery
- Google Scholar search with year filtering, pagination, and citation-count sorting.
- PubMed search with MeSH terms, field tags, publication-type filters, and date ranges.
- Metadata extraction
- Resolve DOI/PMID/arXiv/URL to structured metadata via CrossRef, PubMed E-utilities, and arXiv APIs.
- Batch processing from files containing mixed identifiers.
- BibTeX generation & cleanup
- Generate BibTeX entries with appropriate entry types and required fields.
- Format, sort (key/year/author), and deduplicate BibTeX libraries.
- Citation validation
- DOI resolution checks and metadata cross-checking.
- Required-field checks by entry type, syntax validation, duplicate detection, and optional auto-fix.
- Workflow integration
- Produces submission-ready
files for LaTeX/Overleaf workflows and complements literature review pipelines..bib
- Produces submission-ready
Dependencies
- Python: 3.10+ (recommended)
- Python packages:
requests>=2.31.0
(optional; required only for Google Scholar automation)scholarly>=1.7.11
Example Usage
A complete, end-to-end workflow that searches, extracts metadata, formats, deduplicates, and validates a bibliography:
# 1) Search PubMed (biomedical focus) python scripts/search_pubmed.py \ --query '"CRISPR-Cas Systems"[MeSH] AND "Gene Editing"[MeSH]' \ --date-start 2020-01-01 \ --date-end 2024-12-31 \ --limit 200 \ --output crispr_pubmed.json # 2) Search Google Scholar (broad coverage) python scripts/search_google_scholar.py "CRISPR gene editing therapeutics" \ --year-start 2020 \ --year-end 2024 \ --limit 100 \ --output crispr_scholar.json # 3) Extract metadata from search outputs (or mixed identifiers) cat crispr_pubmed.json crispr_scholar.json > combined_results.json python scripts/extract_metadata.py \ --input combined_results.json \ --output combined.bib # 4) Add known papers by DOI (append) python scripts/doi_to_bibtex.py 10.1038/s41586-021-03819-2 >> combined.bib python scripts/doi_to_bibtex.py 10.1126/science.aam9317 >> combined.bib # 5) Format + deduplicate + sort (newest first) python scripts/format_bibtex.py combined.bib \ --deduplicate \ --sort year \ --descending \ --output formatted.bib # 6) Validate + auto-fix common issues + emit report python scripts/validate_citations.py formatted.bib \ --auto-fix \ --report validation.json \ --output final_references.bib # 7) Inspect validation results cat validation.json
Implementation Details
1) Search (Discovery)
-
Google Scholar (
)scripts/search_google_scholar.py- Supports query operators such as exact phrases (
), author filters ("deep learning"
), title-only (author:LeCun
), exclusions (intitle:"neural networks"
), and year ranges.-survey - Typical parameters:
,--year-start
: constrain publication years--year-end
: cap results--limit
: prioritize highly cited papers (when supported by the script)--sort-by citations
- Supports query operators such as exact phrases (
-
PubMed (
)scripts/search_pubmed.py- Uses NCBI E-utilities (e.g., ESearch/EFetch/ESummary) to retrieve PMIDs and metadata.
- Typical parameters:
: supports MeSH terms, field tags, and Boolean logic--query
,--date-start
: publication date filtering--date-end
: e.g.,--publication-typesClinical Trial,Review
: JSON or BibTeX output (if supported)--format
(See:
references/google_scholar_search.md, references/pubmed_search.md)
2) Metadata Extraction (Normalization)
- Identifier inputs: DOI, PMID, arXiv ID, URL, or mixed lists/files.
- Primary sources:
- CrossRef API for DOI-centric journal metadata
- PubMed E-utilities for biomedical records (PMID/PMCID, MeSH, abstracts)
- arXiv API for preprints and versioned records
- DataCite API for datasets/software DOIs (if implemented/used)
- Field mapping goals:
- Required:
,author
,titleyear - Articles:
,journal
,volume
,number
,pagesdoi - Conferences:
,booktitlepages - Preprints: repository + identifier (e.g.,
,eprint
)archivePrefix
- Required:
(See:
references/metadata_extraction.md)
3) BibTeX Formatting (Quality & Consistency)
- Entry types commonly produced:
,@article
,@inproceedings
,@book
.@misc - Formatting rules enforced/encouraged:
- Page ranges use
(e.g.,--
)123--145 - Protect capitalization in titles with braces (e.g.,
){CRISPR} - Consistent author formatting (
)Last, First and Last, First - Stable citation keys (project convention; often
)FirstAuthorYearKeyword
- Page ranges use
(See:
references/bibtex_formatting.md)
4) Validation (Correctness)
Validation typically checks:
- DOI validity: resolves via
and matches CrossRef metadata.doi.org - Required fields: present per entry type; no empty critical fields.
- Consistency: year format, numeric volume/issue, page-range syntax, URL accessibility.
- Duplicates: same DOI, near-identical titles, or same author/year/title combinations.
- BibTeX syntax: braces/quotes, commas, unique keys, special character handling.
Outputs may include a machine-readable report (e.g., JSON) with
errors and warnings.
(See: references/citation_validation.md)