Agent-almanac validate-references
git clone https://github.com/pjt222/agent-almanac
T=$(mktemp -d) && git clone --depth=1 https://github.com/pjt222/agent-almanac "$T" && mkdir -p ~/.claude/skills && cp -r "$T/i18n/wenyan/skills/validate-references" ~/.claude/skills/pjt222-agent-almanac-validate-references-08eed1 && rm -rf "$T"
i18n/wenyan/skills/validate-references/SKILL.mdValidate References
Check BibTeX bibliography entries for completeness, accuracy, and consistency. This skill covers verifying required fields per entry type, resolving DOIs via the CrossRef API, checking URL accessibility, detecting duplicate entries, and producing a structured validation report that flags issues by severity. It ensures that .bib files are publication-ready before rendering.
When to Use
- Preparing a manuscript bibliography for journal submission
- Auditing a shared .bib file for quality before a project milestone
- After merging bibliographies from multiple sources
- When citations render incorrectly and you need to diagnose .bib issues
- As a CI check on .bib files in version-controlled projects
Inputs
- Required: Path to a .bib file
- Optional: Validation level (
,basic
,standard
; default:strict
)standard - Optional: Whether to check DOI resolution online (default:
)TRUE - Optional: Whether to check URL accessibility (default:
)TRUE - Optional: Output report path (default: prints to console)
- Optional: CrossRef API email for polite pool (recommended for large files)
Procedure
Step 1: Install and Load Required Packages
required_packages <- c("RefManageR", "httr2", "curl") missing <- required_packages[!vapply(required_packages, requireNamespace, logical(1), quietly = TRUE)] if (length(missing) > 0) install.packages(missing) library(RefManageR)
Expected: All packages load without errors.
On failure: If httr2 is unavailable, install it with
install.packages("httr2").
For systems without curl headers: sudo apt install libcurl4-openssl-dev.
Step 2: Parse and Inventory the Bibliography
bib <- RefManageR::ReadBib("references.bib", check = FALSE) message(sprintf("Loaded %d entries from references.bib", length(bib))) # Inventory entry types entry_types <- vapply(bib, function(x) tolower(attr(x, "bibtype")), character(1)) type_counts <- sort(table(entry_types), decreasing = TRUE) message("Entry types:") for (type in names(type_counts)) { message(sprintf(" %s: %d", type, type_counts[[type]])) }
Expected: Summary of entry types (article, book, inproceedings, etc.) and total count matching the number of
@type{ blocks in the file.
On failure: Parsing errors indicate malformed BibTeX. Check for unmatched braces, missing commas between fields, or invalid UTF-8 characters.
Step 3: Validate Required Fields per Entry Type
# BibTeX required fields by entry type required_fields <- list( article = c("author", "title", "journal", "year"), book = c("author", "title", "publisher", "year"), inproceedings = c("author", "title", "booktitle", "year"), incollection = c("author", "title", "booktitle", "publisher", "year"), phdthesis = c("author", "title", "school", "year"), mastersthesis = c("author", "title", "school", "year"), techreport = c("author", "title", "institution", "year"), misc = c("author", "title", "year"), unpublished = c("author", "title", "note") ) validate_fields <- function(bib) { issues <- list() for (i in seq_along(bib)) { key <- names(bib)[i] entry_type <- tolower(attr(bib[[i]], "bibtype")) req <- required_fields[[entry_type]] if (is.null(req)) { issues[[length(issues) + 1]] <- list( key = key, severity = "warning", message = sprintf("Unknown entry type: %s", entry_type) ) next } for (field in req) { value <- bib[[i]][[field]] if (is.null(value) || !nzchar(trimws(as.character(value)))) { issues[[length(issues) + 1]] <- list( key = key, severity = "error", message = sprintf("Missing required field: %s (type: %s)", field, entry_type) ) } } } issues } field_issues <- validate_fields(bib) message(sprintf("Field validation: %d issues found", length(field_issues)))
Expected: A list of issues where required fields are missing. Zero issues for a well-maintained bibliography.
On failure: This step runs locally and should not fail. If it does, check that the .bib file parsed correctly in Step 2.
Step 4: Resolve and Validate DOIs
validate_dois <- function(bib, email = NULL) { issues <- list() # Set polite API headers headers <- list(`User-Agent` = "R-bibliography-validator/1.0") if (!is.null(email)) { headers[["mailto"]] <- email } for (i in seq_along(bib)) { key <- names(bib)[i] doi <- bib[[i]]$doi if (is.null(doi) || !nzchar(doi)) { issues[[length(issues) + 1]] <- list( key = key, severity = "info", message = "No DOI present" ) next } # Normalize DOI doi <- gsub("^https?://doi\\.org/", "", doi) doi <- gsub("^doi:", "", doi, ignore.case = TRUE) doi <- trimws(doi) # Resolve via CrossRef tryCatch({ resp <- httr2::request(sprintf("https://api.crossref.org/works/%s", doi)) |> httr2::req_headers(!!!headers) |> httr2::req_timeout(10) |> httr2::req_perform() if (httr2::resp_status(resp) != 200) { issues[[length(issues) + 1]] <- list( key = key, severity = "error", message = sprintf("DOI does not resolve: %s (HTTP %d)", doi, httr2::resp_status(resp)) ) } }, error = function(e) { issues[[length(issues) + 1]] <<- list( key = key, severity = "warning", message = sprintf("DOI check failed for %s: %s", doi, e$message) ) }) Sys.sleep(0.5) # Rate limiting } issues } # Only run online checks if requested doi_issues <- validate_dois(bib, email = "your.email@example.com") message(sprintf("DOI validation: %d issues found", length(doi_issues)))
Expected: Each DOI resolves successfully (HTTP 200 from CrossRef). Entries without DOIs are flagged as informational.
On failure: Network errors or rate limiting produce warnings rather than hard failures. Set the
email parameter for higher rate limits from CrossRef's polite pool.
Step 5: Check URL Accessibility
validate_urls <- function(bib) { issues <- list() for (i in seq_along(bib)) { key <- names(bib)[i] url <- bib[[i]]$url if (is.null(url) || !nzchar(url)) next tryCatch({ resp <- httr2::request(url) |> httr2::req_method("HEAD") |> httr2::req_timeout(10) |> httr2::req_error(is_error = function(resp) FALSE) |> httr2::req_perform() status <- httr2::resp_status(resp) if (status >= 400) { issues[[length(issues) + 1]] <- list( key = key, severity = "warning", message = sprintf("URL returned HTTP %d: %s", status, url) ) } }, error = function(e) { issues[[length(issues) + 1]] <<- list( key = key, severity = "warning", message = sprintf("URL unreachable: %s (%s)", url, e$message) ) }) Sys.sleep(0.3) } issues } url_issues <- validate_urls(bib) message(sprintf("URL validation: %d issues found", length(url_issues)))
Expected: All URLs return HTTP 200 (or 301/302 redirects). Broken links flagged.
On failure: Some servers block HEAD requests. Retry with GET for failed HEAD checks. Timeout errors are common for slow academic servers.
Step 6: Detect Duplicate Entries
detect_duplicates <- function(bib) { issues <- list() # Check for duplicate DOIs dois <- vapply(bib, function(x) { d <- x$doi if (is.null(d)) NA_character_ else tolower(trimws(d)) }, character(1)) doi_table <- table(dois[!is.na(dois)]) dup_dois <- names(doi_table[doi_table > 1]) for (d in dup_dois) { keys <- names(bib)[which(dois == d)] issues[[length(issues) + 1]] <- list( key = paste(keys, collapse = ", "), severity = "error", message = sprintf("Duplicate DOI %s in entries: %s", d, paste(keys, collapse = ", ")) ) } # Check for duplicate titles (fuzzy) titles <- vapply(bib, function(x) { t <- x$title if (is.null(t)) NA_character_ else tolower(gsub("[^a-z0-9 ]", "", tolower(t))) }, character(1)) seen <- character(0) for (i in seq_along(titles)) { if (is.na(titles[i])) next for (j in seen) { if (identical(titles[i], titles[as.integer(j)])) { issues[[length(issues) + 1]] <- list( key = sprintf("%s, %s", names(bib)[as.integer(j)], names(bib)[i]), severity = "warning", message = sprintf("Possible duplicate titles: '%s'", substr(bib[[i]]$title, 1, 60)) ) } } seen <- c(seen, as.character(i)) } issues } dup_issues <- detect_duplicates(bib) message(sprintf("Duplicate detection: %d issues found", length(dup_issues)))
Expected: Zero duplicates for a clean bibliography. Any detected duplicates are flagged with the specific entry keys involved.
Step 7: Generate Validation Report
generate_report <- function(all_issues, bib, output_file = NULL) { errors <- Filter(function(x) x$severity == "error", all_issues) warnings <- Filter(function(x) x$severity == "warning", all_issues) infos <- Filter(function(x) x$severity == "info", all_issues) lines <- c( "# Bibliography Validation Report", "", sprintf("**File**: references.bib"), sprintf("**Entries**: %d", length(bib)), sprintf("**Date**: %s", Sys.Date()), "", sprintf("## Summary: %d errors, %d warnings, %d info", length(errors), length(warnings), length(infos)), "" ) if (length(errors) > 0) { lines <- c(lines, "## Errors", "") for (issue in errors) { lines <- c(lines, sprintf("- **[%s]** %s", issue$key, issue$message)) } lines <- c(lines, "") } if (length(warnings) > 0) { lines <- c(lines, "## Warnings", "") for (issue in warnings) { lines <- c(lines, sprintf("- **[%s]** %s", issue$key, issue$message)) } lines <- c(lines, "") } report_text <- paste(lines, collapse = "\n") if (!is.null(output_file)) { writeLines(report_text, output_file) message(sprintf("Report written to %s", output_file)) } cat(report_text) invisible(all_issues) } all_issues <- c(field_issues, doi_issues, url_issues, dup_issues) generate_report(all_issues, bib, output_file = "validation-report.md")
Expected: A structured markdown report listing all issues grouped by severity.
Validation
- All entries have required fields for their type (no errors in field check)
- All DOIs resolve to valid CrossRef records
- No duplicate DOIs exist in the bibliography
- All URLs are accessible (HTTP 200 or redirect)
- Validation report generated without R errors
- Zero errors in report for a publication-ready bibliography
Common Pitfalls
- DOI format inconsistency: DOIs may appear as
,10.1234/...
, orhttps://doi.org/10.1234/...
. Normalize before comparingdoi:10.1234/... - CrossRef rate limiting: Unauthenticated requests are limited to ~50/second.
Always use the
parameter to join the polite pool for higher limitsemail - Transient URL failures: Academic servers occasionally timeout. Retry failed URLs once before flagging them as broken
- Entry type variations: BibLaTeX uses
where BibTeX uses@online
. The validator should handle both@misc - False positive duplicates: Entries like "Introduction" or "Methods" as titles trigger fuzzy matching. Review flagged duplicates manually
- Missing DOIs for older works: Pre-2000 publications often lack DOIs. Flag as informational, not as errors
Related Skills
- fix issues found by this validator (dedup, add fields)manage-bibliography
- format validated entries into styled citationsformat-citations
- APA reports require complete, validated references../reporting/format-apa-report
- vignettes with citations need valid .bib entries../r-packages/write-vignette