Medical-research-skills bibliography

Classifies and organizes literature by theme, method, and conclusion; use when you need to batch-read a folder of PDF/MD/DOCX/TXT files and output a structured CSV for literature reviews and annotation management.

install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Other/bibliography" ~/.claude/skills/aipoch-medical-research-skills-bibliography && rm -rf "$T"
manifest: scientific-skills/Other/bibliography/SKILL.md
source content

Source: https://github.com/aipoch/medical-research-skills

Bibliography

When to Use

  • You are conducting a literature review and need consistent summaries plus structured metadata (theme/method/conclusion) across many papers.
  • You have a mixed-format reading folder (
    .pdf
    ,
    .md
    ,
    .docx
    ,
    .txt
    ) and want a single CSV for downstream analysis (e.g., Excel, R, Python).
  • You need to organize annotations by keywords (theme), experimental methods (method), and key conclusions (conclusion).
  • You want a two-step pipeline: first generate a human-readable summary Markdown, then generate a machine-friendly CSV from that Markdown.
  • You need robust handling of PDFs by converting them to Markdown first (via
    pdf-extract
    ) and then using only Markdown content for extraction.

Key Features

  • Batch scans an input directory for
    .pdf
    ,
    .md
    ,
    .docx
    , and
    .txt
    literature files.
  • Converts PDFs to Markdown via
    pdf-extract
    , then ignores non-Markdown artifacts (e.g., image folders).
  • Extracts and normalizes, per document:
    • Title
    • Summary (prefer original abstract)
    • Keywords (theme)
    • Experimental Methods (method names only)
    • Key Conclusions (single sentence)
    • Commentary (one-sentence, tactful evaluation)
  • Produces exactly two outputs:
    1. A consolidated Summary Markdown saved under
      outputs/
    2. A single CSV generated from that Summary Markdown
  • Enforces UTF-8 output to prevent garbled characters; fills missing fields with
    "Not recognized"
    instead of leaving blanks.
  • Uses the CSV field order and headers defined in
    assets/bibliography_template.csv
    .

Dependencies

  • pdf-extract
    (version: not specified; required when PDFs are present)
  • Input formats supported (no external version constraints specified):
    • PDF
    • Markdown (
      .md
      )
    • DOCX (
      .docx
      )
    • Plain text (
      .txt
      )

Example Usage

Goal

Read all literature files in a folder, generate a consolidated summary Markdown, then generate a CSV following

assets/bibliography_template.csv
.

Inputs

  • Input directory (example):
    ./inputs/literature/
  • Output directory:
    ./outputs/
  • Output CSV path (example):
    ./outputs/bibliography.csv

Expected Outputs (exactly two files)

  • ./outputs/bibliography_summary.md
  • ./outputs/bibliography.csv

Example Summary Markdown Structure (generated first)

# Bibliography Summary

## Document 1
- Title: <Title>
- Summary: <Prefer the original Abstract; if missing, use the closest equivalent section>
- Keywords: keyword1 | keyword2 | keyword3
- Experimental Methods: <method1; method2; ... (names only)>
- Key Conclusions: <one sentence covering all main points>
- Commentary: <one tactful sentence>

## Document 2
...

Example CSV (generated from the Summary Markdown)

The CSV must follow the header order defined in:

  • assets/bibliography_template.csv

Rules:

  • One row per document.
  • No empty cells; use
    Not recognized
    when extraction fails.
  • Save as UTF-8.

Implementation Details

1) Input Reading and Normalization

  • Traverse the input directory and process files with extensions:
    • .pdf
      ,
      .md
      ,
      .docx
      ,
      .txt

PDF handling

  • If PDFs exist, convert them to Markdown using
    pdf-extract
    .
  • Use only the generated
    .md
    content; ignore image directories or other byproducts.
  • Locate
    pdf-extract
    as follows:
    1. First, look for a sibling skill directory containing
      SKILL.md
      at the same level as this skill’s parent directory.
    2. If not found, ask the user to confirm the actual
      pdf-extract
      path.

DOCX handling

  • Extract body text while preserving title/paragraph order as much as possible.

MD/TXT handling

  • Read text directly.
  • If garbled characters appear or key fields cannot be recognized, attempt to detect and read using the original encoding (commonly
    GB18030
    /
    GBK
    ) before extraction.

2) Generate the Summary Markdown First (Single Source of Truth)

Before producing the CSV, generate a consolidated Summary Markdown containing, for each document:

  • Title
  • Summary
    • Prefer the original Abstract.
    • If no “Abstract” exists, use the closest equivalent section (e.g., “Summary”, “Highlights”, or an “Objective–Method–Result–Conclusion” style segment).
  • Keywords
  • Experimental Methods
  • Key Conclusions
  • Commentary
    • Exactly one sentence.
    • Avoid harsh criticism; if the work has low value, use tactful phrasing.

This Summary Markdown must be saved with UTF-8 encoding and stored under

outputs/
. The CSV must be generated only from this Markdown (not directly from raw files).

3) Field Extraction Rules (Theme / Method / Conclusion)

  • Keywords (theme)
    • Prefer the original keywords from the document.
    • Separate multiple keywords with
      |
      .
    • If no keywords are found, generate 3–5 keyword phrases based on the abstract and append:
      • (generated based on abstract)
  • Experimental Methods (method)
    • Output method names only (no long descriptions).
  • Key Conclusions (conclusion)
    • One sentence that covers all main points.

4) CSV Output Constraints

  • Output exactly one CSV file at the end.
  • CSV field order and headers must match
    assets/bibliography_template.csv
    .
  • Encoding must be UTF-8 to avoid garbled characters.
  • If any field cannot be extracted, write
    Not recognized
    (never leave empty).
  • Only two files may be generated in total:
    1. Summary Markdown
    2. CSV
  • No temporary/intermediate/auxiliary files may be left behind (including extracted text dumps, caches, logs, images, backups). If conversion/extraction requires intermediate artifacts, keep them in memory or ensure all non-target files are deleted before final output.
  • Do not use PowerShell to directly write/manipulate CSV/Markdown to avoid encoding/newline issues; always generate and save using UTF-8.

Reference

  • Detailed rules and field descriptions:
    references/guide.md