Medical-research-skills bibliography
Classifies and organizes literature by theme, method, and conclusion; use when you need to batch-read a folder of PDF/MD/DOCX/TXT files and output a structured CSV for literature reviews and annotation management.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Other/bibliography" ~/.claude/skills/aipoch-medical-research-skills-bibliography && rm -rf "$T"
manifest:
scientific-skills/Other/bibliography/SKILL.mdsource content
Bibliography
When to Use
- You are conducting a literature review and need consistent summaries plus structured metadata (theme/method/conclusion) across many papers.
- You have a mixed-format reading folder (
,.pdf
,.md
,.docx
) and want a single CSV for downstream analysis (e.g., Excel, R, Python)..txt - You need to organize annotations by keywords (theme), experimental methods (method), and key conclusions (conclusion).
- You want a two-step pipeline: first generate a human-readable summary Markdown, then generate a machine-friendly CSV from that Markdown.
- You need robust handling of PDFs by converting them to Markdown first (via
) and then using only Markdown content for extraction.pdf-extract
Key Features
- Batch scans an input directory for
,.pdf
,.md
, and.docx
literature files..txt - Converts PDFs to Markdown via
, then ignores non-Markdown artifacts (e.g., image folders).pdf-extract - Extracts and normalizes, per document:
- Title
- Summary (prefer original abstract)
- Keywords (theme)
- Experimental Methods (method names only)
- Key Conclusions (single sentence)
- Commentary (one-sentence, tactful evaluation)
- Produces exactly two outputs:
- A consolidated Summary Markdown saved under
outputs/ - A single CSV generated from that Summary Markdown
- A consolidated Summary Markdown saved under
- Enforces UTF-8 output to prevent garbled characters; fills missing fields with
instead of leaving blanks."Not recognized" - Uses the CSV field order and headers defined in
.assets/bibliography_template.csv
Dependencies
(version: not specified; required when PDFs are present)pdf-extract- Input formats supported (no external version constraints specified):
- Markdown (
).md - DOCX (
).docx - Plain text (
).txt
Example Usage
Goal
Read all literature files in a folder, generate a consolidated summary Markdown, then generate a CSV following
assets/bibliography_template.csv.
Inputs
- Input directory (example):
./inputs/literature/ - Output directory:
./outputs/ - Output CSV path (example):
./outputs/bibliography.csv
Expected Outputs (exactly two files)
./outputs/bibliography_summary.md./outputs/bibliography.csv
Example Summary Markdown Structure (generated first)
# Bibliography Summary ## Document 1 - Title: <Title> - Summary: <Prefer the original Abstract; if missing, use the closest equivalent section> - Keywords: keyword1 | keyword2 | keyword3 - Experimental Methods: <method1; method2; ... (names only)> - Key Conclusions: <one sentence covering all main points> - Commentary: <one tactful sentence> ## Document 2 ...
Example CSV (generated from the Summary Markdown)
The CSV must follow the header order defined in:
assets/bibliography_template.csv
Rules:
- One row per document.
- No empty cells; use
when extraction fails.Not recognized - Save as UTF-8.
Implementation Details
1) Input Reading and Normalization
- Traverse the input directory and process files with extensions:
,.pdf
,.md
,.docx.txt
PDF handling
- If PDFs exist, convert them to Markdown using
.pdf-extract - Use only the generated
content; ignore image directories or other byproducts..md - Locate
as follows:pdf-extract- First, look for a sibling skill directory containing
at the same level as this skill’s parent directory.SKILL.md - If not found, ask the user to confirm the actual
path.pdf-extract
- First, look for a sibling skill directory containing
DOCX handling
- Extract body text while preserving title/paragraph order as much as possible.
MD/TXT handling
- Read text directly.
- If garbled characters appear or key fields cannot be recognized, attempt to detect and read using the original encoding (commonly
/GB18030
) before extraction.GBK
2) Generate the Summary Markdown First (Single Source of Truth)
Before producing the CSV, generate a consolidated Summary Markdown containing, for each document:
- Title
- Summary
- Prefer the original Abstract.
- If no “Abstract” exists, use the closest equivalent section (e.g., “Summary”, “Highlights”, or an “Objective–Method–Result–Conclusion” style segment).
- Keywords
- Experimental Methods
- Key Conclusions
- Commentary
- Exactly one sentence.
- Avoid harsh criticism; if the work has low value, use tactful phrasing.
This Summary Markdown must be saved with UTF-8 encoding and stored under
outputs/. The CSV must be generated only from this Markdown (not directly from raw files).
3) Field Extraction Rules (Theme / Method / Conclusion)
- Keywords (theme)
- Prefer the original keywords from the document.
- Separate multiple keywords with
.| - If no keywords are found, generate 3–5 keyword phrases based on the abstract and append:
(generated based on abstract)
- Experimental Methods (method)
- Output method names only (no long descriptions).
- Key Conclusions (conclusion)
- One sentence that covers all main points.
4) CSV Output Constraints
- Output exactly one CSV file at the end.
- CSV field order and headers must match
.assets/bibliography_template.csv - Encoding must be UTF-8 to avoid garbled characters.
- If any field cannot be extracted, write
(never leave empty).Not recognized - Only two files may be generated in total:
- Summary Markdown
- CSV
- No temporary/intermediate/auxiliary files may be left behind (including extracted text dumps, caches, logs, images, backups). If conversion/extraction requires intermediate artifacts, keep them in memory or ensure all non-target files are deleted before final output.
- Do not use PowerShell to directly write/manipulate CSV/Markdown to avoid encoding/newline issues; always generate and save using UTF-8.
Reference
- Detailed rules and field descriptions:
references/guide.md