Medical-research-skills bibliography

Classifies and organizes literature by theme, method, and conclusion; use when you need to batch-read a folder of PDF/MD/DOCX/TXT files and output a structured CSV for literature reviews and annotation management.

install

source · Clone the upstream repo

git clone https://github.com/aipoch/medical-research-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Other/bibliography" ~/.claude/skills/aipoch-medical-research-skills-bibliography && rm -rf "$T"

manifest: scientific-skills/Other/bibliography/SKILL.md

source content

Source: https://github.com/aipoch/medical-research-skills

Bibliography

When to Use

You are conducting a literature review and need consistent summaries plus structured metadata (theme/method/conclusion) across many papers.
You have a mixed-format reading folder (
```
.pdf
```
,
```
.md
```
,
```
.docx
```
,
```
.txt
```
) and want a single CSV for downstream analysis (e.g., Excel, R, Python).
You need to organize annotations by keywords (theme), experimental methods (method), and key conclusions (conclusion).
You want a two-step pipeline: first generate a human-readable summary Markdown, then generate a machine-friendly CSV from that Markdown.
You need robust handling of PDFs by converting them to Markdown first (via
```
pdf-extract
```
) and then using only Markdown content for extraction.

Key Features

Batch scans an input directory for
```
.pdf
```
,
```
.md
```
,
```
.docx
```
, and
```
.txt
```
literature files.
Converts PDFs to Markdown via
```
pdf-extract
```
, then ignores non-Markdown artifacts (e.g., image folders).
Extracts and normalizes, per document:
- Title
- Summary (prefer original abstract)
- Keywords (theme)
- Experimental Methods (method names only)
- Key Conclusions (single sentence)
- Commentary (one-sentence, tactful evaluation)
Produces exactly two outputs:
1. A consolidated Summary Markdown saved under
```
outputs/
```
2. A single CSV generated from that Summary Markdown
Enforces UTF-8 output to prevent garbled characters; fills missing fields with
```
"Not recognized"
```
instead of leaving blanks.
Uses the CSV field order and headers defined in
```
assets/bibliography_template.csv
```
.

Dependencies

```
pdf-extract
```
(version: not specified; required when PDFs are present)
Input formats supported (no external version constraints specified):
- PDF
- Markdown (
```
.md
```
  )
- DOCX (
```
.docx
```
  )
- Plain text (
```
.txt
```
  )

Example Usage

Goal

Read all literature files in a folder, generate a consolidated summary Markdown, then generate a CSV following

assets/bibliography_template.csv

Inputs

Input directory (example):
```
./inputs/literature/
```
Output directory:
```
./outputs/
```
Output CSV path (example):
```
./outputs/bibliography.csv
```

Expected Outputs (exactly two files)

```
./outputs/bibliography_summary.md
```
```
./outputs/bibliography.csv
```

Example Summary Markdown Structure (generated first)

# Bibliography Summary

## Document 1
- Title: <Title>
- Summary: <Prefer the original Abstract; if missing, use the closest equivalent section>
- Keywords: keyword1 | keyword2 | keyword3
- Experimental Methods: <method1; method2; ... (names only)>
- Key Conclusions: <one sentence covering all main points>
- Commentary: <one tactful sentence>

## Document 2
...

Example CSV (generated from the Summary Markdown)

The CSV must follow the header order defined in:

```
assets/bibliography_template.csv
```

Rules:

One row per document.
No empty cells; use
```
Not recognized
```
when extraction fails.
Save as UTF-8.

Implementation Details

1) Input Reading and Normalization

Traverse the input directory and process files with extensions:
- ```
.pdf
```
  ,
```
.md
```
  ,
```
.docx
```
  ,
```
.txt
```

PDF handling

If PDFs exist, convert them to Markdown using
```
pdf-extract
```
.
Use only the generated
```
.md
```
content; ignore image directories or other byproducts.
Locate
```
pdf-extract
```
as follows:
1. First, look for a sibling skill directory containing
```
SKILL.md
```
  at the same level as this skill’s parent directory.
2. If not found, ask the user to confirm the actual
```
pdf-extract
```
  path.

DOCX handling

Extract body text while preserving title/paragraph order as much as possible.

MD/TXT handling

Read text directly.
If garbled characters appear or key fields cannot be recognized, attempt to detect and read using the original encoding (commonly
```
GB18030
```
/
```
GBK
```
) before extraction.

2) Generate the Summary Markdown First (Single Source of Truth)

Before producing the CSV, generate a consolidated Summary Markdown containing, for each document:

Title
Summary
- Prefer the original Abstract.
- If no “Abstract” exists, use the closest equivalent section (e.g., “Summary”, “Highlights”, or an “Objective–Method–Result–Conclusion” style segment).
Keywords
Experimental Methods
Key Conclusions
Commentary
- Exactly one sentence.
- Avoid harsh criticism; if the work has low value, use tactful phrasing.

This Summary Markdown must be saved with UTF-8 encoding and stored under

outputs/

. The CSV must be generated only from this Markdown (not directly from raw files).

3) Field Extraction Rules (Theme / Method / Conclusion)

Keywords (theme)
- Prefer the original keywords from the document.
- Separate multiple keywords with
```
|
```
  .
- If no keywords are found, generate 3–5 keyword phrases based on the abstract and append:
  - ```
  (generated based on abstract)
```
Experimental Methods (method)
- Output method names only (no long descriptions).
Key Conclusions (conclusion)
- One sentence that covers all main points.

4) CSV Output Constraints

Output exactly one CSV file at the end.
CSV field order and headers must match
```
assets/bibliography_template.csv
```
.
Encoding must be UTF-8 to avoid garbled characters.
If any field cannot be extracted, write
```
Not recognized
```
(never leave empty).
Only two files may be generated in total:
1. Summary Markdown
2. CSV
No temporary/intermediate/auxiliary files may be left behind (including extracted text dumps, caches, logs, images, backups). If conversion/extraction requires intermediate artifacts, keep them in memory or ensure all non-target files are deleted before final output.
Do not use PowerShell to directly write/manipulate CSV/Markdown to avoid encoding/newline issues; always generate and save using UTF-8.

Reference

Detailed rules and field descriptions:
```
references/guide.md
```