Medsci-skills fill-protocol

install

source · Clone the upstream repo

git clone https://github.com/Aperivue/medsci-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Aperivue/medsci-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/fill-protocol" ~/.claude/skills/aperivue-medsci-skills-fill-protocol && rm -rf "$T"

manifest: skills/fill-protocol/SKILL.md

source content

Fill-Protocol Skill

You are helping a researcher populate an institutional Word form (IRB protocol, ethics application, grant proposal, etc.) without breaking the original document formatting. This skill is the formatting counterpart to

write-protocol

: where

write-protocol

drafts content,

fill-protocol

lays that content into the institutional template.

Why This Skill Exists

Recreating institutional forms from scratch with

python-docx

reliably destroys table layouts, page breaks, and font consistency. The only safe approach is to open the existing template and replace cell/paragraph text in place. This skill enforces that pattern.

Core Principles (Do Not Violate)

Open the existing template — never create from scratch. Use
```
Document(template_path)
```
, not
```
Document()
```
.
Convert .doc → .docx via LibreOffice headless before any editing.
```
pandoc -f doc
```
is not supported;
```
textutil
```
corrupts table structure.
Match cells by left-label text, not row/column coordinates. Templates evolve and coordinate matching breaks silently.
Apply
cantSplit
to every filled row so a row never breaks across pages.
For CJK languages, set the
eastAsia
font attribute, not just
```
run.font.name
```
. Hangul/Kanji/Hanzi will render in fallback fonts otherwise.
Validate every fill operation: report unmatched labels, count empty cells, and surface mismatches before saving.

Dependencies

If the template is already

.docx

, LibreOffice is not required — only the three Python packages below. LibreOffice is needed only when the template is a legacy

.doc

and must be converted first.

# Python libraries (always required)
pip install --user docxtpl python-docx pyyaml

# LibreOffice (only for legacy .doc input; ~700 MB on macOS)
brew install --cask libreoffice              # macOS
sudo apt-get install -y libreoffice           # Debian/Ubuntu
sudo dnf install -y libreoffice               # Fedora
sudo pacman -S --needed libreoffice-fresh     # Arch

Bundled setup script

The skill ships a

setup.sh

that detects what is missing and installs only those parts, with a confirmation prompt before each step:

bash setup.sh check     # report what's installed (read-only)
bash setup.sh install   # install missing pieces (asks before each)

Auto-install behavior (for Claude as the caller)

When invoking this skill on behalf of a user:

Before calling
doc_to_docx.py
, run
```
bash setup.sh check
```
. If LibreOffice is missing, ask the user before installing — the cask is ~700 MB and proceeding silently is unfriendly.
Skip LibreOffice entirely if the template is already
```
.docx
```
. Only surface the install prompt when a
```
.doc
```
is encountered.
Never pass
```
--yes
```
to
```
setup.sh install
```
unless the user has explicitly authorized unattended installation in this session.
If the user declines installation, fall back to asking them to convert the
```
.doc
```
manually (open in Word/LibreOffice/Pages → Save As → .docx) and then re-run with the converted file.

Workflow

Step 1 — Convert legacy .doc to .docx (if needed)

python scripts/doc_to_docx.py path/to/template.doc path/to/template.docx

Step 2 — Inspect the template structure

python scripts/inspect_template.py path/to/template.docx

This lists every table, every cell (with row/column coordinates and content preview), and every top-level paragraph. Use this output to identify the labels you will match against in your YAML content file.

Step 3 — Author a content YAML

The YAML supports three fill modes. All keys are optional.

protections:
  korean_font: "맑은 고딕"   # CJK font (set to "Noto Sans CJK KR", "SimSun",
                              # "MS Mincho", etc. for other locales)
  cant_split: true            # Apply <w:cantSplit/> to every filled row

  # Readability options (see "Readability" section below for full semantics)
  blank_between_paragraphs: true            # default true — Enter between \n\n chunks
  blank_around_section_header: true         # default true — Enter above/below filled sections
  blank_around_all_section_headers: false   # default false — opt-in; also touches untouched sections

# Mode 1 — table key/value (left-label cell → right value cell)
table_kv:
  "Study Title": "Multi-center prospective validation of ..."
  "Principal Investigator": "Last, First (Department)"
  "연구 목적": "본 연구는 ..."

# Mode 2 — section replacement (find numbered header, replace until next header)
section_replace:
  "1. Background":
    "Hepatocellular carcinoma is the third leading cause of ..."
  "4. 연구 배경 및 이론적 근거":
    "..."

# Mode 3 — single paragraph in-place text replacement
paragraph_replace:
  "Title:":
    "Title: Multi-center prospective validation of ..."

Readability — three blank-line knobs

All blank paragraphs inserted by these options use a forced single-line height (

<w:spacing w:line="240" w:before="0" w:after="0"/>

) so the gap is exactly one body-text line — never inflates the document's apparent line spacing.

Option	Default	What it does	When to flip
`blank_between_paragraphs`	`true`	Inserts a blank line between every `\n\n` -split chunk inside `section_replace`	Disable only for forms where every line must be packed tight
`blank_around_section_header`	`true`	Wraps each header that you `section_replace` with a blank above and a blank below	Disable when the template style already adds visual gaps via `space_before/after`
`blank_around_all_section_headers`	`false`	After all fills, scans every numbered header ( `\d+\.\s+` ) — including ones you didn't replace — and adds blank lines around them	Enable when uniform readability matters more than form fidelity. Default off because IRB / public-document submissions favor template fidelity over visual consistency (page count stability, boilerplate untouched, reviewer-expected layout)
`normalize_page_breaks`	`true`	On save, converts dangling empty paragraphs whose sole content is `<w:br w:type="page"/>` into a `<w:pageBreakBefore/>` attribute on the next content paragraph. Prevents visible blank pages when the preceding content (e.g. an abstract table) grows or shrinks and pushes the empty paragraph onto a page of its own, causing the break to land one page later.	Disable only if your template intentionally relies on the empty-paragraph-as-separator pattern for spacing

The third option exists because

section_replace

only touches sections you list in the YAML. If a template has 18 numbered sections and you only fill 12, the other 6 stay tight against their content — visually inconsistent. Turn the opt-in on for documents where you'd rather the consistency than the fidelity.

Step 4 — Run the fill

python scripts/fill_form.py \
  --template path/to/template.docx \
  --content  content.yaml \
  --output   path/to/filled.docx

The CLI prints

[OK]

[MISS]

for every fill operation and a summary at the end. Investigate any

[MISS]

before submitting.

Step 5 — Visual verification

soffice --headless --convert-to pdf path/to/filled.docx

Open the PDF and visually confirm: page count is sensible, no table row was split across pages, no font fell back to Times New Roman, all required fields are populated.

Python API

from fill_form import FormFiller

filler = FormFiller("template.docx", korean_font="맑은 고딕")

# Fill table cells
filler.fill_table_kv("Study Title", "...")
filler.fill_table_kv("연구 목적", "...")

# Replace section content (header to next header)
filler.replace_paragraphs_after("4. Background", new_content)

# Replace a single paragraph
filler.replace_paragraph_matching("Title:", "Title: ...")

# Validate and save
warnings = filler.validate()
for w in warnings:
    print(w)
filler.save("filled.docx")

Anti-Patterns (Do Not Do)

Anti-pattern	Consequence
`Document()` then rebuild table	Loss of header logo, custom margins, footer placeholders, and page numbering
`pandoc -f doc -t docx`	"Unknown input format doc" — pandoc does not parse .doc
`textutil -convert docx`	Table cell merging is dropped or corrupted
`cell.text = "value"` (single assignment)	Run-level styles (bold, color, eastAsia font) are erased
Coordinate-based matching `table.cell(2, 1)`	Silent breakage when the template adds or reorders rows
`run.font.name` alone for Hangul	Hangul characters render in the default Western font

Companion Skills

```
write-protocol
```
— drafts the scientific content (Background, Study Design, Sample Size, Statistical Plan) that
```
fill-protocol
```
then renders into the form
```
hwp-pipeline
```
— converts Korean Hangul .hwp / .hwpx files; chain it before
```
fill-protocol
```
when the institutional form is distributed in HWP format
```
check-reporting
```
— validates that the filled protocol satisfies CONSORT / STARD / TRIPOD / CLAIM checklists before submission
```
calc-sample-size
```
— produces the sample size text that
```
fill-protocol
```
slots into the corresponding section

Files

```
scripts/doc_to_docx.py
```
— LibreOffice headless wrapper for .doc → .docx
```
scripts/inspect_template.py
```
— reports tables, cells, and paragraphs
```
scripts/fill_form.py
```
— the
```
FormFiller
```
library and CLI entry point
```
examples/
```
— worked examples for IRB, ethics waiver, and grant templates
```
references/best_practices.md
```
— formatting notes (cantSplit, eastAsia, multi-line cell text)

Known Limitations

HWP / HWPX input is not handled directly — chain with
```
hwp-pipeline
```
to convert HWP → HWPX → DOCX first.
Merged cells: filling a label cell that participates in a vertical merge may overwrite the merged region's content. Test on a copy first.
Embedded form fields (Word's content controls): not yet supported. Plain paragraph and table cell content only.
Right-to-left scripts (Arabic, Hebrew): untested.

Anti-Hallucination

Never fabricate references. All citations must be verified via
```
/search-lit
```
with confirmed DOI or PMID. Mark unverified references as
```
[UNVERIFIED - NEEDS MANUAL CHECK]
```
.
Never invent clinical definitions, diagnostic criteria, or guideline recommendations. If uncertain, flag with
```
[VERIFY]
```
and ask the user.
Never fabricate numerical results — compliance percentages, scores, effect sizes, or sample sizes must come from actual data or analysis output.
If a reporting guideline item, journal policy, or clinical standard is uncertain, state the uncertainty rather than guessing.