Medsci-skills fill-protocol
git clone https://github.com/Aperivue/medsci-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/Aperivue/medsci-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/fill-protocol" ~/.claude/skills/aperivue-medsci-skills-fill-protocol && rm -rf "$T"
skills/fill-protocol/SKILL.mdFill-Protocol Skill
You are helping a researcher populate an institutional Word form (IRB protocol, ethics application, grant proposal, etc.) without breaking the original document formatting. This skill is the formatting counterpart to
write-protocol: where
write-protocol drafts content, fill-protocol lays that content into the
institutional template.
Why This Skill Exists
Recreating institutional forms from scratch with
python-docx reliably destroys
table layouts, page breaks, and font consistency. The only safe approach is to
open the existing template and replace cell/paragraph text in place. This
skill enforces that pattern.
Core Principles (Do Not Violate)
- Open the existing template — never create from scratch. Use
, notDocument(template_path)
.Document() - Convert .doc → .docx via LibreOffice headless before any editing.
is not supported;pandoc -f doc
corrupts table structure.textutil - Match cells by left-label text, not row/column coordinates. Templates evolve and coordinate matching breaks silently.
- Apply
to every filled row so a row never breaks across pages.cantSplit - For CJK languages, set the
font attribute, not justeastAsia
. Hangul/Kanji/Hanzi will render in fallback fonts otherwise.run.font.name - Validate every fill operation: report unmatched labels, count empty cells, and surface mismatches before saving.
Dependencies
If the template is already
.docx, LibreOffice is not required — only the
three Python packages below. LibreOffice is needed only when the template is a
legacy .doc and must be converted first.
# Python libraries (always required) pip install --user docxtpl python-docx pyyaml # LibreOffice (only for legacy .doc input; ~700 MB on macOS) brew install --cask libreoffice # macOS sudo apt-get install -y libreoffice # Debian/Ubuntu sudo dnf install -y libreoffice # Fedora sudo pacman -S --needed libreoffice-fresh # Arch
Bundled setup script
The skill ships a
setup.sh that detects what is missing and installs only
those parts, with a confirmation prompt before each step:
bash setup.sh check # report what's installed (read-only) bash setup.sh install # install missing pieces (asks before each)
Auto-install behavior (for Claude as the caller)
When invoking this skill on behalf of a user:
- Before calling
, rundoc_to_docx.py
. If LibreOffice is missing, ask the user before installing — the cask is ~700 MB and proceeding silently is unfriendly.bash setup.sh check - Skip LibreOffice entirely if the template is already
. Only surface the install prompt when a.docx
is encountered..doc - Never pass
to--yes
unless the user has explicitly authorized unattended installation in this session.setup.sh install - If the user declines installation, fall back to asking them to convert
the
manually (open in Word/LibreOffice/Pages → Save As → .docx) and then re-run with the converted file..doc
Workflow
Step 1 — Convert legacy .doc to .docx (if needed)
python scripts/doc_to_docx.py path/to/template.doc path/to/template.docx
Step 2 — Inspect the template structure
python scripts/inspect_template.py path/to/template.docx
This lists every table, every cell (with row/column coordinates and content preview), and every top-level paragraph. Use this output to identify the labels you will match against in your YAML content file.
Step 3 — Author a content YAML
The YAML supports three fill modes. All keys are optional.
protections: korean_font: "맑은 고딕" # CJK font (set to "Noto Sans CJK KR", "SimSun", # "MS Mincho", etc. for other locales) cant_split: true # Apply <w:cantSplit/> to every filled row # Readability options (see "Readability" section below for full semantics) blank_between_paragraphs: true # default true — Enter between \n\n chunks blank_around_section_header: true # default true — Enter above/below filled sections blank_around_all_section_headers: false # default false — opt-in; also touches untouched sections # Mode 1 — table key/value (left-label cell → right value cell) table_kv: "Study Title": "Multi-center prospective validation of ..." "Principal Investigator": "Last, First (Department)" "연구 목적": "본 연구는 ..." # Mode 2 — section replacement (find numbered header, replace until next header) section_replace: "1. Background": "Hepatocellular carcinoma is the third leading cause of ..." "4. 연구 배경 및 이론적 근거": "..." # Mode 3 — single paragraph in-place text replacement paragraph_replace: "Title:": "Title: Multi-center prospective validation of ..."
Readability — three blank-line knobs
All blank paragraphs inserted by these options use a forced single-line height (
<w:spacing w:line="240" w:before="0" w:after="0"/>) so the gap is exactly
one body-text line — never inflates the document's apparent line spacing.
| Option | Default | What it does | When to flip |
|---|---|---|---|
| | Inserts a blank line between every -split chunk inside | Disable only for forms where every line must be packed tight |
| | Wraps each header that you with a blank above and a blank below | Disable when the template style already adds visual gaps via |
| | After all fills, scans every numbered header () — including ones you didn't replace — and adds blank lines around them | Enable when uniform readability matters more than form fidelity. Default off because IRB / public-document submissions favor template fidelity over visual consistency (page count stability, boilerplate untouched, reviewer-expected layout) |
| | On save, converts dangling empty paragraphs whose sole content is into a attribute on the next content paragraph. Prevents visible blank pages when the preceding content (e.g. an abstract table) grows or shrinks and pushes the empty paragraph onto a page of its own, causing the break to land one page later. | Disable only if your template intentionally relies on the empty-paragraph-as-separator pattern for spacing |
The third option exists because
section_replace only touches sections you
list in the YAML. If a template has 18 numbered sections and you only fill 12,
the other 6 stay tight against their content — visually inconsistent. Turn the
opt-in on for documents where you'd rather the consistency than the fidelity.
Step 4 — Run the fill
python scripts/fill_form.py \ --template path/to/template.docx \ --content content.yaml \ --output path/to/filled.docx
The CLI prints
[OK] / [MISS] for every fill operation and a summary at the
end. Investigate any [MISS] before submitting.
Step 5 — Visual verification
soffice --headless --convert-to pdf path/to/filled.docx
Open the PDF and visually confirm: page count is sensible, no table row was split across pages, no font fell back to Times New Roman, all required fields are populated.
Python API
from fill_form import FormFiller filler = FormFiller("template.docx", korean_font="맑은 고딕") # Fill table cells filler.fill_table_kv("Study Title", "...") filler.fill_table_kv("연구 목적", "...") # Replace section content (header to next header) filler.replace_paragraphs_after("4. Background", new_content) # Replace a single paragraph filler.replace_paragraph_matching("Title:", "Title: ...") # Validate and save warnings = filler.validate() for w in warnings: print(w) filler.save("filled.docx")
Anti-Patterns (Do Not Do)
| Anti-pattern | Consequence |
|---|---|
then rebuild table | Loss of header logo, custom margins, footer placeholders, and page numbering |
| "Unknown input format doc" — pandoc does not parse .doc |
| Table cell merging is dropped or corrupted |
(single assignment) | Run-level styles (bold, color, eastAsia font) are erased |
Coordinate-based matching | Silent breakage when the template adds or reorders rows |
alone for Hangul | Hangul characters render in the default Western font |
Companion Skills
— drafts the scientific content (Background, Study Design, Sample Size, Statistical Plan) thatwrite-protocol
then renders into the formfill-protocol
— converts Korean Hangul .hwp / .hwpx files; chain it beforehwp-pipeline
when the institutional form is distributed in HWP formatfill-protocol
— validates that the filled protocol satisfies CONSORT / STARD / TRIPOD / CLAIM checklists before submissioncheck-reporting
— produces the sample size text thatcalc-sample-size
slots into the corresponding sectionfill-protocol
Files
— LibreOffice headless wrapper for .doc → .docxscripts/doc_to_docx.py
— reports tables, cells, and paragraphsscripts/inspect_template.py
— thescripts/fill_form.py
library and CLI entry pointFormFiller
— worked examples for IRB, ethics waiver, and grant templatesexamples/
— formatting notes (cantSplit, eastAsia, multi-line cell text)references/best_practices.md
Known Limitations
- HWP / HWPX input is not handled directly — chain with
to convert HWP → HWPX → DOCX first.hwp-pipeline - Merged cells: filling a label cell that participates in a vertical merge may overwrite the merged region's content. Test on a copy first.
- Embedded form fields (Word's content controls): not yet supported. Plain paragraph and table cell content only.
- Right-to-left scripts (Arabic, Hebrew): untested.
Anti-Hallucination
- Never fabricate references. All citations must be verified via
with confirmed DOI or PMID. Mark unverified references as/search-lit
.[UNVERIFIED - NEEDS MANUAL CHECK] - Never invent clinical definitions, diagnostic criteria, or guideline recommendations. If uncertain, flag with
and ask the user.[VERIFY] - Never fabricate numerical results — compliance percentages, scores, effect sizes, or sample sizes must come from actual data or analysis output.
- If a reporting guideline item, journal policy, or clinical standard is uncertain, state the uncertainty rather than guessing.