Medical-research-skills html-to-pdf
Convert HTML files or URLs to high-fidelity PDFs using Puppeteer; auto-detects or forces RTL for Hebrew/Arabic when RTL content is present.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Other/html-to-pdf" ~/.claude/skills/aipoch-medical-research-skills-html-to-pdf && rm -rf "$T"
manifest:
scientific-skills/Other/html-to-pdf/SKILL.mdsource content
Validation Shortcut
Run this minimal command first to verify the supported execution path:
python scripts/validate_skill.py --help
When to Use
- Export a web page (URL) to a PDF with Chrome-accurate rendering (CSS Grid/Flexbox, backgrounds, web fonts).
- Convert a local HTML report/invoice into a print-ready PDF (A4/Letter) with predictable margins and scaling.
- Generate single-page PDFs that must fit exactly on one page (posters, certificates, one-page invoices).
- Produce RTL documents (Hebrew/Arabic) where direction must be correct and fonts must render reliably.
- Render dynamic HTML that requires JavaScript execution before printing (charts, client-side templates).
Key Features
- Chrome-quality rendering via Puppeteer (headless Chromium/Chrome/Edge).
- Input flexibility: local HTML file, URL, or stdin (
) piping.- - RTL support: automatic RTL detection with an option to force RTL (
).--rtl - Print controls: format, landscape, margins (uniform or per-side), scale, background printing.
- Header/Footer templates: inject HTML header/footer.
- Stability for fonts/resources: waits for network idle and
, plus configurable extra wait.document.fonts.ready
Dependencies
- Node.js: >= 16 (recommended)
- npm: >= 8
- puppeteer: installed via
(Chromium bundled by default)npm install - Optional browser executable (if not using bundled Chromium):
- Google Chrome or Microsoft Edge (path provided via
or--executable-path
)PUPPETEER_EXECUTABLE_PATH
- Google Chrome or Microsoft Edge (path provided via
Example Usage
1) Install
cd d:/skills/html-to-pdf npm install
2) Run (local HTML → PDF)
node assets/html-to-pdf.js input.html output.pdf
3) Run (URL → PDF)
node assets/html-to-pdf.js https://example.com page.pdf
4) Force RTL (Hebrew/Arabic)
node assets/html-to-pdf.js hebrew.html hebrew.pdf --rtl
5) Pipe HTML via stdin
echo "<h1>Example Title</h1>" | node assets/html-to-pdf.js - output.pdf --rtl
6) Use a specific browser executable (optional)
node assets/html-to-pdf.js https://example.com page.pdf --executable-path="C:/Program Files (x86)/Microsoft/Edge/Application/msedge.exe"
Or via environment variable:
set PUPPETEER_EXECUTABLE_PATH=C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe node assets/html-to-pdf.js https://example.com page.pdf
7) A complete, runnable single-page A4 example (recommended pattern)
Create
single-page.html:
<!doctype html> <html lang="he" dir="rtl"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <link href="https://fonts.googleapis.com/css2?family=Heebo:wght@400;700&display=swap" rel="stylesheet"> <style> @page { size: A4; margin: 0; } /* Keep the page box exact; avoid backgrounds here to prevent extra pages */ html, body { width: 210mm; height: 297mm; margin: 0; padding: 0; overflow: hidden; } /* Put backgrounds on a full-size container instead */ .container { width: 100%; height: 100%; box-sizing: border-box; padding: 20mm; font-family: "Heebo", sans-serif; direction: rtl; text-align: right; background: #f5f5f5; overflow-wrap: break-word; word-wrap: break-word; } </style> </head> <body> <div class="container"> <h1>כותרת לדוגמה</h1> <p>תוכן לדוגמה שמודפס כעמוד יחיד.</p> </div> </body> </html>
Generate the PDF:
node assets/html-to-pdf.js single-page.html single-page.pdf --format=A4 --margin=0 --wait=1000
Implementation Details
Rendering approach
- Uses Puppeteer to load either:
- a local HTML file, or
- a remote URL, or
- HTML from stdin (
)-
- Wait strategy (to reduce missing fonts/assets):
- waits for
(resources finished loading),networkidle0 - waits for
,document.fonts.ready - then waits an additional
(default--wait=<ms>
) for late JS/font rendering.1000
- waits for
RTL handling
- Default behavior: auto-detect RTL content (e.g., Hebrew/Arabic) and apply RTL direction.
- Override:
forces RTL regardless of detection.--rtl
Page-fit rule (single-page PDFs)
To avoid unexpected blank pages, ensure the content fits exactly within the page box:
- Do not set backgrounds on
orhtml
(common cause of extra pages).body - Put backgrounds on a full-height container (
) instead..container - Use
onoverflow: hidden
for strict single-page output.html, body - If overflow persists, adjust:
(e.g.,--scale
,0.9
)0.8
(e.g.,--margin
,10mm
)0
CLI parameters
| Parameter | Description | Default |
|---|---|---|
| Page size: A4, Letter, Legal, A3, A5 | A4 |
| Landscape orientation | false |
| Uniform margin (e.g., , ) | 20mm |
| Top margin | 20mm |
| Right margin | 20mm |
| Bottom margin | 20mm |
| Left margin | 20mm |
| Scale factor (0.1-2.0) | 1 |
| Print background graphics | true |
| Disable background printing | - |
| Header HTML template | - |
| Footer HTML template | - |
| Extra wait time for fonts/JS | 1000 |
| Force RTL direction | Auto-detect |
Quality settings
- Uses Chrome print styles (
) and supports print CSS.@page - Sets
to 2 to improve output clarity.deviceScaleFactor
Verification workflow (recommended)
After each generation, visually verify the PDF:
- No extra blank pages (vertical overflow).
- No clipped text (horizontal overflow).
- If issues occur, iterate up to 5 attempts:
- default
- reduce
--scale - reduce
--margin - reduce both further
- fix HTML/CSS (wrap long words, ensure container sizing)
If it still fails after 5 attempts, the HTML layout must be corrected (e.g., reduce fixed heights, fix oversized elements, ensure wrapping).
When Not to Use
- Do not use this skill when the required source data, identifiers, files, or credentials are missing.
- Do not use this skill when the user asks for fabricated results, unsupported claims, or out-of-scope conclusions.
- Do not use this skill when a simpler direct answer is more appropriate than the documented workflow.
Required Inputs
- A clearly specified task goal aligned with the documented scope.
- All required files, identifiers, parameters, or environment variables before execution.
- Any domain constraints, formatting requirements, and expected output destination if applicable.
Recommended Workflow
- Validate the request against the skill boundary and confirm all required inputs are present.
- Select the documented execution path and prefer the simplest supported command or procedure.
- Produce the expected output using the documented file format, schema, or narrative structure.
- Run a final validation pass for completeness, consistency, and safety before returning the result.
Output Contract
- Return a structured deliverable that is directly usable without reformatting.
- If a file is produced, prefer a deterministic output name such as
unless the skill documentation defines a better convention.html_to_pdf_result.md - Include a short validation summary describing what was checked, what assumptions were made, and any remaining limitations.
Validation and Safety Rules
- Validate required inputs before execution and stop early when mandatory fields or files are missing.
- Do not fabricate measurements, references, findings, or conclusions that are not supported by the provided source material.
- Emit a clear warning when credentials, privacy constraints, safety boundaries, or unsupported requests affect the result.
- Keep the output safe, reproducible, and within the documented scope at all times.
Failure Handling
- If validation fails, explain the exact missing field, file, or parameter and show the minimum fix required.
- If an external dependency or script fails, surface the command path, likely cause, and the next recovery step.
- If partial output is returned, label it clearly and identify which checks could not be completed.
Quick Validation
Run this minimal verification path before full execution when possible:
No local script validation step is required for this skill.
Expected output format:
Result file: html_to_pdf_result.md Validation summary: PASS/FAIL with brief notes Assumptions: explicit list if any
Deterministic Output Rules
- Use the same section order for every supported request of this skill.
- Keep output field names stable and do not rename documented keys across examples.
- If a value is unavailable, emit an explicit placeholder instead of omitting the field.
Completion Checklist
- Confirm all required inputs were present and valid.
- Confirm the supported execution path completed without unresolved errors.
- Confirm the final deliverable matches the documented format exactly.
- Confirm assumptions, limitations, and warnings are surfaced explicitly.