Full-stack-skills ocrmypdf-optimize
OCRmyPDF optimization skill — compress PDFs, configure PDF/A output, JBIG2 encoding, and lossless optimization. Use when the user needs to reduce PDF file size, create archival PDF/A files, or optimize OCR output.
git clone https://github.com/partme-ai/full-stack-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/partme-ai/full-stack-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ocrmypdf-skills/ocrmypdf-optimize" ~/.claude/skills/partme-ai-full-stack-skills-ocrmypdf-optimize && rm -rf "$T"
skills/ocrmypdf-skills/ocrmypdf-optimize/SKILL.mdOCRmyPDF — Optimization Guide
Overview
OCRmyPDF provides extensive optimization options to reduce file size, create PDF/A archival documents, and configure output quality.
For core OCR functionality, see the ocrmypdf skill. For image processing (deskew, rotate, clean), see ocrmypdf-image. For batch/Docker/scripting, see ocrmypdf-batch.
Compression Levels
# Level 0 — no optimization (fastest) ocrmypdf --optimize 0 input.pdf output.pdf # Level 1 — lossless (default) ocrmypdf --optimize 1 input.pdf output.pdf # Level 2 — lossy (aggressive) ocrmypdf --optimize 2 input.pdf output.pdf # Level 3 — lossless, aggressive JPEG recompression ocrmypdf --optimize 3 input.pdf output.pdf
PDF/A Output
PDF/A is an archival format with embedded fonts and colorspaces:
# PDF/A-1b (basic, default) ocrmypdf --output-type pdfa input.pdf output.pdf # PDF/A-2b (includes transparency) ocrmypdf --output-type pdfa2b input.pdf output.pdf # PDF/A-2u (Unicode) ocrmypdf --output-type pdfa2u input.pdf output.pdf # Standard PDF (no archival) ocrmypdf --output-type pdf input.pdf output.pdf
JBIG2 Encoding
JBIG2 provides excellent compression for monochrome (1-bit) images:
# Enable JBIG2 (requires jbig2enc) ocrmypdf --jbig2-lossy input.pdf output.pdf # Lossy ocrmypdf --jbib2-lossless input.pdf output.pdf # Lossless (v17+)
Requirements:
# Debian/Ubuntu apt install jbig2enc # macOS brew install jbig2enc
PNG Optimization
Optimize embedded PNG images:
# Use pngquant for lossy compression ocrmypdf --png-lossy input.pdf output.pdf # Lossless PNG optimization ocrmypdf --png-lossless input.pdf output.pdf
Ghostscript Options
Fine-tune PDF processing with Ghostscript:
# Set PDF minor version ocrmypdf --pdf-renderer hatch input.pdf output.pdf # Use pdfimages for better image extraction ocrmypdf --pdf-renderer img2pdf input.pdf output.pdf
Sidecar Text
Generate text file alongside PDF without modifying PDF:
# Generate sidecar only ocrmypdf --output-type none --sidecar text.txt input.pdf output.pdf # Typical sidecar workflow ocrmypdf --sidecar text.txt --force-ocr input.pdf output.pdf
Combined Recipes
Maximum compression
ocrmypdf --optimize 3 --jbig2-lossy --png-lossy input.pdf small.pdf
Archival PDF/A with compression
ocrmypdf --output-type pdfa --optimize 2 input.pdf archival.pdf
Lossless output
ocrmypdf --output-type pdf --optimize 1 --png-lossless input.pdf lossless.pdf
Quick Reference
| Task | Command |
|---|---|
| No optimization | |
| Lossless default | |
| Aggressive lossy | |
| Max quality | |
| PDF/A-1b (default) | |
| PDF/A-2b | |
| JBIG2 lossy | |
| PNG lossy | |
| Sidecar text | |
Troubleshooting
- Large file size: Try
or--optimize 2
.--png-lossy - PDF/A validation fails: Use
for better compatibility.--output-type pdfa2b - Font issues: PDF/A-2u ensures full Unicode support.