Awesome-omni-skill pdf-utilities
Read, extract, edit, and manipulate PDF documents including table extraction, page manipulation, fillable forms, and comments.
install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-ai/pdf-utilities" ~/.claude/skills/diegosouzapw-awesome-omni-skill-pdf-utilities && rm -rf "$T"
manifest:
skills/data-ai/pdf-utilities/SKILL.mdsource content
PDF Utilities Skill
Overview
This skill provides comprehensive PDF processing capabilities including reading PDFs with multiple library backends (tabula, camelot, PyPDF2), extracting tables to DataFrames, editing/extracting page ranges, handling fillable forms, and managing PDF comments. All operations are driven by YAML configuration.
Key Components
ReadPDF Class (read_pdf.py)
Multi-backend PDF reading with table extraction:
- Route to appropriate backend based on configread_pdf(cfg, file_index)
- Extract tables using tabula-pyfrom_pdf_tabula(cfg, file_index)
- Extract tables using camelotfrom_pdf_camelot(cfg, file_index)
- Read PDF pages using PyPDF2from_pdf_PyPDF2(cfg, file_index)
EditPDF Class (edit_pdf.py)
PDF page manipulation and extraction:
- Process PDF files based on configurationedit_pdf(cfg, file_index)
- Extract page ranges to new PDF filesfrom_pdf_PyPDF2(cfg, file_index)
- Process multiple PDF files from configprocess_cfg_files(cfg)
Additional Modules
- Handle fillable PDF forms (fill fields, extract data)fillable_pdf.py
- Add, read, and manipulate PDF annotationspdf_comments.py
- Generate PDF reports from datapdf_reports.py
Usage Patterns
Table Extraction Configuration
pdf: io: pdf_read reader: tabula # or camelot, PyPDF2 files: - path: "input.pdf" pages: [1, 2, 3] area: [0, 0, 100, 100] # Optional: specific region
Page Extraction Configuration
pdf: io: pdf_edit files: - path: "source.pdf" output: "extracted_pages.pdf" page_start: 1 page_end: 5
Common Workflows
- Table Extraction: PDF → tabula/camelot → DataFrame → CSV/Excel
- Page Extraction: Multi-page PDF → Extract range → New PDF
- Form Processing: Fillable PDF → Fill fields → Save completed form
- Report Generation: DataFrame → Generate styled PDF report
Module Location
- Read:
src/assetutilities/modules/pdf_utilities/read_pdf.py - Edit:
src/assetutilities/modules/pdf_utilities/edit_pdf.py - Forms:
src/assetutilities/modules/pdf_utilities/fillable_pdf.py - Comments:
src/assetutilities/modules/pdf_utilities/pdf_comments.py - Reports:
src/assetutilities/modules/pdf_utilities/pdf_reports.py
Dependencies
- PyPDF2 (PDF reading and manipulation)
- tabula-py (table extraction with Java backend)
- camelot-py (table extraction)
- reportlab (PDF generation, optional)