Awesome-omni-skill pdf-utilities

Read, extract, edit, and manipulate PDF documents including table extraction, page manipulation, fillable forms, and comments.

install

source · Clone the upstream repo

git clone https://github.com/diegosouzapw/awesome-omni-skill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-ai/pdf-utilities" ~/.claude/skills/diegosouzapw-awesome-omni-skill-pdf-utilities && rm -rf "$T"

manifest: skills/data-ai/pdf-utilities/SKILL.md

source content

PDF Utilities Skill

Overview

This skill provides comprehensive PDF processing capabilities including reading PDFs with multiple library backends (tabula, camelot, PyPDF2), extracting tables to DataFrames, editing/extracting page ranges, handling fillable forms, and managing PDF comments. All operations are driven by YAML configuration.

Key Components

ReadPDF Class (read_pdf.py)

Multi-backend PDF reading with table extraction:

```
read_pdf(cfg, file_index)
```
- Route to appropriate backend based on config
```
from_pdf_tabula(cfg, file_index)
```
- Extract tables using tabula-py
```
from_pdf_camelot(cfg, file_index)
```
- Extract tables using camelot
```
from_pdf_PyPDF2(cfg, file_index)
```
- Read PDF pages using PyPDF2

EditPDF Class (edit_pdf.py)

PDF page manipulation and extraction:

```
edit_pdf(cfg, file_index)
```
- Process PDF files based on configuration
```
from_pdf_PyPDF2(cfg, file_index)
```
- Extract page ranges to new PDF files
```
process_cfg_files(cfg)
```
- Process multiple PDF files from config

Additional Modules

```
fillable_pdf.py
```
- Handle fillable PDF forms (fill fields, extract data)
```
pdf_comments.py
```
- Add, read, and manipulate PDF annotations
```
pdf_reports.py
```
- Generate PDF reports from data

Usage Patterns

Table Extraction Configuration

pdf:
  io: pdf_read
  reader: tabula  # or camelot, PyPDF2
  files:
    - path: "input.pdf"
      pages: [1, 2, 3]
      area: [0, 0, 100, 100]  # Optional: specific region

Page Extraction Configuration

pdf:
  io: pdf_edit
  files:
    - path: "source.pdf"
      output: "extracted_pages.pdf"
      page_start: 1
      page_end: 5

Common Workflows

Table Extraction: PDF → tabula/camelot → DataFrame → CSV/Excel
Page Extraction: Multi-page PDF → Extract range → New PDF
Form Processing: Fillable PDF → Fill fields → Save completed form
Report Generation: DataFrame → Generate styled PDF report

Module Location

Read:

src/assetutilities/modules/pdf_utilities/read_pdf.py

Edit:

src/assetutilities/modules/pdf_utilities/edit_pdf.py

Forms:

src/assetutilities/modules/pdf_utilities/fillable_pdf.py

Comments:

src/assetutilities/modules/pdf_utilities/pdf_comments.py

Reports:

src/assetutilities/modules/pdf_utilities/pdf_reports.py

Dependencies

PyPDF2 (PDF reading and manipulation)
tabula-py (table extraction with Java backend)
camelot-py (table extraction)
reportlab (PDF generation, optional)