My-personal-assistant extract-pdf-transactions

Extract PDF Transactions Skill

install

source · Clone the upstream repo

git clone https://github.com/ronnycoding/my-personal-assistant

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ronnycoding/my-personal-assistant "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/finance-process/extract-pdf-transactions" ~/.claude/skills/ronnycoding-my-personal-assistant-extract-pdf-transactions && rm -rf "$T"

manifest: .claude/skills/finance-process/extract-pdf-transactions/SKILL.md

source content

Extract PDF Transactions Skill

Extract transaction data from PDF bank and credit card statements.

Skill Metadata

Name: extract-pdf-transactions
Category: Financial Data Processing
Complexity: Medium
Privacy: Local processing only, no external APIs

Capabilities

This skill extracts structured transaction data from PDF financial statements:

Multi-format PDF parsing - Supports various bank statement layouts
Table detection - Automatically identifies transaction tables
Data extraction - Parses dates, descriptions, amounts, and balances
Multi-page support - Processes statements spanning multiple pages
Error handling - Gracefully handles malformed PDFs
Validation - Verifies extracted data quality

Usage

/finance-process extract --input="~/Documents/Finance/*.pdf" --output="~/Documents/Finance/transactions.csv"

How It Works

PDF Discovery: Finds all PDF files matching the input pattern
Content Extraction: Uses pdfplumber to extract text and tables
Table Parsing: Identifies transaction tables using pattern matching
Data Normalization: Standardizes column names and formats
Validation: Checks for completeness and accuracy
CSV Export: Saves transactions to specified output file

Script

The main script is

scripts/extract_pdf_statements.py

which:

Accepts PDF file paths (glob patterns supported)
Extracts transaction tables
Outputs to CSV format
Provides progress feedback and error reporting

Dependencies

Required Python packages:

pdfplumber (primary extraction engine)
pandas (data manipulation)
python-dateutil (date parsing)
tabula-py (fallback for complex tables)

Output Format

The skill produces CSV files with these columns:

```
date
```
- Transaction date (YYYY-MM-DD)
```
description
```
- Transaction description/merchant
```
amount
```
- Transaction amount (negative for debits)
```
balance
```
- Account balance after transaction
```
category
```
- Auto-categorized transaction type
```
source_file
```
- Original PDF filename

Error Handling

Logs files that couldn't be processed
Reports extraction quality metrics
Provides suggestions for failed extractions
Continues processing remaining files on error

Privacy & Security

All PDF processing happens locally:

No cloud API calls
No data transmission
Files stay on local filesystem
Sensitive data never leaves the machine