install
source · Clone the upstream repo
git clone https://github.com/ronnycoding/my-personal-assistant
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ronnycoding/my-personal-assistant "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/finance-process/extract-pdf-transactions" ~/.claude/skills/ronnycoding-my-personal-assistant-extract-pdf-transactions && rm -rf "$T"
manifest:
.claude/skills/finance-process/extract-pdf-transactions/SKILL.mdsource content
Extract PDF Transactions Skill
Extract transaction data from PDF bank and credit card statements.
Skill Metadata
- Name: extract-pdf-transactions
- Category: Financial Data Processing
- Complexity: Medium
- Privacy: Local processing only, no external APIs
Capabilities
This skill extracts structured transaction data from PDF financial statements:
- Multi-format PDF parsing - Supports various bank statement layouts
- Table detection - Automatically identifies transaction tables
- Data extraction - Parses dates, descriptions, amounts, and balances
- Multi-page support - Processes statements spanning multiple pages
- Error handling - Gracefully handles malformed PDFs
- Validation - Verifies extracted data quality
Usage
/finance-process extract --input="~/Documents/Finance/*.pdf" --output="~/Documents/Finance/transactions.csv"
How It Works
- PDF Discovery: Finds all PDF files matching the input pattern
- Content Extraction: Uses pdfplumber to extract text and tables
- Table Parsing: Identifies transaction tables using pattern matching
- Data Normalization: Standardizes column names and formats
- Validation: Checks for completeness and accuracy
- CSV Export: Saves transactions to specified output file
Script
The main script is
scripts/extract_pdf_statements.py which:
- Accepts PDF file paths (glob patterns supported)
- Extracts transaction tables
- Outputs to CSV format
- Provides progress feedback and error reporting
Dependencies
Required Python packages:
- pdfplumber (primary extraction engine)
- pandas (data manipulation)
- python-dateutil (date parsing)
- tabula-py (fallback for complex tables)
Output Format
The skill produces CSV files with these columns:
- Transaction date (YYYY-MM-DD)date
- Transaction description/merchantdescription
- Transaction amount (negative for debits)amount
- Account balance after transactionbalance
- Auto-categorized transaction typecategory
- Original PDF filenamesource_file
Error Handling
- Logs files that couldn't be processed
- Reports extraction quality metrics
- Provides suggestions for failed extractions
- Continues processing remaining files on error
Privacy & Security
All PDF processing happens locally:
- No cloud API calls
- No data transmission
- Files stay on local filesystem
- Sensitive data never leaves the machine