Marketplace receipt-scanner-master
Master receipt scanning operations including parsing, debugging, enhancing accuracy, and database integration. Use when working with receipts, images, OCR issues, expense categorization, or troubleshooting receipt uploads.
git clone https://github.com/aiskillstore/marketplace
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/egadams/receipt-scanner-master" ~/.claude/skills/aiskillstore-marketplace-receipt-scanner-master && rm -rf "$T"
skills/egadams/receipt-scanner-master/SKILL.mdReceipt Scanner Master
Master the receipt scanning system that uses AI-powered OCR to extract structured data from receipt images and store them in the database.
What This Skill Does
This skill helps you:
- Parse receipt images (JPG, PNG, WebP, PDF) into structured data
- Debug OCR accuracy issues and extraction errors
- Enhance the receipt parsing engine and prompts
- Test receipt uploads through the web interface
- Troubleshoot database integration issues
- Validate extracted data against actual receipts
- Improve categorization and line item extraction
System Architecture
Frontend Components
Receipt Scanner Component:
/home/adamsl/planner/office-assistant/js/components/receipt-scanner.js
- Primary receipt scanning interface at
http://localhost:8080/receipt-scanner.html - Drag-and-drop or file upload for receipt images
- Parses receipts and displays line items in a table
- Each line item has a category-picker dropdown
- Items auto-save to database immediately when categorized
- Only categorized items are saved (uncategorized items ignored)
- No overall receipt-level category picker (removed)
Upload Component:
/home/adamsl/planner/office-assistant/js/upload-component.js
- Alternative upload interface (bank statements)
- Displays recent downloads from the system
- Shows real-time processing feedback via terminal display
- Handles streaming responses from backend (Server-Sent Events)
- Auto-refreshes file list after successful imports
Backend Components
Receipt Parser:
app/services/receipt_parser.py
- Validates file types and sizes
- Processes and compresses images
- Manages temporary and permanent file storage
- Coordinates with AI engine for extraction
Receipt Engine:
app/services/receipt_engine.py
- Uses Google Gemini AI for OCR and extraction
- Implements strict accuracy validation rules
- Returns structured data via Pydantic models
- Tries models in order: gemini-2.5-flash (first), 2.0-flash, 2.5-pro, pro-latest
- Flash model used first to avoid pro quota limits
- Separate quotas for flash vs pro models
API Endpoints:
app/api/receipt_endpoints.py
- Uploads and parses receipt image (returns temp data, doesn't save)/api/parse-receipt
- Auto-saves individual line items when categorized/api/receipt-items
- Final save for categorized items (batch operation)/api/save-receipt
- Retrieves receipt metadata/api/receipts/{expense_id}
- Serves stored receipt files/api/receipts/file/{year}/{month}/{filename}
Data Models:
app/models/receipt_models.py
- Complete receipt data structureReceiptExtractionResult
- Individual line items with categorizationReceiptItem
- Subtotal, tax, tip, discount, totalReceiptTotals
- Merchant detailsReceiptPartyInfo
- Parsing metadata and model infoReceiptMeta
- Enum: CASH, CARD, BANK, OTHERPaymentMethod
Database Integration
Tables:
- Main expense entries (amount, date, category, method)expenses
- Parsing metadata (model, confidence, raw response)receipt_metadata
Storage Structure:
app/data/receipts/ ├── YYYY/ │ ├── MM/ │ │ ├── receipt_TIMESTAMP_filename.jpg │ │ └── receipt_TIMESTAMP_filename.pdf └── temp/ └── temp_receipt_TIMESTAMP_filename.jpg
How to Use This Skill
Step 1: Test Receipt Parsing
Parse a receipt image to extract structured data:
# Start the API server if not running python3 api_server.py # Test with curl (from another terminal) curl -X POST "http://localhost:8000/api/parse-receipt" \ -F "file=@/path/to/receipt.jpg"
Expected Response:
{ "parsed_data": { "transaction_date": "2025-01-15", "payment_method": "CARD", "party": { "merchant_name": "Walmart", "merchant_phone": null, "merchant_address": "123 Main St", "store_location": "Store #1234" }, "items": [ { "description": "MILK WHOLE GAL", "quantity": 1.0, "unit_price": 4.99, "line_total": 4.99 } ], "totals": { "subtotal": 4.99, "tax_amount": 0.35, "tip_amount": 0.0, "discount_amount": 0.0, "total_amount": 5.34 }, "meta": { "currency": "USD", "receipt_number": "12345", "model_name": "gemini-2.5-pro" } }, "temp_file_name": "temp_receipt_20250115T120000Z_receipt.jpg" }
Step 2: Debug OCR Accuracy Issues
When OCR produces incorrect amounts or descriptions:
Common Issues:
- Digit Confusion: 4↔9, 3↔8, 5↔6, 0↔8, 1↔7
- Missing Items: Items not extracted from receipt
- Wrong Totals: Extracted amounts don't match
- Poor Image Quality: Blurry, dark, or low-resolution images
Debug Process:
-
Check the raw image quality:
# View the receipt image open /path/to/receipt.jpg # or xdg-open /path/to/receipt.jpg- Is text clearly readable?
- Is image properly oriented?
- Is there sufficient contrast?
-
Review the Gemini prompt in
:app/services/receipt_engine.py:96-173- Look for the accuracy rules and verification steps
- Check if new issue types need specific instructions
- Verify digit confusion prevention rules are clear
-
Test with higher quality image:
- Increase
in settingsRECEIPT_IMAGE_MAX_WIDTH_PX - Increase JPEG quality in
receipt_parser.py:80,83
- Increase
-
Add validation logic:
- Check
for each itemquantity × unit_price = line_total - Verify
sum(line_totals) ≈ subtotal - Compare
subtotal + tax - discount = total
- Check
-
Examine the raw AI response:
# Add debug logging in receipt_engine.py:78 print(f"Raw Gemini Response: {json_response}")
Step 3: Enhance the Receipt Parser
To improve parsing accuracy and features:
Modify the Gemini Prompt (
app/services/receipt_engine.py):
def _get_prompt(self) -> str: return """ You are an expert at extracting structured data from receipt images with EXTREME ACCURACY. [Add new instructions here, such as:] **NEW RULE**: For grocery store receipts, items often have: - Short codes (e.g., "VEG", "DAIRY", "MEAT") - Weight-based pricing (price per lb/kg) - Multi-buy discounts (e.g., "2 for $5") **VALIDATION ENHANCEMENT**: Before returning JSON: 1. Verify every item's math: quantity × unit_price = line_total 2. Sum all line_totals and compare to subtotal 3. Check: subtotal + tax - discount + tip = total_amount 4. If any validation fails, RE-EXAMINE the receipt more carefully ... [rest of prompt] """
Improve Image Processing (
app/services/receipt_parser.py):
async def _process_image(self, image_data: bytes, mime_type: str): if mime_type.startswith("image/"): img = Image.open(BytesIO(image_data)) # Add preprocessing steps: # 1. Auto-rotate based on EXIF # 2. Increase contrast for faded receipts # 3. Sharpen slightly for better OCR # 4. Convert to grayscale if color isn't needed
Add Custom Validation (
app/api/receipt_endpoints.py):
@router.post("/parse-receipt") async def parse_receipt_endpoint(file: UploadFile = File(...)): parsed_data, temp_file_name = await parser.process_receipt(file) # Add validation here: validation_errors = validate_receipt_data(parsed_data) if validation_errors: return JSONResponse( status_code=422, content={ "errors": validation_errors, "parsed_data": parsed_data, "temp_file_name": temp_file_name } ) return ParseReceiptResponse(...)
Step 4: Test Through Web Interface
Test the complete workflow including UI:
-
Start the API server:
cd /home/adamsl/planner/nonprofit_finance_db python3 api_server.py -
Open the web interface:
cd /home/adamsl/planner/office-assistant # Open index.html in browser or use a local server python3 -m http.server 8080 # Navigate to http://localhost:8080 -
Test the upload flow:
- Download a test receipt (PDF or image) to ~/Downloads
- Verify it appears in the upload component
- Select the receipt and click "Import Selected PDF"
- Watch the terminal output for processing steps
- Verify success message and database insertion
-
Check database entries:
# Connect to database and verify mysql -u root -p nonprofit_finance_db-- Check latest expense entries SELECT * FROM expenses ORDER BY id DESC LIMIT 5; -- Check receipt metadata SELECT * FROM receipt_metadata ORDER BY id DESC LIMIT 5; -- Verify file storage SELECT expense_id, receipt_url FROM expenses WHERE receipt_url IS NOT NULL LIMIT 5;
Step 5: Troubleshoot Database Issues
Common database integration problems:
Issue: Receipt parsed but not saved to database
Debug steps:
# Check API server logs tail -f api_server.log # Look for errors in save_receipt_endpoint grep -A 10 "Error saving expense" api_server.log # Verify database connection python3 -c "from app.repositories.expenses import ExpenseRepository; repo = ExpenseRepository(); print('Connection OK')"
Issue: File saved to temp but not moved to permanent storage
Debug steps:
# Check temp directory ls -lth app/data/receipts/temp/ | head -20 # Check permanent storage structure ls -R app/data/receipts/ | grep -E "^\\./" # Verify permissions ls -ld app/data/receipts/
Issue: Categorization not working
Debug steps:
# Check categories table mysql -u root -p -e "SELECT id, name, category_path FROM categories ORDER BY id;" nonprofit_finance_db # Verify category_id assignments in parsed items # Items without category_id are not saved to database
Step 6: Validate Extraction Accuracy
Manually verify OCR accuracy:
-
Get the parsed data:
curl -X POST "http://localhost:8000/api/parse-receipt" \ -F "file=@receipt.jpg" | jq '.' -
Compare against actual receipt:
- Open receipt image side-by-side
- Check each line item: description, quantity, price, total
- Verify merchant name and address
- Confirm tax amount and final total
- Note any discrepancies
-
Calculate accuracy metrics:
# Create a validation script import json def validate_receipt(parsed_json, actual_receipt_data): errors = [] # Check item count if len(parsed_json['items']) != len(actual_receipt_data['items']): errors.append(f"Item count mismatch: {len(parsed_json['items'])} vs {len(actual_receipt_data['items'])}") # Check each item for i, (parsed, actual) in enumerate(zip(parsed_json['items'], actual_receipt_data['items'])): if parsed['line_total'] != actual['line_total']: errors.append(f"Item {i}: ${parsed['line_total']} vs ${actual['line_total']}") # Check total if parsed_json['totals']['total_amount'] != actual_receipt_data['total']: errors.append(f"Total: ${parsed_json['totals']['total_amount']} vs ${actual_receipt_data['total']}") return errors
Configuration Files
Environment Variables (
.env):
GEMINI_API_KEY=your_gemini_api_key_here # Receipt settings RECEIPT_MAX_SIZE_MB=10 RECEIPT_IMAGE_MAX_WIDTH_PX=2048 RECEIPT_IMAGE_MAX_HEIGHT_PX=2048 RECEIPT_PARSE_TIMEOUT_SECONDS=30 RECEIPT_UPLOAD_DIR=app/data/receipts RECEIPT_TEMP_UPLOAD_DIR=app/data/receipts/temp
Settings (
app/config.py):
class Settings(BaseSettings): GEMINI_API_KEY: str RECEIPT_MAX_SIZE_MB: int = 10 RECEIPT_IMAGE_MAX_WIDTH_PX: int = 1024 RECEIPT_IMAGE_MAX_HEIGHT_PX: int = 1024 RECEIPT_PARSE_TIMEOUT_SECONDS: int = 30 RECEIPT_UPLOAD_DIR: str = "app/data/receipts" RECEIPT_TEMP_UPLOAD_DIR: str = "app/data/receipts/temp"
Receipt Scanner Workflow (Important!)
CRITICAL: Items do NOT automatically save when you scan a receipt. You must categorize items for them to be saved.
Workflow Steps:
- Upload receipt → Parses and shows line items (nothing saved yet)
- Select category for each item → Item saves immediately to database
- "Save Expense" button → Optional final confirmation
What Gets Saved:
- ✓ Items with categories selected → Saved to
tableexpenses - ✗ Items without categories → Ignored, not saved
- Each categorized item becomes a separate expense entry
Database Behavior:
// When you select a category for an item: _persistCategorizedItem(index, categoryId) { // Immediately POSTs to /api/receipt-items // Creates expense entry in database // Returns expense_id for the item }
Common Issues & Solutions
Issue: "GEMINI_API_KEY environment variable not set"
Solution:
# Add to .env file echo 'GEMINI_API_KEY=your_key_here' >> .env # Or export in current session export GEMINI_API_KEY=your_key_here
Issue: Gemini API quota exceeded (429 error)
Root Cause: Hit the free tier daily quota for a specific model
Solutions:
-
Model fallback (already implemented):
- Receipt engine tries flash models first (separate quota from pro)
- Order: gemini-2.5-flash → 2.0-flash → 2.5-pro → pro-latest
-
Wait for quota reset (24 hours)
-
Use different Google account:
- Create API key from different account
- Update GEMINI_API_KEY in
.env
-
Upgrade to paid tier (higher quotas)
Issue: OCR reads $4.99 as $9.99
Root Cause: Digit confusion (4 vs 9)
Solution: Enhance Gemini prompt with specific digit rules:
**DIGIT 4 vs 9 RECOGNITION**: - 4 has sharp angles, often looks like "4" with a horizontal line and vertical line meeting - 9 has a curved top, looks like "g" or "q" without the tail - Context check: grocery items rarely cost $9.99, more often $4.99
Issue: Missing line items in extraction
Root Cause: Items at bottom of receipt or spanning multiple lines
Solution:
- Increase image resolution in
receipt_parser.py - Add instruction to Gemini prompt:
**COMPLETE EXTRACTION**: Extract ALL items from top to bottom of receipt. Do not skip items even if they are: - At the very bottom of the receipt - Spanning multiple lines - In a different format or font
Issue: Tax calculation mismatch
Root Cause: Some items are tax-exempt or have different tax rates
Solution:
- Add per-item tax tracking in
modelReceiptItem - Update Gemini prompt to identify taxable vs non-taxable items
- Validate:
sum(item.tax_amount for item in items) = totals.tax_amount
Issue: "Receipt parsing exceeded 30 seconds"
Root Cause: Large image file or slow API response
Solutions:
# Increase timeout in settings RECEIPT_PARSE_TIMEOUT_SECONDS=60 # Reduce image size before sending to API # In receipt_parser.py, decrease max dimensions max_width = 1024 # Instead of 2048 max_height = 1024
Issue: Uploaded file not appearing in component
Root Cause: Frontend not polling or backend endpoint error
Debug steps:
# Check backend endpoint curl http://localhost:8000/api/recent-downloads # Check frontend console # Open browser DevTools → Console → look for errors # Verify file in Downloads folder ls -lth ~/Downloads/*.pdf | head -5
Key Files Reference
Backend Files
- Main parsing logicapp/services/receipt_parser.py
- AI engine integrationapp/services/receipt_engine.py
- REST API endpointsapp/api/receipt_endpoints.py
- Data modelsapp/models/receipt_models.py
- Metadata storageapp/repositories/receipt_metadata.py
- Expense storageapp/repositories/expenses.py
- Configuration settingsapp/config.py
Frontend Files
- Upload UI component/home/adamsl/planner/office-assistant/js/upload-component.js
- Main application/home/adamsl/planner/office-assistant/js/app.js
- Category selection/home/adamsl/planner/office-assistant/js/category-picker.js
Test Files
- Receipt processing teststests/test_receipt_processing.py
- API endpoint teststests/test_receipt_items_api.py
- Integration teststest_receipt_api.py
Examples
Example 1: Scan and Categorize a Receipt (Web Interface)
User request:
I want to scan my Meijer receipt and categorize the groceries
You would:
-
Direct user to the receipt scanner:
Open http://localhost:8080/receipt-scanner.html in your browser -
Guide the workflow:
- Upload: Drag and drop the receipt image or click to browse
- Wait: Receipt parses automatically (gemini-2.5-flash model)
- Review: Check the parsed line items in the table
- Categorize: Select category for each item you want to track
- Click category dropdown for each item
- Select appropriate category (e.g., "Groceries > Dairy")
- Item saves immediately to database
- Optional: Click "Save Expense" to confirm completion
-
Verify in database:
- Only categorized items are saved
- Each item is a separate expense entry
- Uncategorized items are ignored
-
View in Daily Expense Categorizer:
- Navigate to
http://localhost:8080/daily_expense_categorizer.html - Select the month from dropdown
- Select the date
- See all saved receipt items
- Can re-categorize if needed
- Navigate to
Example 2: Parse a Grocery Receipt (API)
User request:
Parse this grocery receipt via API and extract all items with prices
You would:
-
Verify API server is running:
ps aux | grep api_server.py # If not running: python3 api_server.py -
Parse the receipt:
curl -X POST "http://localhost:8080/api/parse-receipt" \ -F "file=@grocery_receipt.jpg" | jq '.' -
Review the output:
- Check
array for all productsitems[] - Verify
matches receipttotals.total_amount - Note the
for saving latertemp_file_name - Note: Nothing is saved to database yet
- Check
-
If items are missing:
- Open the receipt image and compare
- Check if image quality is sufficient
- Look for items at bottom or in different sections
Example 3: Debug OCR Misreading Prices
User request:
The receipt parser is reading $4.99 items as $9.99
You would:
-
Reproduce the issue:
curl -X POST "http://localhost:8000/api/parse-receipt" \ -F "file=@problem_receipt.jpg" > parsed_output.json # Compare parsed vs actual cat parsed_output.json | jq '.parsed_data.items[] | {description, unit_price}' -
Read the current Gemini prompt:
grep -A 30 "DIGIT CONFUSION PREVENTION" app/services/receipt_engine.py -
Enhance the prompt with specific 4 vs 9 rules:
# In receipt_engine.py, _get_prompt() method **CRITICAL: DIGIT 4 vs DIGIT 9**: - When you see what might be 4 or 9, examine the top of the digit - 4: Angular top, horizontal line going right - 9: Curved/circular top, like the letter "g" - Common grocery prices: $4.99, $14.99, NOT $9.99, $19.99 - If unsure, default to 4 for items under $10 -
Test with the problematic receipt:
# Restart server to load new prompt pkill -f api_server.py python3 api_server.py & # Re-test curl -X POST "http://localhost:8000/api/parse-receipt" \ -F "file=@problem_receipt.jpg" | jq '.parsed_data.items[].unit_price' -
Verify improvement and test with other receipts
Example 4: Add Custom Validation
User request:
Validate that line totals match quantity times price
You would:
-
Read the current endpoint code:
cat app/api/receipt_endpoints.py | grep -A 20 "parse_receipt_endpoint" -
Create a validation function:
# Add to receipt_endpoints.py def validate_receipt_math(parsed_data: ReceiptExtractionResult) -> List[str]: errors = [] for i, item in enumerate(parsed_data.items): expected_total = round(item.quantity * item.unit_price, 2) if abs(expected_total - item.line_total) > 0.01: errors.append( f"Item {i} '{item.description}': " f"{item.quantity} × ${item.unit_price} = ${expected_total}, " f"but line_total is ${item.line_total}" ) # Validate subtotal items_sum = sum(item.line_total for item in parsed_data.items) if abs(items_sum - parsed_data.totals.subtotal) > 0.50: errors.append( f"Items sum to ${items_sum:.2f} but subtotal is ${parsed_data.totals.subtotal:.2f}" ) # Validate final total calculated_total = ( parsed_data.totals.subtotal + (parsed_data.totals.tax_amount or 0) + (parsed_data.totals.tip_amount or 0) - (parsed_data.totals.discount_amount or 0) ) if abs(calculated_total - parsed_data.totals.total_amount) > 0.01: errors.append( f"Calculated total ${calculated_total:.2f} != stated total ${parsed_data.totals.total_amount:.2f}" ) return errors -
Integrate validation into endpoint:
@router.post("/parse-receipt", response_model=ParseReceiptResponse) async def parse_receipt_endpoint(file: UploadFile = File(...)): parser = get_receipt_parser() temp_file_name: Optional[str] = None try: parsed_data, temp_file_name = await parser.process_receipt(file) # Add validation validation_errors = validate_receipt_math(parsed_data) if validation_errors: # Log errors but still return the data print(f"Validation warnings: {validation_errors}") return ParseReceiptResponse(parsed_data=parsed_data, temp_file_name=temp_file_name) -
Test the validation:
# Use a receipt with known correct totals curl -X POST "http://localhost:8000/api/parse-receipt" \ -F "file=@test_receipt_good.jpg" # Use a receipt with deliberate errors (or mock the data) # Check logs for validation warnings tail -f api_server.log
Example 5: Integrate with Letta Agent
User request:
Make Letta able to scan and categorize receipts
You would:
-
Ensure this skill is available to Letta:
# Skill already in .claude/skills/receipt-scanner/ # Letta can invoke Claude Code skills via agent tool calls -
Create a Letta tool function:
# In letta_agent/tools/receipt_tools.py from typing import Optional import httpx @tool def scan_receipt(image_path: str) -> dict: """ Scan a receipt image and extract structured data. Args: image_path: Path to the receipt image file Returns: Dictionary with merchant, items, totals, and metadata """ with open(image_path, 'rb') as f: files = {'file': f} response = httpx.post( 'http://localhost:8000/api/parse-receipt', files=files, timeout=60.0 ) if response.status_code == 200: return response.json() else: return {'error': response.text} -
Register the tool with Letta agent:
# In hybrid_letta_persistent.py from letta_agent.tools.receipt_tools import scan_receipt agent = client.create_agent( name="finance_assistant", tools=[scan_receipt, ...], ... ) -
Test with Letta:
# Chat with Letta response = client.send_message( agent_id=agent.id, message="Scan the receipt at ~/Downloads/walmart_receipt.jpg and tell me the total" ) print(response)
Success Criteria
The skill is successful when:
- Receipts parse with >95% accuracy on item prices
- All line items are extracted (no missing items)
- Totals match within $0.01 tolerance
- Database integration works consistently
- Web interface provides clear feedback
- Common OCR issues have documented solutions
- Letta agents can successfully use receipt scanning
Tips for Users
- Start with high-quality images: Clear, well-lit, straight photos work best
- Test incrementally: Parse → validate → save (don't skip validation)
- Build validation suite: Collect problematic receipts and test regularly
- Monitor accuracy trends: Track OCR errors to identify patterns
- Update prompt iteratively: Add specific rules as you encounter issues
- Use streaming responses: Enable real-time feedback for better UX
- Backup original files: Keep original receipts even after successful parsing