Claude-skill-registry Argentine Invoice Processing System

Complete invoice processing system for Argentine utility bills with OCR, classification, and automated organization

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/argentine-invoice-processing-system" ~/.claude/skills/majiayu000-claude-skill-registry-argentine-invoice-processing-system && rm -rf "$T"
manifest: skills/data/argentine-invoice-processing-system/SKILL.md
source content

Argentine Invoice Processing System

Overview

This skill enables automated processing, classification, and organization of Argentine utility invoices (electricity, water, gas, municipal services, etc.). The system extracts text from PDFs and images using OCR, identifies service providers, extracts dates, and organizes files into a structured year/month hierarchy with standardized naming.

What This Skill Does

  • OCR Processing: Extract text from PDFs, images (JPG, PNG) using Tesseract
  • Service Classification: Identify service providers (AYSA, Edenor, Metrogas, ARBA, etc.)
  • Date Extraction: Parse due dates from Spanish-language invoices
  • File Naming: Apply consistent naming convention:
    {service}_{YYYY-MM-DD}_{payment}.ext
  • Organization: Create year/month folder structure (e.g.,
    2025/03_marzo/
    )
  • Parallel Processing: Handle multiple files concurrently

When to Use This Skill

Use this skill when you need to:

  • Process and organize utility invoices
  • Extract dates from Spanish-language documents
  • Classify Argentine service providers
  • Troubleshoot OCR issues with Spanish text
  • Deploy or maintain the invoice processing system
  • Add new service providers or rules
  • Test invoice processing functionality

Quick Start

Basic Usage

# Run with default configuration
dotnet run --project FileContentRenamer/FileContentRenamer.csproj

# Process specific directory
dotnet run --project FileContentRenamer -- /path/to/invoices

Key Configuration

{
  "AppConfig": {
    "BasePath": "/Users/santibraida/Downloads/__comprobantes/servicios",
    "FileExtensions": [".pdf", ".jpg", ".png"],
    "TesseractLanguage": "spa+eng",
    "MaxDegreeOfParallelism": 4
  }
}

Detailed Documentation

This skill is organized into specialized sub-skills for different aspects of the system:

1. Invoice Processing (invoice-processing.md)

Core workflow and service provider directory. Start here for system overview.

Covers:

  • Complete processing workflow
  • Service provider catalog (AYSA, Edenor, Metrogas, etc.)
  • File naming conventions
  • Folder organization structure
  • Configuration reference

When to use: Understanding the overall system, adding new providers, configuring the app

2. OCR Troubleshooting (ocr-troubleshooting.md)

Diagnose and fix OCR issues with Tesseract.

Covers:

  • Common OCR problems and solutions
  • Spanish character recognition
  • Image quality requirements
  • Performance optimization
  • Tesseract configuration

When to use: OCR producing garbled text, missing content, or slow performance

3. Service Identification (service-identification.md)

Detailed patterns for each service provider.

Covers:

  • Provider-specific keywords and identifiers
  • Invoice characteristics and formats
  • Validation rules and amount ranges
  • Edge cases and conflicts
  • Seasonal patterns

When to use: Adding new providers, debugging misclassification, understanding provider specifics

4. Date Extraction (date-extraction.md)

Comprehensive date parsing patterns and validation.

Covers:

  • All regex patterns with priority order
  • Argentine date format handling
  • OCR error correction for dates
  • Multiple date scenarios
  • Validation and edge cases

When to use: Date extraction issues, adding new date patterns, understanding parsing logic

5. Testing Procedures (testing-procedures.md)

Complete testing guide from unit tests to production validation.

Covers:

  • Unit, integration, and E2E testing
  • Test data management
  • Manual testing procedures
  • Performance and regression testing
  • CI/CD integration

When to use: Writing tests, validating changes, testing new invoice types, quality assurance

6. Deployment Guide (deployment-guide.md)

Production deployment and maintenance handbook.

Covers:

  • Installation and setup
  • Production configuration
  • Monitoring and health checks
  • Backup and recovery
  • Troubleshooting common issues
  • Security considerations

When to use: Deploying to production, setting up scheduled runs, maintenance, troubleshooting

Common Workflows

Adding a New Service Provider

  1. Collect 3-5 sample invoices
  2. Read service-identification.md for pattern analysis
  3. Update
    appsettings.json
    with new rule
  4. Test with samples (see testing-procedures.md)
  5. Validate results and adjust keywords

Troubleshooting OCR Issues

  1. Check ocr-troubleshooting.md for your specific issue
  2. Verify Tesseract configuration and language packs
  3. Test image quality requirements
  4. Apply preprocessing if needed
  5. Check logs for detailed error messages

Fixing Date Extraction

  1. Review date-extraction.md for pattern priority
  2. Check if date format is supported
  3. Apply OCR error correction patterns
  4. Add new regex pattern if needed
  5. Validate with unit tests

Deploying to Production

  1. Follow deployment-guide.md installation steps
  2. Configure production settings
  3. Set up monitoring and logging
  4. Test with sample invoices
  5. Schedule automated runs (cron/launchd)

Architecture Overview

FileContentRenamer/
├── Program.cs                  # Entry point, configuration loading
├── Configuration/
│   └── ServiceConfiguration.cs # Dependency injection setup
├── Models/
│   ├── AppConfig.cs           # Configuration model
│   └── NamingRule.cs          # Service provider rules
└── Services/
    ├── FileService.cs         # Main orchestrator
    ├── PdfProcessor.cs        # PDF text extraction
    ├── ImageProcessor.cs      # OCR with Tesseract
    ├── TextProcessor.cs       # Plain text handling
    ├── DateExtractor.cs       # Date parsing
    ├── FilenameGenerator.cs   # Naming rules application
    ├── DirectoryOrganizer.cs  # Folder structure creation
    └── FileValidator.cs       # File validation

Supported Service Providers

ProviderTypeCode
AYSAWater & Sanitation
aysa
EdenorElectricity
edenor
MetrogasNatural Gas
metrogas
Municipality of QuilmesMunicipal Taxes
municipal_quilmes
ARBA InmobiliarioProperty Tax
arba_inmobiliario
ARBA AutomotorVehicle Tax
arba_automotor
Personal/FlowMobile/Internet
personal
Quilmes High SchoolSchool Tuition
high_school_cuota
Quilmes High SchoolSchool Lunch
high_school_comedor
GloriaDomestic Service
gloria

See service-identification.md for detailed information on each provider.

Key Features

Intelligent Date Extraction

Handles multiple date formats with priority order:

  1. Due date (abbreviated):
    Vto.:DD/MM/YYYY
  2. Due date (full):
    vencimiento DD/MM/YYYY
  3. Spanish format:
    DD de MONTH de YYYY
  4. Generic:
    DD/MM/YYYY

OCR with Error Correction

  • Automatic correction of common OCR mistakes (O→0, I→1, S→5)
  • Support for Spanish characters (ñ, á, é, í, ó, ú)
  • Multi-language support (Spanish + English)

Flexible Organization

Files organized into year/month structure:

servicios/
└── 2025/
    ├── 03_marzo/
    │   └── aysa_2025-03-21_santander.pdf
    └── 08_agosto/
        └── gloria_2025-08-08_mercadopago.jpeg

Parallel Processing

Configurable parallelism for faster processing of large batches while maintaining file safety with proper locking.

Configuration Reference

Key Settings

SettingDescriptionDefault
BasePath
Root directory to scan
.
FileExtensions
File types to process
[".pdf", ".jpg", ".png"]
IncludeSubdirectories
Scan subdirectories
true
TesseractLanguage
OCR languages
"spa+eng"
MaxDegreeOfParallelism
Concurrent files
4
ForceReprocessAlreadyNamed
Reprocess named files
false

See invoice-processing.md for complete configuration documentation.

Logging

Logs are written to:

  • Console: Real-time processing status
  • File:
    logs/app{YYYYMMDD}.log
    (daily rotation)

Log Levels

  • Information: Normal processing flow
  • Warning: Skipped files, no content found
  • Error: Processing failures, OCR errors
  • Debug: Detailed extraction and matching info

Performance

Typical performance (4-core system):

  • PDF: ~1-2 seconds per file
  • Image (OCR): ~3-5 seconds per file
  • Large batch (100 files): ~5-8 minutes

See deployment-guide.md for optimization tips.

Troubleshooting Quick Reference

IssueSeeQuick Fix
Garbled OCR textocr-troubleshooting.mdCheck image quality, verify
spa
language pack
Wrong service identifiedservice-identification.mdAdd more keywords, check priority
Date not founddate-extraction.mdVerify date format, check OCR quality
Files in wrong folderinvoice-processing.mdCheck
BasePath
configuration
High memory usagedeployment-guide.mdReduce
MaxDegreeOfParallelism

Testing

Run tests:

dotnet test FileContentRenamer.Tests/

See testing-procedures.md for comprehensive testing guide.

Updates and Maintenance

  • Weekly: Review logs for errors
  • Monthly: Update service provider rules if needed
  • Quarterly: Review and archive old invoices
  • Yearly: Update dependencies and .NET runtime

See deployment-guide.md for maintenance procedures.

Version History

  • v1.0.0 (2025-11-06): Initial skill documentation
    • Complete invoice processing system
    • Support for 10 service providers
    • OCR with Tesseract
    • Automated organization

Support

Related Technologies

  • .NET 9.0: Application framework
  • Tesseract OCR: Text extraction from images
  • Serilog: Structured logging
  • xUnit: Unit testing framework
  • ImageMagick: Image preprocessing (optional)

Best Practices

  1. Always backup before processing
  2. Test with samples before bulk processing
  3. Monitor logs for errors and warnings
  4. Keep configuration in version control (except production secrets)
  5. Update skills documentation when adding features

Getting Help

  1. Check the relevant detailed skill file for your issue
  2. Review logs for error messages
  3. Search existing GitHub issues
  4. Create new issue with:
    • Sample invoice (redacted)
    • Log excerpt
    • Configuration used
    • Expected vs actual behavior

Start with invoice-processing.md for the complete system overview, then dive into specific skill files as needed.