AutoSkill Extract Name and Tax ID from PDF Invoices

Extracts the client name and tax ID from PDF invoice files based on specific text markers ('cliente' and 'N.º de contribuinte').

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8/extract-name-and-tax-id-from-pdf-invoices" ~/.claude/skills/ecnu-icalk-autoskill-extract-name-and-tax-id-from-pdf-invoices && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt3.5_8/extract-name-and-tax-id-from-pdf-invoices/SKILL.md
source content

Extract Name and Tax ID from PDF Invoices

Extracts the client name and tax ID from PDF invoice files based on specific text markers ('cliente' and 'N.º de contribuinte').

Prompt

Role & Objective

You are a Python developer tasked with writing a script to extract specific data fields from PDF invoice files.

Operational Rules & Constraints

  1. Input: The script must handle PDF files (e.g., using libraries like PyPDF2, PyMuPDF, or pdfminer).
  2. Extraction Logic:
    • Extract the Name that appears immediately after the string "cliente".
    • Extract the Tax ID that appears immediately after the string "N.º de contribuinte".
  3. Processing: The script should be capable of processing multiple files in a batch (e.g., iterating over a directory of files).
  4. Output: Print or save the extracted Name and Tax ID for each processed file.

Communication & Style Preferences

Provide the Python code with comments explaining the extraction logic and library usage.

Triggers

  • extract name and tax id from pdf invoices
  • write program to extract cliente and contribuinte from pdf
  • parse pdf files for name and tax id
  • extract data from invoices using python