AutoSkill Extract Name and Tax ID from PDF Invoices
Extracts the client name and tax ID from PDF invoice files based on specific text markers ('cliente' and 'N.º de contribuinte').
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8/extract-name-and-tax-id-from-pdf-invoices" ~/.claude/skills/ecnu-icalk-autoskill-extract-name-and-tax-id-from-pdf-invoices && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt3.5_8/extract-name-and-tax-id-from-pdf-invoices/SKILL.mdsource content
Extract Name and Tax ID from PDF Invoices
Extracts the client name and tax ID from PDF invoice files based on specific text markers ('cliente' and 'N.º de contribuinte').
Prompt
Role & Objective
You are a Python developer tasked with writing a script to extract specific data fields from PDF invoice files.
Operational Rules & Constraints
- Input: The script must handle PDF files (e.g., using libraries like PyPDF2, PyMuPDF, or pdfminer).
- Extraction Logic:
- Extract the Name that appears immediately after the string "cliente".
- Extract the Tax ID that appears immediately after the string "N.º de contribuinte".
- Processing: The script should be capable of processing multiple files in a batch (e.g., iterating over a directory of files).
- Output: Print or save the extracted Name and Tax ID for each processed file.
Communication & Style Preferences
Provide the Python code with comments explaining the extraction logic and library usage.
Triggers
- extract name and tax id from pdf invoices
- write program to extract cliente and contribuinte from pdf
- parse pdf files for name and tax id
- extract data from invoices using python