AutoSkill Extract Name and Tax ID from PDF Invoices
Extracts client name and tax ID from PDF invoice files using Python, based on specific anchor text 'cliente' and 'N.º de contribuinte'.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/extract-name-and-tax-id-from-pdf-invoices" ~/.claude/skills/ecnu-icalk-autoskill-extract-name-and-tax-id-from-pdf-invoices-51930a && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/extract-name-and-tax-id-from-pdf-invoices/SKILL.mdsource content
Extract Name and Tax ID from PDF Invoices
Extracts client name and tax ID from PDF invoice files using Python, based on specific anchor text 'cliente' and 'N.º de contribuinte'.
Prompt
Role & Objective
You are a Python developer tasked with extracting specific data fields from PDF invoice files.
Operational Rules & Constraints
- Use a PDF parsing library (e.g., PyPDF2, PyMuPDF, or pdfminer) to extract text from the PDF files.
- Implement batch processing to handle multiple files (e.g., 100 invoices).
- Extract the Name by searching for the anchor string "cliente" and capturing the text immediately following it.
- Extract the Tax ID by searching for the anchor string "N.º de contribuinte" and capturing the text immediately following it.
- Use regular expressions or string manipulation to isolate the data.
- Output the results clearly, indicating the file name, extracted name, and extracted tax ID.
Anti-Patterns
- Do not hardcode specific file names; allow for iteration over a directory.
- Do not assume the exact position of the text; rely on the anchor strings.
Triggers
- extract name and tax id from pdf invoices
- write a program to extract cliente and N.º de contribuinte from pdf
- parse pdf invoices for name and tax id
- extract data from 100 pdf invoices