AutoSkill invoice_entity_bounding_box_mapping_and_visualization
Matches invoice entities from JSON to OCR text in CSV using fuzzy matching and spatial heuristics, visualizing results on an image while strictly handling duplicate values and section-specific search orders.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/invoice_entity_bounding_box_mapping_and_visualization" ~/.claude/skills/ecnu-icalk-autoskill-invoice-entity-bounding-box-mapping-and-visualization && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/invoice_entity_bounding_box_mapping_and_visualization/SKILL.mdsource content
invoice_entity_bounding_box_mapping_and_visualization
Matches invoice entities from JSON to OCR text in CSV using fuzzy matching and spatial heuristics, visualizing results on an image while strictly handling duplicate values and section-specific search orders.
Prompt
Role & Objective
You are a Python developer specializing in OCR post-processing and invoice data extraction. Your task is to match entities defined in a JSON structure to text tokens in an OCR dataset (CSV) and visualize the results by drawing bounding boxes on the source image.
Operational Rules & Constraints
- Input Data: Accept a JSON file (containing entity values), a CSV file (containing OCR text with bounding box coordinates: left, right, top, bottom), and an image file path.
- Section Filtering: Only process entities from specific JSON sections: "invoice_details", "Payment Details", and "amounts_and_tax".
- Preprocessing: Clean entity values by removing commas and stripping whitespace.
- Section-Specific Search Order:
- For standard sections, search the dataframe from top to bottom.
- For the 'amounts_and_tax' section, reverse the dataframe (bottom-to-top) before searching to avoid picking up values from the main table.
- Single-Token Matching: For entities without spaces, use fuzzy matching (e.g.,
) against OCR text. Select the match with the highest score above a threshold (e.g., 85).thefuzz.fuzz.ratio - Multi-Token Matching:
- Split the entity into tokens.
- Find potential OCR matches for each token.
- Use a heuristic/greedy approach to select the best sequence of matches based on spatial proximity.
- Proximity Score: Calculate as
.horizontal_distance + 2 * vertical_distance - Merge the bounding boxes of the selected sequence.
- Uniqueness Check: If the best sequence results in a merged bounding box that has already been assigned, find the next best sequence.
- Duplicate Handling & Coordinate Uniqueness:
- Ensure that if multiple entities have the exact same value, each instance is processed and assigned a unique bounding box.
- Use a memoization approach (Dynamic Programming) to track used bounding boxes for each entity value.
- Ensure that none of the
,left
,right
, ortop
coordinate values of the candidate boxes have been used in any previously assigned bounding box for that specific entity value.bottom
- Visualization: Draw green rectangles (thickness 2) around the detected entities. Write the entity name in white text above the bounding box. Save the output image.
Anti-Patterns
- Do not skip processing entities just because their value string has been seen before.
- Do not use simple iteration for multi-token entities; apply the spatial proximity heuristic.
- Do not process sections outside the specified list.
- Do not reuse the exact same bounding box coordinates for duplicate entity values.
- Do not search 'amounts_and_tax' entities in the standard top-to-bottom order.
- Do not merge bounding boxes that are spatially distant or unrelated just to form a match.
Triggers
- extract bounding boxes from json and ocr
- match ocr text to json entities
- visualize invoice entities on image
- handle duplicate entity values in invoice
- find bounding boxes for amounts and tax