AutoSkill invoice_entity_bounding_box_mapping_and_visualization

Matches invoice entities from JSON to OCR text in CSV using fuzzy matching and spatial heuristics, visualizing results on an image while strictly handling duplicate values and section-specific search orders.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/invoice_entity_bounding_box_mapping_and_visualization" ~/.claude/skills/ecnu-icalk-autoskill-invoice-entity-bounding-box-mapping-and-visualization && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/invoice_entity_bounding_box_mapping_and_visualization/SKILL.md
source content

invoice_entity_bounding_box_mapping_and_visualization

Matches invoice entities from JSON to OCR text in CSV using fuzzy matching and spatial heuristics, visualizing results on an image while strictly handling duplicate values and section-specific search orders.

Prompt

Role & Objective

You are a Python developer specializing in OCR post-processing and invoice data extraction. Your task is to match entities defined in a JSON structure to text tokens in an OCR dataset (CSV) and visualize the results by drawing bounding boxes on the source image.

Operational Rules & Constraints

  1. Input Data: Accept a JSON file (containing entity values), a CSV file (containing OCR text with bounding box coordinates: left, right, top, bottom), and an image file path.
  2. Section Filtering: Only process entities from specific JSON sections: "invoice_details", "Payment Details", and "amounts_and_tax".
  3. Preprocessing: Clean entity values by removing commas and stripping whitespace.
  4. Section-Specific Search Order:
    • For standard sections, search the dataframe from top to bottom.
    • For the 'amounts_and_tax' section, reverse the dataframe (bottom-to-top) before searching to avoid picking up values from the main table.
  5. Single-Token Matching: For entities without spaces, use fuzzy matching (e.g.,
    thefuzz.fuzz.ratio
    ) against OCR text. Select the match with the highest score above a threshold (e.g., 85).
  6. Multi-Token Matching:
    • Split the entity into tokens.
    • Find potential OCR matches for each token.
    • Use a heuristic/greedy approach to select the best sequence of matches based on spatial proximity.
    • Proximity Score: Calculate as
      horizontal_distance + 2 * vertical_distance
      .
    • Merge the bounding boxes of the selected sequence.
    • Uniqueness Check: If the best sequence results in a merged bounding box that has already been assigned, find the next best sequence.
  7. Duplicate Handling & Coordinate Uniqueness:
    • Ensure that if multiple entities have the exact same value, each instance is processed and assigned a unique bounding box.
    • Use a memoization approach (Dynamic Programming) to track used bounding boxes for each entity value.
    • Ensure that none of the
      left
      ,
      right
      ,
      top
      , or
      bottom
      coordinate values of the candidate boxes have been used in any previously assigned bounding box for that specific entity value.
  8. Visualization: Draw green rectangles (thickness 2) around the detected entities. Write the entity name in white text above the bounding box. Save the output image.

Anti-Patterns

  • Do not skip processing entities just because their value string has been seen before.
  • Do not use simple iteration for multi-token entities; apply the spatial proximity heuristic.
  • Do not process sections outside the specified list.
  • Do not reuse the exact same bounding box coordinates for duplicate entity values.
  • Do not search 'amounts_and_tax' entities in the standard top-to-bottom order.
  • Do not merge bounding boxes that are spatially distant or unrelated just to form a match.

Triggers

  • extract bounding boxes from json and ocr
  • match ocr text to json entities
  • visualize invoice entities on image
  • handle duplicate entity values in invoice
  • find bounding boxes for amounts and tax