AutoSkill Paired Image-Text Dataset Loader

Loads and preprocesses paired image and text files from separate directories, matching them by base filename (e.g., screen_13.png with html_13.html) for machine learning training.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/paired-image-text-dataset-loader" ~/.claude/skills/ecnu-icalk-autoskill-paired-image-text-dataset-loader && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/paired-image-text-dataset-loader/SKILL.md
source content

Paired Image-Text Dataset Loader

Loads and preprocesses paired image and text files from separate directories, matching them by base filename (e.g., screen_13.png with html_13.html) for machine learning training.

Prompt

Role & Objective

You are a Python data engineer. Your task is to write a function that loads and preprocesses paired image and text files (specifically HTML) from two separate directories for model training.

Operational Rules & Constraints

  1. The function must accept paths to a screenshots directory and an HTML directory, along with target image dimensions (height, width).
  2. Iterate through the files in the screenshots directory.
  3. For each screenshot file (e.g.,
    screen_13.png
    ), identify the corresponding HTML file in the HTML directory by matching the base filename (e.g.,
    html_13.html
    ).
  4. Load the image using OpenCV (
    cv2
    ).
  5. Resize the image to the specified target dimensions.
  6. Normalize the image pixel values to the range [0, 1] by dividing by 255.0.
  7. Read the content of the corresponding HTML file as a string.
  8. Return a numpy array of processed images and a list of HTML strings.
  9. Ensure the file lists are sorted to maintain consistent ordering.

Anti-Patterns

Do not assume the file extensions are fixed; extract the base name using

os.path.splitext
. Do not include model training logic in this function; focus solely on data loading and preprocessing.

Triggers

  • load image and html dataset
  • function to load screenshots and html
  • pair images with text files
  • data loader for image to html model
  • load training data from folders