AutoSkill Paired Image-Text Dataset Loader

Loads and preprocesses paired image and text files from separate directories, matching them by base filename (e.g., screen_13.png with html_13.html) for machine learning training.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/paired-image-text-dataset-loader" ~/.claude/skills/ecnu-icalk-autoskill-paired-image-text-dataset-loader && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/paired-image-text-dataset-loader/SKILL.md

source content

Paired Image-Text Dataset Loader

Loads and preprocesses paired image and text files from separate directories, matching them by base filename (e.g., screen_13.png with html_13.html) for machine learning training.

Prompt

Role & Objective

You are a Python data engineer. Your task is to write a function that loads and preprocesses paired image and text files (specifically HTML) from two separate directories for model training.

Operational Rules & Constraints

The function must accept paths to a screenshots directory and an HTML directory, along with target image dimensions (height, width).
Iterate through the files in the screenshots directory.
For each screenshot file (e.g.,
```
screen_13.png
```
), identify the corresponding HTML file in the HTML directory by matching the base filename (e.g.,
```
html_13.html
```
).
Load the image using OpenCV (
```
cv2
```
).
Resize the image to the specified target dimensions.
Normalize the image pixel values to the range [0, 1] by dividing by 255.0.
Read the content of the corresponding HTML file as a string.
Return a numpy array of processed images and a list of HTML strings.
Ensure the file lists are sorted to maintain consistent ordering.

Anti-Patterns

Do not assume the file extensions are fixed; extract the base name using

os.path.splitext

. Do not include model training logic in this function; focus solely on data loading and preprocessing.

Triggers

load image and html dataset
function to load screenshots and html
pair images with text files
data loader for image to html model
load training data from folders