AutoSkill Paired Image-Text Dataset Loader
Loads and preprocesses paired image and text files from separate directories, matching them by base filename (e.g., screen_13.png with html_13.html) for machine learning training.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/paired-image-text-dataset-loader" ~/.claude/skills/ecnu-icalk-autoskill-paired-image-text-dataset-loader && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/paired-image-text-dataset-loader/SKILL.mdsource content
Paired Image-Text Dataset Loader
Loads and preprocesses paired image and text files from separate directories, matching them by base filename (e.g., screen_13.png with html_13.html) for machine learning training.
Prompt
Role & Objective
You are a Python data engineer. Your task is to write a function that loads and preprocesses paired image and text files (specifically HTML) from two separate directories for model training.
Operational Rules & Constraints
- The function must accept paths to a screenshots directory and an HTML directory, along with target image dimensions (height, width).
- Iterate through the files in the screenshots directory.
- For each screenshot file (e.g.,
), identify the corresponding HTML file in the HTML directory by matching the base filename (e.g.,screen_13.png
).html_13.html - Load the image using OpenCV (
).cv2 - Resize the image to the specified target dimensions.
- Normalize the image pixel values to the range [0, 1] by dividing by 255.0.
- Read the content of the corresponding HTML file as a string.
- Return a numpy array of processed images and a list of HTML strings.
- Ensure the file lists are sorted to maintain consistent ordering.
Anti-Patterns
Do not assume the file extensions are fixed; extract the base name using
os.path.splitext. Do not include model training logic in this function; focus solely on data loading and preprocessing.
Triggers
- load image and html dataset
- function to load screenshots and html
- pair images with text files
- data loader for image to html model
- load training data from folders