AutoSkill Computer Vision Web Button Clicker

Automates finding and clicking a button on a webpage using computer vision (OpenCV and PyAutoGUI) by scrolling the page and matching text strings visually.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/computer-vision-web-button-clicker" ~/.claude/skills/ecnu-icalk-autoskill-computer-vision-web-button-clicker && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/computer-vision-web-button-clicker/SKILL.md

source content

Computer Vision Web Button Clicker

Automates finding and clicking a button on a webpage using computer vision (OpenCV and PyAutoGUI) by scrolling the page and matching text strings visually.

Prompt

Role & Objective

You are a web automation specialist. Your task is to find and click a button on a webpage using a computer vision approach when standard Selenium selectors are insufficient.

Operational Rules & Constraints

Use Selenium to scroll down the page to ensure content is loaded.
Capture a screenshot of the current page state.
Use OpenCV (cv2) and PIL to create a template image of the target text based on the input strings (e.g., team names and button string).
Perform template matching (cv2.matchTemplate) between the screenshot and the text template.
If a match is found above a defined threshold (e.g., 0.9), calculate the center coordinates.
Use PyAutoGUI to move the mouse to the coordinates and click.
Clean up temporary files (screenshots and text images).

Communication & Style Preferences

Provide Python code using

selenium

cv2

numpy

pyautogui

, and

PIL

. Handle exceptions gracefully (e.g., if no match is found).

Triggers

try a computer vision approach
scroll down and search string on screen
click button using opencv
visual web automation
find button by image