openclaw-pyautogui
Cross-platform mouse/keyboard automation skill. Supports mouse control (move/click/drag/scroll), keyboard control (key press/hotkeys/type text), screen operations (screenshots/mouse position/screen size), image utilities (metadata/crop), screen overlay markers, drawing markers on images, image locating (template matching + OCR), and file cleanup to free disk space. Activate when the user needs UI automation, screenshots, coordinate verification, image analysis/annotation, on-screen element locating, or cleanup.
git clone https://github.com/Ikaros-521/openclaw-pyautogui-skill
git clone --depth=1 https://github.com/Ikaros-521/openclaw-pyautogui-skill ~/.claude/skills/ikaros-521-openclaw-pyautogui-skill-openclaw-pyautogui
git clone --depth=1 https://github.com/Ikaros-521/openclaw-pyautogui-skill ~/.openclaw/skills/ikaros-521-openclaw-pyautogui-skill-openclaw-pyautogui
SKILL.mdPyAutoGUI Automation Skill
Cross-platform mouse/keyboard automation for Windows, Linux, and macOS.
Features
- Mouse control: move, click, drag, scroll
- Keyboard control: key press, hotkeys, type text
- Screen operations: screenshot, mouse position, screen size
- Image utilities: image metadata (size/format/file size), crop images
- Screen overlay: draw temporary markers to validate coordinates
- Draw on images: draw persistent markers into an image and save
- Image locating: template matching and OCR-based text locating
- Cleanup: remove generated screenshots/marked files to free disk space
Activation
Activate when the user asks to do things like:
- "Click a position on the screen"
- "Move the mouse to (x, y)"
- "Type text / press keys"
- "Take a screenshot"
- "Run repetitive UI automation"
- "Get the current mouse position"
- "Get image size / image info"
- "Crop an image"
- "Draw a marker on the screen"
- "Draw a marker on an image"
- "Locate an element by template"
- "Locate text on the screen (OCR)"
- "Clean up screenshots / temporary files"
Usage
Install dependencies
# Mouse/keyboard automation pip3 install pyautogui # Image utilities pip3 install Pillow
Screen info
# Screen size python3 scripts/keyboard_mouse.py screen_size # Mouse position python3 scripts/keyboard_mouse.py mouse_position
Mouse actions
# Move mouse to (x, y) python3 scripts/keyboard_mouse.py mouse_move 500 300 python3 scripts/keyboard_mouse.py mouse_move 500 300 --duration 1.0 # Mouse click (left/right/middle) python3 scripts/keyboard_mouse.py mouse_click left python3 scripts/keyboard_mouse.py mouse_click right python3 scripts/keyboard_mouse.py mouse_click middle --clicks 2 # Click at a specific location python3 scripts/keyboard_mouse.py mouse_click_at 500 300 left python3 scripts/keyboard_mouse.py mouse_click_at 500 300 right --clicks 2 # Double click python3 scripts/keyboard_mouse.py mouse_double_click 500 300 # Drag python3 scripts/keyboard_mouse.py mouse_drag 500 300 800 600 python3 scripts/keyboard_mouse.py mouse_drag 500 300 800 600 --duration 2.0 # Scroll (positive = up, negative = down) python3 scripts/keyboard_mouse.py mouse_scroll 5 python3 scripts/keyboard_mouse.py mouse_scroll -3
Keyboard actions
# Single key python3 scripts/keyboard_mouse.py key_press enter python3 scripts/keyboard_mouse.py key_press escape python3 scripts/keyboard_mouse.py key_press tab python3 scripts/keyboard_mouse.py key_press space # Hotkeys python3 scripts/keyboard_mouse.py key_hotkey ctrl c python3 scripts/keyboard_mouse.py key_hotkey ctrl v python3 scripts/keyboard_mouse.py key_hotkey win r python3 scripts/keyboard_mouse.py key_hotkey alt tab python3 scripts/keyboard_mouse.py key_hotkey ctrl alt t # Type text python3 scripts/keyboard_mouse.py type_text "Hello World" python3 scripts/keyboard_mouse.py type_text "你好世界" --interval 0.05
Screenshot
# Save a screenshot (primary screen) python3 scripts/keyboard_mouse.py screenshot /tmp/screenshot.png # Windows example python scripts/keyboard_mouse.py screenshot "E:\\temp\\screenshot.png"
Screenshot notes:
- Supported formats: PNG (recommended), JPG, BMP, etc.
- Scope: primary monitor (in multi-monitor setups)
Region Screenshot
# Screenshot specific region (x1, y1, x2, y2) python3 scripts/keyboard_mouse.py screenshot_region region.png 100 100 500 500 # Windows example - capture QQ chat window area python scripts/keyboard_mouse.py screenshot_region qq_window.png 2800 300 3800 1200
Parameters:
: Top-left corner coordinatesx1, y1
: Bottom-right corner coordinatesx2, y2- Order doesn't matter (automatically calculated)
Copy & Paste
# Copy text to clipboard python3 scripts/keyboard_mouse.py copy "Text to copy" # Paste from clipboard (Ctrl+V) python3 scripts/keyboard_mouse.py paste # Copy and paste in one command (fastest way to input text) python3 scripts/keyboard_mouse.py copy_paste "Text to input directly"
Use cases:
is faster thancopy_paste
for long texttype_text- Use
when you want to skip typing animationcopy_paste - Use
when you need to simulate realistic typingtype_text
Common key names
- Letters:
ab
...c - Numbers:
01
...2 - Function keys:
f1
...f2f12 - Modifiers:
ctrlaltshiftwin - Others:
enteresctabspacebackspacedeleteupdownleftright
Safety
⚠️ Important:
- Make sure the target window is focused before executing actions
- Be careful with system hotkeys to avoid unintended actions
- Add delays when needed to give yourself time to interrupt
- Moving the mouse to the top-left corner (0, 0) triggers PyAutoGUI failsafe
Cross-platform notes
- Windows: Full support; admin permission may be needed in some environments
- Linux: Requires X11; Wayland may not work
- macOS: Grant Accessibility permission to Terminal/Python in System Settings
Example scenarios
Open Calculator (Windows)
python3 scripts/keyboard_mouse.py key_hotkey win r python3 scripts/keyboard_mouse.py type_text "calc" python3 scripts/keyboard_mouse.py key_press enter
Auto-fill a form
python3 scripts/keyboard_mouse.py mouse_click_at 500 300 left python3 scripts/keyboard_mouse.py type_text "example@email.com" python3 scripts/keyboard_mouse.py key_press tab python3 scripts/keyboard_mouse.py type_text "password123"
Batch clicking
python3 scripts/keyboard_mouse.py mouse_click_at 100 100 left python3 scripts/keyboard_mouse.py mouse_click_at 200 200 left python3 scripts/keyboard_mouse.py mouse_click_at 300 300 left
Included scripts
- Mouse/keyboard controlscripts/keyboard_mouse.py
- Image utilitiesscripts/image_utils.py
- Screen overlay markersscripts/draw_overlay.py
- Draw markers on imagesscripts/draw_on_image.py
- Image locating (template + OCR)scripts/image_finder.py
- Cleanup toolscripts/cleanup.py
Image utilities
Image info
python3 scripts/image_utils.py info screenshot.png python3 scripts/image_utils.py size photo.jpg
Crop image
python3 scripts/image_utils.py crop screenshot.png 100 100 500 500 python3 scripts/image_utils.py crop screenshot.png 100 100 500 500 -o output.png
Output example
$ python3 scripts/image_utils.py info screenshot.png { "path": "screenshot.png", "filename": "screenshot.png", "size": { "width": 3840, "height": 2160 }, "format": "PNG", "mode": "RGB", "file_size_bytes": 2097152, "file_size_kb": 2048.0 }
Image fields
| Field | Meaning | Example |
|---|---|---|
| Image width (px) | 1920, 3840 |
| Image height (px) | 1080, 2160 |
| Image format | PNG, JPEG, GIF, BMP, WEBP |
| Color mode | RGB, RGBA, L |
| File size (bytes) | 2097152 |
| File size (KB) | 2048.0 |
Coordinate system
Screen coordinates:
- Origin (0, 0) is the top-left corner
- X increases to the right
- Y increases downward
Crop coordinates:
: top-left corner of cropx1, y1
: bottom-right corner of cropx2, y2- Cropped size = (x2 - x1) × (y2 - y1)
Example:
python3 scripts/image_utils.py crop screenshot.png 1520 880 1920 1080
Typical workflows
Analyze positions in a screenshot
python3 scripts/image_utils.py size screenshot.png python3 scripts/image_utils.py crop screenshot.png 3440 1960 3840 2160 -o bottom_right.png
Batch image sizing
for img in *.png; do echo -n "$img: " python3 scripts/image_utils.py size "$img" done
Capture a region of the screen
python3 scripts/keyboard_mouse.py screenshot full.png python3 scripts/image_utils.py crop full.png 500 300 1000 800 -o region.png
Screen overlay markers
Draw temporary markers on the screen for coordinate verification. Useful for:
- Calibrating coordinates
- Confirming the real position of a button/element
- Debugging automation scripts
Draw a marker
python3 scripts/draw_overlay.py marker cross 500 300 python3 scripts/draw_overlay.py marker target 800 600 --duration 10 python3 scripts/draw_overlay.py marker circle 500 300 --color blue --text "Send button" python3 scripts/draw_overlay.py marker arrow 1000 800 --direction down --color yellow python3 scripts/draw_overlay.py marker square 600 400 --color green --size 40
Draw a rectangular area
python3 scripts/draw_overlay.py area 3028 276 3832 2098 --label "Window" --duration 8 python3 scripts/draw_overlay.py area 3744 2062 3832 2098 --label "Send button" --color red
Marker types
| Type | Description | Use case |
|---|---|---|
| Crosshair | Precise single-point targeting |
| Circle | Mark buttons/circular elements |
| Square | Mark rectangular elements |
| Arrow | Indicate direction / draw attention |
| Target | Strongest visual cue (circle + crosshair) |
Colors
red, green, blue, yellow, cyan, magenta, white, orange
Coordinate calibration example
python3 scripts/keyboard_mouse.py screenshot screen.png python3 scripts/draw_overlay.py marker target 3788 2080 --text "Send button" --duration 10 python3 scripts/draw_overlay.py marker target 3790 2090 --text "Send button (adjusted)" --duration 10 python3 scripts/keyboard_mouse.py mouse_click_at 3790 2090 left
Draw markers on images
Draw persistent markers into image files. Useful for:
- Annotating recognized positions on a screenshot
- Producing reference images
- Batch marking candidates for comparison
- Keeping calibration records
Draw a marker
python3 scripts/draw_on_image.py screenshot.png marker cross 500 300 python3 scripts/draw_on_image.py screenshot.png marker target 800 600 -o marked.png python3 scripts/draw_on_image.py screenshot.png marker circle 500 300 --color red --text "Send button" python3 scripts/draw_on_image.py screenshot.png marker arrow 1000 800 --direction down --color yellow python3 scripts/draw_on_image.py screenshot.png marker point 600 400 --color green --size 10
Draw a rectangular area
python3 scripts/draw_on_image.py screenshot.png area 3028 276 3832 2098 --label "Window" python3 scripts/draw_on_image.py screenshot.png area 3744 2062 3832 2098 -o button_marked.png --label "Send button"
Batch marking workflow
python3 scripts/keyboard_mouse.py screenshot screen.png python3 scripts/draw_on_image.py screen.png marker target 3788 2080 --text "Send button" -o step1.png python3 scripts/draw_on_image.py step1.png marker target 3790 2090 --text "Adjusted" -o step2.png python3 scripts/draw_on_image.py step2.png marker circle 3000 1500 --text "Avatar area" -o final.png
Screen overlay vs drawing on image
| Item | Screen overlay (draw_overlay.py) | Draw on image (draw_on_image.py) |
|---|---|---|
| Display | Real-time on screen | Inside the image file |
| Duration | Temporary | Persistent |
| Interaction | Auto-close (time) | No interaction |
| Best for | Real-time coordinate validation | Generating annotated references |
| Output | Not saved | Saved to file |
Recommended coordinate calibration (cost-saving)
python3 scripts/keyboard_mouse.py screenshot screen.png python3 scripts/image_utils.py size screen.png python3 scripts/draw_on_image.py screen.png marker target 3788 2080 --text "Candidate 1" -o marked1.png python3 scripts/draw_on_image.py screen.png marker target 3790 2090 --text "Candidate 2" -o marked2.png python3 scripts/draw_on_image.py screen.png marker target 3785 2085 --text "Candidate 3" -o marked3.png python3 scripts/draw_overlay.py marker target 3790 2090 --duration 3 python3 scripts/keyboard_mouse.py mouse_click_at 3790 2090 left
Image locating
Built on OpenCV template matching and RapidOCR. Supports locating UI elements by image and by text.
Install dependencies
pip install opencv-python numpy rapidocr_onnxruntime
Note: RapidOCR models are ~15MB and are downloaded automatically on first use.
Template matching (find by image)
python3 scripts/image_finder.py image button.png python3 scripts/image_finder.py image button.png --all python3 scripts/image_finder.py image button.png --threshold 0.95 python3 scripts/image_finder.py image button.png --mark python3 scripts/image_finder.py image button.png --click
Output example:
✅ Match found: position (3788, 2080), similarity: 98.50%
OCR text locating (find by text)
python3 scripts/image_finder.py text "Send" python3 scripts/image_finder.py text "OK" --click python3 scripts/image_finder.py text "Send" --mark-on-image checked.png python3 scripts/image_finder.py text-all python3 scripts/image_finder.py text "Login" --confidence 0.9
Output example:
✅ Found 2 candidates containing 'Send': [1] Text: 'Send', position: (3788, 2080), confidence: 95% [2] Text: 'Send to all', position: (2100, 1500), confidence: 88%
Recommended automation workflows
Template matching (most accurate):
python3 scripts/image_finder.py image qq_send_button.png --threshold 0.9 python3 scripts/draw_on_image.py marker screen.png target 3788 2080 --text "Candidate 1" -o check1.png python3 scripts/keyboard_mouse.py mouse_click_at 3788 2080 left
OCR text locating (when no template is available):
python3 scripts/image_finder.py text "Send" python3 scripts/keyboard_mouse.py mouse_click_at 3548 1462 left
Important principle:
- OCR returns accurate screen coordinates; do not modify the returned coordinates
- If there are multiple candidates, mark them on an image to visually choose the correct one
- Once you choose the right candidate, click using the original coordinates
Template matching vs OCR
| Item | Template matching | OCR text locating |
|---|---|---|
| Accuracy | ⭐⭐⭐⭐⭐ pixel-level | ⭐⭐⭐⭐ depends on font/background |
| Speed | ⭐⭐⭐⭐⭐ milliseconds | ⭐⭐⭐ requires inference |
| Dependencies | OpenCV | RapidOCR |
| Best for | Icons/buttons/fixed UI | Text buttons/labels/inputs |
Why this is better than guessing coordinates
- High precision and repeatability (pixel-level)
- Local compute with no API cost
- Fast response
- Easy to debug via marked outputs
Workflow Patterns
Standard workflow patterns for reliable UI automation. These patterns combine the tools above into proven strategies.
Core Principles
- Step-by-step confirmation: Pause, check, and adjust at each step
- Human-friendly: Support manual intervention at any time
- Failure recovery: Retry or skip on failure
- Flexible composition: Combine strategies as needed
Pattern 1: Locate-Verify-Click
Workflow:
1. Screenshot / region screenshot 2. OCR/image finder to locate target 3. Mark all candidates on image 4. AI/human selects the correct one 5. Draw overlay on screen to confirm position 6. Click (or manual click)
Use cases: Button clicks, menu selections
Command sequence:
# Region screenshot to reduce analysis scope python3 scripts/keyboard_mouse.py screenshot_region check.png 2800 300 3800 1200 # OCR to find candidates python3 scripts/image_finder.py text "Send" --mark-on-image candidates.png # Overlay confirmation on screen python3 scripts/draw_overlay.py marker target 3548 1462 --duration 3 # Click after confirmation python3 scripts/keyboard_mouse.py mouse_click_at 3548 1462 left
Pattern 2: Search-Input-Verify
Workflow:
1. Check if target already exists 2. If not, open search 3. Type search keyword 4. Screenshot to verify search results 5. Click the correct result 6. Confirm target appeared
Use cases: Finding contacts, search functions
Command sequence:
# 1. Check if exists python3 scripts/image_finder.py text "John" --mark-on-image check1.png # 2. Click search box (estimated or located) python3 scripts/keyboard_mouse.py mouse_click_at 2950 250 left # 3. Quick input via clipboard python3 scripts/keyboard_mouse.py copy_paste "John" # 4. Verify results python3 scripts/keyboard_mouse.py screenshot_region result.png 2800 400 3200 800 python3 scripts/image_finder.py text "John" --mark-on-image check2.png # 5. Click result python3 scripts/keyboard_mouse.py mouse_click_at 3000 600 left # 6. Confirm target appeared python3 scripts/image_finder.py text "John" --mark-on-image verify.png
Pattern 3: Form Filling
Workflow:
1. Locate first field 2. Click input box 3. Enter value (copy_paste) 4. Locate next field (Tab or click) 5. Repeat until all fields complete 6. Screenshot to confirm 7. Click submit
Use cases: Form filling, configuration settings
Command sequence:
# Process each field python3 scripts/keyboard_mouse.py mouse_click_at 1000 500 left # Field 1 python3 scripts/keyboard_mouse.py copy_paste "value1" python3 scripts/keyboard_mouse.py mouse_click_at 1000 600 left # Field 2 python3 scripts/keyboard_mouse.py copy_paste "value2" # Or use Tab to navigate python3 scripts/keyboard_mouse.py key_press tab python3 scripts/keyboard_mouse.py copy_paste "value3" # Final verification and submit python3 scripts/keyboard_mouse.py screenshot verify.png python3 scripts/keyboard_mouse.py mouse_click_at 1200 800 left # Submit button
Pattern 4: Regional Precise Location
Workflow:
1. Roughly locate region (e.g., know QQ is on the right) 2. Region screenshot to narrow scope 3. Precise locate within small region 4. Calculate actual screen coord = region top-left + relative coord 5. Click
Use cases: When you know approximate position but need precision
Command sequence:
# 1. Region screenshot (QQ window area) python3 scripts/keyboard_mouse.py screenshot_region qq_area.png 2800 200 3840 1500 # 2. Find within small region python3 scripts/image_finder.py text "Send" --mark-on-image local_find.png # Returns relative coords like (500, 1300) # 3. Calculate actual coords # x = 2800 + 500 = 3300 # y = 200 + 1300 = 1500 # 4. Click python3 scripts/keyboard_mouse.py mouse_click_at 3300 1500 left
General Strategies
Strategy A: Progressive Confirmation
Instead of executing all at once, confirm at each step:
User: Click the send button for me AI: 1. Screenshot 2. OCR find "Send" 3. Mark candidates for you to see 4. Ask: Is candidate 2 correct? 5. After your confirmation, draw overlay for 3 seconds 6. Click only when you say "go"
Strategy B: Failure Retry
Handle step failures:
- Can't find target → Expand region and retry - Multiple matches → Mark all and let user choose - Click not working → Check if window is focused - OCR failed → Switch to template matching
Strategy C: Human-Machine Collaboration
Machine does what it's good at, human makes decisions:
Machine: Screenshot, locate, mark, execute clicks Human: Judge selections, confirm positions, handle exceptions
Common Scenarios
QQ Message Sending:
1. Check if QQ window exists 2. Focus QQ window 3. Check if contact is already open 4. Search for contact (if not open) 5. Type message 6. Find and click Send button 7. Verify message appears in chat
Web Form:
1. Locate form area (scroll if needed) 2. For each field: - Find label - Click input box - copy_paste value - Tab or click next 3. Screenshot to confirm 4. Click submit 5. Wait and verify result
Best Practices
-
Prefer region screenshots over full screen
- Reduces analysis time
- Improves OCR accuracy
- Reduces false matches
-
Mark multiple candidates with
--mark-on-image- Let user/AI choose
- Avoid clicking wrong positions blindly
-
Use copy_paste instead of type_text for long text
- Faster input
- Avoids Chinese input issues
- Good for fixed content
-
Offset clicking when you know base position
- Base coord + offset = target coord
- Works for UIs with fixed relative positions
-
Screenshot at each important step
- Easier to debug
- Can review the process
- Supports post-analysis
Error Handling
| Problem | Solution |
|---|---|
| Can't find target | Expand region, lower confidence threshold, try different keywords |
| Multiple matches | Mark all candidates, let human choose |
| Click not working | Check window focus, add delay, retry |
| OCR wrong | Switch to template matching, increase confidence |
| Coordinate offset | Use relative coordinates, calibrate base points |
Cleanup
Analyze disk usage
python3 scripts/cleanup.py analyze .
Clean files
python3 scripts/cleanup.py clean . --days 7 python3 scripts/cleanup.py clean . --days 7 --execute python3 scripts/cleanup.py clean . --size 1024 --execute python3 scripts/cleanup.py clean . --execute
Auto cleanup
python3 scripts/cleanup.py auto . --max-files 50 --max-size 100 python3 scripts/cleanup.py auto . --max-files 20 --max-size 50
End-to-end example
python3 scripts/keyboard_mouse.py screenshot screen.png python3 scripts/draw_on_image.py marker screen.png target 500 300 --text "Button" -o marked.png python3 scripts/cleanup.py analyze . python3 scripts/cleanup.py clean . --days 1 --execute python3 scripts/cleanup.py auto . --max-files 10 --max-size 50
Command quick reference
Mouse/keyboard (keyboard_mouse.py
)
keyboard_mouse.py| Command | Description | Example |
|---|---|---|
| Get screen size | |
| Get mouse position | |
| Move mouse | |
| Click mouse | |
| Click at coordinates | |
| Double click | |
| Drag | |
| Scroll | |
| Press key | |
| Hotkey | |
| Type text | |
| Screenshot | |
Image utilities (image_utils.py
)
image_utils.py| Command | Description | Example |
|---|---|---|
| Full image info | |
| Image size only | |
| Crop image | |
Screen overlay (draw_overlay.py
)
draw_overlay.py| Command | Description | Example |
|---|---|---|
| Draw marker | |
| Draw rectangle | |
Draw on image (draw_on_image.py
)
draw_on_image.py| Command | Description | Example |
|---|---|---|
| Draw marker on image | |
| Draw rectangle on image | |
Image finder (image_finder.py
)
image_finder.py| Command | Description | Example |
|---|---|---|
| Find by template | |
| Find by text (OCR) | |
| Recognize all text | |
Cleanup (cleanup.py
)
cleanup.py| Command | Description | Example |
|---|---|---|
| Analyze disk usage | |
| Clean files | |
| Auto cleanup | |