Skills gui-setup
First-time setup for GUI Agency Pack — install dependencies for local (Mac/Linux) and remote (VM) operation.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/alfredjamesli/gui-claw/skills/gui-setup" ~/.claude/skills/openclaw-skills-gui-setup && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/alfredjamesli/gui-claw/skills/gui-setup" ~/.openclaw/skills/openclaw-skills-gui-setup && rm -rf "$T"
manifest:
skills/alfredjamesli/gui-claw/skills/gui-setup/SKILL.mdsource content
Setup — New Machine
Quick Start
git clone https://github.com/Fzkuji/GUI-Agent-Skills.git cd GUI-Agent-Skills bash scripts/setup.sh
Dependencies by Platform
macOS (local operation)
# Python venv python3 -m venv ~/gui-agent-env source ~/gui-agent-env/bin/activate # Core dependencies pip install pynput opencv-python pillow requests # GPA-GUI-Detector (UI element detection, ~40MB) pip install torch ultralytics git clone https://huggingface.co/Salesforce/GPA-GUI-Detector ~/GPA-GUI-Detector # Accessibility permissions required: # System Settings → Privacy & Security → Accessibility → Add Terminal / OpenClaw
Linux (local operation)
# System tools sudo apt install xdotool wmctrl xclip scrot # Python venv python3 -m venv ~/gui-agent-env source ~/gui-agent-env/bin/activate # Core dependencies pip install pyautogui opencv-python pillow requests
OpenClaw Configuration
Add to
~/.openclaw/openclaw.json:
{ "tools": { "exec": { "timeoutSec": 300 } }, "messages": { "queue": { "mode": "interrupt" } }, "skills": { "entries": { "gui-agent": { "enabled": true } } } }
: GUI operations (screenshot → detect → click → verify) can take timetimeoutSec: 300
: lets you abort long-running GUI operationsqueue.mode: "interrupt"
Key Scripts
| Script | Purpose |
|---|---|
| Detect local platform, print environment info |
| Unified GUI action interface — all operations go through here |
| OCR + YOLO UI element detection |
| Per-app visual memory (learn/detect components) |
gui_action.py Usage
source ~/gui-agent-env/bin/activate cd path/to/GUI-Agent-Skills # Local operations (default) python3 scripts/gui_action.py click 500 300 python3 scripts/gui_action.py type "hello" python3 scripts/gui_action.py screenshot /tmp/s.png # Remote VM operations python3 scripts/gui_action.py click 500 300 --remote http://VM_IP:5000 python3 scripts/gui_action.py type "hello" --remote http://VM_IP:5000
Models
| Model | Size | Purpose |
|---|---|---|
| GPA-GUI-Detector | 40MB | UI element detection (YOLO-based) |
Location:
~/GPA-GUI-Detector/model.pt
Path Conventions
- Venv:
~/gui-agent-env/ - Model:
~/GPA-GUI-Detector/model.pt - Memory:
<skill-dir>/memory/apps/<appname>/ - Actions:
<skill-dir>/actions/ - Backends:
<skill-dir>/scripts/backends/