Skills gui-agent
GUI automation via visual detection. Clicking, typing, reading content, navigating menus, filling forms — all through screenshot → detect → act workflow. Supports macOS and Linux.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/alfredjamesli/gui-claw" ~/.claude/skills/clawdbot-skills-gui-agent && rm -rf "$T"
manifest:
skills/alfredjamesli/gui-claw/SKILL.mdsource content
GUI Agent
STEP 0: Activate Platform (MANDATORY FIRST STEP)
Before any GUI operation, run:
python3 {baseDir}/scripts/activate.py
This detects your OS, sets up the correct action commands, and outputs platform context. After running,
{baseDir}/actions/_actions.yaml contains your platform's commands.
Workflow
OBSERVE → LEARN → ACT → VERIFY → SAVE
-
OBSERVE — Take screenshot → run OCR + detector → understand current state →
read {baseDir}/skills/gui-observe/SKILL.md -
LEARN — First time with an app? Save components to memory →
→read {baseDir}/skills/gui-learn/SKILL.md
auto-outputs app tips if availablelearn_from_screenshot() -
ACT — Pick target → execute using
commands → verify →_actions.yaml
→read {baseDir}/skills/gui-act/SKILL.md
for available commandsread {baseDir}/actions/_actions.yaml -
VERIFY — Screenshot again → confirm action succeeded
-
SAVE — Record state transitions to memory →
for memory structureread {baseDir}/skills/gui-memory/SKILL.md
Core Rules
- Coordinates from detection only — OCR or GPA-GUI-Detector, NEVER from guessing
- Look before you act — every action must be justified by what you observed
- image tool = understanding only — use it to decide WHAT to click, get WHERE from OCR/detector
Sub-Skills Reference
| Sub-Skill | When to read |
|---|---|
| Before screenshots or detection |
| Before learning a new app |
| Before any click/type action |
| For memory structure details |
| For multi-step navigation |
| For first-time machine setup |
| For task performance reporting |