git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/4ier/claw-use-android" ~/.claude/skills/openclaw-skills-claw-use-android && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/4ier/claw-use-android" ~/.openclaw/skills/openclaw-skills-claw-use-android && rm -rf "$T"
skills/4ier/claw-use-android/SKILL.mdClaw Use Android — Phone Control for AI Agents
Give your AI agent eyes, hands, and a voice on a real Android phone.
claw-use-android is an Android app + CLI (cua) that exposes HTTP endpoints for full phone control. No ADB, no root, no PC.
Setup
# Install the APK on your Android phone, enable Accessibility Service # Then register the device: cua add redmi 192.168.0.105 <token> cua ping
New in v2.0.0: Unified API
Three new endpoints replace the scattered old endpoints for AI agent workflows:
GET /screen — Semantic UI Tree
Returns elements with stable integer
ref IDs, semantic zone, and role annotations.
cua screen # full semantic UI tree (JSON) cua screen -c # compact: only interactive/text elements
Response:
{ "package": "com.android.settings", "elements": [ {"ref": 1, "text": "设置", "zone": "header"}, {"ref": 2, "text": "搜索", "zone": "header", "role": "button", "click": true}, {"ref": 3, "text": "WLAN", "zone": "content"} ] }
GET /snapshot — JPEG Screenshot
Returns a base64-encoded JPEG screenshot.
cua snapshot # save screenshot, print path cua snapshot 50 720 out.jpg # quality, maxWidth, output
POST /act — Unified Action Endpoint
All operations through a single entry point, using
ref IDs from /screen.
cua act '{"click": 3}' # click ref 3 cua act '{"click": "OK"}' # click by text (fallback) cua act '{"click": [1, 2, 3]}' # click refs in sequence cua act '{"tap": {"x": 540, "y": 960}}' cua act '{"type": "hello"}' # type into focused field cua act '{"type": {"ref": 3, "text": "hello"}}' # focus ref then type cua act '{"swipe": "up"}' # directional swipe cua act '{"scroll": "down"}' # scroll nearest scrollable cua act '{"back": true}' cua act '{"home": true}' cua act '{"recents": true}' cua act '{"longpress": 3}' # long press ref cua act '{"launch": "com.duolingo"}' # Multiple actions in one request: cua act '{"home": true, "back": true}'
Agent Workflow Pattern (screen → act loop)
# 1. Observe cua screen -c # get refs # 2. Act cua act '{"click": 5}' # click ref 5 # 3. Observe again cua screen -c # see result
Flow-First Principle
执行手机操作前,先读
(与本文件同目录)。flows.md
- 如果有匹配的 flow → 直接用
或批量脚本执行,跳过逐步推理/flow - 如果 flow 中有
断点 → 在该步读屏后由 agent 决策,然后继续{"screen":true} - 如果没有匹配 flow → 走 screen→act 循环,完成后沉淀新 flow 到
flows.md - 如果 flow 执行失败(超时、元素未找到等)→ 回退到 screen→act 循环继续完成任务,事后修正 flows.md
主动沉淀(必须执行): 完成任何多步操作后,立即审视刚才的步骤序列。如果发现可复用的模式(哪怕只是部分步骤),当场追加到
flows.md。不要等用户提醒。沉淀是 agent 的责任,不是用户的。
这样做的好处:
- 快:
在设备端 100ms 轮询执行,不经过 LLM/flow - 省 token:一个 flow 替代 5-10 轮 agent 推理
- 可积累:每次新场景都沉淀,agent 越用越快
Legacy CLI Reference (cua
)
cuaAll legacy endpoints remain supported alongside the new unified API.
Device Management
cua add <name> <ip> <token> # register device with alias cua devices # list all (with live status) cua use <name> # switch default device cua rm <name> # remove device cua -d <name> <command> # target specific device cua discover # scan LAN for devices (192.168.x.x:7333)
Perception — read the phone
cua screen # full UI tree (JSON) cua screen -c # compact: only interactive/text elements cua screenshot # save screenshot, print path cua screenshot 50 720 out.jpg # quality, maxWidth, output cua notifications # list all notifications cua status # health dashboard cua info # device model, screen size, permissions
Action — control the phone
cua tap <x> <y> # tap coordinates cua click <text> # tap element by visible text cua longpress <x> <y> # long press cua swipe up|down|left|right cua scroll up|down|left|right cua type "text" # type text (CJK supported) cua back # system back cua home # go home cua launch <package> # launch app cua launch # list all apps cua open <url> # open URL cua call <number> # phone call cua intent '<json>' # fire Android Intent
Audio
cua tts "hello" # speak through phone speaker cua say "你好" # alias
Device I/O (v1.7.0+)
cua clipboard # read clipboard cua clipboard "text" # write to clipboard cua camera [front|back] [quality] [output.jpg] # take photo cua volume # read all volumes cua volume media 10 # set media volume cua volume media up # adjust volume cua battery # battery status cua wifi # WiFi info cua location # GPS/network location cua vibrate [ms] # vibrate (default 200ms) cua contacts [search] # list/search contacts cua sms list [limit] # read SMS cua sms send <number> <message> # send SMS cua file list [path] # list directory cua file read <path> # read file cua file write <path> <content> # write file cua file delete <path> # delete file
Device State
cua wake # wake screen cua lock / cua unlock # lock/unlock (PIN required) cua config pin 123456 # remember lock screen PIN for auto-unlock cua config pattern 256398 # EXPERIMENTAL: pattern unlock (not yet verified)
Flow Engine — phone-side scripted automation
cua flow '{ "steps": [ {"wait": "继续安装", "then": "tap", "timeout": 10000}, {"wait": "继续更新", "then": "tap", "timeout": 10000}, {"wait": "完成", "then": "tap", "timeout": 60000, "optional": true} ] }'
Flow runs entirely on the phone with zero LLM calls. The device polls its accessibility tree at 100ms intervals and reacts instantly when the target element appears.
Step fields:
— text to find (case-insensitive partial match)wait
— resource ID to findwaitId
— content description to findwaitDesc
— wait for text to DISAPPEARwaitGone
— action:then
,tap
,click
,longpress
,back
,homenone
— per-step timeout in ms (default 10000)timeout
— if true, timeout doesn't fail the flowoptional
— pause after action before next step (default 500)pauseMs
Click with Retry
# Atomic find-and-tap: retries until element appears curl -X POST /click -d '{"text":"继续安装","retry":3,"retryMs":2000}'
Device Onboarding (New Device Setup)
Complete recipe for adding a new Android device from zero to fully operational.
Prerequisites (human must do once)
- Install APK on the device (download from GitHub Releases or LAN HTTP)
- Enable Accessibility Service: Settings → Accessibility → Claw Use → ON
- Note the auth token from the app notification or main screen
Step 1: Discover & Register
# Scan LAN for devices cua discover # Register with a friendly name cua add <name> <ip> <token> # Verify connectivity cua -d <name> ping cua -d <name> info
Step 2: Configure Auto-Unlock
# PIN unlock (recommended — proven reliable via a11y button tapping) cua -d <name> config pin <PIN> # Verify: lock then unlock cua -d <name> lock sleep 3 cua -d <name> unlock # Should show {"unlocked":true}
Important: Only PIN unlock is verified to work. Pattern unlock is experimental and unreliable — the accessibility gesture dispatch doesn't consistently hit the correct grid coordinates across different devices and screen sizes. If the device uses pattern lock, change it to PIN.
Step 3: MIUI/HyperOS Permissions (automated)
cua -d <name> setup-perms
This automates granting all 9 app permissions on MIUI devices: 位置, 相机, 麦克风, 照片和视频, 音乐和音频, 短信, 电话, 联系人, 日历
The command navigates through Settings → Apps → Claw Use → Permissions and clicks through each permission grant dialog.
If
fails (common on tablets with dual-pane layout), grant manually:setup-perms
- Open Settings → Apps → Manage Apps → search "Claw Use"
- Tap "App permissions" (应用权限)
- Enable each permission: prefer "始终允许" > "仅在使用中允许" > "允许"
Step 4: Background Survival (MIUI)
These settings prevent MIUI from killing the service:
# Navigate to app settings cua -d <name> intent '{"action":"android.settings.APPLICATION_DETAILS_SETTINGS","uri":"package:com.clawuse.android"}'
Then via a11y or manually ensure:
- 自启动 (Autostart): ON
- 省电策略 (Battery saver): 无限制 (No restrictions)
- 通知 (Notifications): 允许 (Allow)
- WLAN联网 (WiFi access): ON (if available)
Step 5: Verify Everything
cua -d <name> status # check a11y health, uptime, request count cua -d <name> screen -c # verify a11y tree works cua -d <name> screenshot 50 720 /tmp/verify.jpg # verify screenshot # Test auto-unlock end-to-end cua -d <name> lock sleep 3 cua -d <name> screen -c # should auto-unlock then return tree
Known Device-Specific Issues
MIUI Tablets (Xiaomi Pad 5, etc.):
- Settings uses dual-pane layout — left panel items NOT visible in a11y tree
- Must navigate through full Settings → Apps path instead of direct Intent
intent opens app LIST, not specific appAPPLICATION_DETAILS_SETTINGS
may need manual fallback for tablet layoutsetup-perms
MIUI Phones (Redmi K60 Ultra, etc.):
- ICP 备案 dialog may appear during APK install — click "继续安装"
- "仍然下载" confirmation in Chrome for HTTP downloads
- Chrome downloads don't auto-open APK — go to Downloads → tap the file icon (left side)
General Android:
- Notification Listener requires manual enable: Settings → 通知 → 设备和应用通知 → Claw Use
returns black image on lock screen (Android security)takeScreenshot()- Lock screen a11y tree requires
(added in v1.6.2)flagRetrieveInteractiveWindows
Self-Update (OTA via LAN)
Update a device to a new APK version without ADB:
# Serve APK on LAN (from the machine with the APK) cd /path/to/apk && python3 -m http.server 9090 & # On the device, open browser to download cua -d <name> intent '{"action":"android.intent.action.VIEW","uri":"http://<lan-ip>:9090/app.apk"}' # Or via browser navigation for MIUI browser: cua -d <name> click "浏览器" cua -d <name> click "搜索或输入网址" cua -d <name> type "http://<lan-ip>:9090/app.apk" # ... then handle download + install prompts # MIUI install flow (after APK opens in installer) cua -d <name> flow '{ "steps": [ {"wait": "继续安装", "then": "tap", "timeout": 15000}, {"wait": "已了解此应用未经安全检测", "then": "tap", "timeout": 10000, "optional": true}, {"wait": "继续更新", "then": "tap", "timeout": 15000} ] }' # Verify new version after service restart (~30s) sleep 30 cua -d <name> ping
UpdateReceiver: The app listens for
MY_PACKAGE_REPLACED broadcast and auto-restarts the service after update. No manual intervention needed after install completes.
Workflow Patterns
Navigate and interact (v2.0+ recommended)
cua act '{"launch": "org.telegram.messenger"}' cua screen -c cua act '{"click": "Search Chats"}' cua act '{"type": "John"}' cua act '{"click": "John"}'
Navigate and interact (legacy)
cua launch org.telegram.messenger cua screen -c cua click "Search Chats" cua type "John" cua click "John"
Visual + semantic perception
cua screen -c # what elements exist (structured, with refs) cua snapshot 50 720 /tmp/look.jpg # what it looks like (visual)
Prefer
over screen -c
for decision-making. Structured a11y data is faster to process, has exact coordinates, and provides ref IDs for snapshot
/act. Use snapshot only when visual context matters (images, colors, layout).
Handle locked device
Automatic — any command auto-unlocks if PIN is configured. No special handling needed.
MIUI APK Install (via /flow)
cua flow '{ "steps": [ {"wait": "继续安装", "then": "tap", "timeout": 15000}, {"wait": "已了解此应用未经安全检测", "then": "tap", "timeout": 10000, "optional": true}, {"wait": "继续更新", "then": "tap", "timeout": 10000} ] }'
Multi-device
cua add phone1 192.168.0.101 <token> cua add tablet 192.168.0.102 <token> cua -d phone1 say "hello from phone 1" cua -d tablet screenshot
Operational Lessons
DO
- Use
by text instead ofclick
by coordinates whenever text is visibletap - Use
as the primary perception tool — compact filters noisescreen -c - Use
for multi-step mechanical sequences — saves tokens, 100x faster than LLM-per-step/flow - Use
deep links for app navigation (e.g.,intent
)https://t.me/c/{id}/{topic}/{msg} - Use PIN unlock — proven 100% reliable via a11y button tapping
DON'T
- Don't use screenshot coordinates for tapping —
is scaled,screenshot?maxWidth=720
bounds are actual pixelsscreen - Don't try pattern unlock — coordinates vary by device/OS, no reliable way to locate the grid
- Don't rely on
whentap
can work — text-based is resolution-independentclick - Don't manually navigate app UIs when deep links exist — error-prone and slow
- Don't rapid-fire requests — allow 0.5-1s between actions for UI to settle
Architecture
┌─────────────────────────────────────────────┐ │ Android Device │ │ │ │ :http process main process │ │ ┌──────────────┐ ┌──────────────────┐ │ │ │ BridgeService│ HTTP │ AccessibilityBridge│ │ │ │ NanoHTTPD │─────→│ A11yInternalServer│ │ │ │ 0.0.0.0:7333│proxy │ 127.0.0.1:7334 │ │ │ └──────────────┘ └──────────────────┘ │ │ ↑ auth+CORS ↑ a11y service │ │ ↑ auto-unlock ↑ gesture dispatch │ │ ↑ config/status ↑ tree traversal │ └────────────────────────────────────────────── ┘ ↑ HTTP ┌────────────┐ │ Agent/CLI │ cua commands / curl └────────────┘
Family
| Platform | Package | CLI | Status |
|---|---|---|---|
| Android | claw-use-android | | ✅ Available |
| iOS | claw-use-ios | | 🔮 Planned |
| Windows | claw-use-windows | | 🔮 Planned |
| Linux | claw-use-linux | | 🔮 Planned |
| macOS | claw-use-mac | | 🔮 Planned |