Claude-skill-registry cua-cloud
Comprehensive guide for building Computer Use Agents with the CUA framework. This skill should be used when automating desktop applications, building vision-based agents, controlling virtual machines (Linux/Windows/macOS), or integrating computer-use models from Anthropic, OpenAI, or other providers. Covers Computer SDK (click, type, scroll, screenshot), Agent SDK (model configuration, composition), supported models, provider setup, and MCP integration.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cua-cloud" ~/.claude/skills/majiayu000-claude-skill-registry-cua-cloud && rm -rf "$T"
skills/data/cua-cloud/SKILL.mdCUA Framework
Overview
CUA ("koo-ah") is an open-source framework for building Computer Use Agents—AI systems that see, understand, and interact with desktop applications through vision and action. It supports Windows, Linux, and macOS automation.
Key capabilities:
- Vision-based UI automation via screenshot analysis
- Multi-platform desktop control (click, type, scroll, drag)
- 100+ LLM providers via LiteLLM integration
- Composed agents (grounding + planning models)
- Local and cloud execution options
Installation
# Computer SDK - desktop control pip install cua-computer # Agent SDK - autonomous agents pip install cua-agent[all] # MCP Server (optional) pip install cua-mcp-server
CLI Installation:
# macOS/Linux curl -LsSf https://cua.ai/cli/install.sh | sh # Windows powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
Computer SDK
Computer Class
from computer import Computer import os os.environ["CUA_API_KEY"] = "sk_cua-api01_..." computer = Computer( os_type="linux", # "linux" | "macos" | "windows" provider_type="cloud", # "cloud" | "docker" | "lume" | "windows_sandbox" name="sandbox-name" ) try: await computer.run() # Use computer.interface methods here finally: await computer.close()
Interface Methods
Screenshot:
screenshot = await computer.interface.screenshot()
Mouse Actions:
await computer.interface.left_click(x, y) # Left click at coordinates await computer.interface.right_click(x, y) # Right click await computer.interface.double_click(x, y) # Double click await computer.interface.move_cursor(x, y) # Move cursor without clicking await computer.interface.drag(x1, y1, x2, y2) # Click and drag
Keyboard Actions:
await computer.interface.type_text("Hello!") # Type text await computer.interface.key_press("enter") # Press single key await computer.interface.hotkey("ctrl", "c") # Key combination
Scrolling:
await computer.interface.scroll(direction, amount) # Scroll up/down/left/right
File Operations:
content = await computer.interface.read_file("/path/to/file") await computer.interface.write_file("/path/to/file", "content")
Clipboard:
text = await computer.interface.get_clipboard() await computer.interface.set_clipboard("text to copy")
Supported Actions (Message Format)
OpenAI-style:
- button: left/right/wheel/back/forward, x, y coordinatesClickAction
- same parameters as clickDoubleClickAction
- start and end coordinatesDragAction
- key nameKeyPressAction
- x, y coordinatesMoveAction
- no parametersScreenshotAction
- direction and amountScrollAction
- text stringTypeAction
- durationWaitAction
Anthropic-style:
- x, y coordinatesLeftMouseDownAction
- x, y coordinatesLeftMouseUpAction
Agent SDK
ComputerAgent Class
from agent import ComputerAgent agent = ComputerAgent( model="anthropic/claude-sonnet-4-5-20250929", tools=[computer], max_trajectory_budget=5.0 # Cost limit in USD ) messages = [{"role": "user", "content": "Open Firefox and go to google.com"}] async for result in agent.run(messages): for item in result["output"]: if item["type"] == "message": print(item["content"][0]["text"])
Response Structure
{ "output": [AgentMessage, ...], # List of messages "usage": { "prompt_tokens": int, "completion_tokens": int, "total_tokens": int, "response_cost": float } }
Message Types:
- Input from user/systemUserMessage
- Text output from agentAssistantMessage
- Agent thinking/summaryReasoningMessage
- Intent to perform actionComputerCallMessage
- Screenshot resultComputerCallOutputMessage
- Python tool invocationFunctionCallMessage
- Function resultFunctionCallOutputMessage
Supported Models
CUA VLM Router (Recommended)
model="cua/anthropic/claude-sonnet-4.5" # Recommended model="cua/anthropic/claude-haiku-4.5" # Faster, cheaper
Single API key, cost tracking, managed infrastructure.
Anthropic (BYOK)
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..." model="anthropic/claude-sonnet-4-5-20250929" model="anthropic/claude-haiku-4-5-20251001" model="anthropic/claude-opus-4-20250514" model="anthropic/claude-3-7-sonnet-20250219"
OpenAI (BYOK)
os.environ["OPENAI_API_KEY"] = "sk-..." model="openai/computer-use-preview"
Google Gemini
model="gemini-2.5-computer-use-preview-10-2025"
Local Models
model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B" model="ollama_chat/0000/ui-tars-1.5-7b"
Composed Agents
Combine grounding models with planning models:
model="huggingface-local/GTA1-7B+openai/gpt-4o" model="moondream3+openai/gpt-4o" model="omniparser+anthropic/claude-sonnet-4-5-20250929" model="omniparser+ollama_chat/mistral-small3.2"
Grounding Models: UI-TARS, GTA, Holo, Moondream, OmniParser, OpenCUA
Human-in-the-Loop
model="human/human" # Pause for user approval
Provider Types
Cloud (Recommended)
computer = Computer( os_type="linux", # linux, windows, macos provider_type="cloud", name="sandbox-name", api_key="sk_cua-api01_..." )
Get API key from cloud.trycua.com.
Docker (Local)
computer = Computer( os_type="linux", provider_type="docker" )
Images:
trycua/cua-xfce:latest, trycua/cua-ubuntu:latest
Lume (macOS Local)
computer = Computer( os_type="linux", provider_type="lume" )
Requires Lume CLI installation.
Windows Sandbox
computer = Computer( os_type="windows", provider_type="windows_sandbox" )
Requires
pywinsandbox and Windows Sandbox feature enabled.
MCP Integration
This project uses the CUA MCP Server for Claude Code integration:
{ "mcpServers": { "cua": { "type": "http", "url": "https://cua-mcp-server.vercel.app/mcp" } } }
MCP Tools Available
Sandbox Management:
- List all sandboxesmcp__cua__list_sandboxes
- Create VM (os, size, region)mcp__cua__create_sandboxmcp__cua__start/stop/restart/delete_sandbox
Task Execution:
- Autonomous task executionmcp__cua__run_task
- Vision analysis without actionmcp__cua__describe_screen
- Retrieve task resultsmcp__cua__get_task_history
Best Practices
Task Design
# Good - specific and sequential "Open Chrome, navigate to github.com, click the Sign In button" # Avoid - vague "Log into GitHub"
Error Recovery
async for result in agent.run(messages): if result.get("error"): # Take screenshot to understand state screenshot = await computer.interface.screenshot() # Retry with more specific instructions
Resource Management
try: await computer.run() # ... perform tasks finally: await computer.close() # Always cleanup
Cost Control
agent = ComputerAgent( model="cua/anthropic/claude-sonnet-4.5", max_trajectory_budget=5.0 # Stop at $5 spent )