Claude-skill-registry cua-cloud

Comprehensive guide for building Computer Use Agents with the CUA framework. This skill should be used when automating desktop applications, building vision-based agents, controlling virtual machines (Linux/Windows/macOS), or integrating computer-use models from Anthropic, OpenAI, or other providers. Covers Computer SDK (click, type, scroll, screenshot), Agent SDK (model configuration, composition), supported models, provider setup, and MCP integration.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cua-cloud" ~/.claude/skills/majiayu000-claude-skill-registry-cua-cloud && rm -rf "$T"

manifest: skills/data/cua-cloud/SKILL.md

CUA Framework

Overview

CUA ("koo-ah") is an open-source framework for building Computer Use Agents—AI systems that see, understand, and interact with desktop applications through vision and action. It supports Windows, Linux, and macOS automation.

Key capabilities:

Vision-based UI automation via screenshot analysis
Multi-platform desktop control (click, type, scroll, drag)
100+ LLM providers via LiteLLM integration
Composed agents (grounding + planning models)
Local and cloud execution options

Installation

# Computer SDK - desktop control
pip install cua-computer

# Agent SDK - autonomous agents
pip install cua-agent[all]

# MCP Server (optional)
pip install cua-mcp-server

CLI Installation:

# macOS/Linux
curl -LsSf https://cua.ai/cli/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"

Computer SDK

Computer Class

from computer import Computer
import os

os.environ["CUA_API_KEY"] = "sk_cua-api01_..."

computer = Computer(
    os_type="linux",      # "linux" | "macos" | "windows"
    provider_type="cloud", # "cloud" | "docker" | "lume" | "windows_sandbox"
    name="sandbox-name"
)

try:
    await computer.run()
    # Use computer.interface methods here
finally:
    await computer.close()

Interface Methods

Screenshot:

screenshot = await computer.interface.screenshot()

Mouse Actions:

await computer.interface.left_click(x, y)      # Left click at coordinates
await computer.interface.right_click(x, y)     # Right click
await computer.interface.double_click(x, y)    # Double click
await computer.interface.move_cursor(x, y)     # Move cursor without clicking
await computer.interface.drag(x1, y1, x2, y2)  # Click and drag

Keyboard Actions:

await computer.interface.type_text("Hello!")   # Type text
await computer.interface.key_press("enter")    # Press single key
await computer.interface.hotkey("ctrl", "c")   # Key combination

Scrolling:

await computer.interface.scroll(direction, amount)  # Scroll up/down/left/right

File Operations:

content = await computer.interface.read_file("/path/to/file")
await computer.interface.write_file("/path/to/file", "content")

Clipboard:

text = await computer.interface.get_clipboard()
await computer.interface.set_clipboard("text to copy")

Supported Actions (Message Format)

OpenAI-style:

```
ClickAction
```
- button: left/right/wheel/back/forward, x, y coordinates
```
DoubleClickAction
```
- same parameters as click
```
DragAction
```
- start and end coordinates
```
KeyPressAction
```
- key name
```
MoveAction
```
- x, y coordinates
```
ScreenshotAction
```
- no parameters
```
ScrollAction
```
- direction and amount
```
TypeAction
```
- text string
```
WaitAction
```
- duration

Anthropic-style:

```
LeftMouseDownAction
```
- x, y coordinates
```
LeftMouseUpAction
```
- x, y coordinates

Agent SDK

ComputerAgent Class

from agent import ComputerAgent

agent = ComputerAgent(
    model="anthropic/claude-sonnet-4-5-20250929",
    tools=[computer],
    max_trajectory_budget=5.0  # Cost limit in USD
)

messages = [{"role": "user", "content": "Open Firefox and go to google.com"}]

async for result in agent.run(messages):
    for item in result["output"]:
        if item["type"] == "message":
            print(item["content"][0]["text"])

Response Structure

{
    "output": [AgentMessage, ...],  # List of messages
    "usage": {
        "prompt_tokens": int,
        "completion_tokens": int,
        "total_tokens": int,
        "response_cost": float
    }
}

Message Types:

```
UserMessage
```
- Input from user/system
```
AssistantMessage
```
- Text output from agent
```
ReasoningMessage
```
- Agent thinking/summary
```
ComputerCallMessage
```
- Intent to perform action
```
ComputerCallOutputMessage
```
- Screenshot result
```
FunctionCallMessage
```
- Python tool invocation
```
FunctionCallOutputMessage
```
- Function result

Supported Models

CUA VLM Router (Recommended)

model="cua/anthropic/claude-sonnet-4.5"  # Recommended
model="cua/anthropic/claude-haiku-4.5"   # Faster, cheaper

Single API key, cost tracking, managed infrastructure.

Anthropic (BYOK)

os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."

model="anthropic/claude-sonnet-4-5-20250929"
model="anthropic/claude-haiku-4-5-20251001"
model="anthropic/claude-opus-4-20250514"
model="anthropic/claude-3-7-sonnet-20250219"

OpenAI (BYOK)

os.environ["OPENAI_API_KEY"] = "sk-..."

model="openai/computer-use-preview"

Google Gemini

model="gemini-2.5-computer-use-preview-10-2025"

Local Models

model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"
model="ollama_chat/0000/ui-tars-1.5-7b"

Composed Agents

Combine grounding models with planning models:

model="huggingface-local/GTA1-7B+openai/gpt-4o"
model="moondream3+openai/gpt-4o"
model="omniparser+anthropic/claude-sonnet-4-5-20250929"
model="omniparser+ollama_chat/mistral-small3.2"

Grounding Models: UI-TARS, GTA, Holo, Moondream, OmniParser, OpenCUA

Human-in-the-Loop

model="human/human"  # Pause for user approval

Provider Types

Cloud (Recommended)

computer = Computer(
    os_type="linux",  # linux, windows, macos
    provider_type="cloud",
    name="sandbox-name",
    api_key="sk_cua-api01_..."
)

Get API key from cloud.trycua.com.

Docker (Local)

computer = Computer(
    os_type="linux",
    provider_type="docker"
)

Images:

trycua/cua-xfce:latest

trycua/cua-ubuntu:latest

Lume (macOS Local)

computer = Computer(
    os_type="linux",
    provider_type="lume"
)

Requires Lume CLI installation.

Windows Sandbox

computer = Computer(
    os_type="windows",
    provider_type="windows_sandbox"
)

Requires

pywinsandbox

and Windows Sandbox feature enabled.

MCP Integration

This project uses the CUA MCP Server for Claude Code integration:

{
  "mcpServers": {
    "cua": {
      "type": "http",
      "url": "https://cua-mcp-server.vercel.app/mcp"
    }
  }
}

MCP Tools Available

Sandbox Management:

```
mcp__cua__list_sandboxes
```
- List all sandboxes
```
mcp__cua__create_sandbox
```
- Create VM (os, size, region)

mcp__cua__start/stop/restart/delete_sandbox

Task Execution:

```
mcp__cua__run_task
```
- Autonomous task execution
```
mcp__cua__describe_screen
```
- Vision analysis without action
```
mcp__cua__get_task_history
```
- Retrieve task results

Best Practices

Task Design

# Good - specific and sequential
"Open Chrome, navigate to github.com, click the Sign In button"

# Avoid - vague
"Log into GitHub"

Error Recovery

async for result in agent.run(messages):
    if result.get("error"):
        # Take screenshot to understand state
        screenshot = await computer.interface.screenshot()
        # Retry with more specific instructions

Resource Management

try:
    await computer.run()
    # ... perform tasks
finally:
    await computer.close()  # Always cleanup

Cost Control

agent = ComputerAgent(
    model="cua/anthropic/claude-sonnet-4.5",
    max_trajectory_budget=5.0  # Stop at $5 spent
)