Skills claude-computer-use
install
source · Clone the upstream repo
git clone https://github.com/TerminalSkills/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/TerminalSkills/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/claude-computer-use" ~/.claude/skills/terminalskills-skills-claude-computer-use && rm -rf "$T"
manifest:
skills/claude-computer-use/SKILL.mdsafety · automated scan (medium risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
- pip install
- shell exec via library
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content
Claude Computer Use
Overview
Claude's computer use capability lets the model see your screen (via screenshots) and control mouse and keyboard. Unlike Playwright or Selenium, it requires no selectors — Claude navigates visually, the same way a human would. It's ideal for legacy software, complex multi-app workflows, and tasks that are hard to automate programmatically.
⚠️ Always run in a sandboxed environment (Docker, VM, or dedicated machine). Never run on a machine with access to sensitive accounts or production systems.
Setup
Install Dependencies
pip install anthropic pillow pyautogui # Linux: also install scrot or gnome-screenshot for screenshots apt-get install -y scrot
Environment
Use Docker for safety:
FROM ubuntu:22.04 RUN apt-get update && apt-get install -y \ python3 python3-pip \ xvfb x11vnc \ scrot \ chromium-browser \ && pip install anthropic pillow pyautogui ENV DISPLAY=:1 CMD ["Xvfb", ":1", "-screen", "0", "1280x800x24"]
docker build -t computer-use-sandbox . docker run -d --name sandbox \ -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \ -p 5900:5900 \ # VNC to monitor what's happening computer-use-sandbox
Core Implementation
Tool Definitions
The computer use beta provides three tools:
TOOLS = [ { "type": "computer_20241022", # Screen + mouse + keyboard "name": "computer", "display_width_px": 1280, "display_height_px": 800, "display_number": 1 }, { "type": "bash_20241022", # Run shell commands "name": "bash" }, { "type": "text_editor_20241022", # View and edit files "name": "str_replace_editor" } ]
Take a Screenshot
import subprocess import base64 from pathlib import Path def take_screenshot() -> str: """Take a screenshot and return as base64 PNG.""" path = "/tmp/screenshot.png" subprocess.run(["scrot", path], check=True) return base64.standard_b64encode(Path(path).read_bytes()).decode("utf-8") def get_screenshot_block() -> dict: """Return an image content block for the API.""" return { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": take_screenshot() } }
Execute Computer Actions
import pyautogui import time def execute_computer_action(action: dict) -> str: """Execute a computer action from Claude's tool use response.""" action_type = action["action"] if action_type == "screenshot": return take_screenshot() elif action_type == "mouse_move": pyautogui.moveTo(action["coordinate"][0], action["coordinate"][1]) return "Mouse moved" elif action_type == "left_click": pyautogui.click(action["coordinate"][0], action["coordinate"][1]) time.sleep(0.5) return "Clicked" elif action_type == "double_click": pyautogui.doubleClick(action["coordinate"][0], action["coordinate"][1]) time.sleep(0.5) return "Double-clicked" elif action_type == "right_click": pyautogui.rightClick(action["coordinate"][0], action["coordinate"][1]) return "Right-clicked" elif action_type == "type": pyautogui.write(action["text"], interval=0.02) return "Typed text" elif action_type == "key": pyautogui.hotkey(*action["key"].replace("+", " ").split()) return f"Pressed {action['key']}" elif action_type == "scroll": direction = action.get("direction", "down") clicks = action.get("clicks", 3) pyautogui.scroll(-clicks if direction == "down" else clicks) return f"Scrolled {direction}" else: return f"Unknown action: {action_type}" def execute_bash(command: str) -> str: """Execute a bash command and return output.""" result = subprocess.run( command, shell=True, capture_output=True, text=True, timeout=30 ) return result.stdout + result.stderr
Full Action Loop
import anthropic client = anthropic.Anthropic() def run_computer_use_agent(task: str, max_steps: int = 20) -> str: """ Run Claude computer use to complete a task. Args: task: Natural language description of what to do max_steps: Safety limit on number of actions Returns: Claude's final response describing what was done """ # Start with screenshot so Claude sees current state messages = [ { "role": "user", "content": [ {"type": "text", "text": task}, get_screenshot_block() ] } ] for step in range(max_steps): print(f"\n[Step {step + 1}/{max_steps}]") response = client.beta.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=4096, tools=TOOLS, messages=messages, betas=["computer-use-2024-10-22"] ) # Add Claude's response to history messages.append({"role": "assistant", "content": response.content}) # Check if Claude is done if response.stop_reason == "end_turn": final_text = next( (b.text for b in response.content if hasattr(b, "text")), "" ) print(f"\n✅ Task complete: {final_text}") return final_text # Process tool uses if response.stop_reason == "tool_use": tool_results = [] for block in response.content: if block.type != "tool_use": continue print(f" → Tool: {block.name}, Action: {block.input.get('action', block.name)}") if block.name == "computer": result = execute_computer_action(block.input) # Always include a fresh screenshot after actions tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": [ {"type": "text", "text": result}, get_screenshot_block() ] }) elif block.name == "bash": output = execute_bash(block.input["command"]) print(f" ← Output: {output[:200]}") tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": output }) elif block.name == "str_replace_editor": # Handle file viewing/editing tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": "File operation completed" }) messages.append({"role": "user", "content": tool_results}) return "Max steps reached without completion"
Human-in-the-Loop
For sensitive tasks, add approval checkpoints:
SENSITIVE_ACTIONS = ["left_click", "type", "key"] REQUIRE_APPROVAL_FOR = ["submit", "delete", "purchase", "send"] def execute_with_approval(action: dict, task_context: str) -> str: """Pause and ask for human approval on potentially dangerous actions.""" action_type = action.get("action", "") text = action.get("text", "") # Check if this looks like a high-risk action is_risky = any(word in task_context.lower() for word in REQUIRE_APPROVAL_FOR) if is_risky and action_type in SENSITIVE_ACTIONS: print(f"\n⚠️ APPROVAL REQUIRED") print(f" Action: {action_type}") print(f" Details: {action}") approval = input(" Approve? (y/n): ") if approval.lower() != "y": raise RuntimeError("Action denied by user") return execute_computer_action(action)
Usage Examples
# Fill out a web form result = run_computer_use_agent( "Open Chrome, go to https://forms.example.com/application, " "fill in Name='John Smith', Email='john@smith.com', and submit the form." ) # Extract data from a desktop app result = run_computer_use_agent( "Open the Excel file at /home/user/data.xlsx, " "copy all values from column B rows 2-50, and save them to /tmp/extracted.txt" ) # Automate a repetitive workflow result = run_computer_use_agent( "In the open CRM application, find all contacts with status 'Follow Up', " "change their status to 'Active', and export the list to CSV." )
Safety Checklist
- ✅ Always run in Docker or a dedicated VM
- ✅ Never mount production credentials or sensitive files in the sandbox
- ✅ Add human-in-the-loop for submit/delete/purchase actions
- ✅ Set
to prevent runaway loopsmax_steps - ✅ Monitor via VNC to watch what Claude does in real time
- ✅ Log all actions and screenshots for audit trail
- ❌ Do NOT run on your main machine with browser sessions logged into accounts
- ❌ Do NOT give Claude access to payment methods or admin panels without oversight
Guidelines
- Start tasks with a screenshot so Claude has current context
- Be specific in your task description — vague tasks lead to wrong actions
- Include success criteria: "task is done when you see the confirmation page"
- Set a reasonable
(10–30 depending on task complexity)max_steps - Add delays (
) after clicks to let UI render before next screenshottime.sleep - Use bash tool for file operations; computer tool for GUI interactions
- Monitor token usage — each screenshot adds ~1K tokens to context