Claude-night-market computer-control
Automate desktop GUI workflows via Claude computer use API with screenshot capture and mouse/keyboard control.
install
source · Clone the upstream repo
git clone https://github.com/athola/claude-night-market
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/athola/claude-night-market "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/phantom/skills/computer-control" ~/.claude/skills/athola-claude-night-market-computer-control && rm -rf "$T"
manifest:
plugins/phantom/skills/computer-control/SKILL.mdsource content
Computer Control Skill
Use Claude's Computer Use API to see and control desktop environments through screenshots and mouse/keyboard actions.
When To Use
- Automating GUI-based workflows that lack CLI alternatives
- Testing web applications through visual interaction
- Filling forms, navigating menus, or interacting with desktop apps
- Building automation pipelines that need visual verification
When NOT To Use
- Tasks achievable through CLI or API (no GUI needed)
- Browser automation better served by Playwright or CDP
Architecture
The computer use system has three layers:
- Display Toolkit (
) - executes OS-level actions via xdotool/scrot on the real or virtual displayphantom.display - Agent Loop (
) - manages the conversation cycle between Claude API and the display toolkitphantom.loop - CLI (
) - command-line interface for running tasks or checking environment readinessphantom.cli
User Task | v Agent Loop <----> Claude API (beta) | | v v Display Toolkit tool_use responses | (click, type, screenshot) v OS Commands (xdotool, scrot) | v Display (X11 / Xvfb / WSLg)
Quick Start
Check environment
cd plugins/phantom uv run python -m phantom.cli --check
Run a task
export ANTHROPIC_API_KEY="sk-ant-..." uv run python -m phantom.cli "Open Firefox and search for Claude AI"
Use in Python
from phantom.display import DisplayConfig, DisplayToolkit from phantom.loop import LoopConfig, run_loop result = run_loop( task="Take a screenshot of the desktop", api_key="sk-ant-...", loop_config=LoopConfig( model="claude-sonnet-4-6", max_iterations=10, ), display_config=DisplayConfig(width=1920, height=1080), ) print(f"Done in {result.iterations} iterations") print(result.final_text)
API Versions
| Model | Tool Version | Beta Flag |
|---|---|---|
| Opus 4.6, Sonnet 4.6, Opus 4.5 | | |
| Sonnet 4.5, Haiku 4.5, older | | |
The
resolve_tool_version() function handles this mapping
automatically based on the model name.
Available Actions
All versions:
- capture displayscreenshot
- click atleft_click[x, y]
- type text stringtype
- press key combo (e.g.,key
)ctrl+s
- move cursormouse_move
Enhanced (20250124+):
- scroll with direction and amountscroll
- drag between coordinatesleft_click_drag
,right_click
,middle_click
,double_clicktriple_click
- hold key for durationhold_key
- pause between actionswait
Latest (20251124):
- inspect screen region at full resolutionzoom
Safety
Computer use carries risks. Follow these guidelines:
- Use a sandbox: Run in Docker or a VM, not your main OS
- Limit access: Do not provide login credentials unless necessary, and never for banking or sensitive services
- Set iteration caps: Always use
to prevent runaway API costsmax_iterations - Human approval: For actions with real-world consequences,
add confirmation callbacks via
on_action - Close sensitive apps: Claude sees the full screen via screenshots; close anything private before starting
Environment Requirements
Linux (native or WSL2 with WSLg):
sudo apt install xdotool scrot xclip
Headless (Docker/CI):
# Install Xvfb for virtual display sudo apt install xvfb xdotool scrot xclip Xvfb :1 -screen 0 1920x1080x24 & export DISPLAY=:1
Prompting Tips
- Be specific about each step of the task
- Add "After each step, take a screenshot and verify" to catch mistakes early
- Use keyboard shortcuts when UI elements are hard to click
- Provide example screenshots for repeatable workflows
- Set a system prompt with domain-specific instructions