Claude-skill-registry adb-screen-detection

Screen understanding with OCR and template matching for Android device automation

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/adb-screen-detection" ~/.claude/skills/majiayu000-claude-skill-registry-adb-screen-detection && rm -rf "$T"
manifest: skills/data/adb-screen-detection/SKILL.md
source content

Quick Reference (30 seconds)

Screen Understanding for Android Automation

What It Does: Provides OCR-based text detection and template matching to understand Android device screens. Enables reliable UI automation by verifying screen state before and after actions.

Core Capabilities:

  • 📸 Screen Capture: ADB screencap with local storage
  • 🔍 OCR Detection: Tesseract-based text extraction
  • 🎯 Template Matching: OpenCV-based element detection
  • 👆 Coordinate Tapping: ADB input tap with verification

When to Use:

  • Need to verify UI state before taking actions
  • Finding UI elements by text or appearance
  • Building reliable automation workflows
  • Screen-dependent decision making

Scripts

1. adb-screen-capture.py

Capture Android device screen and save locally.

# Basic usage
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py

# Specify device
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --device 127.0.0.1:5555

# Custom output path
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --output /tmp/screen.png

# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --json

Output:

{
  "device": "127.0.0.1:5555",
  "timestamp": "2025-12-01T10:30:45Z",
  "local_path": "/tmp/screenshot.png",
  "size": [1080, 2400],
  "success": true
}

2. adb-ocr-extract.py

Extract all visible text from device screen using Tesseract OCR.

# Basic usage (uses most recent screenshot)
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py

# Specify screenshot path
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --image /tmp/screen.png

# Search for specific text
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --search "Login"

# JSON output with coordinates
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --json

Output:

{
  "text": ["Login", "Username", "Password", "Submit"],
  "detected": true,
  "search_found": true,
  "search_term": "Login",
  "coordinates": {
    "Login": [[100, 200, 150, 230]]
  }
}

3. adb-find-element.py

Find UI element by template matching or OCR text search.

# Find by OCR text
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
    --method ocr \
    --target "Login Button" \
    --threshold 0.8

# Find by template image
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
    --method template \
    --template /path/to/template.png \
    --threshold 0.8

# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
    --method ocr \
    --target "Login" \
    --json

Output:

{
  "found": true,
  "method": "ocr",
  "target": "Login",
  "coordinates": {
    "x": 100,
    "y": 200,
    "width": 150,
    "height": 30
  },
  "confidence": 0.95,
  "message": "Element found at (100, 200)"
}

4. adb-tap-coordinate.py

Tap device screen at specific coordinates.

# Tap at coordinates
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
    --x 100 \
    --y 200 \
    --device 127.0.0.1:5555

# Tap with verification (check screen after tap)
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
    --x 100 \
    --y 200 \
    --verify-text "Next Screen" \
    --timeout 5

# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
    --x 100 \
    --y 200 \
    --json

Output:

{
  "device": "127.0.0.1:5555",
  "tap": {
    "x": 100,
    "y": 200
  },
  "success": true,
  "verified": true,
  "verify_text": "Next Screen",
  "verification_match": true
}

Usage Patterns

Pattern 1: Verify Screen State Before Action

# 1. Capture current screen
adb-screen-capture.py

# 2. Check for expected element
adb-find-element.py --method ocr --target "Login Button"

# 3. If found, tap it
adb-tap-coordinate.py --x 100 --y 200 --verify-text "Welcome"

Pattern 2: OCR-Based Automation

# 1. Capture screen
adb-screen-capture.py

# 2. Extract all text
adb-ocr-extract.py --search "Settings"

# 3. Get coordinates and tap
adb-find-element.py --method ocr --target "Settings"
adb-tap-coordinate.py --x 150 --y 300

Pattern 3: Template-Based Element Detection

# 1. Have known UI template images in ./templates/
# 2. Capture screen
adb-screen-capture.py

# 3. Match against templates
adb-find-element.py --method template --template ./templates/button.png

# 4. Tap matched location
adb-tap-coordinate.py --x $(jq -r '.coordinates.x') --y $(jq -r '.coordinates.y')

Architecture

Design Principles:

  • Independent: Each script can run standalone
  • Chainable: Scripts output JSON for piping
  • Stateless: No dependencies between executions
  • Verifiable: Always verify screen state before proceeding
  • Timeout Protected: All network operations have timeouts

Dependency Relationship:

adb-screen-capture.py (foundation)
    ↓
adb-ocr-extract.py (uses capture)
adb-find-element.py (uses capture or templates)
    ↓
adb-tap-coordinate.py (uses find-element for verification)

Integration Points

Used By:

  • adb-navigation-base
    - Wait for elements between actions
  • adb-magisk
    - Verify Magisk UI state
  • adb-karrot
    - Verify app state during automation
  • adb-workflow-orchestrator
    - Screen verification in workflows

Dependencies:

  • System:
    adb
    command-line tool
  • Python: pytesseract, opencv-python, pillow, numpy

Troubleshooting

OCR Not Working

  • Install Tesseract:
    brew install tesseract
    (macOS) or
    apt-get install tesseract-ocr
    (Linux)
  • Set TESSDATA_PREFIX:
    export TESSDATA_PREFIX=/usr/local/share/tessdata

Template Matching Too Strict/Loose

  • Adjust
    --threshold
    parameter (0.0-1.0)
  • Higher threshold = stricter matching
  • Recommended: 0.8-0.9 for reliable detection

Device Offline

  • Check ADB connection:
    adb devices
  • Reconnect:
    adb connect <device>
  • Restart ADB:
    adb kill-server && adb start-server

Workflows

This skill includes TOON-based workflow definitions for automation.

What is TOON?

TOON (Task-Oriented Orchestration Notation) is a structured workflow definition language that pairs with Markdown documentation. Each workflow consists of:

  • [name].toon - Orchestration logic and execution steps
  • [name].md - Complete documentation and usage guide

This TOON+MD pairing approach is inspired by the BMAD METHOD pattern, adapted to use TOON instead of YAML for better orchestration support.

Available Workflows

Workflow files are located in

workflow/
directory:

Example Workflows (adb-screen-detection):

  • workflow/screen-verification.toon
    - Capture and verify screen state
  • workflow/element-detection.toon
    - Find elements via OCR or template matching
  • workflow/screen-monitoring.toon
    - Continuous screen monitoring and analysis

Running a Workflow

Execute any workflow using the ADB workflow orchestrator:

uv run .claude/skills/adb-workflow-orchestrator/scripts/adb-run-workflow.py \
  --workflow .claude/skills/adb-screen-detection/workflow/screen-verification.toon \
  --param device="127.0.0.1:5555"

Workflow Documentation

Each workflow includes comprehensive documentation in the corresponding

.md
file:

  • Purpose and use case
  • Prerequisites and requirements
  • Available parameters
  • Execution phases and steps
  • Success criteria
  • Error handling and recovery
  • Example commands

See the

workflow/
directory for complete TOON file definitions and documentation.

Creating New Workflows

To create custom workflows for this skill:

  1. Create a new
    .toon
    file in the
    workflow/
    directory
  2. Define phases, steps, and parameters using TOON v4.0 syntax
  3. Create corresponding
    .md
    file with comprehensive documentation
  4. Test with the workflow orchestrator

For more information, refer to the TOON specification and the workflow orchestrator documentation.


Version: 1.0.0 Status: ✅ Foundation Tier Scripts: 4 (all MCP-ready) Last Updated: 2025-12-01 Tier: 2 (Foundation)