Claude-skill-registry adb-screen-detection
Screen understanding with OCR and template matching for Android device automation
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/adb-screen-detection" ~/.claude/skills/majiayu000-claude-skill-registry-adb-screen-detection && rm -rf "$T"
skills/data/adb-screen-detection/SKILL.mdQuick Reference (30 seconds)
Screen Understanding for Android Automation
What It Does: Provides OCR-based text detection and template matching to understand Android device screens. Enables reliable UI automation by verifying screen state before and after actions.
Core Capabilities:
- 📸 Screen Capture: ADB screencap with local storage
- 🔍 OCR Detection: Tesseract-based text extraction
- 🎯 Template Matching: OpenCV-based element detection
- 👆 Coordinate Tapping: ADB input tap with verification
When to Use:
- Need to verify UI state before taking actions
- Finding UI elements by text or appearance
- Building reliable automation workflows
- Screen-dependent decision making
Scripts
1. adb-screen-capture.py
Capture Android device screen and save locally.
# Basic usage uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py # Specify device uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --device 127.0.0.1:5555 # Custom output path uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --output /tmp/screen.png # JSON output uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --json
Output:
{ "device": "127.0.0.1:5555", "timestamp": "2025-12-01T10:30:45Z", "local_path": "/tmp/screenshot.png", "size": [1080, 2400], "success": true }
2. adb-ocr-extract.py
Extract all visible text from device screen using Tesseract OCR.
# Basic usage (uses most recent screenshot) uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py # Specify screenshot path uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --image /tmp/screen.png # Search for specific text uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --search "Login" # JSON output with coordinates uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --json
Output:
{ "text": ["Login", "Username", "Password", "Submit"], "detected": true, "search_found": true, "search_term": "Login", "coordinates": { "Login": [[100, 200, 150, 230]] } }
3. adb-find-element.py
Find UI element by template matching or OCR text search.
# Find by OCR text uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \ --method ocr \ --target "Login Button" \ --threshold 0.8 # Find by template image uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \ --method template \ --template /path/to/template.png \ --threshold 0.8 # JSON output uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \ --method ocr \ --target "Login" \ --json
Output:
{ "found": true, "method": "ocr", "target": "Login", "coordinates": { "x": 100, "y": 200, "width": 150, "height": 30 }, "confidence": 0.95, "message": "Element found at (100, 200)" }
4. adb-tap-coordinate.py
Tap device screen at specific coordinates.
# Tap at coordinates uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \ --x 100 \ --y 200 \ --device 127.0.0.1:5555 # Tap with verification (check screen after tap) uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \ --x 100 \ --y 200 \ --verify-text "Next Screen" \ --timeout 5 # JSON output uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \ --x 100 \ --y 200 \ --json
Output:
{ "device": "127.0.0.1:5555", "tap": { "x": 100, "y": 200 }, "success": true, "verified": true, "verify_text": "Next Screen", "verification_match": true }
Usage Patterns
Pattern 1: Verify Screen State Before Action
# 1. Capture current screen adb-screen-capture.py # 2. Check for expected element adb-find-element.py --method ocr --target "Login Button" # 3. If found, tap it adb-tap-coordinate.py --x 100 --y 200 --verify-text "Welcome"
Pattern 2: OCR-Based Automation
# 1. Capture screen adb-screen-capture.py # 2. Extract all text adb-ocr-extract.py --search "Settings" # 3. Get coordinates and tap adb-find-element.py --method ocr --target "Settings" adb-tap-coordinate.py --x 150 --y 300
Pattern 3: Template-Based Element Detection
# 1. Have known UI template images in ./templates/ # 2. Capture screen adb-screen-capture.py # 3. Match against templates adb-find-element.py --method template --template ./templates/button.png # 4. Tap matched location adb-tap-coordinate.py --x $(jq -r '.coordinates.x') --y $(jq -r '.coordinates.y')
Architecture
Design Principles:
- Independent: Each script can run standalone
- Chainable: Scripts output JSON for piping
- Stateless: No dependencies between executions
- Verifiable: Always verify screen state before proceeding
- Timeout Protected: All network operations have timeouts
Dependency Relationship:
adb-screen-capture.py (foundation) ↓ adb-ocr-extract.py (uses capture) adb-find-element.py (uses capture or templates) ↓ adb-tap-coordinate.py (uses find-element for verification)
Integration Points
Used By:
- Wait for elements between actionsadb-navigation-base
- Verify Magisk UI stateadb-magisk
- Verify app state during automationadb-karrot
- Screen verification in workflowsadb-workflow-orchestrator
Dependencies:
- System:
command-line tooladb - Python: pytesseract, opencv-python, pillow, numpy
Troubleshooting
OCR Not Working
- Install Tesseract:
(macOS) orbrew install tesseract
(Linux)apt-get install tesseract-ocr - Set TESSDATA_PREFIX:
export TESSDATA_PREFIX=/usr/local/share/tessdata
Template Matching Too Strict/Loose
- Adjust
parameter (0.0-1.0)--threshold - Higher threshold = stricter matching
- Recommended: 0.8-0.9 for reliable detection
Device Offline
- Check ADB connection:
adb devices - Reconnect:
adb connect <device> - Restart ADB:
adb kill-server && adb start-server
Workflows
This skill includes TOON-based workflow definitions for automation.
What is TOON?
TOON (Task-Oriented Orchestration Notation) is a structured workflow definition language that pairs with Markdown documentation. Each workflow consists of:
- [name].toon - Orchestration logic and execution steps
- [name].md - Complete documentation and usage guide
This TOON+MD pairing approach is inspired by the BMAD METHOD pattern, adapted to use TOON instead of YAML for better orchestration support.
Available Workflows
Workflow files are located in
workflow/ directory:
Example Workflows (adb-screen-detection):
- Capture and verify screen stateworkflow/screen-verification.toon
- Find elements via OCR or template matchingworkflow/element-detection.toon
- Continuous screen monitoring and analysisworkflow/screen-monitoring.toon
Running a Workflow
Execute any workflow using the ADB workflow orchestrator:
uv run .claude/skills/adb-workflow-orchestrator/scripts/adb-run-workflow.py \ --workflow .claude/skills/adb-screen-detection/workflow/screen-verification.toon \ --param device="127.0.0.1:5555"
Workflow Documentation
Each workflow includes comprehensive documentation in the corresponding
.md file:
- Purpose and use case
- Prerequisites and requirements
- Available parameters
- Execution phases and steps
- Success criteria
- Error handling and recovery
- Example commands
See the
workflow/ directory for complete TOON file definitions and documentation.
Creating New Workflows
To create custom workflows for this skill:
- Create a new
file in the.toon
directoryworkflow/ - Define phases, steps, and parameters using TOON v4.0 syntax
- Create corresponding
file with comprehensive documentation.md - Test with the workflow orchestrator
For more information, refer to the TOON specification and the workflow orchestrator documentation.
Version: 1.0.0 Status: ✅ Foundation Tier Scripts: 4 (all MCP-ready) Last Updated: 2025-12-01 Tier: 2 (Foundation)