DeepCamera yolo-detection-2026

YOLO 2026 — state-of-the-art real-time object detection

install
source · Clone the upstream repo
git clone https://github.com/SharpAI/DeepCamera
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/SharpAI/DeepCamera "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/detection/yolo-detection-2026" ~/.claude/skills/sharpai-deepcamera-yolo-detection-2026 && rm -rf "$T"
manifest: skills/detection/yolo-detection-2026/SKILL.md
source content

YOLO 2026 Object Detection

Real-time object detection using the latest YOLO 2026 models. Detects 80+ COCO object classes including people, vehicles, animals, and everyday objects. Outputs bounding boxes with labels and confidence scores.

Model Sizes

SizeSpeedAccuracyBest For
nanoFastestGoodReal-time on CPU, edge devices
smallFastBetterBalanced speed/accuracy
mediumModerateHighAccuracy-focused deployments
largeSlowerHighestMaximum detection quality

Hardware Acceleration

The skill uses

env_config.py
to automatically detect hardware and convert the model to the fastest format for your platform. Conversion happens once during deployment and is cached.

PlatformBackendOptimized FormatCompute UnitsExpected Speedup
NVIDIA GPUCUDATensorRT
.engine
GPU~3-5x
Apple Silicon (M1+)MPSCoreML
.mlpackage
Neural Engine (NPU)~2x
Intel CPU/GPU/NPUOpenVINOOpenVINO IR
.xml
CPU/GPU/NPU~2-3x
AMD GPUROCmONNX RuntimeGPU~1.5-2x
CPU (any)CPUONNX RuntimeCPU~1.5x

Apple Silicon Note: Detection defaults to

cpu_and_ne
(CPU + Neural Engine), keeping the GPU free for LLM/VLM inference. Set
compute_units: all
to include GPU if not running local LLM.

How It Works

  1. deploy.sh
    detects your hardware via
    env_config.HardwareEnv.detect()
  2. Installs the matching
    requirements_{backend}.txt
    (e.g. CUDA → includes
    tensorrt
    )
  3. Pre-converts the default model to the optimal format
  4. At runtime,
    detect.py
    loads the cached optimized model automatically
  5. Falls back to PyTorch if optimization fails

Set

use_optimized: false
to disable auto-conversion and use raw PyTorch.

Auto Start

Set

auto_start: true
in the skill config to start detection automatically when Aegis launches. The skill will begin processing frames from the selected camera immediately.

auto_start: true
model_size: nano
fps: 5

Performance Monitoring

The skill emits

perf_stats
events every 50 frames with aggregate timing:

{"event": "perf_stats", "total_frames": 50, "timings_ms": {
  "inference": {"avg": 3.4, "p50": 3.2, "p95": 5.1},
  "postprocess": {"avg": 0.15, "p50": 0.12, "p95": 0.31},
  "total": {"avg": 3.6, "p50": 3.4, "p95": 5.5}
}}

Protocol

Communicates via JSON lines over stdin/stdout.

Aegis → Skill (stdin)

{"event": "frame", "frame_id": 42, "camera_id": "front_door", "timestamp": "...", "frame_path": "/tmp/aegis_detection/frame_front_door.jpg", "width": 1920, "height": 1080}

Skill → Aegis (stdout)

{"event": "ready", "model": "yolo2026n", "device": "mps", "backend": "mps", "format": "coreml", "gpu": "Apple M3", "classes": 80, "fps": 5}
{"event": "detections", "frame_id": 42, "camera_id": "front_door", "timestamp": "...", "objects": [
  {"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]}
]}
{"event": "perf_stats", "total_frames": 50, "timings_ms": {"inference": {"avg": 3.4}}}
{"event": "error", "message": "...", "retriable": true}

Bounding Box Format

[x_min, y_min, x_max, y_max]
— pixel coordinates (xyxy).

Stop Command

{"command": "stop"}

Installation

The

deploy.sh
bootstrapper handles everything — Python environment, GPU backend detection, dependency installation, and model optimization. No manual setup required.

./deploy.sh

Requirements Files

FileBackendKey Deps
requirements_cuda.txt
NVIDIA
torch
(cu124),
tensorrt
requirements_mps.txt
Apple
torch
,
coremltools
requirements_intel.txt
Intel
torch
,
openvino
requirements_rocm.txt
AMD
torch
(rocm6.2),
onnxruntime-rocm
requirements_cpu.txt
CPU
torch
(cpu),
onnxruntime