DeepCamera Home Security AI Benchmark
LLM & VLM evaluation suite for home security AI applications
git clone https://github.com/SharpAI/DeepCamera
T=$(mktemp -d) && git clone --depth=1 https://github.com/SharpAI/DeepCamera "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/analysis/home-security-benchmark" ~/.claude/skills/sharpai-deepcamera-home-security-ai-benchmark && rm -rf "$T"
skills/analysis/home-security-benchmark/SKILL.mdHome Security AI Benchmark
Comprehensive benchmark suite evaluating LLM and VLM models on 143 tests across 16 suites — context preprocessing, tool use, security classification, prompt injection resistance, alert routing, knowledge injection, VLM-to-alert triage, and scene analysis.
Setup
Requires
. This skill has a npm install
package.json with dependencies (e.g. openai SDK). The deployment agent should check for package.json and run npm install in the skill directory before launching.
Entry script:
scripts/run-benchmark.cjs
Verification
node scripts/run-benchmark.cjs --help
Quick Start
As an Aegis Skill (automatic)
When spawned by Aegis, all configuration is injected via environment variables. The benchmark discovers your LLM gateway and VLM server automatically, generates an HTML report, and opens it when complete.
Standalone
# LLM-only (VLM tests skipped) node scripts/run-benchmark.cjs # With VLM tests (base URL without /v1 suffix) node scripts/run-benchmark.cjs --vlm http://localhost:5405 # Custom LLM gateway node scripts/run-benchmark.cjs --gateway http://localhost:5407 # Skip report auto-open node scripts/run-benchmark.cjs --no-open
Configuration
Environment Variables (set by Aegis)
| Variable | Default | Description |
|---|---|---|
| | LLM gateway (OpenAI-compatible) |
| — | Direct llama-server LLM endpoint |
| | LLM provider type (builtin, openai, etc.) |
| — | LLM model name |
| — | API key for cloud LLM providers |
| — | Cloud provider base URL (e.g. ) |
| (disabled) | VLM server base URL |
| — | Loaded VLM model ID |
| — | Skill identifier (enables skill mode) |
| | JSON params from skill config |
Note: URLs should be base URLs (e.g.
). The benchmark appendshttp://localhost:5405automatically. Including a/v1/chat/completionssuffix is also accepted — it will be stripped to avoid double-pathing./v1
User Configuration (config.yaml)
This skill includes a
that defines user-configurable parameters. Aegis parses this at install time and renders a config panel in the UI. Values are delivered via config.yaml
AEGIS_SKILL_PARAMS.
| Parameter | Type | Default | Description |
|---|---|---|---|
| select | | Which suites to run: (96 tests), (47 tests), or (143 tests) |
| boolean | | Skip auto-opening the HTML report in browser |
Platform parameters like
AEGIS_GATEWAY_URL and AEGIS_VLM_URL are auto-injected by Aegis — they are not in config.yaml. See Aegis Skill Platform Parameters for the full platform contract.
CLI Arguments (standalone fallback)
| Argument | Default | Description |
|---|---|---|
| | LLM gateway |
| (disabled) | VLM server base URL |
| | Results directory |
| (auto in skill mode) | Force report generation |
| — | Don't auto-open report in browser |
Protocol
Aegis → Skill (env vars)
AEGIS_GATEWAY_URL=http://localhost:5407 AEGIS_VLM_URL=http://localhost:5405 AEGIS_SKILL_ID=home-security-benchmark AEGIS_SKILL_PARAMS={}
Skill → Aegis (stdout, JSON lines)
{"event": "ready", "model": "Qwen3.5-4B-Q4_1", "system": "Apple M3"} {"event": "suite_start", "suite": "Context Preprocessing"} {"event": "test_result", "suite": "...", "test": "...", "status": "pass", "timeMs": 123} {"event": "suite_end", "suite": "...", "passed": 4, "failed": 0} {"event": "complete", "passed": 126, "total": 131, "timeMs": 322000, "reportPath": "/path/to/report.html"}
Human-readable output goes to stderr (visible in Aegis console tab).
Test Suites (143 Tests)
| Suite | Tests | Domain |
|---|---|---|
| Context Preprocessing | 6 | Conversation dedup accuracy |
| Topic Classification | 4 | Topic extraction & change detection |
| Knowledge Distillation | 5 | Fact extraction, slug matching |
| Event Deduplication | 8 | Security event classification |
| Tool Use | 16 | Tool selection & parameter extraction |
| Chat & JSON Compliance | 11 | Persona, memory, structured output |
| Security Classification | 12 | Threat level assessment |
| Narrative Synthesis | 4 | Multi-camera event summarization |
| Prompt Injection Resistance | 4 | Adversarial prompt defense |
| Multi-Turn Reasoning | 4 | Context resolution over turns |
| Error Recovery & Edge Cases | 4 | Graceful failure handling |
| Privacy & Compliance | 3 | PII handling, consent |
| Alert Routing & Subscription | 5 | Channel targeting, schedule CRUD |
| Knowledge Injection to Dialog | 5 | KI-personalized responses |
| VLM-to-Alert Triage | 5 | Urgency classification from VLM |
| VLM Scene Analysis | 47 | Frame entity detection & description (outdoor + indoor safety) |
Results
Results are saved to
~/.aegis-ai/benchmarks/ as JSON. An HTML report with cross-model comparison is auto-generated and opened in the browser after each run.
Requirements
- Node.js ≥ 18
(fornpm install
SDK dependency)openai- Running LLM server (llama-server, OpenAI API, or any OpenAI-compatible endpoint)
- Optional: Running VLM server for scene analysis tests (47 tests)