Claude-ops ops-monitor
Unified APM and monitoring surface. Polls Datadog, New Relic, and OpenTelemetry backends for active alerts, error traces, and entity health. Use --watch for live polling every 60 seconds. Use --setup to configure monitoring credentials.
git clone https://github.com/Lifecycle-Innovations-Limited/claude-ops
T=$(mktemp -d) && git clone --depth=1 https://github.com/Lifecycle-Innovations-Limited/claude-ops "$T" && mkdir -p ~/.claude/skills && cp -r "$T/claude-ops/skills/ops-monitor" ~/.claude/skills/lifecycle-innovations-limited-claude-ops-ops-monitor && rm -rf "$T"
claude-ops/skills/ops-monitor/SKILL.mdRuntime Context
PREFS="${CLAUDE_PLUGIN_DATA_DIR:-$HOME/.claude/plugins/data/ops-ops-marketplace}/preferences.json" DD_API_KEY=$(jq -r '.datadog_api_key // empty' "$PREFS" 2>/dev/null) NR_API_KEY=$(jq -r '.newrelic_api_key // empty' "$PREFS" 2>/dev/null) OTEL_ENDPOINT=$(jq -r '.otel_endpoint // empty' "$PREFS" 2>/dev/null)
Determine
$ARGUMENTS mode:
- Contains
→ run Setup flow--setup - Contains
→ run Watch mode--watch - Otherwise → run Default health check
OPS ► MONITOR
Setup flow (--setup
)
--setupAsk which backends to configure:
Which monitoring backends would you like to configure? [Datadog] [New Relic] [OpenTelemetry] [All three]
For each selected backend, collect credentials via
AskUserQuestion free-text input (one at a time, ≤4 options per call):
Datadog:
— API Key from app.datadoghq.com/organization-settings/api-keysdatadog_api_key
— Application Key from app.datadoghq.com/organization-settings/application-keysdatadog_app_key
New Relic:
— User API Key from one.newrelic.com/api-keysnewrelic_api_key
— Numeric Account ID from New Relic admin portalnewrelic_account_id
OpenTelemetry:
— Base URL of your OTEL-compatible backend (e.g., https://otlp.grafana.net)otel_endpoint
Write each credential to preferences.json using atomic tmpfile swap:
tmp=$(mktemp) jq --arg k "$KEY" --arg v "$VALUE" '.[$k] = $v' "$PREFS" > "$tmp" && mv "$tmp" "$PREFS"
Run smoke test after saving:
- Datadog:
→ expectcurl -sf -H "DD-API-KEY: $DD_API_KEY" -H "DD-APPLICATION-KEY: $DD_APP_KEY" "https://api.datadoghq.com/api/v1/validate"{"valid": true} - New Relic:
→ expectcurl -sf -H "Api-Key: $NR_API_KEY" "https://api.newrelic.com/graphql" -d '{"query":"{ actor { user { name } } }"}'data.actor.user - OTEL:
→ expect HTTP 200curl -sf "$OTEL_ENDPOINT/healthz"
Report ✅ or ❌ with status for each backend.
Agent Teams support
If
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 is set, use Agent Teams when querying multiple backends simultaneously. This enables:
- Backend probes run in parallel with shared context (e.g., Datadog agent detects latency spike → OTEL agent can correlate with traces)
- You can steer: "focus on Datadog alerts first, then cross-reference with New Relic"
- Real-time progress: agents report per-backend as results arrive
Team setup (only when flag is enabled, multiple backends configured):
TeamCreate("monitor-probes") Agent(team_name="monitor-probes", name="datadog-probe", subagent_type="ops:monitor-agent", ...) Agent(team_name="monitor-probes", name="newrelic-probe", subagent_type="ops:monitor-agent", ...) Agent(team_name="monitor-probes", name="otel-probe", subagent_type="ops:monitor-agent", ...)
If the flag is NOT set or only one backend is configured, use a single
monitor-agent subagent.
Default health check (no flags)
Spawn
monitor-agent via the Agent tool. Display the result as a formatted dashboard:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ OPS ► MONITOR [<timestamp>] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ DATADOG ✅ healthy (0 alerts) NEW RELIC 🔴 2 critical entities OTEL ✅ healthy ────────────────────────────────────────────────────── Total alerts: 2 Severity: CRITICAL
Status icons:
— healthy (0 alerts / configured and reachable)✅
— warning (warn-level alerts present)⚠️
— critical (critical alerts or unreachable)🔴
— not configured⬜
For each alert or critical entity, display: service name, alert name, and link to the relevant dashboard.
If no backends are configured, show a setup prompt:
No monitoring backends configured. Run /ops:monitor --setup to add Datadog, New Relic, or OTEL.
Watch mode (--watch
)
--watchPoll every 60 seconds. On each tick:
while true; do RESULT=$(# spawn monitor-agent and capture JSON output) # Diff against previous tick # Print: timestamp, changed items only # 🆕 new alert: <name> # ✅ resolved: <name> sleep 60 done
Exit on Ctrl-C.
--backend
filter
--backendIf
--backend datadog|newrelic|otel is specified, query and display only that backend.
CLI/API Reference
| Backend | Auth header | Base URL | Health endpoint |
|---|---|---|---|
| Datadog | + | https://api.datadoghq.com | /api/v1/validate |
| New Relic | | https://api.newrelic.com/graphql | POST GraphQL query |
| OTEL | varies by backend | | /healthz |
# Datadog — active alerts curl -sf \ -H "DD-API-KEY: ${DD_API_KEY}" \ -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \ "https://api.datadoghq.com/api/v1/monitor?monitor_tags=*&with_downtimes=false" \ | jq '[.[] | select(.overall_state == "Alert" or .overall_state == "Warn")]' # New Relic — critical entities (GraphQL) curl -sf \ -H "Api-Key: ${NR_API_KEY}" \ -H "Content-Type: application/json" \ -d '{"query":"{ actor { entitySearch(queryBuilder: {alertSeverity: CRITICAL}) { results { entities { name alertSeverity entityType } } } } }"}' \ "https://api.newrelic.com/graphql" # OTEL — health check curl -sf "${OTEL_ENDPOINT}/healthz"