Skillshub anth-observability
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/anth-observability" ~/.claude/skills/comeonoliver-skillshub-anth-observability && rm -rf "$T"
manifest:
skills/jeremylongshore/claude-code-plugins-plus-skills/anth-observability/SKILL.mdsource content
Anthropic Observability
Overview
Instrument Claude API calls with structured logging, Prometheus metrics, and cost tracking. Every API response includes
usage data and rate limit headers — capture these for dashboards and alerting.
Structured Logging
import anthropic import logging import time import json logger = logging.getLogger("claude") def create_with_logging(client: anthropic.Anthropic, **kwargs) -> anthropic.types.Message: start = time.monotonic() request_meta = { "model": kwargs.get("model"), "max_tokens": kwargs.get("max_tokens"), "tool_count": len(kwargs.get("tools", [])), "stream": kwargs.get("stream", False), } try: response = client.messages.create(**kwargs) duration_ms = int((time.monotonic() - start) * 1000) logger.info(json.dumps({ "event": "claude.request", "request_id": response._request_id, "model": response.model, "input_tokens": response.usage.input_tokens, "output_tokens": response.usage.output_tokens, "cache_read_tokens": getattr(response.usage, "cache_read_input_tokens", 0), "stop_reason": response.stop_reason, "duration_ms": duration_ms, "content_blocks": len(response.content), })) return response except anthropic.APIStatusError as e: duration_ms = int((time.monotonic() - start) * 1000) logger.error(json.dumps({ "event": "claude.error", "status": e.status_code, "error_type": getattr(e, "type", "unknown"), "duration_ms": duration_ms, "request_id": e.response.headers.get("request-id", "unknown"), })) raise
Prometheus Metrics
from prometheus_client import Counter, Histogram, Gauge claude_requests = Counter( "claude_requests_total", "Total Claude API requests", ["model", "stop_reason", "status"] ) claude_latency = Histogram( "claude_latency_seconds", "Claude API latency", ["model"], buckets=[0.5, 1, 2, 5, 10, 30, 60] ) claude_tokens = Counter( "claude_tokens_total", "Token usage", ["model", "direction"] # direction: input|output|cache_read ) claude_cost = Counter( "claude_cost_usd", "Estimated cost in USD", ["model"] ) claude_rate_limit_remaining = Gauge( "claude_rate_limit_remaining", "Remaining rate limit", ["dimension"] # dimension: requests|tokens ) def track_metrics(response, duration: float): model = response.model claude_requests.labels(model=model, stop_reason=response.stop_reason, status="ok").inc() claude_latency.labels(model=model).observe(duration) claude_tokens.labels(model=model, direction="input").inc(response.usage.input_tokens) claude_tokens.labels(model=model, direction="output").inc(response.usage.output_tokens) # Cost estimation pricing = {"claude-haiku-4-20250514": (0.80, 4.0), "claude-sonnet-4-20250514": (3.0, 15.0)} rates = pricing.get(model, (3.0, 15.0)) cost = (response.usage.input_tokens * rates[0] + response.usage.output_tokens * rates[1]) / 1e6 claude_cost.labels(model=model).inc(cost)
Key Metrics Dashboard
| Metric | Description | Alert Threshold |
|---|---|---|
| Error count | > 5% of total |
p99 | Tail latency | > 10s |
daily | Daily spend | > 80% budget |
| RPM headroom | < 10% remaining |
rate | Output throughput | Spike detection |
Usage API (Server-Side)
# Anthropic's Usage & Cost API for billing reconciliation # GET https://api.anthropic.com/v1/usage # Returns daily token usage and cost per model
Error Handling
| Observability Gap | Risk | Fix |
|---|---|---|
| No request_id logged | Can't debug with support | Capture |
| Missing cost tracking | Budget surprise | Track per-request cost |
| No latency histogram | Can't spot slow queries | Add Prometheus/Datadog histograms |
Resources
Next Steps
For incident response, see
anth-incident-runbook.