Awesome-claude-code observability-knowledge

Observability knowledge base. Provides three pillars (logs, metrics, traces), structured logging, distributed tracing, metrics collection (RED/USE), SLI/SLO/SLA definitions for observability audits and generation.

install
source · Clone the upstream repo
git clone https://github.com/dykyi-roman/awesome-claude-code
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/dykyi-roman/awesome-claude-code "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/observability-knowledge" ~/.claude/skills/dykyi-roman-awesome-claude-code-observability-knowledge && rm -rf "$T"
manifest: skills/observability-knowledge/SKILL.md
source content

Observability Knowledge Base

Quick reference for the three pillars of observability, instrumentation patterns, and SLI/SLO/SLA definitions in PHP applications.

Three Pillars Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                      THREE PILLARS OF OBSERVABILITY                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐          │
│   │      LOGS        │  │     METRICS      │  │     TRACES       │          │
│   │                  │  │                  │  │                  │          │
│   │  What happened   │  │  How much/many   │  │  How requests    │          │
│   │  (discrete       │  │  (aggregated     │  │  flow through    │          │
│   │   events)        │  │   measurements)  │  │  services)       │          │
│   │                  │  │                  │  │                  │          │
│   │  • Errors        │  │  • Counters      │  │  • Spans         │          │
│   │  • Audit trail   │  │  • Gauges        │  │  • Context       │          │
│   │  • Debug info    │  │  • Histograms    │  │  • Latency       │          │
│   │                  │  │                  │  │                  │          │
│   │  JSON structured │  │  Prometheus      │  │  OpenTelemetry   │          │
│   │  Monolog         │  │  StatsD          │  │  Jaeger/Zipkin   │          │
│   └────────┬─────────┘  └────────┬─────────┘  └────────┬─────────┘          │
│            │                     │                      │                    │
│            └─────────────────────┼──────────────────────┘                    │
│                                  │                                           │
│                        ┌─────────▼─────────┐                                │
│                        │  CORRELATION ID   │                                │
│                        │  (links all three │                                │
│                        │   pillars)        │                                │
│                        └───────────────────┘                                │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Structured Logging

JSON Log Format

FieldTypeDescriptionRequired
timestamp
ISO 8601When event occurredYes
level
stringRFC 5424 log levelYes
message
stringHuman-readable descriptionYes
channel
stringLogger channel nameYes
context
objectStructured event dataNo
correlation_id
stringRequest/trace identifierYes
service
stringService/app nameYes
environment
stringprod/staging/devYes

Log Levels (RFC 5424)

LevelCodeWhen to Use
EMERGENCY0System is unusable
ALERT1Immediate action required
CRITICAL2Critical conditions (component failure)
ERROR3Runtime errors (not requiring immediate action)
WARNING4Exceptional but handled conditions
NOTICE5Normal but significant events
INFO6Informational messages (request processed)
DEBUG7Detailed debug information

Monolog Context Processor

<?php

declare(strict_types=1);

namespace Infrastructure\Logging;

use Monolog\LogRecord;
use Monolog\Processor\ProcessorInterface;

final readonly class CorrelationIdProcessor implements ProcessorInterface
{
    public function __construct(
        private CorrelationIdHolder $holder,
    ) {}

    public function __invoke(LogRecord $record): LogRecord
    {
        return $record->with(
            extra: array_merge($record->extra, [
                'correlation_id' => $this->holder->get(),
                'service' => $_ENV['APP_SERVICE_NAME'] ?? 'unknown',
                'environment' => $_ENV['APP_ENV'] ?? 'unknown',
            ]),
        );
    }
}

Correlation ID Holder

<?php

declare(strict_types=1);

namespace Infrastructure\Logging;

final class CorrelationIdHolder
{
    private ?string $correlationId = null;

    public function set(string $correlationId): void
    {
        $this->correlationId = $correlationId;
    }

    public function get(): string
    {
        if ($this->correlationId === null) {
            $this->correlationId = uuid_create(UUID_TYPE_RANDOM);
        }

        return $this->correlationId;
    }
}

Distributed Tracing

OpenTelemetry Concepts

ConceptDescription
TraceEnd-to-end journey of a request across services
SpanSingle unit of work within a trace (has start/end time)
SpanContextTrace ID + Span ID + flags, propagated across boundaries
AttributesKey-value metadata on spans
EventsTimestamped annotations within a span
LinksConnections between spans in different traces
BaggageCross-cutting key-value pairs propagated with context

W3C Trace Context Header

traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

tracestate: vendor1=value1,vendor2=value2
PartLengthDescription
version2 hexAlways
00
trace-id32 hexGlobally unique trace identifier
parent-id16 hexID of parent span
trace-flags2 hex
01
= sampled

OpenTelemetry PHP SDK Setup

<?php

declare(strict_types=1);

namespace Infrastructure\Telemetry;

use OpenTelemetry\API\Globals;
use OpenTelemetry\API\Trace\SpanKind;
use OpenTelemetry\API\Trace\StatusCode;
use OpenTelemetry\API\Trace\TracerInterface;

final readonly class TracingService
{
    private TracerInterface $tracer;

    public function __construct(string $serviceName = 'my-app')
    {
        $this->tracer = Globals::tracerProvider()->getTracer($serviceName);
    }

    public function traceOperation(string $operationName, callable $operation, array $attributes = []): mixed
    {
        $span = $this->tracer
            ->spanBuilder($operationName)
            ->setSpanKind(SpanKind::KIND_INTERNAL)
            ->startSpan();

        $scope = $span->activate();

        try {
            foreach ($attributes as $key => $value) {
                $span->setAttribute($key, $value);
            }

            $result = $operation();
            $span->setStatus(StatusCode::STATUS_OK);

            return $result;
        } catch (\Throwable $e) {
            $span->setStatus(StatusCode::STATUS_ERROR, $e->getMessage());
            $span->recordException($e);

            throw $e;
        } finally {
            $scope->detach();
            $span->end();
        }
    }

    public function traceHttpClient(string $method, string $url, callable $request): mixed
    {
        $span = $this->tracer
            ->spanBuilder(sprintf('%s %s', $method, $url))
            ->setSpanKind(SpanKind::KIND_CLIENT)
            ->setAttribute('http.method', $method)
            ->setAttribute('http.url', $url)
            ->startSpan();

        $scope = $span->activate();

        try {
            $result = $request();
            $span->setStatus(StatusCode::STATUS_OK);

            return $result;
        } catch (\Throwable $e) {
            $span->setStatus(StatusCode::STATUS_ERROR, $e->getMessage());

            throw $e;
        } finally {
            $scope->detach();
            $span->end();
        }
    }
}

Metrics

RED Method (Request-Driven Services)

MetricWhatUnitExample
RateRequests per secondreq/sHTTP requests per second by endpoint
ErrorsFailed requests per seconderr/s5xx responses per second
DurationLatency distributionmsResponse time p50, p95, p99

USE Method (Resource-Oriented)

MetricWhatExample
Utilization% time resource is busyCPU usage, disk I/O
SaturationQueued workRequest queue length
ErrorsError countDisk errors, connection failures

Golden Signals (Google SRE)

SignalDescriptionRED Equivalent
LatencyTime to service a requestDuration
TrafficDemand on the systemRate
ErrorsRate of failed requestsErrors
SaturationHow full the system is(USE method)

Prometheus PHP Client

<?php

declare(strict_types=1);

namespace Infrastructure\Metrics;

use Prometheus\CollectorRegistry;
use Prometheus\RenderTextFormat;
use Prometheus\Storage\Redis;

final class PrometheusMetricsCollector
{
    private readonly CollectorRegistry $registry;

    public function __construct(\Redis $redis)
    {
        $adapter = Redis::fromExistingConnection($redis);
        $this->registry = new CollectorRegistry($adapter);
    }

    public function incrementRequestCount(string $method, string $route, int $statusCode): void
    {
        $counter = $this->registry->getOrRegisterCounter(
            'app',
            'http_requests_total',
            'Total HTTP requests',
            ['method', 'route', 'status_code'],
        );

        $counter->inc([$method, $route, (string) $statusCode]);
    }

    public function observeRequestDuration(string $method, string $route, float $durationSeconds): void
    {
        $histogram = $this->registry->getOrRegisterHistogram(
            'app',
            'http_request_duration_seconds',
            'HTTP request duration in seconds',
            ['method', 'route'],
            [0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0],
        );

        $histogram->observe($durationSeconds, [$method, $route]);
    }

    public function setActiveConnections(int $count): void
    {
        $gauge = $this->registry->getOrRegisterGauge(
            'app',
            'active_connections',
            'Current active connections',
            [],
        );

        $gauge->set($count, []);
    }

    public function renderMetrics(): string
    {
        $renderer = new RenderTextFormat();

        return $renderer->render($this->registry->getMetricFamilySamples());
    }
}

SLI / SLO / SLA

ConceptDefinitionExample
SLI (Service Level Indicator)Measurable metric of service behaviorRequest latency p99 < 200ms
SLO (Service Level Objective)Target value for an SLI99.9% of requests within 200ms
SLA (Service Level Agreement)Contract with consequences99.5% uptime or credit issued

Common SLIs

SLI TypeFormulaTarget (SLO)
Availability
successful_requests / total_requests
99.9% (three nines)
Latency
requests < threshold / total_requests
99% < 200ms, 99.9% < 1s
Error Rate
error_requests / total_requests
< 0.1%
Throughput
requests / time_window
> 1000 req/s
Freshness
time_since_last_update
< 5 minutes

Error Budget

Error Budget = 1 - SLO

Example: SLO = 99.9%
Error Budget = 0.1% = ~43 minutes/month downtime allowed

Budget remaining = Error Budget - Actual Errors
If budget exhausted → freeze deployments, focus on reliability

Quick Reference Tables

Observability Tool Selection

NeedTool/LibraryPHP Integration
Structured loggingMonolog
monolog/monolog
Log aggregationELK Stack, LokiMonolog handlers
Metrics collectionPrometheus
promphp/prometheus_client_php
Metrics visualizationGrafanaPrometheus data source
Distributed tracingJaeger, ZipkinOpenTelemetry PHP SDK
APMDatadog, New RelicPHP extensions/agents
Error trackingSentry
sentry/sentry-php
Health checksCustom endpointPSR-15 middleware

Alerting Thresholds

AlertConditionSeverity
High error rate> 1% of requests 5xxCritical
High latencyp99 > 2s for 5 minWarning
Service downHealth check fails 3xCritical
Disk usage> 85% usedWarning
Queue backlog> 10k unprocessedWarning
Memory usage> 90% for 10 minCritical

Common Violations Quick Reference

ViolationWhere to LookSeverity
No structured logging (plain text)Logger config, log outputWarning
Missing correlation IDsMiddleware, log processorsCritical
No metrics endpointRoutes, health controllersWarning
Untraced external callsHTTP clients, adaptersWarning
Swallowed exceptions without loggingCatch blocksCritical
No health check endpointRoutes, controllersWarning
Missing request/response loggingMiddlewareWarning
No alerting rules definedMonitoring configWarning

Detection Patterns

# Logging setup
Grep: "Monolog|LoggerInterface|PsrLogLoggerInterface" --glob "**/*.php"
Grep: "monolog" --glob "**/composer.json"
Grep: "structured|json_formatter|JsonFormatter" --glob "**/*.php"

# Correlation IDs
Grep: "correlation.id|correlationId|X-Correlation-ID|X-Request-ID" --glob "**/*.php"

# Metrics
Grep: "Prometheus|CollectorRegistry|Counter|Histogram|Gauge" --glob "**/*.php"
Grep: "prometheus|promphp" --glob "**/composer.json"
Grep: "/metrics|metricsEndpoint" --glob "**/*.php"

# Tracing
Grep: "OpenTelemetry|Tracer|Span|SpanBuilder" --glob "**/*.php"
Grep: "open-telemetry|opentelemetry" --glob "**/composer.json"
Grep: "traceparent|tracestate|W3C" --glob "**/*.php"

# Health checks
Grep: "health|healthcheck|readiness|liveness" --glob "**/*.php"
Grep: "/health|/ready|/live" --glob "**/routes*.php"

# Error tracking
Grep: "Sentry|sentry|Bugsnag|Rollbar" --glob "**/*.php"
Grep: "sentry/sentry" --glob "**/composer.json"

# Log levels and context
Grep: "->error\(|->critical\(|->warning\(|->info\(" --glob "**/*.php"
Grep: "LogLevel::" --glob "**/*.php"

References

For detailed information, load these reference files:

  • references/logging-patterns.md
    — Structured logging, Monolog setup, context processors, log aggregation patterns
  • references/metrics-patterns.md
    — Counter/Gauge/Histogram types, Prometheus PHP client, RED metrics, alerting rules
  • references/tracing-patterns.md
    — OpenTelemetry PHP SDK, span creation, context propagation, sampling strategies