Learn-skills.dev opentelemetry

OpenTelemetry for distributed tracing, metrics, and logging in production systems. Use when user mentions "opentelemetry", "otel", "distributed tracing", "traces", "spans", "metrics collection", "observability", "jaeger", "prometheus", "grafana", "OTLP", "instrumentation", or setting up application monitoring.

install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/1mangesh1/dev-skills-collection/opentelemetry" ~/.claude/skills/neversight-learn-skills-dev-opentelemetry && rm -rf "$T"
manifest: data/skills-md/1mangesh1/dev-skills-collection/opentelemetry/SKILL.md
source content

OpenTelemetry

OpenTelemetry (OTel) is a vendor-neutral observability framework for generating, collecting, and exporting telemetry data. It defines three signals:

  • Traces -- Follow a request across services. Made up of spans (units of work with timing, status, and relationships).
  • Metrics -- Numeric measurements aggregated over time: counters, histograms, gauges.
  • Logs -- Structured event records, correlated with traces via trace/span IDs.

All three signals share a common context propagation mechanism so they can be correlated.

Node.js Setup

Auto-Instrumentation

npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-grpc @opentelemetry/exporter-metrics-otlp-grpc

Create

tracing.ts
(must load before application code):

import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-grpc';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter(),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter(), exportIntervalMillis: 15000,
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
process.on('SIGTERM', () => sdk.shutdown());

Run with:

node --require ./tracing.js app.js

Manual Spans (Node.js)

import { trace, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('my-service', '1.0.0');

async function processOrder(orderId: string) {
  return tracer.startActiveSpan('processOrder', async (span) => {
    try {
      span.setAttribute('order.id', orderId);
      span.addEvent('validation_started');
      span.addEvent('order_processed', { 'order.total': 42.50 });
      span.setStatus({ code: SpanStatusCode.OK });
    } catch (err) {
      span.setStatus({ code: SpanStatusCode.ERROR, message: String(err) });
      span.recordException(err as Error);
      throw err;
    } finally {
      span.end();
    }
  });
}

Python Setup

Auto-Instrumentation

pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
opentelemetry-instrument --service_name my-service \
  --exporter_otlp_endpoint http://localhost:4317 python app.py

Programmatic Setup (Python)

from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.resources import Resource

resource = Resource.create({"service.name": "my-service"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)
metric_reader = PeriodicExportingMetricReader(OTLPMetricExporter())
metrics.set_meter_provider(MeterProvider(resource=resource, metric_readers=[metric_reader]))

Manual Spans (Python)

from opentelemetry import trace
tracer = trace.get_tracer("my-service", "1.0.0")

def process_order(order_id: str):
    with tracer.start_as_current_span("process_order") as span:
        span.set_attribute("order.id", order_id)
        span.add_event("validation_started")
        span.add_event("order_processed", {"order.total": 42.50})

Traces: Spans, Context, Attributes, Events

A span represents a unit of work. Key fields:

name
(operation),
kind
(CLIENT/SERVER/PRODUCER/CONSUMER/INTERNAL),
start_time
/
end_time
,
status
(OK/ERROR/UNSET),
attributes
(key-value pairs),
events
(timestamped entries),
links
(related spans).

Context propagation passes trace context across process boundaries via the W3C

traceparent
header:
00-<trace-id>-<span-id>-<trace-flags>
. Auto-instrumentation handles this for HTTP. For manual propagation:

import { propagation, context } from '@opentelemetry/api';
// Inject into outgoing headers
const headers: Record<string, string> = {};
propagation.inject(context.active(), headers);
// Extract from incoming headers
const ctx = propagation.extract(context.active(), incomingHeaders);

Metrics

InstrumentUse CaseExample
CounterMonotonically increasing count
requests_total
UpDownCounterValue that increases or decreases
active_connections
HistogramDistribution of values
request_duration_ms
GaugePoint-in-time value via callback
cpu_usage_percent
import { metrics } from '@opentelemetry/api';
const meter = metrics.getMeter('my-service');

const requestCounter = meter.createCounter('http.requests', { description: 'Total HTTP requests' });
const requestDuration = meter.createHistogram('http.request.duration', { description: 'ms', unit: 'ms' });
const activeConns = meter.createUpDownCounter('http.active_connections');
meter.createObservableGauge('system.cpu.usage').addCallback((r) => {
  r.observe(getCpuUsage(), { 'cpu.core': '0' });
});

requestCounter.add(1, { 'http.method': 'GET', 'http.route': '/users' });
requestDuration.record(145, { 'http.method': 'GET' });
activeConns.add(1);   // on connect
activeConns.add(-1);  // on disconnect

OTLP Exporter Configuration

gRPC (port 4317) / HTTP/protobuf (port 4318):

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc          # or http/protobuf
# Signal-specific overrides:
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4318/v1/traces
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:4318/v1/metrics
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://localhost:4318/v1/logs
# Auth headers:
OTEL_EXPORTER_OTLP_HEADERS="x-api-key=abc123,x-team=backend"

Collector Setup

The Collector receives, processes, and exports telemetry in a pipeline:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc: { endpoint: 0.0.0.0:4317 }
      http: { endpoint: 0.0.0.0:4318 }
processors:
  batch: { timeout: 5s, send_batch_size: 1024 }
  memory_limiter: { check_interval: 1s, limit_mib: 512 }
  resource:
    attributes:
      - { key: environment, value: production, action: upsert }
exporters:
  otlp/jaeger: { endpoint: jaeger:4317, tls: { insecure: true } }
  prometheus: { endpoint: 0.0.0.0:8889 }
  debug: { verbosity: detailed }
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/jaeger]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

Common Backends

BackendSignalNotes
JaegerTracesOpen source, native OTLP support
Prometheus + GrafanaMetricsPrometheus scrapes collector; Grafana visualizes
DatadogAllUse Datadog exporter or OTLP endpoint
HoneycombTraces, LogsNative OTLP; API key via
OTEL_EXPORTER_OTLP_HEADERS
Grafana TempoTracesPairs with Grafana for visualization

Docker Compose: Collector + Jaeger (Local Dev)

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "8889:8889"   # Prometheus metrics
    depends_on: [jaeger]
  jaeger:
    image: jaegertracing/all-in-one:latest
    environment: [COLLECTOR_OTLP_ENABLED=true]
    ports:
      - "16686:16686" # Jaeger UI
      - "14268:14268" # Jaeger collector HTTP

Point your app at

http://localhost:4317
(gRPC) or
http://localhost:4318
(HTTP). Jaeger UI:
http://localhost:16686
.

Environment Variables

VariablePurposeExample
OTEL_SERVICE_NAME
Identifies the service
order-service
OTEL_EXPORTER_OTLP_ENDPOINT
Collector address
http://localhost:4317
OTEL_EXPORTER_OTLP_PROTOCOL
Transport protocol
grpc
or
http/protobuf
OTEL_EXPORTER_OTLP_HEADERS
Auth headers
x-api-key=abc123
OTEL_TRACES_SAMPLER
Sampling strategy
parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG
Sampler argument
0.1
(10%)
OTEL_RESOURCE_ATTRIBUTES
Additional resource attrs
deployment.environment=prod
OTEL_LOG_LEVEL
SDK log level
debug
OTEL_PROPAGATORS
Context propagation format
tracecontext,baggage

Sampling Strategies

SamplerBehavior
always_on
Record every span. Dev only.
always_off
Record nothing. Disables tracing.
traceidratio
Sample a percentage based on trace ID. Arg:
0.0
-
1.0
.
parentbased_always_on
Respect parent decision; sample root spans.
parentbased_traceidratio
Respect parent; sample unparented at given ratio.

For production,

parentbased_traceidratio
with
0.01
-
0.1
is a common starting point.

OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.05

Programmatic equivalent:

import { TraceIdRatioBasedSampler, ParentBasedSampler } from '@opentelemetry/sdk-trace-base';
const sampler = new ParentBasedSampler({ root: new TraceIdRatioBasedSampler(0.05) });

Custom Span Attributes and Events for Debugging

span.setAttribute('user.id', userId);
span.setAttribute('order.item_count', items.length);
span.setAttribute('feature_flag.dark_mode', true);
span.addEvent('cache_miss', { 'cache.key': cacheKey });
span.addEvent('retry_attempt', { 'attempt.number': 3, 'error.type': 'timeout' });
span.recordException(error);  // creates event with stack trace
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });

Follow semantic conventions for attribute names:

http.request.method
,
db.system
,
rpc.service
.

Common Instrumentation Patterns

Trace HTTP Requests

Auto-instrumentation covers most HTTP libraries. Add business context via middleware:

app.use((req, res, next) => {
  const span = trace.getActiveSpan();
  if (span) {
    span.setAttribute('http.request.header.x_request_id', req.headers['x-request-id']);
    span.setAttribute('user.id', req.user?.id);
  }
  next();
});

Trace Database Queries

Auto-instrumentation handles pg, mysql2, mongoose, etc. Add business context manually:

async function getUser(userId: string) {
  return tracer.startActiveSpan('db.getUser', async (span) => {
    span.setAttribute('db.system', 'postgresql');
    span.setAttribute('db.operation', 'SELECT');
    span.setAttribute('user.id', userId);
    const result = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
    span.setAttribute('db.result_count', result.rows.length);
    span.end();
    return result.rows[0];
  });
}

Trace External API Calls

with tracer.start_as_current_span("call_payment_api") as span:
    span.set_attribute("peer.service", "payment-gateway")
    span.set_attribute("payment.amount", amount)
    span.set_attribute("payment.currency", "USD")
    try:
        response = requests.post(payment_url, json=payload)
        span.set_attribute("http.response.status_code", response.status_code)
    except requests.exceptions.Timeout:
        span.set_status(StatusCode.ERROR, "Payment API timeout")
        raise

Baggage for Cross-Service Context

Baggage propagates key-value pairs across service boundaries without adding them to spans. Useful for tenant IDs, feature flags, or routing hints.

import { propagation, context } from '@opentelemetry/api';
// Set baggage in service A
const bag = propagation.createBaggage({
  'tenant.id': { value: 'acme-corp' },
  'feature.flag': { value: 'new-checkout' },
});
const ctx = propagation.setBaggage(context.active(), bag);
// Baggage propagates automatically via headers

// Read baggage in service B
const currentBaggage = propagation.getBaggage(context.active());
const tenantId = currentBaggage?.getEntry('tenant.id')?.value;

Baggage travels as HTTP headers. Do not put sensitive data in it. Keep entries small -- every downstream service receives all baggage.