install
source · Clone the upstream repo
git clone https://github.com/MacPhobos/research-mind
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/MacPhobos/research-mind "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/toolchains-platforms-observability-datadog" ~/.claude/skills/macphobos-research-mind-toolchains-platforms-observability-datadog && rm -rf "$T"
manifest:
.claude/skills/toolchains-platforms-observability-datadog/skill.mdsource content
Datadog Observability
Overview
Datadog is a SaaS observability platform providing unified monitoring across infrastructure, applications, logs, and user experience. It offers AI-powered anomaly detection, 1000+ integrations, and OpenTelemetry compatibility.
Core Capabilities:
- APM: Distributed tracing with automatic instrumentation for 8+ languages
- Infrastructure: Host, container, and cloud service monitoring
- Logs: Centralized collection with processing pipelines and 15-month retention
- Metrics: Custom metrics via DogStatsD with cardinality management
- Synthetics: Proactive API and browser testing from 29+ global locations
- RUM: Frontend performance with Core Web Vitals and session replay
When to Use This Skill
Activate when:
- Setting up production monitoring and observability
- Implementing distributed tracing across microservices
- Configuring log aggregation and analysis pipelines
- Creating custom metrics and dashboards
- Setting up alerting and anomaly detection
- Optimizing Datadog costs
Do not use when:
- Building with open-source stack (use Prometheus/Grafana instead)
- Cost is primary concern and budget is limited
- Need maximum customization over managed solution
Quick Start
1. Install Datadog Agent
Docker (simplest):
docker run -d --name dd-agent \ -e DD_API_KEY=<YOUR_API_KEY> \ -e DD_SITE="datadoghq.com" \ -v /var/run/docker.sock:/var/run/docker.sock:ro \ -v /proc/:/host/proc/:ro \ -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \ gcr.io/datadoghq/agent:7
Kubernetes (Helm):
helm repo add datadog https://helm.datadoghq.com helm install datadog-agent datadog/datadog \ --set datadog.apiKey=<YOUR_API_KEY> \ --set datadog.apm.enabled=true \ --set datadog.logs.enabled=true
2. Instrument Your Application
Python:
from ddtrace import tracer, patch_all # Automatic instrumentation for common libraries patch_all() # Manual span for custom operations with tracer.trace("custom.operation", service="my-service") as span: span.set_tag("user.id", user_id) # your code here
Node.js:
// Must be first import const tracer = require('dd-trace').init({ service: 'my-service', env: 'production', version: '1.0.0', });
3. Verify in Datadog UI
- Go to Infrastructure > Host Map to verify agent
- Go to APM > Services to see traced services
- Go to Logs > Search to verify log collection
Core Concepts
Tagging Strategy
Tags enable filtering, aggregation, and cost attribution. Use consistent tags across all telemetry.
Required Tags:
| Tag | Purpose | Example |
|---|---|---|
| Environment | |
| Service name | |
| Deployment version | |
| Owning team | |
Avoid High-Cardinality Tags:
- User IDs, request IDs, timestamps
- Pod IDs in Kubernetes
- Build numbers, commit hashes
Unified Observability
Datadog correlates metrics, traces, and logs automatically:
- Traces include span tags that link to metrics
- Logs inject trace IDs for correlation
- Dashboards combine all data sources
Best Practices
Start Simple
- Install Agent with basic configuration
- Enable automatic instrumentation
- Verify data in Datadog UI
- Add custom spans/metrics as needed
Progressive Enhancement
Basic → APM tracing → Custom spans → Custom metrics → Profiling → RUM
Key Instrumentation Points
- HTTP entry/exit points
- Database queries
- External service calls
- Message queue operations
- Business-critical flows
Common Mistakes
- High-cardinality tags: Using user IDs or request IDs as tags creates millions of unique metrics
- Missing log index quotas: Leads to unexpected bills from log volume spikes
- Over-alerting: Creates alert fatigue; alert on symptoms, not causes
- Missing service tags: Prevents correlation between metrics, traces, and logs
- No sampling for high-volume traces: Ingests everything, causing cost explosion
Navigation
For detailed implementation:
- Agent Installation: Docker, Kubernetes, Linux, Windows, and cloud-specific setup
- APM Instrumentation: Python, Node.js, Go, Java instrumentation with code examples
- Log Management: Pipelines, Grok parsing, standard attributes, archives
- Custom Metrics: DogStatsD patterns, metric types, tagging best practices
- Alerting: Monitor types, anomaly detection, alert hygiene
- Cost Optimization: Metrics without Limits, sampling, index quotas
- Kubernetes: DaemonSet, Cluster Agent, autodiscovery
Complementary Skills
When using this skill, consider these related skills (if deployed):
- docker: Container instrumentation patterns
- kubernetes: K8s-native monitoring patterns
- python/nodejs/go: Language-specific APM setup
Resources
Official Documentation:
- APM: https://docs.datadoghq.com/tracing/
- Logs: https://docs.datadoghq.com/logs/
- Metrics: https://docs.datadoghq.com/metrics/
- DogStatsD: https://docs.datadoghq.com/developers/dogstatsd/
Cost Management: