Awesome-omni-skill dd-apm

APM - traces, services, dependencies, performance analysis.

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/dd-apm" ~/.claude/skills/diegosouzapw-awesome-omni-skill-dd-apm && rm -rf "$T"
manifest: skills/development/dd-apm/SKILL.md
source content

Datadog APM

Distributed tracing, service maps, and performance analysis.

Requirements

Datadog Labs Pup should be installed via:

go install github.com/datadog-labs/pup@latest

Quick Start

pup auth login
pup apm services list
pup apm traces list --service api-gateway --duration 1h

Services

List Services

pup apm services list
pup apm services list --env production

Service Details

pup apm services get api-gateway --json

Service Map

# View dependencies
pup apm service-map --service api-gateway --json

Traces

Search Traces

# By service
pup apm traces list --service api-gateway --duration 1h

# Errors only
pup apm traces list --service api-gateway --status error

# Slow traces (>1s)
pup apm traces list --service api-gateway --min-duration 1000ms

# With specific tag
pup apm traces list --query "@http.url:/api/users"

Get Trace Detail

pup apm traces get <trace_id> --json

Key Metrics

MetricWhat It Measures
trace.http.request.hits
Request count
trace.http.request.duration
Latency
trace.http.request.errors
Error count
trace.http.request.apdex
User satisfaction

⚠️ Trace Sampling

Not all traces are kept. Understand sampling:

ModeWhat's Kept
Head-basedRandom % at start
Error/SlowAll errors, slow traces
RetentionWhat's indexed (billed)
# Check retention filters
pup apm retention-filters list

Trace Retention Costs

RetentionCost
Indexed spans$$$ per million
Ingested spans$ per million

Best practice: Only index what you need for search.

Service Level Objectives

Link APM to SLOs:

pup slos create \
  --name "API Latency p99 < 200ms" \
  --type metric \
  --numerator "sum:trace.http.request.hits{service:api,@duration:<200000000}" \
  --denominator "sum:trace.http.request.hits{service:api}" \
  --target 99.0

Common Queries

GoalQuery
Slowest endpoints
avg:trace.http.request.duration{*} by {resource_name}
Error rate
sum:trace.http.request.errors{*} / sum:trace.http.request.hits{*}
Throughput
sum:trace.http.request.hits{*}.as_rate()

Troubleshooting

ProblemFix
No tracesCheck ddtrace installed, DD_TRACE_ENABLED=true
Missing serviceVerify DD_SERVICE env var
Traces not linkedCheck trace headers propagated
High cardinalityDon't tag with user_id/request_id

References/Docs