Skills datadog

install
source · Clone the upstream repo
git clone https://github.com/TerminalSkills/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/TerminalSkills/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/datadog" ~/.claude/skills/terminalskills-skills-datadog && rm -rf "$T"
manifest: skills/datadog/SKILL.md
safety · automated scan (high risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
  • uses sudo
  • pip install
  • makes HTTP requests (curl)
  • references API keys
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content

Datadog

Overview

Set up and manage Datadog for full-stack observability including infrastructure metrics, APM traces, log aggregation, dashboards, and alerting. Covers agent installation, integration configuration, monitor creation, and API usage.

Instructions

Task A: Install and Configure the Datadog Agent

  1. Install the agent on the target host
  2. Configure the main
    datadog.yaml
    with API key and site
  3. Enable relevant integrations
# /etc/datadog-agent/datadog.yaml — Main agent configuration
api_key: "<YOUR_DD_API_KEY>"
site: "datadoghq.com"
hostname: "web-server-01"
tags:
  - env:production
  - service:web-api
  - team:platform
logs_enabled: true
apm_config:
  enabled: true
  env: production
process_config:
  process_collection:
    enabled: true
# Install Datadog Agent on Ubuntu/Debian
DD_API_KEY="<YOUR_DD_API_KEY>" DD_SITE="datadoghq.com" \
  bash -c "$(curl -L https://install.datadoghq.com/scripts/install_script_agent7.sh)"

# Verify agent status
sudo datadog-agent status

Task B: Configure Integrations

# /etc/datadog-agent/conf.d/postgres.d/conf.yaml — PostgreSQL integration
init_config:

instances:
  - host: localhost
    port: 5432
    username: datadog
    password: "<DB_PASSWORD>"
    dbname: myapp_production
    tags:
      - env:production
      - service:database
    collect_activity_metrics: true
    collect_database_size_metrics: true
# /etc/datadog-agent/conf.d/nginx.d/conf.yaml — Nginx integration
init_config:

instances:
  - nginx_status_url: http://localhost:8080/nginx_status
    tags:
      - env:production
      - service:web-proxy

Task C: Create Monitors and Alerts

# Create a metric monitor via API — High CPU alert
curl -X POST "https://api.datadoghq.com/api/v1/monitor" \
  -H "Content-Type: application/json" \
  -H "DD-API-KEY: ${DD_API_KEY}" \
  -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
  -d '{
    "name": "High CPU on {{host.name}}",
    "type": "metric alert",
    "query": "avg(last_5m):avg:system.cpu.user{env:production} by {host} > 85",
    "message": "CPU usage above 85% on {{host.name}}.\n\n@slack-ops-alerts @pagerduty-infra",
    "tags": ["env:production", "team:platform"],
    "options": {
      "thresholds": {
        "critical": 85,
        "warning": 70
      },
      "notify_no_data": true,
      "no_data_timeframe": 10,
      "renotify_interval": 30,
      "escalation_message": "CPU still elevated on {{host.name}} — escalating."
    }
  }'
# Create a log-based monitor — Error rate spike
curl -X POST "https://api.datadoghq.com/api/v1/monitor" \
  -H "Content-Type: application/json" \
  -H "DD-API-KEY: ${DD_API_KEY}" \
  -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
  -d '{
    "name": "Error log spike in payment-service",
    "type": "log alert",
    "query": "logs(\"service:payment-service status:error\").index(\"main\").rollup(\"count\").by(\"service\").last(\"5m\") > 50",
    "message": "More than 50 error logs in 5 minutes for payment-service.\n\n@slack-payments-team",
    "options": {
      "thresholds": { "critical": 50, "warning": 25 },
      "enable_logs_sample": true
    }
  }'

Task D: Build Dashboards

# Create a dashboard via API — Service overview
curl -X POST "https://api.datadoghq.com/api/v1/dashboard" \
  -H "Content-Type: application/json" \
  -H "DD-API-KEY: ${DD_API_KEY}" \
  -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
  -d '{
    "title": "Web API Service Overview",
    "layout_type": "ordered",
    "widgets": [
      {
        "definition": {
          "type": "timeseries",
          "title": "Request Rate",
          "requests": [
            {
              "q": "sum:trace.http.request.hits{service:web-api,env:production}.as_count()",
              "display_type": "bars"
            }
          ]
        }
      },
      {
        "definition": {
          "type": "query_value",
          "title": "P99 Latency",
          "requests": [
            {
              "q": "p99:trace.http.request.duration{service:web-api,env:production}"
            }
          ],
          "precision": 2
        }
      },
      {
        "definition": {
          "type": "toplist",
          "title": "Top Endpoints by Error Rate",
          "requests": [
            {
              "q": "sum:trace.http.request.errors{service:web-api,env:production} by {resource_name}.as_count()"
            }
          ]
        }
      }
    ]
  }'

Task E: APM Instrumentation

# app.py — Python APM auto-instrumentation with ddtrace
from ddtrace import tracer, patch_all

# Patch all supported libraries (requests, flask, sqlalchemy, etc.)
patch_all()

tracer.configure(
    hostname="localhost",
    port=8126,
    service="payment-service",
    env="production",
    version="2.1.0",
)

from flask import Flask
app = Flask(__name__)

@app.route("/charge", methods=["POST"])
def charge():
    with tracer.trace("payment.process", service="payment-service") as span:
        span.set_tag("payment.provider", "stripe")
        result = process_payment()
        span.set_metric("payment.amount", result["amount"])
        return {"status": "ok"}
# Run with ddtrace auto-instrumentation
pip install ddtrace
ddtrace-run python app.py

Task F: Log Collection and Pipelines

# /etc/datadog-agent/conf.d/python.d/conf.yaml — Custom log collection
logs:
  - type: file
    path: /var/log/myapp/*.log
    service: web-api
    source: python
    tags:
      - env:production
    log_processing_rules:
      - type: multi_line
        name: python_traceback
        pattern: "Traceback \\(most recent call last\\)"
# Query logs via API — Find errors in last hour
curl -X POST "https://api.datadoghq.com/api/v2/logs/events/search" \
  -H "Content-Type: application/json" \
  -H "DD-API-KEY: ${DD_API_KEY}" \
  -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
  -d '{
    "filter": {
      "query": "service:web-api status:error",
      "from": "now-1h",
      "to": "now"
    },
    "sort": "-timestamp",
    "page": { "limit": 25 }
  }'

Best Practices

  • Use consistent tagging:
    env
    ,
    service
    ,
    team
    on all resources
  • Set
    notify_no_data
    on critical monitors to catch silent failures
  • Use composite monitors to reduce alert noise by correlating signals
  • Configure log exclusion filters to control ingestion costs
  • Use Unified Service Tagging (
    DD_ENV
    ,
    DD_SERVICE
    ,
    DD_VERSION
    ) across APM, logs, and metrics