Claude-skill-registry deploying-monitoring-stacks
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/unknown-vasic-digital-superagent-2" ~/.claude/skills/majiayu000-claude-skill-registry-deploying-monitoring-stacks && rm -rf "$T"
manifest:
skills/development/unknown-vasic-digital-superagent-2/SKILL.mdsource content
Monitoring Stack Deployer
This skill provides automated assistance for monitoring stack deployer tasks.
Overview
Deploys monitoring stacks (Prometheus/Grafana/Datadog) including collectors, scraping config, dashboards, and alerting rules for production systems.
Prerequisites
Before using this skill, ensure:
- Target infrastructure is identified (Kubernetes, Docker, bare metal)
- Metric endpoints are accessible from monitoring platform
- Storage backend is configured for time-series data
- Alert notification channels are defined (email, Slack, PagerDuty)
- Resource requirements are calculated based on scale
Instructions
- Select Platform: Choose Prometheus/Grafana, Datadog, or hybrid approach
- Deploy Collectors: Install exporters and agents on monitored systems
- Configure Scraping: Define metric collection endpoints and intervals
- Set Up Storage: Configure retention policies and data compaction
- Create Dashboards: Build visualization panels for key metrics
- Define Alerts: Create alerting rules with appropriate thresholds
- Test Monitoring: Verify metrics flow and alert triggering
Output
Prometheus + Grafana (Kubernetes):
# {baseDir}/monitoring/prometheus.yaml ## Overview This skill provides automated assistance for the described functionality. ## Examples Example usage patterns will be demonstrated in context. apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config data: prometheus.yml: | global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true --- apiVersion: apps/v1 kind: Deployment metadata: name: prometheus spec: replicas: 1 template: spec: containers: - name: prometheus image: prom/prometheus:latest args: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.retention.time=30d' ports: - containerPort: 9090
Grafana Dashboard Configuration:
{ "dashboard": { "title": "Application Metrics", "panels": [ { "title": "CPU Usage", "type": "graph", "targets": [ { "expr": "rate(container_cpu_usage_seconds_total[5m])" } ] } ] } }
Error Handling
Metrics Not Appearing
- Error: "No data points"
- Solution: Verify scrape targets are accessible and returning metrics
High Cardinality
- Error: "Too many time series"
- Solution: Reduce label combinations or increase Prometheus resources
Alert Not Firing
- Error: "Alert condition met but no notification"
- Solution: Check Alertmanager configuration and notification channels
Dashboard Load Failure
- Error: "Failed to load dashboard"
- Solution: Verify Grafana datasource configuration and permissions
Examples
- "Deploy Prometheus + Grafana on Kubernetes and add alerts for high error rate and latency."
- "Install Datadog agents across hosts and configure a dashboard for CPU/memory saturation."
Resources
- Prometheus documentation: https://prometheus.io/docs/
- Grafana documentation: https://grafana.com/docs/
- Example dashboards in {baseDir}/monitoring-examples/