Awesome-omni-skill prometheus-monitoring
Set up Prometheus monitoring for applications with custom metrics, scraping configurations, and service discovery. Use when implementing time-series metrics collection, monitoring applications, or building observability infrastructure.
install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/devops/prometheus-monitoring" ~/.claude/skills/diegosouzapw-awesome-omni-skill-prometheus-monitoring && rm -rf "$T"
manifest:
skills/devops/prometheus-monitoring/SKILL.mdsource content
Prometheus Monitoring
Overview
Implement comprehensive Prometheus monitoring infrastructure for collecting, storing, and querying time-series metrics from applications and infrastructure.
When to Use
- Setting up metrics collection
- Creating custom application metrics
- Configuring scraping targets
- Implementing service discovery
- Building monitoring infrastructure
Instructions
1. Prometheus Configuration
# /etc/prometheus/prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s external_labels: cluster: production alerting: alertmanagers: - static_configs: - targets: ["localhost:9093"] rule_files: - "/etc/prometheus/alert_rules.yml" scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] - job_name: "node" static_configs: - targets: ["localhost:9100"] - job_name: "api-service" static_configs: - targets: ["localhost:8080/metrics"] scrape_interval: 10s - job_name: "kubernetes-pods" kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: "true" - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__
2. Node.js Metrics Implementation
// metrics.js const promClient = require("prom-client"); const register = new promClient.Registry(); promClient.collectDefaultMetrics({ register }); const httpRequestDuration = new promClient.Histogram({ name: "http_request_duration_seconds", help: "HTTP request duration", labelNames: ["method", "route", "status_code"], buckets: [0.1, 0.5, 1, 2, 5], registers: [register], }); const requestsTotal = new promClient.Counter({ name: "requests_total", help: "Total requests", labelNames: ["method", "route", "status_code"], registers: [register], }); // Express middleware const express = require("express"); const app = express(); app.get("/metrics", (req, res) => { res.set("Content-Type", register.contentType); res.end(register.metrics()); }); app.use((req, res, next) => { const start = Date.now(); res.on("finish", () => { const duration = (Date.now() - start) / 1000; httpRequestDuration .labels(req.method, req.path, res.statusCode) .observe(duration); requestsTotal.labels(req.method, req.path, res.statusCode).inc(); }); next(); }); module.exports = { register, httpRequestDuration, requestsTotal };
3. Python Prometheus Integration
from prometheus_client import Counter, Histogram, start_http_server from flask import Flask, request import time app = Flask(__name__) request_count = Counter('requests_total', 'Total requests', ['method', 'endpoint']) request_duration = Histogram('request_duration_seconds', 'Request duration', ['method', 'endpoint']) @app.before_request def before(): request.start_time = time.time() @app.after_request def after(response): duration = time.time() - request.start_time request_count.labels(request.method, request.path).inc() request_duration.labels(request.method, request.path).observe(duration) return response if __name__ == '__main__': start_http_server(8000) app.run(port=5000)
4. Alert Rules
# /etc/prometheus/alert_rules.yml groups: - name: application rules: - alert: HighErrorRate expr: rate(requests_total{status_code=~"5.."}[5m]) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate: {{ $value }}" - alert: HighLatency expr: histogram_quantile(0.95, request_duration_seconds) > 1 for: 10m labels: severity: warning annotations: summary: "p95 latency: {{ $value }}s" - alert: HighMemoryUsage expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.1 for: 5m labels: severity: warning annotations: summary: "Low memory: {{ $value }}"
5. Docker Compose Setup
version: "3.8" services: prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - ./alert_rules.yml:/etc/prometheus/alert_rules.yml - prometheus_data:/prometheus command: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus" - "--storage.tsdb.retention.time=30d" node-exporter: image: prom/node-exporter:latest ports: - "9100:9100" volumes: prometheus_data:
Best Practices
✅ DO
- Use consistent metric naming conventions
- Add comprehensive labels for filtering
- Set appropriate scrape intervals (10-60s)
- Implement retention policies
- Monitor Prometheus itself
- Test alert rules before deployment
- Document metric meanings
❌ DON'T
- Add unbounded cardinality labels
- Scrape too frequently (< 10s)
- Ignore metric naming conventions
- Create alerts without runbooks
- Store raw event data in Prometheus
- Use counters for gauge-like values
Key Prometheus Queries
rate(requests_total[5m]) # Request rate histogram_quantile(0.95, request_duration_seconds) # p95 latency rate(requests_total{status_code=~"5.."}[5m]) # Error rate