Skillshub gatus

Gatus — Lightweight Health Check Dashboard

install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/TerminalSkills/skills/gatus" ~/.claude/skills/comeonoliver-skillshub-gatus && rm -rf "$T"
manifest: skills/TerminalSkills/skills/gatus/SKILL.md
source content

Gatus — Lightweight Health Check Dashboard

Overview

Gatus, the lightweight, self-hosted health check and status page tool written in Go. Helps developers set up endpoint monitoring with conditions, alerting, and a beautiful status page — all configured via a single YAML file with no database required.

Instructions

Configuration

# config.yaml — Complete Gatus configuration
# Single YAML file defines all endpoints, conditions, and alerts.

# Storage (optional — for persistence across restarts)
storage:
  type: sqlite
  path: /data/gatus.db

# Web UI settings
web:
  port: 8080

# Alert providers
alerting:
  slack:
    webhook-url: "${SLACK_WEBHOOK_URL}"
    default-alert:
      enabled: true
      failure-threshold: 3             # Alert after 3 consecutive failures
      success-threshold: 2             # Recover after 2 consecutive successes
      send-on-resolved: true
  
  pagerduty:
    integration-key: "${PAGERDUTY_KEY}"
    default-alert:
      enabled: true
      failure-threshold: 5
      send-on-resolved: true

  email:
    from: "gatus@example.com"
    host: "smtp.example.com"
    port: 587
    username: "${SMTP_USER}"
    password: "${SMTP_PASS}"
    default-alert:
      enabled: false                    # Only enable on critical endpoints

# Endpoints to monitor
endpoints:
  # --- API Health ---
  - name: API Gateway
    group: backend
    url: "https://api.example.com/health"
    interval: 30s
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 2000"       # Under 2 seconds
      - "[BODY].status == healthy"     # JSON body check
    alerts:
      - type: slack
      - type: pagerduty

  - name: Auth Service
    group: backend
    url: "https://api.example.com/auth/health"
    interval: 30s
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 1000"

  # --- Frontend ---
  - name: Website
    group: frontend
    url: "https://example.com"
    interval: 60s
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
      - "[BODY] contains Welcome"      # Verify page renders
    alerts:
      - type: slack

  # --- Database ---
  - name: PostgreSQL
    group: infrastructure
    url: "tcp://db.example.com:5432"
    interval: 30s
    conditions:
      - "[CONNECTED] == true"
    alerts:
      - type: slack
      - type: pagerduty

  # --- Redis ---
  - name: Redis
    group: infrastructure
    url: "tcp://redis.example.com:6379"
    interval: 15s
    conditions:
      - "[CONNECTED] == true"

  # --- DNS ---
  - name: DNS Resolution
    group: infrastructure
    url: "dns://8.8.8.8"
    dns:
      query-name: "example.com"
      query-type: "A"
    conditions:
      - "[DNS_RCODE] == NOERROR"
      - "[RESPONSE_TIME] < 500"

  # --- SSL Certificate ---
  - name: SSL Certificate
    group: security
    url: "https://example.com"
    interval: 1h
    conditions:
      - "[CERTIFICATE_EXPIRATION] > 720h"    # Alert if < 30 days

  # --- External Dependencies ---
  - name: Stripe API
    group: external
    url: "https://api.stripe.com/v1"
    interval: 5m
    conditions:
      - "[STATUS] == 401"               # Unauthenticated is expected (API is up)
      - "[RESPONSE_TIME] < 3000"

  # --- GraphQL ---
  - name: GraphQL API
    group: backend
    url: "https://api.example.com/graphql"
    method: POST
    headers:
      Content-Type: application/json
    body: '{"query": "{ __typename }"}'
    interval: 30s
    conditions:
      - "[STATUS] == 200"
      - "[BODY].data.__typename == Query"

Deployment

# docker-compose.yml — Self-hosted Gatus
version: "3.8"
services:
  gatus:
    image: twinproduction/gatus:latest
    ports:
      - "8080:8080"
    volumes:
      - ./config.yaml:/config/config.yaml
      - gatus-data:/data
    environment:
      - SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL}
      - PAGERDUTY_KEY=${PAGERDUTY_KEY}
    restart: unless-stopped

volumes:
  gatus-data:
# Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gatus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gatus
  template:
    metadata:
      labels:
        app: gatus
    spec:
      containers:
        - name: gatus
          image: twinproduction/gatus:latest
          ports:
            - containerPort: 8080
          volumeMounts:
            - name: config
              mountPath: /config
          envFrom:
            - secretRef:
                name: gatus-secrets
      volumes:
        - name: config
          configMap:
            name: gatus-config

Installation

# Docker
docker run -p 8080:8080 -v $(pwd)/config.yaml:/config/config.yaml twinproduction/gatus

# Binary
go install github.com/TwiN/gatus/v5@latest

# Helm
helm repo add gatus https://twin.github.io/helm-charts
helm install gatus gatus/gatus

Examples

Example 1: Setting up Gatus for a microservices project

User request:

I have a Node.js API and a React frontend running in Docker. Set up Gatus for monitoring/deployment.

The agent creates the necessary configuration files based on patterns like

# config.yaml — Complete Gatus configuration
, sets up the integration with the existing Docker setup, configures appropriate defaults for a Node.js + React stack, and provides verification commands to confirm everything is working.

Example 2: Troubleshooting deployment issues

User request:

Gatus is showing errors in our deployment. Here are the logs: [error output]

The agent analyzes the error output, identifies the root cause by cross-referencing with common Gatus issues, applies the fix (updating configuration, adjusting resource limits, or correcting syntax), and verifies the resolution with appropriate health checks.

Guidelines

  1. Single YAML — Keep all monitoring in one config file; version control it alongside your infrastructure
  2. Group endpoints — Use groups (backend, frontend, infrastructure) for organized status page display
  3. Multiple conditions — Check status code AND response time AND body content; a 200 with wrong content is still broken
  4. Failure thresholds — Set failure-threshold to 2-3 to avoid false alarms from transient network blips
  5. Check dependencies — Monitor external services (Stripe, AWS) separately; know when the issue is upstream vs yours
  6. SSL monitoring — Check certificate expiration weekly; alert at 30 days to give time for renewal
  7. Lightweight deployment — Gatus uses ~15MB RAM; run it on any machine, even a Raspberry Pi
  8. SQLite for history — Enable SQLite storage for uptime history across restarts; no external database needed