Claude-skill-registry docker-production

Deploy Docker containers to production with monitoring, logging, and health checks

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/docker-production" ~/.claude/skills/majiayu000-claude-skill-registry-docker-production && rm -rf "$T"

manifest: skills/data/docker-production/SKILL.md

Docker Production Skill

Master production-grade Docker deployments with monitoring, logging, health checks, and resource management.

Purpose

Configure containers for production with proper observability, resource limits, and deployment strategies.

Parameters

Parameter	Type	Required	Default	Description
monitoring	enum	No	prometheus	prometheus/datadog
logging	enum	No	json-file	json-file/loki/elk
replicas	number	No	1	Number of replicas

Production Configuration

Health Checks

HEALTHCHECK --interval=30s --timeout=3s --retries=3 --start-period=60s \
  CMD curl -f http://localhost:3000/health || exit 1

# Compose health check
services:
  app:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

Resource Limits

services:
  app:
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3

Logging Configuration

services:
  app:
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
        labels: "app,environment"

Monitoring Stack

Prometheus + Grafana

services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - "8080:8080"

Prometheus Config

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'docker-containers'
    docker_sd_configs:
      - host: unix:///var/run/docker.sock

Deployment Strategies

Rolling Update (Zero Downtime)

deploy:
  update_config:
    parallelism: 1
    delay: 10s
    failure_action: rollback
    order: start-first
  rollback_config:
    parallelism: 1
    delay: 10s

Blue-Green

# Deploy new version
docker compose -p myapp-green up -d

# Switch traffic (update nginx/load balancer)
# Remove old version
docker compose -p myapp-blue down

Error Handling

Common Errors

Error	Cause	Solution
`unhealthy`	Health check failing	Check endpoint, increase start_period
`OOMKilled`	Memory exceeded	Increase limit or optimize
`restart loop`	App crash	Check logs, fix application

Recovery

Check logs:
```
docker logs --tail 100 <container>
```

Verify health:

docker inspect --format='{{.State.Health.Status}}'

Rollback if needed

Troubleshooting

Debug Checklist

Health check passing?
Resources sufficient?
```
docker stats
```
Logs showing errors?
Metrics collecting?

Diagnostics

# Resource usage
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

# Restart count
docker inspect --format='{{.RestartCount}}' <container>

# Recent events
docker events --filter 'container=<name>' --since 1h

Usage

Skill("docker-production")

Related Skills

docker-debugging
docker-ci-cd
docker-security