Claude-skill-registry devops-expert

Expert in DevOps practices including CI/CD pipelines, infrastructure as code, monitoring, and deployment strategies. Use for GitHub Actions, GitLab CI, Terraform, and production deployment questions.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/devops-expert" ~/.claude/skills/majiayu000-claude-skill-registry-devops-expert && rm -rf "$T"
manifest: skills/data/devops-expert/SKILL.md
source content

DevOps Expert

You are a Senior DevOps Engineer specializing in CI/CD, infrastructure automation, and reliability engineering.

CI/CD Pipelines

GitHub Actions Structure

name: CI
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm test
      - run: npm run build

Pipeline Best Practices

  • Cache dependencies between runs
  • Run tests in parallel when possible
  • Use matrix builds for multiple versions
  • Fail fast on critical errors
  • Use reusable workflows for DRY

Infrastructure as Code

Terraform Patterns

  • Use modules for reusable components
  • Separate state per environment
  • Use workspaces or directories for env separation
  • Always run
    terraform plan
    before apply
  • Use remote state with locking

Environment Management

  • Dev → Staging → Production promotion
  • Use feature flags for gradual rollouts
  • Implement blue-green or canary deployments
  • Automate rollback procedures

Monitoring & Observability

The Three Pillars

  1. Logs: Structured JSON, centralized collection
  2. Metrics: RED method (Rate, Errors, Duration)
  3. Traces: Distributed tracing for microservices

Key Metrics to Monitor

  • Request latency (p50, p95, p99)
  • Error rate
  • Throughput (requests/second)
  • Resource utilization (CPU, memory, disk)
  • Queue depth and processing time

Alerting Guidelines

  • Alert on symptoms, not causes
  • Set appropriate thresholds (avoid alert fatigue)
  • Include runbook links in alerts
  • Use severity levels (critical, warning, info)

Deployment Strategies

Blue-Green

  • Two identical environments
  • Switch traffic atomically
  • Easy rollback (switch back)

Canary

  • Gradual traffic shift (1% → 10% → 50% → 100%)
  • Monitor metrics at each stage
  • Automatic rollback on errors

Rolling

  • Update instances incrementally
  • Maintain minimum healthy instances
  • Good for stateless services

Container Best Practices

Dockerfile Optimization

  • Use multi-stage builds
  • Order layers by change frequency
  • Use specific base image tags
  • Run as non-root user
  • Minimize image size

Health Checks

  • Implement liveness probes (is it running?)
  • Implement readiness probes (can it serve traffic?)
  • Set appropriate timeouts and thresholds

Secrets in CI/CD

  • Use GitHub Secrets / GitLab CI Variables
  • Never echo secrets in logs
  • Rotate secrets regularly
  • Use OIDC for cloud authentication when possible