Claude-skill-registry DevOps Practices
Expertise in deployment automation, container orchestration, and infrastructure as code. Activates when working with "deploy", "kubernetes", "docker", "terraform", "helm", "k8s", "container", or cloud infrastructure.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/devops-practices" ~/.claude/skills/majiayu000-claude-skill-registry-devops-practices && rm -rf "$T"
skills/data/devops-practices/SKILL.mdDevOps Practices Skill
Overview
Apply modern DevOps practices for deployment automation, container orchestration, and infrastructure management across multi-cloud environments (Azure, AWS, GCP). This skill encompasses containerization strategies, Kubernetes orchestration, infrastructure as code (IaC), and CI/CD pipeline design using GitHub Actions and Harness.
Core Competencies
Container Strategy
Build Optimized Docker Images:
Create multi-stage Dockerfiles that minimize image size and maximize build cache efficiency:
# Development stage with full toolchain FROM node:20-alpine AS development WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . # Build stage FROM development AS build ENV NODE_ENV=production RUN npm run build && npm prune --production # Production stage with minimal footprint FROM node:20-alpine AS production WORKDIR /app COPY --from=build /app/dist ./dist COPY --from=build /app/node_modules ./node_modules COPY package*.json ./ USER node EXPOSE 3000 CMD ["node", "dist/main.js"]
Implement Security Best Practices:
- Use specific version tags, never
latest - Run containers as non-root user
- Scan images with Trivy or Snyk before deployment
- Minimize attack surface by using distroless or Alpine base images
- Set resource limits (CPU, memory) in all deployment manifests
Layer Optimization Strategy:
- Place frequently changing files (source code) in later layers
- Place dependency installation early to leverage cache
- Combine RUN commands to reduce layer count
- Use
to exclude unnecessary files.dockerignore
Kubernetes Orchestration
Design Deployment Manifests:
Create production-ready Kubernetes resources with proper resource management:
apiVersion: apps/v1 kind: Deployment metadata: name: api-service namespace: production labels: app: api-service version: v1.0.0 spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: api-service template: metadata: labels: app: api-service version: v1.0.0 spec: containers: - name: api image: ghcr.io/org/api-service:1.0.0 ports: - containerPort: 3000 name: http resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 5 periodSeconds: 5 env: - name: NODE_ENV value: "production" envFrom: - secretRef: name: api-secrets - configMapRef: name: api-config serviceAccountName: api-service-sa securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 1000
Implement Service Mesh Patterns:
Configure Ingress resources with proper routing, TLS, and rate limiting:
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: api-ingress annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod" nginx.ingress.kubernetes.io/rate-limit: "100" nginx.ingress.kubernetes.io/ssl-redirect: "true" spec: ingressClassName: nginx tls: - hosts: - api.example.com secretName: api-tls rules: - host: api.example.com http: paths: - path: / pathType: Prefix backend: service: name: api-service port: number: 80
Configure Horizontal Pod Autoscaling:
Implement HPA based on CPU, memory, or custom metrics:
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-service-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-service minReplicas: 3 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80
Helm Chart Development
Structure Helm Charts for Reusability:
Organize Helm charts with proper templating and value management:
deployment/helm/api-service/ ├── Chart.yaml ├── values.yaml ├── values-dev.yaml ├── values-staging.yaml ├── values-prod.yaml └── templates/ ├── deployment.yaml ├── service.yaml ├── ingress.yaml ├── configmap.yaml ├── secrets.yaml ├── hpa.yaml └── _helpers.tpl
Parameterize Configuration:
Use template functions for flexible deployments:
# values.yaml replicaCount: 3 image: repository: ghcr.io/org/api-service tag: "1.0.0" pullPolicy: IfNotPresent resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" autoscaling: enabled: true minReplicas: 3 maxReplicas: 10 targetCPUUtilizationPercentage: 70 ingress: enabled: true className: "nginx" annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod" hosts: - host: api.example.com paths: - path: / pathType: Prefix tls: - secretName: api-tls hosts: - api.example.com
Implement Helm Hooks for Lifecycle Management:
Use pre-install, post-upgrade hooks for database migrations and testing:
apiVersion: batch/v1 kind: Job metadata: name: db-migration annotations: "helm.sh/hook": pre-upgrade "helm.sh/hook-weight": "1" "helm.sh/hook-delete-policy": before-hook-creation spec: template: spec: containers: - name: migrate image: {{ .Values.image.repository }}:{{ .Values.image.tag }} command: ["npm", "run", "migrate"] restartPolicy: OnFailure
Infrastructure as Code
Terraform Module Design:
Create reusable Terraform modules for cloud resources:
# modules/aks-cluster/main.tf resource "azurerm_kubernetes_cluster" "main" { name = var.cluster_name location = var.location resource_group_name = var.resource_group_name dns_prefix = var.dns_prefix kubernetes_version = var.kubernetes_version default_node_pool { name = "default" node_count = var.node_count vm_size = var.vm_size enable_auto_scaling = true min_count = var.min_count max_count = var.max_count } identity { type = "SystemAssigned" } network_profile { network_plugin = "azure" load_balancer_sku = "standard" } tags = var.tags } # modules/aks-cluster/variables.tf variable "cluster_name" { type = string description = "Name of the AKS cluster" } variable "kubernetes_version" { type = string description = "Kubernetes version" default = "1.28.0" } # modules/aks-cluster/outputs.tf output "cluster_id" { value = azurerm_kubernetes_cluster.main.id } output "kube_config" { value = azurerm_kubernetes_cluster.main.kube_config_raw sensitive = true }
State Management Best Practices:
Configure remote state with state locking:
terraform { backend "azurerm" { resource_group_name = "terraform-state" storage_account_name = "tfstate" container_name = "tfstate" key = "production.terraform.tfstate" } required_providers { azurerm = { source = "hashicorp/azurerm" version = "~> 3.0" } } }
CI/CD Pipeline Design
GitHub Actions Workflow Structure:
Create comprehensive CI/CD pipelines with testing, building, and deployment stages:
name: CI/CD Pipeline on: push: branches: [main, develop] pull_request: branches: [main] env: REGISTRY: ghcr.io IMAGE_NAME: ${{ github.repository }} jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20' cache: 'npm' - name: Install dependencies run: npm ci - name: Run linters run: npm run lint - name: Run unit tests run: npm run test:unit - name: Run integration tests run: npm run test:integration - name: Upload coverage uses: codecov/codecov-action@v3 build: needs: test runs-on: ubuntu-latest permissions: contents: read packages: write steps: - uses: actions/checkout@v4 - name: Log in to registry uses: docker/login-action@v3 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract metadata id: meta uses: docker/metadata-action@v5 with: images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} tags: | type=ref,event=branch type=ref,event=pr type=semver,pattern={{version}} type=sha - name: Build and push uses: docker/build-push-action@v5 with: context: . push: true tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} cache-from: type=gha cache-to: type=gha,mode=max deploy: needs: build runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' steps: - uses: actions/checkout@v4 - name: Setup kubectl uses: azure/setup-kubectl@v3 - name: Setup Helm uses: azure/setup-helm@v3 - name: Azure Login uses: azure/login@v1 with: creds: ${{ secrets.AZURE_CREDENTIALS }} - name: Get AKS credentials run: | az aks get-credentials \ --resource-group ${{ secrets.RESOURCE_GROUP }} \ --name ${{ secrets.CLUSTER_NAME }} - name: Deploy with Helm run: | helm upgrade --install api-service \ ./deployment/helm/api-service \ --namespace production \ --create-namespace \ --values ./deployment/helm/api-service/values-prod.yaml \ --set image.tag=${{ github.sha }} \ --wait \ --timeout 5m
Harness Pipeline Configuration:
Structure Harness pipelines for enterprise-grade deployments:
pipeline: name: Production Deployment identifier: prod_deployment projectIdentifier: platform orgIdentifier: engineering tags: {} stages: - stage: name: Build and Test identifier: build_test type: CI spec: cloneCodebase: true execution: steps: - step: type: Run name: Run Tests identifier: run_tests spec: shell: Bash command: | npm ci npm run test npm run lint - step: type: BuildAndPushDockerRegistry name: Build and Push identifier: build_push spec: connectorRef: docker_registry repo: <+input> tags: - <+pipeline.sequenceId> - latest - stage: name: Deploy to Production identifier: deploy_prod type: Deployment spec: deploymentType: Kubernetes service: serviceRef: api_service environment: environmentRef: production infrastructureDefinitions: - identifier: prod_k8s execution: steps: - step: type: K8sRollingDeploy name: Rolling Deployment identifier: rolling_deploy spec: skipDryRun: false pruningEnabled: false - step: type: K8sBlueGreenDeploy name: Blue Green Deployment identifier: bg_deploy spec: skipDryRun: false pruningEnabled: false rollbackSteps: - step: type: K8sRollingRollback name: Rollback identifier: rollback
Multi-Cloud Strategies
Azure-Specific Patterns:
Leverage Azure-native services for container orchestration:
- Use Azure Container Registry (ACR) with geo-replication
- Implement Azure Key Vault integration for secrets
- Configure Azure Monitor for observability
- Use Azure DevOps or GitHub Actions for CI/CD
- Implement Azure Front Door for global load balancing
AWS-Specific Patterns:
Utilize AWS container services:
- Deploy to EKS with Fargate for serverless containers
- Use ECR for container registry
- Implement AWS Secrets Manager integration
- Configure CloudWatch for logging and metrics
- Use AWS Load Balancer Controller for ingress
GCP-Specific Patterns:
Leverage Google Cloud Platform capabilities:
- Deploy to GKE with Autopilot mode
- Use Artifact Registry for containers
- Implement Secret Manager integration
- Configure Cloud Monitoring and Logging
- Use Cloud Load Balancing for ingress
Deployment Best Practices
Zero-Downtime Deployments:
Implement rolling updates with proper health checks and graceful shutdown:
- Configure readiness probes to prevent traffic to unhealthy pods
- Set
to allow in-flight requests to completeterminationGracePeriodSeconds - Use
hooks for cleanup operationspreStop - Implement connection draining in load balancers
- Use PodDisruptionBudgets to maintain availability during updates
Blue-Green Deployment Strategy:
Maintain two identical production environments for instant rollback:
- Deploy new version to inactive environment (green)
- Run smoke tests against green environment
- Switch traffic from blue to green
- Monitor metrics and error rates
- Keep blue environment ready for instant rollback if needed
Canary Deployment Pattern:
Gradually roll out changes to a subset of users:
- Deploy new version to canary pods (10% traffic)
- Monitor key metrics (latency, errors, saturation)
- Gradually increase traffic to canary (25%, 50%, 75%)
- Promote to full deployment or rollback based on metrics
- Automate decision-making with service mesh (Istio, Linkerd)
Related Resources
- Workflow Automation Skill - For pipeline creation and process automation
- Performance Optimization Skill - For monitoring and metrics in deployed environments
- Integration Patterns Skill - For connecting deployed services
- GitHub Actions Documentation - https://docs.github.com/actions
- Helm Best Practices - https://helm.sh/docs/chart_best_practices/
- Kubernetes Production Patterns - https://kubernetes.io/docs/concepts/