Gsd-skill-creator kubernetes-patterns

Provides Kubernetes resource management, Helm chart patterns, service mesh configuration, and autoscaling strategies. Covers HPA, VPA, KEDA, operators, security contexts, and namespace isolation. Use when user mentions 'kubernetes', 'k8s', 'helm', 'istio', 'linkerd', 'service mesh', 'HPA', 'VPA', 'KEDA', 'pod security', 'resource quotas', 'operators'.

install
source · Clone the upstream repo
git clone https://github.com/Tibsfox/gsd-skill-creator
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Tibsfox/gsd-skill-creator "$T" && mkdir -p ~/.claude/skills && cp -r "$T/examples/skills/patterns/kubernetes-patterns" ~/.claude/skills/tibsfox-gsd-skill-creator-kubernetes-patterns && rm -rf "$T"
manifest: examples/skills/patterns/kubernetes-patterns/SKILL.md
source content

Kubernetes Patterns

Best practices for deploying, scaling, securing, and managing workloads on Kubernetes. This skill covers resource management, Helm chart structure, service mesh configuration, autoscaling strategies, and security hardening.

Resource Management

Every container must declare resource requests and limits. Without them, the scheduler cannot make informed placement decisions and nodes can become overcommitted.

Resource TypeRequest (Guaranteed)Limit (Maximum)What Happens at Limit
CPUReserved on nodeThrottled (not killed)Container slows down
MemoryReserved on nodeOOM-killedContainer restarts
Ephemeral StorageReserved on nodeEvictedPod removed from node
GPUReserved on nodeHard limitCannot exceed

QoS Classes

Kubernetes assigns QoS classes based on resource declarations. This determines eviction priority.

QoS ClassConditionEviction Priority
Guaranteedrequests == limits for all containersLast (highest priority)
Burstablerequests < limits for at least one containerMiddle
BestEffortNo requests or limits setFirst (lowest priority)

Resource Declaration Best Practices

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
        version: v2.1.0
    spec:
      # Topology spread for high availability
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api-server
      containers:
        - name: api
          image: ghcr.io/our-org/api@sha256:a1b2c3d4e5f6
          ports:
            - containerPort: 8080
              protocol: TCP
          resources:
            requests:
              cpu: 250m        # 0.25 cores -- baseline
              memory: 256Mi    # baseline memory
            limits:
              cpu: "1"         # burst to 1 core
              memory: 512Mi    # hard cap prevents OOM cascade
          # Probes are essential for rolling updates
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20
            failureThreshold: 3
          startupProbe:
            httpGet:
              path: /healthz
              port: 8080
            failureThreshold: 30
            periodSeconds: 2

Namespace Isolation Strategies

Namespaces provide logical boundaries. Combine with NetworkPolicies and RBAC for true isolation.

StrategyIsolation LevelUse Case
Per-teamMediumSmall org, shared cluster
Per-environmentMediumDev/staging/prod in one cluster
Per-applicationHighMicroservices with strict boundaries
Per-tenantHighestMulti-tenant SaaS

Resource Quotas and Limit Ranges

# ResourceQuota: caps total resource consumption per namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"
    services: "20"
    persistentvolumeclaims: "10"
    secrets: "30"
    configmaps: "30"
---
# LimitRange: sets defaults and bounds per container
apiVersion: v1
kind: LimitRange
metadata:
  name: team-alpha-limits
  namespace: team-alpha
spec:
  limits:
    - type: Container
      default:
        cpu: 500m
        memory: 256Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      min:
        cpu: 50m
        memory: 64Mi
      max:
        cpu: "4"
        memory: 4Gi
    - type: PersistentVolumeClaim
      min:
        storage: 1Gi
      max:
        storage: 50Gi

Network Policy for Namespace Isolation

# Default deny all ingress and egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: team-alpha
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
---
# Allow only within namespace + DNS
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-intra-namespace
  namespace: team-alpha
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector: {}
  egress:
    - to:
        - podSelector: {}
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

Helm Chart Structure

Helm charts package Kubernetes manifests with templating and dependency management.

Standard Chart Layout

my-app/
  Chart.yaml              # Chart metadata, version, dependencies
  Chart.lock              # Locked dependency versions
  values.yaml             # Default configuration values
  values-staging.yaml     # Environment-specific overrides
  values-production.yaml  # Environment-specific overrides
  templates/
    _helpers.tpl          # Template helper functions
    deployment.yaml       # Deployment manifest
    service.yaml          # Service manifest
    ingress.yaml          # Ingress manifest
    hpa.yaml              # HorizontalPodAutoscaler
    configmap.yaml        # ConfigMap
    secret.yaml           # Secret (sealed or external)
    serviceaccount.yaml   # ServiceAccount
    networkpolicy.yaml    # NetworkPolicy
    pdb.yaml              # PodDisruptionBudget
    tests/
      test-connection.yaml  # Helm test hooks
  charts/                 # Dependency charts (vendored)

Chart.yaml Best Practices

apiVersion: v2
name: my-app
description: A Helm chart for the My App API service
type: application
version: 1.4.0        # Chart version (bump on chart changes)
appVersion: "2.1.0"   # Application version (bump on app changes)

dependencies:
  - name: postgresql
    version: "13.x"
    repository: https://charts.bitnami.com/bitnami
    condition: postgresql.enabled
  - name: redis
    version: "18.x"
    repository: https://charts.bitnami.com/bitnami
    condition: redis.enabled

maintainers:
  - name: Platform Team
    email: platform@company.com

Helm Template with Guards

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "my-app.fullname" . }}
  labels:
    {{- include "my-app.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "my-app.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      annotations:
        # Force rollout on config changes
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
      labels:
        {{- include "my-app.selectorLabels" . | nindent 8 }}
    spec:
      serviceAccountName: {{ include "my-app.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      containers:
        - name: {{ .Chart.Name }}
          securityContext:
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: {{ .Values.service.targetPort }}
              protocol: TCP
          {{- with .Values.resources }}
          resources:
            {{- toYaml . | nindent 12 }}
          {{- end }}
          {{- with .Values.env }}
          env:
            {{- toYaml . | nindent 12 }}
          {{- end }}

Service Mesh: Istio Configuration

Service meshes handle traffic management, security, and observability at the infrastructure layer.

Istio vs Linkerd Comparison

AspectIstioLinkerd
ComplexityHigh (many CRDs, control plane components)Low (minimal, opinionated)
Resource Overhead~100MB per sidecar~25MB per sidecar
mTLSConfigurable (permissive/strict)On by default
Traffic ManagementVery flexible (VirtualService, DestinationRule)Basic (TrafficSplit, ServiceProfile)
Multi-clusterBuilt-inSupported with multicluster extension
Learning CurveSteepGentle
Best ForComplex routing, advanced policiesSimple mTLS + observability

Istio VirtualService: Canary with Header Routing

apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: api-server
  namespace: production
spec:
  hosts:
    - api-server
    - api.company.com
  gateways:
    - mesh                    # In-mesh traffic
    - api-gateway             # External traffic
  http:
    # Route internal testers to canary via header
    - match:
        - headers:
            x-canary:
              exact: "true"
      route:
        - destination:
            host: api-server
            subset: canary
          weight: 100

    # Weighted canary for production traffic
    - route:
        - destination:
            host: api-server
            subset: stable
          weight: 90
        - destination:
            host: api-server
            subset: canary
          weight: 10
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: 5xx,reset,connect-failure
      timeout: 10s

---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: api-server
  namespace: production
spec:
  host: api-server
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        maxRequestsPerConnection: 1000
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
  subsets:
    - name: stable
      labels:
        version: v2.0.0
    - name: canary
      labels:
        version: v2.1.0

Autoscaling Strategies

HPA with Custom Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 50
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 100            # Double capacity per minute
          periodSeconds: 60
        - type: Pods
          value: 5              # Or add 5 pods, whichever is higher
          periodSeconds: 60
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 25             # Remove 25% per 2 minutes
          periodSeconds: 120
      selectPolicy: Min
  metrics:
    # CPU-based scaling
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

    # Memory-based scaling
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

    # Custom metric: requests per second from Prometheus
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

KEDA ScaledObject: Event-Driven Autoscaling

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor
  namespace: production
spec:
  scaleTargetRef:
    name: order-processor
  pollingInterval: 15          # Check triggers every 15s
  cooldownPeriod: 60           # Wait 60s after last trigger before scale-down
  minReplicaCount: 1           # Minimum replicas (0 for scale-to-zero)
  maxReplicaCount: 100
  fallback:
    failureThreshold: 3
    replicas: 5                # Fallback if scaler fails
  triggers:
    # Scale based on Kafka consumer lag
    - type: kafka
      metadata:
        bootstrapServers: kafka.production:9092
        consumerGroup: order-processor
        topic: orders
        lagThreshold: "50"     # Scale up when lag > 50 per partition

    # Scale based on RabbitMQ queue depth
    - type: rabbitmq
      metadata:
        host: amqp://rabbitmq.production:5672
        queueName: order-queue
        queueLength: "100"

    # Scale based on Prometheus metric
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring:9090
        query: |
          sum(rate(http_requests_total{service="order-processor"}[2m]))
        threshold: "500"
---
# Scale-to-zero for batch jobs
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: report-generator
  namespace: batch
spec:
  scaleTargetRef:
    name: report-generator
  minReplicaCount: 0           # Scale to zero when idle
  maxReplicaCount: 10
  triggers:
    - type: cron
      metadata:
        timezone: America/New_York
        start: 0 2 * * *       # Scale up at 2 AM
        end: 0 6 * * *         # Scale down at 6 AM
        desiredReplicas: "5"

Autoscaling Strategy Comparison

StrategyScales OnScale-to-ZeroLatencyBest For
HPA (CPU/Memory)Resource utilizationNoSecondsSteady traffic patterns
HPA (Custom)Application metricsNoSecondsAPI servers, web apps
VPAHistorical usageNoPod restartRight-sizing resources
KEDAExternal eventsYesSecondsEvent-driven workloads
Cluster AutoscalerNode pressureNoMinutesNode pool management
KarpenterPod scheduling needsNoSecondsFast, flexible node scaling

Pod Security Best Practices

Security Context Configuration

apiVersion: v1
kind: Pod
metadata:
  name: secure-app
  namespace: production
spec:
  # Pod-level security context
  securityContext:
    runAsNonRoot: true
    runAsUser: 10001
    runAsGroup: 10001
    fsGroup: 10001
    seccompProfile:
      type: RuntimeDefault
  serviceAccountName: app-service-account
  automountServiceAccountToken: false    # Disable unless needed
  containers:
    - name: app
      image: ghcr.io/our-org/app@sha256:abc123
      # Container-level security context
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop:
            - ALL
          # Only add specific capabilities if absolutely needed
          # add:
          #   - NET_BIND_SERVICE
      volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
  volumes:
    # Writable dirs for read-only root filesystem
    - name: tmp
      emptyDir:
        sizeLimit: 100Mi
    - name: cache
      emptyDir:
        sizeLimit: 500Mi

Pod Security Standards (PSS)

LevelDescriptionKey Restrictions
PrivilegedUnrestrictedNone (cluster admin workloads)
BaselineMinimally restrictiveNo hostNetwork, hostPID, hostIPC, privileged containers
RestrictedHeavily restrictedrunAsNonRoot, drop ALL capabilities, readOnlyRootFilesystem, seccomp
# Enforce restricted standard on namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: latest
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Operator Pattern

Operators extend Kubernetes with domain-specific controllers that encode operational knowledge.

When to Build an Operator

Use OperatorDon't Use Operator
Stateful applications (databases, caches)Stateless apps (use Deployment)
Complex lifecycle managementSimple CRUD workloads
Custom scaling logicStandard HPA is sufficient
Automated backup/restoreManual operations are fine
Multi-step provisioningSingle manifest applies cleanly

Operator Maturity Model

LevelCapabilityExample
1 - Basic InstallAutomated install, lifecycle hooksHelm chart with operator
2 - Seamless UpgradesPatch and minor version upgradesRolling update strategy
3 - Full LifecycleBackup, restore, failure recoveryAutomated database failover
4 - Deep InsightsMetrics, alerts, log processingCustom Prometheus exporters
5 - Auto PilotAuto-scaling, tuning, anomaly detectionSelf-healing database cluster

Anti-Patterns

Anti-PatternProblemFix
No resource requests/limitsNode overcommit, OOM kills, unpredictable schedulingSet requests and limits on every container
latest
image tag
Non-reproducible deployments, silent breakageUse immutable tags or
@sha256:
digest
Running as rootContainer escape leads to host compromise
runAsNonRoot: true
,
runAsUser: 10001
No readiness probeTraffic sent to unready pods, user-facing errorsAlways define readinessProbe with appropriate thresholds
No PodDisruptionBudgetCluster upgrades kill all replicas simultaneouslySet PDB with
minAvailable
or
maxUnavailable
Single replica in productionAny disruption causes downtimeMinimum 3 replicas with topology spread
Hardcoded config in imagesRebuilds needed for config changesUse ConfigMaps, Secrets, environment variables
ClusterRole for app workloadsExcessive permissions across all namespacesNamespace-scoped Roles with least privilege
No NetworkPolicyAll pods can talk to all pods (flat network)Default-deny with explicit allow rules
Helm values in CI pipelineConfig scattered, hard to audit
values-{env}.yaml
files in Git
kubectl apply
in production
No rollback tracking, no drift detectionGitOps with Argo CD or Flux
Ignoring pod topology spreadAll replicas on same node/zone
topologySpreadConstraints
for HA
No seccomp profileContainers can use any syscall
seccompProfile.type: RuntimeDefault
Mounting service account tokensCompromised pod can access API server
automountServiceAccountToken: false
unless needed

Kubernetes Security Checklist

  • All containers define resource requests and limits
  • Images pinned by digest (
    @sha256:
    ) not mutable tags
  • runAsNonRoot: true
    on all pods
  • allowPrivilegeEscalation: false
    on all containers
  • readOnlyRootFilesystem: true
    with explicit writable mounts
  • All capabilities dropped (
    drop: [ALL]
    ), add back only as needed
  • seccompProfile.type: RuntimeDefault
    on all pods
  • automountServiceAccountToken: false
    unless API access is needed
  • NetworkPolicies enforce default-deny with explicit allow rules
  • Pod Security Standards enforced at namespace level (
    restricted
    )
  • RBAC uses namespace-scoped Roles (not ClusterRoles) for workloads
  • Secrets encrypted at rest (EncryptionConfiguration or KMS provider)
  • PodDisruptionBudgets defined for all production workloads
  • Topology spread constraints distribute pods across zones
  • Readiness, liveness, and startup probes configured on all containers
  • Helm charts use
    values-{env}.yaml
    per environment, reviewed in PRs
  • Image pull policies set to
    IfNotPresent
    for tagged,
    Always
    for
    latest
  • Service mesh mTLS enabled for inter-service communication
  • Audit logging enabled on API server with appropriate retention
  • Cluster upgrades tested in staging before production rollout