git clone https://github.com/Intense-Visions/harness-engineering
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/harness-containerization" ~/.claude/skills/intense-visions-harness-engineering-harness-containerization && rm -rf "$T"
agents/skills/claude-code/harness-containerization/SKILL.mdHarness Containerization
Dockerfile review, Kubernetes manifest validation, and container optimization. Smaller images, safer containers, correct orchestration.
When to Use
- When reviewing Dockerfiles for image size, security, and layer efficiency
- When auditing Kubernetes manifests, Helm charts, or docker-compose files
- On PRs that modify container configuration files
- NOT for CI/CD pipeline design (use harness-deployment)
- NOT for infrastructure provisioning (use harness-infrastructure-as-code)
- NOT for application-level security review (use harness-security-review)
Process
Phase 1: SCAN -- Discover Container Configuration
-
Locate container files. Search the project for container-related configuration:
,Dockerfile
(multi-target builds)Dockerfile.*
,docker-compose.yml
(override files)docker-compose.*.yml.dockerignore
,k8s/
,kubernetes/
directoriesmanifests/
,helm/
directoriescharts/
,skaffold.yaml
(dev tooling)tilt.json
-
Identify base images. Parse each Dockerfile for FROM directives:
- Record base image name, tag, and digest (if pinned)
- Flag images using
taglatest - Flag images from untrusted registries
- Note multi-stage build structure (builder vs. runtime stages)
-
Inventory Kubernetes resources. Parse manifest files and record:
- Resource types (Deployment, Service, ConfigMap, Secret, Ingress, HPA)
- Namespaces used
- Image references in pod specs
- Resource requests and limits
- Volume mounts and persistent volume claims
-
Detect Helm usage. If Helm charts exist:
- Parse
for version and dependenciesChart.yaml - Parse
for configurable parametersvalues.yaml - Identify template files and their output resource types
- Parse
-
Present scan summary:
Container Scan: Dockerfiles: 2 (app, worker) Compose files: 1 (docker-compose.yml + docker-compose.dev.yml) K8s manifests: 8 resources across 2 namespaces Helm charts: 1 (app chart with 3 subcharts) Base images: node:20-alpine, python:3.12-slim
Phase 2: ANALYZE -- Evaluate Best Practices
-
Analyze Dockerfile layer efficiency. Check each Dockerfile for:
- COPY/ADD placement relative to dependency installation (cache busting)
- Multi-stage builds separating build dependencies from runtime
- Layer count optimization (combining related RUN commands)
- Unnecessary files copied into the image (node_modules, .git, tests)
completeness.dockerignore
-
Check container security posture. Evaluate:
- Running as non-root user (USER directive present)
- No secrets in build args or environment variables
- Base image currency (is the tag reasonably current)
- HEALTHCHECK directive present
- Read-only filesystem where possible
- No privileged mode in compose or K8s specs
- Security contexts in Kubernetes pod specs (runAsNonRoot, readOnlyRootFilesystem)
-
Evaluate Kubernetes resource definitions. For each Deployment/StatefulSet:
- Resource requests and limits are set (CPU and memory)
- Liveness and readiness probes are configured
- Pod disruption budgets exist for production workloads
- Horizontal pod autoscaler is configured where appropriate
- Image pull policy is set (Always for mutable tags, IfNotPresent for digests)
-
Analyze docker-compose configuration. Check for:
- Service dependency ordering (depends_on with health checks)
- Volume mount correctness (host paths vs. named volumes)
- Network isolation between services
- Environment variable management (env_file vs. inline)
- Port mapping conflicts
-
Check image tag strategy. Verify:
- Production images use immutable tags (semver or digest)
- Development images use descriptive tags (branch name, commit SHA)
- No
tag in production manifestslatest - Registry URL is consistent across all references
Phase 3: OPTIMIZE -- Recommend Improvements
-
Recommend image size reduction. For each Dockerfile:
- Switch to minimal base images (alpine, distroless, scratch)
- Remove build-only dependencies in multi-stage builds
- Use
to exclude test files, docs, and dev configs.dockerignore - Estimate size savings for each recommendation
-
Recommend build performance improvements.
-
Reorder COPY directives to maximize layer cache hits
-
Use BuildKit features (cache mounts for package managers)
-
Split slow-changing layers (OS packages) from fast-changing layers (app code)
-
Example for Node.js:
# Good: dependency layer cached separately COPY package.json package-lock.json ./ RUN npm ci --production COPY src/ ./src/
-
-
Recommend Kubernetes improvements.
- Add missing resource limits with reasonable defaults
- Configure probes with appropriate initial delays and periods
- Add pod anti-affinity for high-availability workloads
- Recommend namespace isolation for multi-tenant clusters
- Add network policies to restrict pod-to-pod communication
-
Recommend security hardening.
- Add non-root USER directive with specific UID
- Add security context to Kubernetes pods
- Pin base images to digest for supply chain security
- Remove unnecessary capabilities (drop ALL, add only what is needed)
-
Generate optimization summary with estimated impact:
Optimization Summary: Image size: 850MB -> ~180MB (switch to alpine + multi-stage) Build time: ~4m -> ~2m (layer reordering + cache mounts) Security: 3 findings (non-root, capabilities, image pinning) K8s: 5 resources missing resource limits
Phase 4: VALIDATE -- Verify Configuration Correctness
-
Validate Dockerfile syntax. Run
or parse for common errors:docker build --check- Invalid instruction ordering (e.g., CMD before COPY)
- Missing required arguments
- Deprecated instructions (MAINTAINER)
- Shell form vs. exec form for CMD/ENTRYPOINT
-
Validate Kubernetes manifests. Check for:
- Valid YAML structure
- Required fields present (apiVersion, kind, metadata, spec)
- Label selectors match between Deployment and Service
- Port numbers are consistent across Service and container specs
- ConfigMap and Secret references resolve to existing resources
-
Validate Helm charts. If Helm is used:
passeshelm lint- Template rendering with default values produces valid manifests
- Values schema matches actual usage in templates
- Dependencies are declared and version-locked
-
Validate docker-compose. Check for:
- Valid YAML and compose file version
- All referenced images exist or have build contexts
- Port mappings do not conflict
- Named volumes are declared in the top-level volumes section
- Networks are declared before use
-
Generate validation report:
Container Validation: [PASS/WARN/FAIL] Dockerfiles: PASS (2/2 valid) K8s manifests: WARN (label mismatch in worker-service.yaml) Helm chart: PASS (lint clean) Compose: PASS (valid structure) Issues: 1. k8s/worker-service.yaml: selector "app: worker" does not match deployment label "app: worker-v2" -- requests will not route
Harness Integration
-- Primary invocation for container review.harness skill run harness-containerization
-- Run after configuration changes to verify project health.harness validate
-- Verify container tooling dependencies are available.harness check-deps
-- Present optimization recommendations and gather decisions.emit_interaction
Success Criteria
- All container configuration files in the project are discovered and cataloged
- Dockerfiles are analyzed for layer efficiency, security, and size
- Kubernetes manifests are validated for correctness and best practices
- Resource requests and limits are verified for all production workloads
- Image tag strategy is evaluated (no
in production)latest - Optimization recommendations include estimated impact
Examples
Example: Node.js Monorepo with Docker and Kubernetes
Phase 1: SCAN Found: Dockerfile (app), Dockerfile.worker, docker-compose.dev.yml K8s: 12 manifests in k8s/ (2 Deployments, 2 Services, 2 ConfigMaps, 2 HPA, 2 Ingress, 2 PDB) Base images: node:20 (not alpine), node:20 (worker) Phase 2: ANALYZE Dockerfile issues: - node:20 full image (940MB) -- use node:20-alpine (180MB) - No .dockerignore -- node_modules and .git copied into image - No USER directive -- running as root - No HEALTHCHECK K8s issues: - worker deployment missing memory limits - No network policies defined - Liveness probe on /healthz but no readiness probe Phase 3: OPTIMIZE 1. Switch to node:20-alpine -- saves ~760MB per image 2. Add .dockerignore with node_modules, .git, tests, docs 3. Add multi-stage build: builder stage for npm ci, runtime for app 4. Add USER node (UID 1000) after COPY 5. Add readiness probe on /ready endpoint 6. Add memory limit of 512Mi to worker deployment Phase 4: VALIDATE Dockerfiles: WARN (2 security findings, 1 size finding) K8s manifests: WARN (missing limits, missing readiness probe) Compose: PASS Result: WARN -- 6 actionable improvements identified
Example: Python FastAPI with Helm and Distroless
Phase 1: SCAN Found: Dockerfile (multi-stage with distroless runtime) Helm chart: charts/api/ with values.yaml Base images: python:3.12-slim (builder), gcr.io/distroless/python3 (runtime) Phase 2: ANALYZE Dockerfile: Well-structured multi-stage build - Builder installs dependencies, runtime copies only venv - Distroless base (no shell, minimal attack surface) - Non-root user configured Helm: - Resource limits set in values.yaml - Probes configured with appropriate timeouts - HPA configured for 2-10 replicas Phase 3: OPTIMIZE Minor recommendations only: - Pin distroless image to digest for reproducibility - Add --mount=type=cache for pip downloads in builder stage - Add pod anti-affinity to spread replicas across nodes Phase 4: VALIDATE Dockerfile: PASS Helm lint: PASS Template render: PASS (all values resolve) Result: PASS -- well-configured container setup
Rationalizations to Reject
| Rationalization | Reality |
|---|---|
"We use in production because we always want the most recent image" | is a mutable pointer. A deployment at 9am and a rollback at 9pm may pull different image contents with the same tag. Immutable tags (semver or digest) are required for reproducible deployments and reliable rollbacks. |
| "The container runs as root because it needs to bind to port 80" | Binding to a privileged port requires elevated capability, not full root access. The correct solution is to run the container as a non-root user and use , or to expose a high port and use a load balancer or ingress for port translation. |
| "We don't set resource limits because we want the container to use whatever it needs" | Containers without memory limits can exhaust node memory, triggering OOM kills of other containers on the same node. Kubernetes uses resource requests for scheduling and limits for safety; omitting limits transfers the risk of one container onto the entire node. |
| "Our image is 2GB but it only takes a few seconds to pull in our CI" | Image size multiplies across every developer pull, every CI run, and every Kubernetes pod startup. A 2GB image that takes 3 seconds to pull in CI with a warm cache takes 90 seconds on a cold node during an autoscaling event at peak traffic. |
| "We don't need liveness and readiness probes — the container exits if it crashes" | Process exit is a coarse health signal. A process that is running but deadlocked, stuck in an infinite retry loop, or unable to connect to its database will never exit. Kubernetes will continue routing traffic to it. Readiness probes prevent traffic routing to unhealthy containers that are still running. |
Gates
- No
tag in production manifests. Production Kubernetes manifests or compose files usinglatest
image tags are blocking findings. Immutable tags or digests are required.latest - No containers running as root in production. Missing USER directive in Dockerfiles or missing security context in K8s pods targeting production are blocking findings.
- No missing resource limits in production. Kubernetes Deployments without CPU and memory limits are blocking warnings for production namespaces.
- No invalid manifest references. Label selector mismatches between Services and Deployments, or ConfigMap/Secret references to nonexistent resources, are blocking errors.
Escalation
- When base images have known CVEs: Flag the specific CVEs and recommend upgrading to a patched version. If no patched version exists, recommend an alternative base image and document the migration path.
- When Kubernetes manifest complexity exceeds review scope: For clusters with 50+ resources, recommend focusing on changed resources only (
flag) and scheduling a full audit separately.--changed-only - When Helm chart dependencies are outdated: Report the version gap and recommend updating. If the update includes breaking changes, flag it as a decision point and present the changelog.
- When docker-compose is used for production: Flag this as an architectural concern. Docker Compose is appropriate for development but production workloads should use an orchestrator (Kubernetes, ECS, Cloud Run). Present migration options.