Agent-almanac setup-service-mesh
git clone https://github.com/pjt222/agent-almanac
T=$(mktemp -d) && git clone --depth=1 https://github.com/pjt222/agent-almanac "$T" && mkdir -p ~/.claude/skills && cp -r "$T/i18n/caveman-ultra/skills/setup-service-mesh" ~/.claude/skills/pjt222-agent-almanac-setup-service-mesh-dfb0e0 && rm -rf "$T"
i18n/caveman-ultra/skills/setup-service-mesh/SKILL.mdSetup Service Mesh
Deploy and configure a service mesh for secure service-to-service communication and advanced traffic management.
When to Use
- Microservices architecture requires encrypted service-to-service communication
- Need fine-grained traffic control (canary deployments, A/B testing, traffic splitting)
- Require observability across all service interactions without application changes
- Enforce security policies (mTLS, authorization) at the infrastructure level
- Implement circuit breaking, retries, and timeouts consistently across services
- Need distributed tracing and service dependency mapping
Inputs
- Required: Kubernetes cluster with admin access
- Required: Choice of service mesh (Istio or Linkerd)
- Required: Namespace(s) to enable service mesh
- Optional: Monitoring stack (Prometheus, Grafana, Jaeger)
- Optional: Custom traffic management requirements
- Optional: Certificate authority configuration for mTLS
Procedure
See Extended Examples for complete configuration files and templates.
Step 1: Install Service Mesh Control Plane
Choose and install the service mesh control plane.
For Istio:
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.20.2 sh - istioctl install --set profile=production -y kubectl get pods -n istio-system
For Linkerd:
curl -sL https://run.linkerd.io/install | sh linkerd check --pre linkerd install --ha | kubectl apply -f - linkerd check
Create a service mesh configuration with resource limits and tracing:
# service-mesh-config.yaml (abbreviated) spec: profile: production meshConfig: enableTracing: true components: pilot: k8s: resources: { requests: { cpu: 500m, memory: 2Gi } } # See EXAMPLES.md Step 1 for complete configuration
Expected: Control plane pods running in istio-system (Istio) or linkerd (Linkerd) namespace.
istioctl version or linkerd version shows matching client and server versions.
On failure:
- Check cluster has sufficient resources (at least 4 CPU cores, 8GB RAM for production)
- Verify Kubernetes version compatibility (check mesh documentation)
- Review logs:
orkubectl logs -n istio-system -l app=istiodkubectl logs -n linkerd -l linkerd.io/control-plane-component=controller - Check for conflicting CRDs:
orkubectl get crd | grep istiokubectl get crd | grep linkerd
Step 2: Enable Automatic Sidecar Injection
Configure namespaces for automatic sidecar proxy injection.
For Istio:
# Label namespace for automatic injection kubectl label namespace default istio-injection=enabled kubectl get namespace -L istio-injection
For Linkerd:
# Annotate namespace for injection kubectl annotate namespace default linkerd.io/inject=enabled
Test sidecar injection with a sample deployment:
# test-deployment.yaml (abbreviated) apiVersion: apps/v1 kind: Deployment spec: replicas: 2 template: spec: containers: - name: app image: nginx:alpine # See EXAMPLES.md Step 2 for complete test deployment
Apply and verify:
kubectl apply -f test-deployment.yaml kubectl get pods -n default # Expect 2/2 containers (app + proxy)
Expected: New pods show 2/2 containers (application + sidecar proxy). Describe output shows istio-proxy or linkerd-proxy container. Logs show successful proxy startup.
On failure:
- Check namespace labels/annotations:
kubectl get ns default -o yaml - Verify mutating webhook is active:
kubectl get mutatingwebhookconfiguration - Review injection logs:
(Istio)kubectl logs -n istio-system -l app=sidecar-injector - Manually inject to test:
kubectl get deploy test-app -o yaml | istioctl kube-inject -f - | kubectl apply -f -
Step 3: Configure mTLS Policy
Enable mutual TLS for secure service-to-service communication.
For Istio:
# mtls-policy.yaml (abbreviated) apiVersion: security.istio.io/v1beta1 kind: PeerAuthentication metadata: name: default namespace: istio-system spec: mtls: mode: STRICT # See EXAMPLES.md Step 3 for per-namespace and permissive mode examples
For Linkerd:
# Linkerd enforces mTLS by default for meshed pods linkerd viz tap deploy/test-app -n default # Check for 🔒 (lock) symbol
Apply and verify:
kubectl apply -f mtls-policy.yaml # Istio: verify mTLS status istioctl authn tls-check $(kubectl get pod -n default -l app=test-app -o jsonpath='{.items[0].metadata.name}') -n default
Expected: All connections between meshed services show mTLS enabled. Istio
tls-check shows STATUS as "OK". Linkerd tap output shows 🔒 for all connections. Service logs show no TLS errors.
On failure:
- Check certificate issuance:
(cert-manager)kubectl get certificates -A - Verify CA is healthy:
kubectl logs -n istio-system -l app=istiod | grep -i cert - Test with PERMISSIVE mode first, then transition to STRICT
- Check for services without sidecars:
kubectl get pods --all-namespaces -o json | jq '.items[] | select(.spec.containers | length == 1) | .metadata.name'
Step 4: Implement Traffic Management Rules
Configure intelligent traffic routing, retries, and circuit breaking.
Create traffic management policies:
# traffic-management.yaml (abbreviated) apiVersion: networking.istio.io/v1beta1 kind: VirtualService spec: http: - match: - uri: { prefix: /api/v2 } route: - destination: { host: api-service, subset: v2 } weight: 10 - destination: { host: api-service, subset: v1 } weight: 90 retries: { attempts: 3, perTryTimeout: 2s } # See EXAMPLES.md Step 4 for complete routing, circuit breaker, and gateway configs
For Linkerd traffic splitting:
apiVersion: split.smi-spec.io/v1alpha2 kind: TrafficSplit spec: service: api-service backends: - service: api-service-v1 weight: 900 - service: api-service-v2 weight: 100
Apply and test:
kubectl apply -f traffic-management.yaml # Test traffic distribution for i in {1..100}; do curl -s http://api.example.com/api/v2 | grep version; done | sort | uniq -c # Monitor: istioctl dashboard kiali or linkerd viz dashboard
Expected: Traffic splits according to defined weights. Circuit breaker trips after consecutive errors. Retries occur for transient failures. Kiali/Linkerd dashboard shows traffic flow visualization.
On failure:
- Verify destination hosts resolve:
kubectl get svc -n production - Check subset labels match pod labels:
kubectl get pods -n production --show-labels - Review pilot logs:
kubectl logs -n istio-system -l app=istiod - Test without circuit breaker first, then add incrementally
- Use
to check configuration:istioctl analyzeistioctl analyze -n production
Step 5: Integrate Observability Stack
Connect service mesh telemetry to monitoring and tracing systems.
Install observability addons:
# Istio: Prometheus, Grafana, Kiali, Jaeger kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/prometheus.yaml kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/grafana.yaml kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/kiali.yaml kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/jaeger.yaml # Linkerd linkerd viz install | kubectl apply -f - linkerd jaeger install | kubectl apply -f -
Configure custom metrics and dashboards:
# service-monitor.yaml (abbreviated) apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: istio-mesh-metrics spec: selector: { matchLabels: { app: istiod } } endpoints: - port: http-monitoring interval: 30s # See EXAMPLES.md Step 5 for Grafana dashboards and telemetry config
Access dashboards:
istioctl dashboard grafana # or: linkerd viz dashboard istioctl dashboard kiali istioctl dashboard jaeger
Expected: Dashboards show service topology, request rates, latency percentiles, error rates. Distributed traces available in Jaeger. Prometheus scraping mesh metrics successfully. Custom metrics appear in queries.
On failure:
- Verify Prometheus scraping:
kubectl get servicemonitor -A - Check addon pods are running:
kubectl get pods -n istio-system - Review telemetry configuration:
istioctl proxy-config log <pod-name> -n <namespace> - Verify mesh config has telemetry enabled:
kubectl get configmap istio -n istio-system -o yaml | grep -A 5 enableTracing - Check for port conflicts if port-forward fails
Step 6: Validate and Monitor Mesh Health
Perform comprehensive health checks and set up ongoing monitoring.
# Istio validation istioctl analyze --all-namespaces istioctl verify-install istioctl proxy-status # Linkerd validation linkerd check linkerd viz check linkerd diagnostics policy # Check proxy sync status kubectl get pods -n production -o json | \ jq '.items[] | {name: .metadata.name, proxy: .status.containerStatuses[] | select(.name=="istio-proxy").ready}' # Monitor control plane health kubectl get pods -n istio-system -w kubectl top pods -n istio-system
Create health check script and alerts:
#!/bin/bash # mesh-health-check.sh (abbreviated) echo "=== Service Mesh Health Check ===" kubectl get pods -n istio-system istioctl analyze --all-namespaces # See EXAMPLES.md Step 6 for complete health check script and alert configs
Expected: All analysis checks pass with no warnings. Proxy-status shows all proxies synced. mTLS check confirms encryption. Metrics show traffic flowing. Control plane pods stable with low resource usage.
On failure:
- Address specific issues from
outputistioctl analyze - Check proxy logs for individual pods:
kubectl logs <pod> -c istio-proxy -n <namespace> - Verify network policies aren't blocking mesh traffic
- Review control plane logs for errors:
kubectl logs -n istio-system deploy/istiod --tail=100 - Restart problematic proxies:
kubectl rollout restart deploy/<deployment> -n <namespace>
Validation
- Control plane pods running and healthy (istiod/linkerd-controller)
- Sidecar proxies injected into all application pods (2/2 containers)
- mTLS enabled and functioning (verified with tls-check/tap)
- Traffic management rules routing requests correctly (verified with curl tests)
- Circuit breaker trips on repeated failures (tested with fault injection)
- Observability dashboards showing metrics (Grafana/Kiali/Linkerd Viz)
- Distributed traces captured in Jaeger for sample requests
- No configuration warnings from istioctl analyze/linkerd check
- Proxy sync status shows all proxies in sync
- Service-to-service communication encrypted (verified in logs/dashboards)
Common Pitfalls
-
Resource Exhaustion: Service mesh adds 100-200MB memory per pod for sidecars. Ensure cluster has sufficient capacity. Set appropriate resource limits in injection config.
-
Configuration Conflicts: Multiple VirtualServices for same host cause undefined behavior. Use single VirtualService per host with multiple match conditions instead.
-
Certificate Expiration: mTLS certificates auto-rotate but CA root must be managed. Monitor certificate expiry with:
and set up alerts.kubectl get certificate -A -
Sidecar Not Injected: Pods created before namespace labeling won't have sidecars. Must recreate:
.kubectl rollout restart deploy/<name> -n <namespace> -
DNS Resolution Issues: Service mesh intercepts DNS. Use fully qualified names (service.namespace.svc.cluster.local) for cross-namespace calls.
-
Port Naming Requirement: Istio requires named ports following protocol-name pattern (e.g., http-web, tcp-db). Unnamed ports default to TCP passthrough.
-
Gradual Rollout Required: Don't enable STRICT mTLS immediately in production. Use PERMISSIVE mode during migration, verify all services meshed, then switch to STRICT.
-
Observability Overhead: 100% tracing sampling causes performance issues. Use 1-10% for production:
in mesh config.sampling: 1.0 -
Gateway vs VirtualService Confusion: Gateway configures ingress (load balancer), VirtualService configures routing. Both required for external traffic.
-
Version Compatibility: Ensure mesh version compatible with Kubernetes version. Istio supports n-1 minor versions, Linkerd typically supports last 3 Kubernetes versions.
Related Skills
- Gateway configuration complements mesh ingressconfigure-ingress-networking
- Application deployment patterns that work with service meshdeploy-to-kubernetes
- Prometheus integration for mesh metricssetup-prometheus-monitoring
- Certificate management for mTLSmanage-kubernetes-secrets
- OPA policies that work alongside mesh authorizationenforce-policy-as-code