Agent-almanac setup-service-mesh

install

source · Clone the upstream repo

git clone https://github.com/pjt222/agent-almanac

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/pjt222/agent-almanac "$T" && mkdir -p ~/.claude/skills && cp -r "$T/i18n/caveman-ultra/skills/setup-service-mesh" ~/.claude/skills/pjt222-agent-almanac-setup-service-mesh-dfb0e0 && rm -rf "$T"

manifest: i18n/caveman-ultra/skills/setup-service-mesh/SKILL.md

source content

Setup Service Mesh

Deploy and configure a service mesh for secure service-to-service communication and advanced traffic management.

When to Use

Microservices architecture requires encrypted service-to-service communication
Need fine-grained traffic control (canary deployments, A/B testing, traffic splitting)
Require observability across all service interactions without application changes
Enforce security policies (mTLS, authorization) at the infrastructure level
Implement circuit breaking, retries, and timeouts consistently across services
Need distributed tracing and service dependency mapping

Inputs

Required: Kubernetes cluster with admin access
Required: Choice of service mesh (Istio or Linkerd)
Required: Namespace(s) to enable service mesh
Optional: Monitoring stack (Prometheus, Grafana, Jaeger)
Optional: Custom traffic management requirements
Optional: Certificate authority configuration for mTLS

Procedure

See Extended Examples for complete configuration files and templates.

Step 1: Install Service Mesh Control Plane

Choose and install the service mesh control plane.

For Istio:

curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.20.2 sh -
istioctl install --set profile=production -y
kubectl get pods -n istio-system

For Linkerd:

curl -sL https://run.linkerd.io/install | sh
linkerd check --pre
linkerd install --ha | kubectl apply -f -
linkerd check

Create a service mesh configuration with resource limits and tracing:

# service-mesh-config.yaml (abbreviated)
spec:
  profile: production
  meshConfig:
    enableTracing: true
  components:
    pilot:
      k8s:
        resources: { requests: { cpu: 500m, memory: 2Gi } }
# See EXAMPLES.md Step 1 for complete configuration

Expected: Control plane pods running in istio-system (Istio) or linkerd (Linkerd) namespace.

istioctl version

linkerd version

shows matching client and server versions.

On failure:

Check cluster has sufficient resources (at least 4 CPU cores, 8GB RAM for production)
Verify Kubernetes version compatibility (check mesh documentation)

Review logs:

kubectl logs -n istio-system -l app=istiod

kubectl logs -n linkerd -l linkerd.io/control-plane-component=controller

Check for conflicting CRDs:

kubectl get crd | grep istio

kubectl get crd | grep linkerd

Step 2: Enable Automatic Sidecar Injection

Configure namespaces for automatic sidecar proxy injection.

For Istio:

# Label namespace for automatic injection
kubectl label namespace default istio-injection=enabled
kubectl get namespace -L istio-injection

For Linkerd:

# Annotate namespace for injection
kubectl annotate namespace default linkerd.io/inject=enabled

Test sidecar injection with a sample deployment:

# test-deployment.yaml (abbreviated)
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 2
  template:
    spec:
      containers:
      - name: app
        image: nginx:alpine
# See EXAMPLES.md Step 2 for complete test deployment

Apply and verify:

kubectl apply -f test-deployment.yaml
kubectl get pods -n default
# Expect 2/2 containers (app + proxy)

Expected: New pods show 2/2 containers (application + sidecar proxy). Describe output shows istio-proxy or linkerd-proxy container. Logs show successful proxy startup.

On failure:

Check namespace labels/annotations:
```
kubectl get ns default -o yaml
```

Verify mutating webhook is active:

kubectl get mutatingwebhookconfiguration

Review injection logs:

kubectl logs -n istio-system -l app=sidecar-injector

(Istio)

Manually inject to test:

kubectl get deploy test-app -o yaml | istioctl kube-inject -f - | kubectl apply -f -

Step 3: Configure mTLS Policy

Enable mutual TLS for secure service-to-service communication.

For Istio:

# mtls-policy.yaml (abbreviated)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT
# See EXAMPLES.md Step 3 for per-namespace and permissive mode examples

For Linkerd:

# Linkerd enforces mTLS by default for meshed pods
linkerd viz tap deploy/test-app -n default
# Check for 🔒 (lock) symbol

Apply and verify:

kubectl apply -f mtls-policy.yaml
# Istio: verify mTLS status
istioctl authn tls-check $(kubectl get pod -n default -l app=test-app -o jsonpath='{.items[0].metadata.name}') -n default

Expected: All connections between meshed services show mTLS enabled. Istio

tls-check

shows STATUS as "OK". Linkerd

tap

output shows 🔒 for all connections. Service logs show no TLS errors.

On failure:

Check certificate issuance:
```
kubectl get certificates -A
```
(cert-manager)

Verify CA is healthy:

kubectl logs -n istio-system -l app=istiod | grep -i cert

Test with PERMISSIVE mode first, then transition to STRICT

Check for services without sidecars:

kubectl get pods --all-namespaces -o json | jq '.items[] | select(.spec.containers | length == 1) | .metadata.name'

Step 4: Implement Traffic Management Rules

Configure intelligent traffic routing, retries, and circuit breaking.

Create traffic management policies:

# traffic-management.yaml (abbreviated)
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
spec:
  http:
  - match:
    - uri: { prefix: /api/v2 }
    route:
    - destination: { host: api-service, subset: v2 }
      weight: 10
    - destination: { host: api-service, subset: v1 }
      weight: 90
    retries: { attempts: 3, perTryTimeout: 2s }
# See EXAMPLES.md Step 4 for complete routing, circuit breaker, and gateway configs

For Linkerd traffic splitting:

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
spec:
  service: api-service
  backends:
  - service: api-service-v1
    weight: 900
  - service: api-service-v2
    weight: 100

Apply and test:

kubectl apply -f traffic-management.yaml
# Test traffic distribution
for i in {1..100}; do curl -s http://api.example.com/api/v2 | grep version; done | sort | uniq -c
# Monitor: istioctl dashboard kiali or linkerd viz dashboard

Expected: Traffic splits according to defined weights. Circuit breaker trips after consecutive errors. Retries occur for transient failures. Kiali/Linkerd dashboard shows traffic flow visualization.

On failure:

Verify destination hosts resolve:
```
kubectl get svc -n production
```

Check subset labels match pod labels:

kubectl get pods -n production --show-labels

Review pilot logs:

kubectl logs -n istio-system -l app=istiod

Test without circuit breaker first, then add incrementally

Use

istioctl analyze

to check configuration:

istioctl analyze -n production

Step 5: Integrate Observability Stack

Connect service mesh telemetry to monitoring and tracing systems.

Install observability addons:

# Istio: Prometheus, Grafana, Kiali, Jaeger
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/prometheus.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/grafana.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/kiali.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/jaeger.yaml

# Linkerd
linkerd viz install | kubectl apply -f -
linkerd jaeger install | kubectl apply -f -

Configure custom metrics and dashboards:

# service-monitor.yaml (abbreviated)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: istio-mesh-metrics
spec:
  selector: { matchLabels: { app: istiod } }
  endpoints:
  - port: http-monitoring
    interval: 30s
# See EXAMPLES.md Step 5 for Grafana dashboards and telemetry config

Access dashboards:

istioctl dashboard grafana  # or: linkerd viz dashboard
istioctl dashboard kiali
istioctl dashboard jaeger

Expected: Dashboards show service topology, request rates, latency percentiles, error rates. Distributed traces available in Jaeger. Prometheus scraping mesh metrics successfully. Custom metrics appear in queries.

On failure:

Verify Prometheus scraping:
```
kubectl get servicemonitor -A
```
Check addon pods are running:
```
kubectl get pods -n istio-system
```

Review telemetry configuration:

istioctl proxy-config log <pod-name> -n <namespace>

Verify mesh config has telemetry enabled:

kubectl get configmap istio -n istio-system -o yaml | grep -A 5 enableTracing

Check for port conflicts if port-forward fails

Step 6: Validate and Monitor Mesh Health

Perform comprehensive health checks and set up ongoing monitoring.

# Istio validation
istioctl analyze --all-namespaces
istioctl verify-install
istioctl proxy-status

# Linkerd validation
linkerd check
linkerd viz check
linkerd diagnostics policy

# Check proxy sync status
kubectl get pods -n production -o json | \
  jq '.items[] | {name: .metadata.name, proxy: .status.containerStatuses[] | select(.name=="istio-proxy").ready}'

# Monitor control plane health
kubectl get pods -n istio-system -w
kubectl top pods -n istio-system

Create health check script and alerts:

#!/bin/bash
# mesh-health-check.sh (abbreviated)
echo "=== Service Mesh Health Check ==="
kubectl get pods -n istio-system
istioctl analyze --all-namespaces
# See EXAMPLES.md Step 6 for complete health check script and alert configs

Expected: All analysis checks pass with no warnings. Proxy-status shows all proxies synced. mTLS check confirms encryption. Metrics show traffic flowing. Control plane pods stable with low resource usage.

On failure:

Address specific issues from
```
istioctl analyze
```
output

Check proxy logs for individual pods:

kubectl logs <pod> -c istio-proxy -n <namespace>

Verify network policies aren't blocking mesh traffic

Review control plane logs for errors:

kubectl logs -n istio-system deploy/istiod --tail=100

Restart problematic proxies:

kubectl rollout restart deploy/<deployment> -n <namespace>

Validation

Common Pitfalls

Resource Exhaustion: Service mesh adds 100-200MB memory per pod for sidecars. Ensure cluster has sufficient capacity. Set appropriate resource limits in injection config.
Configuration Conflicts: Multiple VirtualServices for same host cause undefined behavior. Use single VirtualService per host with multiple match conditions instead.
Certificate Expiration: mTLS certificates auto-rotate but CA root must be managed. Monitor certificate expiry with:
```
kubectl get certificate -A
```
and set up alerts.
Sidecar Not Injected: Pods created before namespace labeling won't have sidecars. Must recreate:
```
kubectl rollout restart deploy/<name> -n <namespace>
```
.
DNS Resolution Issues: Service mesh intercepts DNS. Use fully qualified names (service.namespace.svc.cluster.local) for cross-namespace calls.
Port Naming Requirement: Istio requires named ports following protocol-name pattern (e.g., http-web, tcp-db). Unnamed ports default to TCP passthrough.
Gradual Rollout Required: Don't enable STRICT mTLS immediately in production. Use PERMISSIVE mode during migration, verify all services meshed, then switch to STRICT.
Observability Overhead: 100% tracing sampling causes performance issues. Use 1-10% for production:
```
sampling: 1.0
```
in mesh config.
Gateway vs VirtualService Confusion: Gateway configures ingress (load balancer), VirtualService configures routing. Both required for external traffic.
Version Compatibility: Ensure mesh version compatible with Kubernetes version. Istio supports n-1 minor versions, Linkerd typically supports last 3 Kubernetes versions.

Related Skills

```
configure-ingress-networking
```
- Gateway configuration complements mesh ingress
```
deploy-to-kubernetes
```
- Application deployment patterns that work with service mesh
```
setup-prometheus-monitoring
```
- Prometheus integration for mesh metrics
```
manage-kubernetes-secrets
```
- Certificate management for mTLS
```
enforce-policy-as-code
```
- OPA policies that work alongside mesh authorization