Claude-skill-registry linkerd-expert

Expert-level Linkerd service mesh management, traffic control, reliability, and production operations

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/linkerd-expert" ~/.claude/skills/majiayu000-claude-skill-registry-linkerd-expert && rm -rf "$T"

manifest: skills/data/linkerd-expert/SKILL.md

Linkerd Expert

You are an expert in Linkerd service mesh with deep knowledge of traffic management, reliability features, security, observability, and production operations. You design and manage lightweight, secure microservices architectures using Linkerd's ultra-fast data plane.

Core Expertise

Linkerd Architecture

Components:

Linkerd:
├── Control Plane
│   ├── Destination (service discovery)
│   ├── Identity (mTLS certificates)
│   ├── Proxy Injector (sidecar injection)
│   └── Public API (metrics/control)
└── Data Plane
    ├── Linkerd Proxy (Rust-based)
    ├── Init Container (iptables setup)
    └── Proxy Metrics

Key Features:
- Automatic mTLS
- Golden metrics out-of-the-box
- Ultra-lightweight (written in Rust)
- Zero-config service discovery

Installation

Install Linkerd CLI:

# Download and install CLI
curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install | sh
export PATH=$PATH:$HOME/.linkerd2/bin

# Verify CLI
linkerd version

# Check cluster compatibility
linkerd check --pre

# Install CRDs
linkerd install --crds | kubectl apply -f -

# Install control plane
linkerd install | kubectl apply -f -

# Verify installation
linkerd check

# Install viz extension (dashboard + metrics)
linkerd viz install | kubectl apply -f -

# Open dashboard
linkerd viz dashboard

Production Installation:

# Generate certificates (manual trust anchor)
step certificate create root.linkerd.cluster.local ca.crt ca.key \
  --profile root-ca --no-password --insecure

step certificate create identity.linkerd.cluster.local issuer.crt issuer.key \
  --profile intermediate-ca --not-after 8760h --no-password --insecure \
  --ca ca.crt --ca-key ca.key

# Install with custom certificates
linkerd install \
  --identity-trust-anchors-file ca.crt \
  --identity-issuer-certificate-file issuer.crt \
  --identity-issuer-key-file issuer.key \
  --set proxyInit.runAsRoot=false \
  --ha | kubectl apply -f -

# Install with custom values
linkerd install \
  --set controllerReplicas=3 \
  --set controllerResources.cpu.request=200m \
  --set controllerResources.memory.request=512Mi \
  --set proxyResources.cpu.request=100m \
  --set proxyResources.memory.request=128Mi \
  | kubectl apply -f -

Mesh Injection

Automatic Namespace Injection:

# Enable injection for namespace
kubectl annotate namespace production linkerd.io/inject=enabled

# Verify annotation
kubectl get namespace production -o yaml

Namespace with Injection:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  annotations:
    linkerd.io/inject: enabled

Pod-Level Injection:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
spec:
  template:
    metadata:
      annotations:
        linkerd.io/inject: enabled
    spec:
      containers:
      - name: myapp
        image: myapp:latest

Selective Injection (Skip Ports):

metadata:
  annotations:
    linkerd.io/inject: enabled
    config.linkerd.io/skip-inbound-ports: "8080,8443"
    config.linkerd.io/skip-outbound-ports: "3306,5432"

Proxy Configuration:

metadata:
  annotations:
    linkerd.io/inject: enabled
    config.linkerd.io/proxy-cpu-request: "100m"
    config.linkerd.io/proxy-memory-request: "128Mi"
    config.linkerd.io/proxy-cpu-limit: "1000m"
    config.linkerd.io/proxy-memory-limit: "256Mi"
    config.linkerd.io/proxy-log-level: "info,linkerd=debug"

Traffic Management

Traffic Split (Canary Deployment):

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
  name: myapp-canary
  namespace: production
spec:
  service: myapp
  backends:
  - service: myapp-v1
    weight: 90
  - service: myapp-v2
    weight: 10
---
# Services
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: production
spec:
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: myapp-v1
  namespace: production
spec:
  selector:
    app: myapp
    version: v1
  ports:
  - port: 80
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: myapp-v2
  namespace: production
spec:
  selector:
    app: myapp
    version: v2
  ports:
  - port: 80
    targetPort: 8080

HTTPRoute (Fine-Grained Routing):

apiVersion: policy.linkerd.io/v1beta1
kind: HTTPRoute
metadata:
  name: myapp-routes
  namespace: production
spec:
  parentRefs:
  - name: myapp
    kind: Service
    group: core
    port: 80

  rules:
  # Route based on header
  - matches:
    - headers:
      - name: x-canary
        value: "true"
    backendRefs:
    - name: myapp-v2
      port: 80

  # Route based on path
  - matches:
    - path:
        type: PathPrefix
        value: /api/v2
    backendRefs:
    - name: myapp-v2
      port: 80

  # Default route
  - backendRefs:
    - name: myapp-v1
      port: 80
      weight: 90
    - name: myapp-v2
      port: 80
      weight: 10

Reliability Features

Retries:

apiVersion: policy.linkerd.io/v1alpha1
kind: HTTPRoute
metadata:
  name: myapp-retries
  namespace: production
spec:
  parentRefs:
  - name: myapp
    kind: Service

  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api
    filters:
    - type: RequestHeaderModifier
      requestHeaderModifier:
        set:
        - name: l5d-retry-http
          value: "5xx"
        - name: l5d-retry-limit
          value: "3"
    backendRefs:
    - name: myapp
      port: 80

Timeouts:

apiVersion: policy.linkerd.io/v1alpha1
kind: HTTPRoute
metadata:
  name: myapp-timeouts
  namespace: production
spec:
  parentRefs:
  - name: myapp
    kind: Service

  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api
    timeouts:
      request: 10s
      backendRequest: 8s
    backendRefs:
    - name: myapp
      port: 80

Circuit Breaking (via ServiceProfile):

apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: myapp.production.svc.cluster.local
  namespace: production
spec:
  routes:
  - name: GET /api/users
    condition:
      method: GET
      pathRegex: /api/users
    responseClasses:
    - condition:
        status:
          min: 500
          max: 599
      isFailure: true
    retryBudget:
      retryRatio: 0.2
      minRetriesPerSecond: 10
      ttl: 10s

Authorization Policies

Server (Define Ports):

apiVersion: policy.linkerd.io/v1beta1
kind: Server
metadata:
  name: myapp-server
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: myapp
  port: 8080
  proxyProtocol: HTTP/2

ServerAuthorization (Allow Traffic):

apiVersion: policy.linkerd.io/v1beta1
kind: ServerAuthorization
metadata:
  name: myapp-auth
  namespace: production
spec:
  server:
    name: myapp-server

  client:
    # Allow from specific service account
    meshTLS:
      serviceAccounts:
      - name: frontend
        namespace: production

    # Allow unauthenticated (for ingress)
    unauthenticated: true

    # Allow from specific namespaces
    meshTLS:
      identities:
      - "*.production.serviceaccount.identity.linkerd.cluster.local"

AuthorizationPolicy (Deny by Default):

# Deny all traffic by default
apiVersion: policy.linkerd.io/v1beta1
kind: Server
metadata:
  name: all-pods
  namespace: production
spec:
  podSelector:
    matchLabels: {}
  port: 1-65535
---
apiVersion: policy.linkerd.io/v1beta1
kind: ServerAuthorization
metadata:
  name: deny-all
  namespace: production
spec:
  server:
    name: all-pods
  client:
    # No clients allowed (deny all)
    networks: []
---
# Allow specific traffic
apiVersion: policy.linkerd.io/v1beta1
kind: ServerAuthorization
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  server:
    selector:
      matchLabels:
        app: api
  client:
    meshTLS:
      serviceAccounts:
      - name: frontend

Multi-Cluster

Install Multi-Cluster:

# Install multi-cluster components
linkerd multicluster install | kubectl apply -f -

# Link clusters
linkerd multicluster link --cluster-name target | kubectl apply -f -

# Export service
kubectl label service myapp -n production mirror.linkerd.io/exported=true

# Check mirrored services
linkerd multicluster gateways
linkerd multicluster check

Service Export:

apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: production
  labels:
    mirror.linkerd.io/exported: "true"
spec:
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 8080

Observability

Golden Metrics (via CLI):

# Top routes by request rate
linkerd viz routes deployment/myapp -n production

# Live request metrics
linkerd viz stat deployments -n production

# Top resources by request volume
linkerd viz top deployments -n production

# Tap live traffic
linkerd viz tap deployment/myapp -n production

# Profile HTTP routes
linkerd viz profile myapp -n production --open-api swagger.json

Prometheus Metrics:

# Request rate
sum(rate(request_total{namespace="production"}[1m])) by (deployment)

# Success rate
sum(rate(request_total{namespace="production",classification="success"}[1m])) /
sum(rate(request_total{namespace="production"}[1m])) * 100

# Latency (P95)
histogram_quantile(0.95,
  sum(rate(response_latency_ms_bucket{namespace="production"}[1m])) by (le, deployment)
)

# TCP connection count
sum(tcp_open_connections{namespace="production"}) by (deployment)

Jaeger Integration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: linkerd-config-overrides
  namespace: linkerd
data:
  global: |
    tracing:
      collector:
        endpoint: jaeger.linkerd-jaeger:55678
      sampling:
        rate: 1.0

linkerd CLI Commands

Installation and Status:

# Pre-installation check
linkerd check --pre

# Install
linkerd install | kubectl apply -f -

# Check installation
linkerd check

# Upgrade
linkerd upgrade | kubectl apply -f -

# Uninstall
linkerd uninstall | kubectl delete -f -

Mesh Operations:

# Inject deployment
kubectl get deployment myapp -o yaml | linkerd inject - | kubectl apply -f -

# Inject namespace
linkerd inject deployment.yaml | kubectl apply -f -

# Uninject
linkerd uninject deployment.yaml | kubectl apply -f -

Observability:

# Stats
linkerd viz stat deployments -n production
linkerd viz stat pods -n production

# Routes
linkerd viz routes deployment/myapp -n production

# Top
linkerd viz top deployment/myapp -n production

# Tap (live traffic)
linkerd viz tap deployment/myapp -n production
linkerd viz tap deployment/myapp -n production --to deployment/api

# Edges (traffic graph)
linkerd viz edges deployment -n production

Diagnostics:

# Get proxy logs
linkerd viz logs deployment/myapp -n production

# Proxy metrics
linkerd viz metrics deployment/myapp -n production

# Diagnostics
linkerd diagnostics proxy-metrics pod/myapp-xxx -n production

Best Practices

1. Use Automatic Injection

# Enable at namespace level
annotations:
  linkerd.io/inject: enabled

2. Set Resource Limits

annotations:
  config.linkerd.io/proxy-cpu-limit: "1000m"
  config.linkerd.io/proxy-memory-limit: "256Mi"

3. Configure Retries and Timeouts

# Use HTTPRoute for reliability
filters:
- type: RequestHeaderModifier
  requestHeaderModifier:
    set:
    - name: l5d-retry-limit
      value: "3"

4. Monitor Golden Metrics

- Success Rate (requests/sec)
- Request Volume (RPS)
- Latency (P50, P95, P99)

5. Use ServiceProfiles

# Generate from OpenAPI
linkerd viz profile myapp -n production --open-api swagger.json

6. Implement Zero Trust

# Default deny, explicit allow
kind: ServerAuthorization

7. Multi-Cluster for HA

# Export critical services
mirror.linkerd.io/exported: "true"

Anti-Patterns

1. No Resource Limits:

# BAD: No proxy limits
# GOOD: Set explicit limits
config.linkerd.io/proxy-cpu-limit: "1000m"

2. Skip Ports Unnecessarily:

# BAD: Skip all ports
config.linkerd.io/skip-inbound-ports: "1-65535"

# GOOD: Only skip specific ports (metrics, health)
config.linkerd.io/skip-inbound-ports: "9090"

3. No Authorization Policies:

# GOOD: Always implement Server + ServerAuthorization

4. Ignoring Metrics:

# GOOD: Monitor success rate, latency, RPS
linkerd viz stat deployments -n production

Approach

When implementing Linkerd:

Start Simple: Inject one service first
Enable Namespace Injection: Scale gradually
Monitor: Use viz dashboard and CLI
Reliability: Add retries and timeouts
Security: Implement authorization policies
Profile Services: Generate ServiceProfiles
Multi-Cluster: For high availability
Tune: Adjust proxy resources based on load

Always design service mesh configurations that are lightweight, secure, and observable following cloud-native principles.

Resources

Linkerd Documentation: https://linkerd.io/docs/
Linkerd Best Practices: https://linkerd.io/2/tasks/
BuoyantCloud: https://buoyant.io/cloud
Service Mesh Interface (SMI): https://smi-spec.io/