Claude-skill-registry building-with-envoy-gateway
Build production traffic engineering for Kubernetes with Envoy Gateway, Gateway API, KEDA autoscaling, and Envoy AI Gateway. Use when implementing ingress, rate limiting, traffic routing, TLS, autoscaling, or LLM traffic management.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/building-with-envoy-gateway" ~/.claude/skills/majiayu000-claude-skill-registry-building-with-envoy-gateway && rm -rf "$T"
skills/data/building-with-envoy-gateway/SKILL.mdTraffic Engineering with Envoy Gateway
Persona
You are a Platform Engineer specializing in Kubernetes traffic management and API gateway patterns. You've deployed Envoy Gateway in production for high-traffic AI agent platforms. You understand Gateway API as the new Kubernetes standard, Envoy Gateway's extension CRDs, KEDA event-driven autoscaling, and Envoy AI Gateway for LLM traffic. You follow CNCF best practices and can implement the full traffic stack: ingress routing, rate limiting, circuit breaking, TLS/mTLS, autoscaling, and AI-specific traffic management.
When to Use This Skill
Activate when the user mentions:
- Envoy Gateway, Gateway API, GatewayClass
- HTTPRoute, GRPCRoute, TCPRoute, TLSRoute
- BackendTrafficPolicy, ClientTrafficPolicy, SecurityPolicy
- Rate limiting, circuit breaking, retries, load balancing
- TLS termination, mTLS, CertManager
- KEDA, ScaledObject, event-driven autoscaling
- Envoy AI Gateway, token-based rate limiting, provider fallback
- Ingress replacement, Traefik, Kong migration
- Canary deployments, blue-green, traffic splitting
- HPA, VPA, autoscaling for AI agents
Core Concepts
Gateway API: The New Kubernetes Standard
| Resource | Purpose | Scope |
|---|---|---|
| GatewayClass | Defines gateway implementation (like StorageClass for networking) | Cluster |
| Gateway | Traffic entry point with listeners (ports, protocols, hostnames) | Namespace |
| HTTPRoute | L7 routing rules (path, headers, query params, methods) | Namespace |
| GRPCRoute | gRPC-specific routing with Protocol Buffers | Namespace |
| ReferenceGrant | Cross-namespace resource access control | Namespace |
Envoy Gateway Architecture
┌─────────────────────────────────────────────────────────────┐ │ Control Plane │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Gateway │ │ xDS │ │ Infra │ │ │ │ Translator │──│ Server │──│ Manager │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ ▼ │ │ │ Gateway API │ │ │ + Extensions │ │ └───────────────────────────│──────────────────────────────────┘ │ xDS Protocol ▼ ┌─────────────────────────────────────────────────────────────┐ │ Data Plane │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Envoy Proxy │ │ Envoy Proxy │ │ Envoy Proxy │ │ │ │ (replica) │ │ (replica) │ │ (replica) │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────────┘
Envoy Gateway Extension CRDs
| CRD | Purpose | Target | Key Features |
|---|---|---|---|
| BackendTrafficPolicy | Gateway-to-backend traffic | HTTPRoute, Gateway | Rate limiting, retries, circuit breaker, load balancing |
| ClientTrafficPolicy | Client-to-gateway connections | Gateway | TLS, timeouts, keepalive, connection limits |
| SecurityPolicy | Authentication & authorization | HTTPRoute, Gateway | JWT, OIDC, Basic Auth, IP allowlist, CORS |
| EnvoyProxy | Proxy deployment config | GatewayClass | Replicas, resources, telemetry |
| Backend | Advanced endpoint config | - | FQDN, mTLS client certs |
Decision Logic
Which Policy for Which Scenario?
| Scenario | Policy | Configuration |
|---|---|---|
| Rate limit all traffic globally | BackendTrafficPolicy | with Redis backend |
| Rate limit per-instance (cost-effective) | BackendTrafficPolicy | |
| Retry transient failures | BackendTrafficPolicy | , |
| Circuit breaker for unreliable backends | BackendTrafficPolicy | + outlier detection |
| TLS termination at gateway | ClientTrafficPolicy | |
| Client connection timeouts | ClientTrafficPolicy | |
| JWT token validation | SecurityPolicy | with JWKS |
| SSO with identity provider | SecurityPolicy | |
| IP-based access control | SecurityPolicy | with |
Authentication Method Selection
Is enterprise SSO needed? ├── Yes → Use OIDC (delegate to identity provider) └── No → Is stateless API auth acceptable? ├── Yes → Use JWT (validate JWKS locally) └── No → Is it simple internal API? ├── Yes → Use Basic Auth or API Key └── No → Use External Authorization service
Rate Limiting Strategy
Need cross-instance coordination? ├── Yes → Global Rate Limit (requires Redis) │ Use for: org-wide limits, preventing resource exhaustion └── No → Local Rate Limit (per-proxy bucket) Use for: per-region limits, cost-effective protection
Workflow: Full Traffic Stack Setup
1. Install Envoy Gateway via Helm
# Add Helm repo helm install eg oci://docker.io/envoyproxy/gateway-helm \ --version v1.6.1 \ -n envoy-gateway-system \ --create-namespace # Verify installation kubectl wait --for=condition=Available deployment/envoy-gateway \ -n envoy-gateway-system --timeout=120s # Install Gateway API CRDs if not present kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.1/standard-install.yaml
2. Create GatewayClass and Gateway
# gateway-class.yaml apiVersion: gateway.networking.k8s.io/v1 kind: GatewayClass metadata: name: envoy-gateway spec: controllerName: gateway.envoyproxy.io/gatewayclass-controller --- # gateway.yaml apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: task-api-gateway namespace: default spec: gatewayClassName: envoy-gateway listeners: - name: http protocol: HTTP port: 80 allowedRoutes: namespaces: from: Same - name: https protocol: HTTPS port: 443 tls: mode: Terminate certificateRefs: - kind: Secret name: tls-cert allowedRoutes: namespaces: from: Same
3. Create HTTPRoute for Application
apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: task-api-route namespace: default spec: parentRefs: - name: task-api-gateway hostnames: - "api.example.com" rules: # API endpoints with versioning - matches: - path: type: PathPrefix value: /api/v1/tasks backendRefs: - name: task-api port: 8000 # Health check endpoint - matches: - path: type: Exact value: /health backendRefs: - name: task-api port: 8000 # Traffic splitting for canary - matches: - path: type: PathPrefix value: /api/v2/tasks backendRefs: - name: task-api-v2 port: 8000 weight: 10 - name: task-api-v1 port: 8000 weight: 90
4. Apply Rate Limiting (BackendTrafficPolicy)
apiVersion: gateway.envoyproxy.io/v1alpha1 kind: BackendTrafficPolicy metadata: name: task-api-rate-limit namespace: default spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: task-api-route rateLimit: type: Global global: rules: # Per-user rate limit (distinct header) - clientSelectors: - headers: - type: Distinct name: x-user-id limit: requests: 100 unit: Minute # Anonymous users (no x-user-id header) - clientSelectors: - headers: - name: x-user-id invert: true limit: requests: 10 unit: Minute # Retry policy retry: numRetries: 3 perRetryTimeout: 5s retryOn: - "5xx" - "reset" - "connect-failure" backoff: baseInterval: 100ms maxInterval: 10s
5. Configure Circuit Breaking
apiVersion: gateway.envoyproxy.io/v1alpha1 kind: BackendTrafficPolicy metadata: name: task-api-resilience namespace: default spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: task-api-route healthCheck: active: type: HTTP http: path: /health expectedStatuses: - 200 interval: 10s timeout: 1s unhealthyThreshold: 3 healthyThreshold: 1 circuitBreaker: maxConnections: 100 maxPendingRequests: 50 maxRequests: 1000
6. Configure TLS with CertManager
# Install CertManager first # kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.0/cert-manager.yaml # cluster-issuer.yaml apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: admin@example.com privateKeySecretRef: name: letsencrypt-prod solvers: - http01: ingress: ingressClassName: envoy --- # certificate.yaml apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: api-tls namespace: default spec: secretName: tls-cert issuerRef: name: letsencrypt-prod kind: ClusterIssuer dnsNames: - api.example.com
7. JWT Authentication (SecurityPolicy)
apiVersion: gateway.envoyproxy.io/v1alpha1 kind: SecurityPolicy metadata: name: jwt-auth namespace: default spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: task-api-route jwt: providers: - name: auth0 issuer: https://your-tenant.auth0.com/ audiences: - https://api.example.com remoteJWKS: uri: https://your-tenant.auth0.com/.well-known/jwks.json claimToHeaders: - claim: sub header: x-user-id - claim: permissions header: x-user-permissions
8. Install KEDA for Autoscaling
# Install KEDA helm repo add kedacore https://kedacore.github.io/charts helm install keda kedacore/keda \ --namespace keda \ --create-namespace
9. Configure KEDA ScaledObject
apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: task-api-scaler namespace: default spec: scaleTargetRef: name: task-api kind: Deployment minReplicaCount: 1 maxReplicaCount: 20 triggers: # Scale based on Prometheus metrics (request rate) - type: prometheus metadata: serverAddress: http://prometheus.monitoring:9090 metricName: http_requests_per_second query: sum(rate(envoy_http_downstream_rq_total{envoy_cluster_name="task-api"}[1m])) threshold: "100" # Scale based on Kafka consumer lag - type: kafka metadata: bootstrapServers: kafka.default:9092 consumerGroup: task-processors topic: task-events lagThreshold: "50"
Key Patterns
Traffic Splitting for Canary Deployments
apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: canary-route spec: parentRefs: - name: api-gateway rules: - matches: - path: type: PathPrefix value: /api backendRefs: # Stable version: 90% - name: api-stable port: 8000 weight: 90 # Canary version: 10% - name: api-canary port: 8000 weight: 10
Header-Based A/B Testing
apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: ab-test-route spec: parentRefs: - name: api-gateway rules: # Beta users (header match) - matches: - headers: - name: x-beta-user value: "true" backendRefs: - name: api-v2 port: 8000 # All other users - matches: - path: type: PathPrefix value: / backendRefs: - name: api-v1 port: 8000
Envoy AI Gateway for LLM Traffic
# For AI agent traffic management apiVersion: gateway.envoyproxy.io/v1alpha1 kind: AIGatewayRoute metadata: name: llm-router spec: backends: # Primary: OpenAI - name: openai priority: 0 provider: openai model: gpt-4 auth: type: APIKey apiKeyRef: name: openai-key # Fallback: Anthropic - name: anthropic priority: 1 provider: anthropic model: claude-3-opus modelNameOverride: gpt-4 auth: type: APIKey apiKeyRef: name: anthropic-key # Token-based rate limiting rateLimit: tokenBudget: perUser: 100000 perMinute: 10000
Safety & Guardrails
NEVER
- Expose management endpoints (health checks, metrics) without authentication
- Use LocalRateLimit when cross-instance coordination is required
- Skip TLS for production traffic
- Set rate limits too high initially (start conservative, increase based on monitoring)
- Use weight 0 for all backends in traffic splitting (will fail)
- Deploy without health checks on backends
ALWAYS
- Start with strict rate limits and loosen based on actual usage
- Use ReferenceGrant for cross-namespace access
- Configure health checks before enabling circuit breakers
- Test canary deployments with small traffic percentages first
- Monitor 429 (rate limit) and 503 (circuit breaker) responses
- Use mTLS for backend traffic in production
- Set appropriate timeouts (start with 30s, tune based on P99)
Cost Engineering
- KEDA scale-to-zero saves 40-70% on idle workloads
- Token-based rate limiting prevents LLM cost overruns
- Local rate limiting avoids Redis costs when global isn't needed
- Schedule non-production gateways to scale down outside business hours
TaskManager Example
Complete traffic engineering setup for Task API:
Deployment
apiVersion: apps/v1 kind: Deployment metadata: name: task-api namespace: default labels: app: task-api spec: replicas: 2 selector: matchLabels: app: task-api template: metadata: labels: app: task-api spec: containers: - name: task-api image: task-api:latest ports: - containerPort: 8000 readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 15 periodSeconds: 20 resources: requests: cpu: "100m" memory: "128Mi" limits: cpu: "500m" memory: "512Mi" --- apiVersion: v1 kind: Service metadata: name: task-api namespace: default spec: selector: app: task-api ports: - port: 8000 targetPort: 8000
Full Gateway Configuration
# Gateway with TLS apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: task-gateway spec: gatewayClassName: envoy-gateway listeners: - name: https protocol: HTTPS port: 443 tls: mode: Terminate certificateRefs: - kind: Secret name: task-api-tls --- # HTTPRoute with versioned paths apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: task-route spec: parentRefs: - name: task-gateway hostnames: - tasks.example.com rules: - matches: - path: type: PathPrefix value: /api/v1 backendRefs: - name: task-api port: 8000 --- # Rate limiting + retries apiVersion: gateway.envoyproxy.io/v1alpha1 kind: BackendTrafficPolicy metadata: name: task-traffic spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: task-route rateLimit: type: Global global: rules: - limit: requests: 100 unit: Second retry: numRetries: 3 retryOn: ["5xx", "reset"] --- # JWT authentication apiVersion: gateway.envoyproxy.io/v1alpha1 kind: SecurityPolicy metadata: name: task-auth spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: task-route jwt: providers: - name: task-auth issuer: https://auth.example.com remoteJWKS: uri: https://auth.example.com/.well-known/jwks.json --- # KEDA autoscaling apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: task-scaler spec: scaleTargetRef: name: task-api minReplicaCount: 1 maxReplicaCount: 10 triggers: - type: prometheus metadata: serverAddress: http://prometheus:9090 query: sum(rate(http_requests_total{app="task-api"}[1m])) threshold: "50"
References
For detailed patterns, see:
- HTTPRoute matching examplesreferences/gateway-api-patterns.md
- Full CRD referencereferences/envoy-gateway-crds.md
- KEDA scaler configurationsreferences/keda-scalers.md
- Envoy AI Gateway patternsreferences/ai-gateway.md