Claude-skill-registry knative-serving
Deploy serverless workloads with Knative Serving for scale-to-zero and autoscaling. Use for creating Knative Services, configuring autoscaling, traffic splitting, and revisions. Triggers on "knative service", "scale-to-zero", "serverless deployment", "ksvc", "knative autoscaling", "traffic splitting", or when deploying agents as serverless workloads.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/knative-serving" ~/.claude/skills/majiayu000-claude-skill-registry-knative-serving && rm -rf "$T"
manifest:
skills/data/knative-serving/SKILL.mdsource content
Knative Serving
Overview
Deploy AI agents as serverless workloads using Knative Serving, enabling automatic scale-to-zero and request-based autoscaling.
Knative Architecture
Request │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Knative Service │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Route │ │ │ │ • Traffic splitting between revisions │ │ │ │ • A/B testing, canary deployments │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Configuration │ │ │ │ • Desired state specification │ │ │ │ • Creates new Revision on each update │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Revision (Immutable) │ │ │ │ • Snapshot of code + config │ │ │ │ • Autoscaled via KPA/HPA │ │ │ └───────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘
Knative Service Definition
Basic Service
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: customer-support-agent namespace: agents labels: agentstack.io/agent-id: agt_abc123 agentstack.io/project-id: prj_xyz789 agentstack.io/framework: google-adk spec: template: metadata: annotations: # Autoscaling configuration autoscaling.knative.dev/class: kpa.autoscaling.knative.dev autoscaling.knative.dev/metric: concurrency autoscaling.knative.dev/target: "10" autoscaling.knative.dev/min-scale: "0" autoscaling.knative.dev/max-scale: "100" autoscaling.knative.dev/scale-down-delay: "30s" # Container configuration autoscaling.knative.dev/initial-scale: "1" spec: containerConcurrency: 10 timeoutSeconds: 300 # 5 minutes for LLM calls containers: - image: ghcr.io/raphaelmansuy/customer-support-agent:v1.0.0 ports: - containerPort: 8080 protocol: TCP env: - name: AGENT_ID value: "agt_abc123" - name: PROJECT_ID value: "prj_xyz789" - name: OPENAI_API_KEY valueFrom: secretKeyRef: name: agent-secrets key: OPENAI_API_KEY resources: requests: cpu: 100m memory: 512Mi limits: cpu: 2000m memory: 2Gi readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 5
Production Service with Queue-Proxy Config
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: enterprise-agent namespace: agents annotations: # Revision history limit serving.knative.dev/rolloutDuration: "120s" spec: template: metadata: annotations: # Use HPA for production autoscaling.knative.dev/class: hpa.autoscaling.knative.dev autoscaling.knative.dev/metric: cpu autoscaling.knative.dev/target: "70" autoscaling.knative.dev/min-scale: "2" autoscaling.knative.dev/max-scale: "50" # Queue-proxy settings queue.sidecar.serving.knative.dev/resourcePercentage: "20" spec: containerConcurrency: 0 # Unlimited (use with HPA) timeoutSeconds: 600 serviceAccountName: agent-runner containers: - image: ghcr.io/raphaelmansuy/enterprise-agent:v2.0.0 ports: - containerPort: 8080 resources: requests: cpu: 500m memory: 1Gi limits: cpu: 4000m memory: 4Gi volumeMounts: - name: model-cache mountPath: /cache volumes: - name: model-cache emptyDir: sizeLimit: 5Gi
Autoscaling Configuration
Concurrency-Based (KPA) - Default
Best for request-heavy workloads:
annotations: autoscaling.knative.dev/class: kpa.autoscaling.knative.dev autoscaling.knative.dev/metric: concurrency autoscaling.knative.dev/target: "10" # Target concurrent requests per pod autoscaling.knative.dev/target-utilization-percentage: "70"
CPU-Based (HPA)
Best for compute-intensive agents:
annotations: autoscaling.knative.dev/class: hpa.autoscaling.knative.dev autoscaling.knative.dev/metric: cpu autoscaling.knative.dev/target: "70" # Target CPU percentage
RPS-Based
For rate-limited scenarios:
annotations: autoscaling.knative.dev/class: kpa.autoscaling.knative.dev autoscaling.knative.dev/metric: rps autoscaling.knative.dev/target: "100" # Target requests per second
Scale Bounds
annotations: autoscaling.knative.dev/min-scale: "0" # Scale to zero (default) autoscaling.knative.dev/max-scale: "100" # Maximum replicas autoscaling.knative.dev/initial-scale: "1" # Initial pods on deploy
Scale-Down Delay
Prevent thrashing:
annotations: autoscaling.knative.dev/scale-down-delay: "30s" # Wait before scaling down autoscaling.knative.dev/stable-window: "60s" # Stability window autoscaling.knative.dev/panic-window-percentage: "10" autoscaling.knative.dev/panic-threshold-percentage: "200"
Traffic Splitting
Canary Deployment
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: customer-support-agent spec: template: metadata: name: customer-support-agent-v2 spec: containers: - image: ghcr.io/raphaelmansuy/customer-support-agent:v2.0.0 traffic: - revisionName: customer-support-agent-v1 percent: 90 - revisionName: customer-support-agent-v2 percent: 10
Blue-Green Deployment
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: customer-support-agent spec: template: metadata: name: customer-support-agent-green spec: containers: - image: ghcr.io/raphaelmansuy/customer-support-agent:v2.0.0 traffic: # Route all traffic to previous version - revisionName: customer-support-agent-blue percent: 100 # Tag new version for testing - revisionName: customer-support-agent-green percent: 0 tag: green # Accessible at green-<service>.<domain>
Rollout Complete Traffic
traffic: - revisionName: customer-support-agent-green percent: 100 - revisionName: customer-support-agent-blue percent: 0
Private Services (Internal Only)
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: internal-agent labels: networking.knative.dev/visibility: cluster-local spec: template: spec: containers: - image: ghcr.io/raphaelmansuy/internal-agent:v1.0.0
Domain Mapping
apiVersion: serving.knative.dev/v1beta1 kind: DomainMapping metadata: name: support.agentstack.io namespace: agents spec: ref: name: customer-support-agent kind: Service apiVersion: serving.knative.dev/v1
ConfigMaps for Global Settings
apiVersion: v1 kind: ConfigMap metadata: name: config-autoscaler namespace: knative-serving data: # Global defaults container-concurrency-target-default: "100" container-concurrency-target-percentage: "0.7" enable-scale-to-zero: "true" scale-to-zero-grace-period: "30s" scale-to-zero-pod-retention-period: "0s" stable-window: "60s" panic-window-percentage: "10.0" panic-threshold-percentage: "200.0" max-scale: "100"
Observability Integration
Enable Metrics
apiVersion: v1 kind: ConfigMap metadata: name: config-observability namespace: knative-serving data: metrics.backend-destination: prometheus metrics.request-metrics-backend-destination: prometheus metrics.opencensus-address: ""
Enable Tracing
apiVersion: v1 kind: ConfigMap metadata: name: config-tracing namespace: knative-serving data: backend: zipkin zipkin-endpoint: "http://zipkin.observability:9411/api/v2/spans" sample-rate: "0.1"
Resources
- Reducing cold start latencyreferences/cold-start-optimization.md
- Integrating with kagent orchestratorreferences/kagent-integration.md
- Base service templateassets/service-template.yaml