Everything-claude-code-zh deployment-patterns
部署工作流、CI/CD流水线模式、Docker容器化、健康检查、回滚策略以及Web应用程序的生产就绪检查清单。
install
source · Clone the upstream repo
git clone https://github.com/xu-xiang/everything-claude-code-zh
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/xu-xiang/everything-claude-code-zh "$T" && mkdir -p ~/.claude/skills && cp -r "$T/docs/zh-CN/skills/deployment-patterns" ~/.claude/skills/xu-xiang-everything-claude-code-zh-deployment-patterns && rm -rf "$T"
manifest:
docs/zh-CN/skills/deployment-patterns/SKILL.mdsource content
部署模式
生产环境部署工作流和 CI/CD 最佳实践。
何时启用
- 设置 CI/CD 流水线时
- 将应用容器化(Docker)时
- 规划部署策略(蓝绿、金丝雀、滚动)时
- 实现健康检查和就绪探针时
- 准备生产发布时
- 配置环境特定设置时
部署策略
滚动部署(默认)
逐步替换实例——在发布过程中,新旧版本同时运行。
Instance 1: v1 → v2 (update first) Instance 2: v1 (still running v1) Instance 3: v1 (still running v1) Instance 1: v2 Instance 2: v1 → v2 (update second) Instance 3: v1 Instance 1: v2 Instance 2: v2 Instance 3: v1 → v2 (update last)
优点: 零停机时间,渐进式发布 缺点: 两个版本同时运行——需要向后兼容的更改 适用场景: 标准部署,向后兼容的更改
蓝绿部署
运行两个相同的环境。原子化地切换流量。
Blue (v1) ← traffic Green (v2) idle, running new version # After verification: Blue (v1) idle (becomes standby) Green (v2) ← traffic
优点: 即时回滚(切换回蓝色环境),切换干净利落 缺点: 部署期间需要双倍的基础设施 适用场景: 关键服务,对问题零容忍
金丝雀部署
首先将一小部分流量路由到新版本。
v1: 95% of traffic v2: 5% of traffic (canary) # If metrics look good: v1: 50% of traffic v2: 50% of traffic # Final: v2: 100% of traffic
优点: 在全量发布前,通过真实流量发现问题 缺点: 需要流量分割基础设施和监控 适用场景: 高流量服务,风险性更改,功能标志
Docker
多阶段 Dockerfile (Node.js)
# Stage 1: Install dependencies FROM node:22-alpine AS deps WORKDIR /app COPY package.json package-lock.json ./ RUN npm ci --production=false # Stage 2: Build FROM node:22-alpine AS builder WORKDIR /app COPY --from=deps /app/node_modules ./node_modules COPY . . RUN npm run build RUN npm prune --production # Stage 3: Production image FROM node:22-alpine AS runner WORKDIR /app RUN addgroup -g 1001 -S appgroup && adduser -S appuser -u 1001 USER appuser COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules COPY --from=builder --chown=appuser:appgroup /app/dist ./dist COPY --from=builder --chown=appuser:appgroup /app/package.json ./ ENV NODE_ENV=production EXPOSE 3000 HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1 CMD ["node", "dist/server.js"]
多阶段 Dockerfile (Go)
FROM golang:1.22-alpine AS builder WORKDIR /app COPY go.mod go.sum ./ RUN go mod download COPY . . RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server ./cmd/server FROM alpine:3.19 AS runner RUN apk --no-cache add ca-certificates RUN adduser -D -u 1001 appuser USER appuser COPY --from=builder /server /server EXPOSE 8080 HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- http://localhost:8080/health || exit 1 CMD ["/server"]
多阶段 Dockerfile (Python/Django)
FROM python:3.12-slim AS builder WORKDIR /app RUN pip install --no-cache-dir uv COPY requirements.txt . RUN uv pip install --system --no-cache -r requirements.txt FROM python:3.12-slim AS runner WORKDIR /app RUN useradd -r -u 1001 appuser USER appuser COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages COPY --from=builder /usr/local/bin /usr/local/bin COPY . . ENV PYTHONUNBUFFERED=1 EXPOSE 8000 HEALTHCHECK --interval=30s --timeout=3s CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health/')" || exit 1 CMD ["gunicorn", "config.wsgi:application", "--bind", "0.0.0.0:8000", "--workers", "4"]
Docker 最佳实践
# GOOD practices - Use specific version tags (node:22-alpine, not node:latest) - Multi-stage builds to minimize image size - Run as non-root user - Copy dependency files first (layer caching) - Use .dockerignore to exclude node_modules, .git, tests - Add HEALTHCHECK instruction - Set resource limits in docker-compose or k8s # BAD practices - Running as root - Using :latest tags - Copying entire repo in one COPY layer - Installing dev dependencies in production image - Storing secrets in image (use env vars or secrets manager)
CI/CD 流水线
GitHub Actions (标准流水线)
name: CI/CD on: push: branches: [main] pull_request: branches: [main] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 22 cache: npm - run: npm ci - run: npm run lint - run: npm run typecheck - run: npm test -- --coverage - uses: actions/upload-artifact@v4 if: always() with: name: coverage path: coverage/ build: needs: test runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' steps: - uses: actions/checkout@v4 - uses: docker/setup-buildx-action@v3 - uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - uses: docker/build-push-action@v5 with: push: true tags: ghcr.io/${{ github.repository }}:${{ github.sha }} cache-from: type=gha cache-to: type=gha,mode=max deploy: needs: build runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' environment: production steps: - name: Deploy to production run: | # Platform-specific deployment command # Railway: railway up # Vercel: vercel --prod # K8s: kubectl set image deployment/app app=ghcr.io/${{ github.repository }}:${{ github.sha }} echo "Deploying ${{ github.sha }}"
流水线阶段
PR opened: lint → typecheck → unit tests → integration tests → preview deploy Merged to main: lint → typecheck → unit tests → integration tests → build image → deploy staging → smoke tests → deploy production
健康检查
健康检查端点
// Simple health check app.get("/health", (req, res) => { res.status(200).json({ status: "ok" }); }); // Detailed health check (for internal monitoring) app.get("/health/detailed", async (req, res) => { const checks = { database: await checkDatabase(), redis: await checkRedis(), externalApi: await checkExternalApi(), }; const allHealthy = Object.values(checks).every(c => c.status === "ok"); res.status(allHealthy ? 200 : 503).json({ status: allHealthy ? "ok" : "degraded", timestamp: new Date().toISOString(), version: process.env.APP_VERSION || "unknown", uptime: process.uptime(), checks, }); }); async function checkDatabase(): Promise<HealthCheck> { try { await db.query("SELECT 1"); return { status: "ok", latency_ms: 2 }; } catch (err) { return { status: "error", message: "Database unreachable" }; } }
Kubernetes 探针
livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 10 periodSeconds: 30 failureThreshold: 3 readinessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 5 periodSeconds: 10 failureThreshold: 2 startupProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 0 periodSeconds: 5 failureThreshold: 30 # 30 * 5s = 150s max startup time
环境配置
十二要素应用模式
# All config via environment variables — never in code DATABASE_URL=postgres://user:pass@host:5432/db REDIS_URL=redis://host:6379/0 API_KEY=${API_KEY} # injected by secrets manager LOG_LEVEL=info PORT=3000 # Environment-specific behavior NODE_ENV=production # or staging, development APP_ENV=production # explicit app environment
配置验证
import { z } from "zod"; const envSchema = z.object({ NODE_ENV: z.enum(["development", "staging", "production"]), PORT: z.coerce.number().default(3000), DATABASE_URL: z.string().url(), REDIS_URL: z.string().url(), JWT_SECRET: z.string().min(32), LOG_LEVEL: z.enum(["debug", "info", "warn", "error"]).default("info"), }); // Validate at startup — fail fast if config is wrong export const env = envSchema.parse(process.env);
回滚策略
即时回滚
# Docker/Kubernetes: point to previous image kubectl rollout undo deployment/app # Vercel: promote previous deployment vercel rollback # Railway: redeploy previous commit railway up --commit <previous-sha> # Database: rollback migration (if reversible) npx prisma migrate resolve --rolled-back <migration-name>
回滚检查清单
- [ ] 之前的镜像/制品可用且已标记
- [ ] 数据库迁移向后兼容(无破坏性更改)
- [ ] 功能标志可以在不部署的情况下禁用新功能
- [ ] 监控警报已配置,用于错误率飙升
- [ ] 在生产发布前,回滚已在预演环境测试
生产就绪检查清单
在任何生产部署之前:
应用
- [ ] 所有测试通过(单元、集成、端到端)
- [ ] 代码或配置文件中没有硬编码的密钥
- [ ] 错误处理覆盖所有边缘情况
- [ ] 日志是结构化的(JSON)且不包含 PII
- [ ] 健康检查端点返回有意义的状态
基础设施
- [ ] Docker 镜像可重复构建(版本已固定)
- [ ] 环境变量已记录并在启动时验证
- [ ] 资源限制已设置(CPU、内存)
- [ ] 水平伸缩已配置(最小/最大实例数)
- [ ] 所有端点均已启用 SSL/TLS
监控
- [ ] 应用指标已导出(请求率、延迟、错误)
- [ ] 已配置错误率超过阈值的警报
- [ ] 日志聚合已设置(结构化日志,可搜索)
- [ ] 健康端点有正常运行时间监控
安全
- [ ] 依赖项已扫描 CVE
- [ ] CORS 仅配置允许的来源
- [ ] 公共端点已启用速率限制
- [ ] 身份验证和授权已验证
- [ ] 安全头已设置(CSP、HSTS、X-Frame-Options)
运维
- [ ] 回滚计划已记录并测试
- [ ] 数据库迁移已针对生产规模的数据进行测试
- [ ] 常见故障场景的应急预案
- [ ] 待命轮换和升级路径已定义