git clone https://github.com/Intense-Visions/harness-engineering
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/harness-deployment" ~/.claude/skills/intense-visions-harness-engineering-harness-deployment && rm -rf "$T"
agents/skills/claude-code/harness-deployment/SKILL.mdHarness Deployment
CI/CD pipeline analysis, deployment strategy design, and environment management. From commit to production with confidence.
When to Use
- When setting up or reviewing CI/CD pipelines for a new or existing project
- When evaluating deployment strategies (blue-green, canary, rolling) for a service
- When auditing environment separation and promotion workflows
- NOT for container image building or registry management (use harness-containerization)
- NOT for infrastructure provisioning (use harness-infrastructure-as-code)
- NOT for application performance under load (use harness-perf)
Process
Phase 1: DETECT -- Identify Pipeline and Environment Configuration
-
Scan for CI/CD configuration files. Search the project root for pipeline definitions:
-- GitHub Actions.github/workflows/*.yml
-- GitLab CI.gitlab-ci.yml
-- JenkinsJenkinsfile
-- CircleCI.circleci/config.yml
-- Bitbucket Pipelinesbitbucket-pipelines.yml
-- Azure DevOpsazure-pipelines.yml
,deploy/
-- custom deployment scriptsscripts/deploy*
-
Identify deployment targets. Parse pipeline files for deployment steps and extract:
- Target environments (dev, staging, production)
- Deployment mechanisms (kubectl apply, aws ecs update-service, serverless deploy, rsync)
- Cloud provider and region information
- Container registry references
-
Detect environment configuration. Look for environment-specific config:
,.env.production
files.env.staging- Environment variable injection in pipeline definitions
- Secret references (GitHub Secrets, GitLab CI variables, Vault paths)
- Feature flag provider configuration per environment
-
Map the deployment topology. Build a summary of what gets deployed where:
- Service name, pipeline file, target environment, deployment mechanism
- Dependencies between services (deploy order constraints)
- Manual approval gates vs. automatic promotion
-
Present detection summary. Output the discovered topology before proceeding:
Deployment Topology: Platform: GitHub Actions Pipelines: 3 workflow files Environments: dev, staging, production Strategy: Rolling (detected from kubectl rolling-update) Approval gates: production (manual)
Phase 2: ANALYZE -- Evaluate Pipeline Quality and Gaps
-
Check pipeline stage completeness. A mature pipeline includes these stages. Flag any that are missing:
- Build and compile
- Unit tests
- Integration tests
- Security scan (SAST/DAST)
- Artifact packaging
- Deploy to staging
- Smoke tests post-deploy
- Deploy to production
- Post-deploy verification
-
Evaluate environment isolation. Verify that environments are properly separated:
- Staging and production use different credentials
- Environment-specific variables are not shared across environments
- Database connections point to the correct environment
- No hardcoded production URLs in non-production configs
-
Check deployment safety mechanisms. Verify the pipeline includes:
- Rollback procedures (automatic or documented manual)
- Health checks after deployment
- Timeout configuration on deployment steps
- Concurrency controls (prevent parallel deploys to the same environment)
- Branch protection rules that gate production deploys
-
Analyze pipeline performance. Identify bottlenecks:
- Steps that could run in parallel but are sequential
- Missing caching (dependencies, build artifacts, Docker layers)
- Redundant steps across workflows
- Total pipeline duration from commit to production
-
Check secret hygiene in pipelines. Verify:
- No secrets hardcoded in pipeline files
- Secrets are scoped to the minimum required environment
- Secret rotation is possible without pipeline changes
- OIDC or workload identity is used where available instead of long-lived credentials
Phase 3: DESIGN -- Recommend Strategy Improvements
-
Recommend deployment strategy. Based on the service characteristics:
- Rolling -- suitable for stateless services with backward-compatible changes
- Blue-green -- suitable when zero-downtime cutover is required and rollback must be instant
- Canary -- suitable for high-traffic services where gradual validation reduces blast radius
- Recreate -- suitable only for development environments or when downtime is acceptable
-
Design missing pipeline stages. For each gap identified in Phase 2, provide:
- The stage definition in the project's CI/CD platform syntax
- Where it fits in the pipeline order
- What tools or services it requires
- Example configuration snippet
-
Recommend environment promotion workflow. Design the path from commit to production:
- Automatic promotion from dev to staging after tests pass
- Manual approval gate before production (with notification to the team channel)
- Smoke test suite that runs post-deploy in each environment
- Rollback trigger conditions (error rate spike, health check failure)
-
Design rollback procedure. Every deployment must have a documented rollback:
- For container deployments: revert to previous image tag
- For serverless: revert to previous function version
- For database migrations: backward-compatible migration strategy
- Maximum rollback time target (e.g., under 5 minutes)
-
Recommend monitoring integration. Connect deployment events to observability:
- Deploy markers in APM tools (Datadog, New Relic, Grafana)
- Automated alerts on error rate increase after deploy
- Deployment frequency and lead time tracking
Phase 4: VALIDATE -- Verify Pipeline Correctness
-
Lint pipeline configuration. Run syntax validation:
- GitHub Actions:
or YAML schema validationactionlint - GitLab CI:
API endpointgitlab-ci-lint - Jenkinsfile: Groovy syntax check
- General: YAML structure validation for all config files
- GitHub Actions:
-
Verify environment variable completeness. For each environment:
- All required variables are defined
- No placeholder values remain (TODO, CHANGEME, xxx)
- Variables referenced in code exist in the pipeline configuration
-
Verify branch protection alignment. Confirm that:
- Production deploy pipelines only trigger from protected branches
- Required status checks match the pipeline stages
- Force-push is disabled on deployment branches
-
Generate deployment readiness report. Summarize findings:
Deployment Readiness: [PASS/WARN/FAIL] Pipeline stages: 7/9 present (missing: security scan, smoke tests) Environment isolation: PASS Rollback procedure: WARN (documented but not automated) Secret hygiene: PASS Pipeline performance: 12m avg (recommend parallelizing test stages) Recommendations: 1. Add SAST scan stage between build and deploy 2. Add post-deploy smoke test stage 3. Automate rollback on health check failure -
Present results. Use
to deliver the report and ask whether to proceed with implementing recommendations.emit_interaction
Harness Integration
-- Primary invocation for deployment analysis.harness skill run harness-deployment
-- Run after any pipeline configuration changes to verify project health.harness validate
-- Verify deployment script dependencies are available.harness check-deps
-- Present deployment readiness report and gather decisions on strategy.emit_interaction
Success Criteria
- All CI/CD configuration files in the project are identified and cataloged
- Pipeline stage completeness is assessed against the standard checklist
- Environment isolation is verified with no cross-environment credential leakage
- A deployment strategy recommendation is provided with rationale
- Rollback procedures are documented or flagged as missing
- Pipeline lint passes without errors
Examples
Example: Node.js API with GitHub Actions
Phase 1: DETECT Found: .github/workflows/ci.yml, .github/workflows/deploy.yml Environments: staging (auto), production (manual dispatch) Strategy: Rolling (kubectl set image) Registry: ghcr.io/org/api-server Phase 2: ANALYZE Missing stages: security scan, post-deploy smoke tests Environment isolation: PASS Secret hygiene: WARN -- AWS_ACCESS_KEY_ID used instead of OIDC Pipeline duration: 18m (test and lint run sequentially) Phase 3: DESIGN Recommendation: Add trivy scan after Docker build Recommendation: Switch to AWS OIDC for keyless authentication Recommendation: Parallelize lint and test jobs (saves ~4m) Recommendation: Add smoke test job after deploy-staging Phase 4: VALIDATE actionlint: PASS Environment variables: PASS Branch protection: WARN -- main branch allows force-push Result: WARN -- 3 recommendations, 1 security improvement needed
Example: Python Service with GitLab CI and Canary Deploy
Phase 1: DETECT Found: .gitlab-ci.yml with 5 stages Environments: dev, staging, production Strategy: Canary (Istio VirtualService weight shifting) Registry: registry.gitlab.com/org/service Phase 2: ANALYZE All 9 standard stages present Environment isolation: PASS Canary configuration: 5% -> 25% -> 75% -> 100% over 30 minutes Rollback: Automatic on 5xx rate > 1% Phase 3: DESIGN Current strategy is well-configured. Minor recommendations: - Add canary duration metrics to Grafana dashboard - Add deployment event annotation to Prometheus - Consider adding a manual gate between 75% and 100% Phase 4: VALIDATE GitLab CI lint: PASS Environment variables: PASS Branch protection: PASS Result: PASS -- pipeline is production-ready
Gates
- No production deploy without staging validation. If the pipeline allows direct-to-production deployment without a prior staging step, flag as a blocking issue.
- No long-lived credentials in pipelines. Hardcoded secrets or long-lived access keys in pipeline files are blocking findings. OIDC or short-lived tokens must be used.
- No deploy without rollback. Every deployment target must have a documented or automated rollback mechanism. Missing rollback is a blocking warning.
- No skipping pipeline lint. Pipeline configuration must pass syntax validation before recommendations are made.
Evidence Requirements
When this skill makes claims about existing code, architecture, or behavior, it MUST cite evidence using one of:
- File reference:
format (e.g.,file:line
)src/auth.ts:42 - Code pattern reference:
with description (e.g.,file
— "existing bcrypt wrapper")src/utils/hash.ts - Test/command output: Inline or referenced output from a test run or CLI command
- Session evidence: Write to the
session section viaevidencemanage_state
Uncited claims: Technical assertions without citations MUST be prefixed with
[UNVERIFIED]. Example: [UNVERIFIED] The auth middleware supports refresh tokens.
Red Flags
Universal
These apply to ALL skills. If you catch yourself doing any of these, STOP.
- "I believe the codebase does X" — Stop. Read the code and cite a file:line reference. Belief is not evidence.
- "Let me recommend [pattern] for this" without checking existing patterns — Stop. Search the codebase first. The project may already have a convention.
- "While we're here, we should also [unrelated improvement]" — Stop. Flag the idea but do not expand scope beyond the stated task.
Domain-Specific
- "Deploying without a health check endpoint" — Stop. Without health checks, the orchestrator cannot detect failed deployments. Add health checks before deploying.
- "Skipping canary deployment, it's a small change" — Stop. Small changes cause outages too. Follow the deployment policy regardless of change size.
- "Rolling back manually if something goes wrong" — Stop. Manual rollback under incident pressure fails. Automate rollback before deploying.
- "We can update the runbook after the deploy" — Stop. If the deployment changes operational behavior, update the runbook first. Stale runbooks during incidents cause escalations.
Rationalizations to Reject
Universal
These reasoning patterns sound plausible but lead to bad outcomes. Reject them.
- "It's probably fine" — "Probably" is not evidence. Verify before asserting.
- "This is best practice" — Best practice in what context? Cite the source and confirm it applies to this codebase.
- "We can fix it later" — If it is worth flagging, it is worth documenting now with a concrete follow-up plan.
Domain-Specific
| Rationalization | Reality |
|---|---|
| "It's just a config change, not a code change" | Config changes cause outages at the same rate as code changes. Deploy them with the same rigor and rollback strategy. |
| "We tested this in staging" | Staging is not production. Traffic patterns, data volume, and edge cases differ. Staging success does not guarantee production safety. |
| "Downtime will be brief" | Brief is not zero. Quantify the expected impact and communicate it to stakeholders before deploying. |
Escalation
- When the CI/CD platform is unsupported: Report which platform was detected and that analysis is limited to general best practices. Recommend the user provide platform-specific documentation for deeper analysis.
- When secrets are found hardcoded in pipeline files: Immediately flag as a critical finding. Do not proceed with strategy recommendations until secrets are remediated. Recommend rotating the exposed credentials.
- When multiple deployment strategies are mixed across environments: This is valid (e.g., rolling for staging, canary for production). Analyze each independently and verify the promotion workflow handles the strategy transition.
- When pipeline configuration is generated by a tool (Terraform, Pulumi): Analyze the generated output but note that fixes must be applied to the generator configuration, not the output files.