Claude-Skills devops-workflow-engineer
git clone https://github.com/borghei/Claude-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/borghei/Claude-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/engineering/devops-workflow-engineer" ~/.claude/skills/borghei-claude-skills-devops-workflow-engineer && rm -rf "$T"
engineering/devops-workflow-engineer/SKILL.mdDevOps Workflow Engineer
The agent generates GitHub Actions workflow YAML, analyzes existing pipelines for optimization opportunities, and creates deployment plans with strategy selection, health checks, and rollback procedures.
Quick Start
# Generate a CI workflow python scripts/workflow_generator.py --type ci --language python --test-framework pytest # Analyze existing pipelines for optimization python scripts/pipeline_analyzer.py .github/workflows/ --format json # Plan a deployment strategy python scripts/deployment_planner.py --type webapp --environments dev,staging,prod --strategy canary
Tools Overview
| Tool | Input | Output |
|---|---|---|
| Workflow type + language | GitHub Actions YAML (ci, cd, release, security-scan, docs-check) |
| Workflow file or directory | Optimization findings, cost estimates, severity ratings |
| Project type + environments | Deployment plan with strategy, health checks, rollback |
All tools support
--format json and --output for file writing.
Workflow 1: CI Pipeline Design
The agent generates pipelines following fail-fast ordering:
- Lint and format (~30s) -- cheapest gate first
- Unit tests (~2-5m) -- matrix across versions
- Build verification (~3-8m)
- Integration tests (~5-15m, parallel with build)
- Security scanning (~2-5m)
jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: make lint test: needs: lint strategy: matrix: python-version: ['3.10', '3.11', '3.12'] steps: - uses: actions/setup-python@v5 with: { python-version: "${{ matrix.python-version }}", cache: pip } - run: pip install -r requirements.txt - run: pytest --junitxml=results.xml security: needs: lint steps: - run: pip-audit -r requirements.txt
CI targets:
| Metric | Target | Fix |
|---|---|---|
| Total CI time | < 10 min | Parallelize, add caching |
| Lint step | < 1 min | Use pre-commit locally |
| Unit tests | < 5 min | Split suites, use matrix |
| Flaky rate | < 1% | Quarantine flaky tests |
| Cache hit rate | > 80% | Review cache keys |
Workflow 2: CD Pipeline and Multi-Environment Deployment
python scripts/deployment_planner.py --type webapp --environments dev,staging,prod --format json
Environment promotion flow:
Build -> Dev (auto) -> Staging (auto) -> Production (manual approval) | Canary (10%) -> Full rollout
| Aspect | Dev | Staging | Production |
|---|---|---|---|
| Trigger | Every push | Merge to main | Manual approval |
| Replicas | 1 | 2 | 3+ (auto-scaled) |
| Secrets | Repository | Environment | Vault/OIDC |
| Monitoring | Basic logs | Full observability | Full + alerting |
Key CD rules:
- Build once, deploy the same artifact everywhere
- Tag artifacts with commit SHA for traceability
- Use environment protection rules for production gates
- Maintain rollback capability at every stage
Workflow 3: Pipeline Optimization
python scripts/pipeline_analyzer.py .github/workflows/ --format json -o report.json
The agent checks for:
- Missing caching -- dependencies reinstalled every run
- No timeouts -- stuck jobs burn budget
- Sequential chains that could parallelize
- Deprecated actions with newer versions available
- Security issues -- secrets in logs, missing permissions scoping
- Cost inefficiency -- oversized runners, no path filtering
Optimization techniques:
Path-based filtering -- skip CI for docs-only changes:
on: push: paths: ['src/**', 'tests/**', 'requirements*.txt'] paths-ignore: ['docs/**', '*.md']
Concurrency cancellation -- cancel superseded runs:
concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true
Dependency caching:
- uses: actions/cache@v4 with: path: ~/.cache/pip key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt') }}
Deployment Strategies
Decision tree:
Zero-downtime required? No -> Rolling deployment Yes -> Need instant rollback? No -> Rolling with health checks Yes -> Budget for 2x infrastructure? Yes -> Blue-green No -> Canary
Canary traffic split schedule:
| Phase | % | Duration | Gate |
|---|---|---|---|
| 1 | 5% | 15 min | Error rate < 0.1% |
| 2 | 25% | 30 min | P99 latency < 200ms |
| 3 | 50% | 60 min | Business metrics stable |
| 4 | 100% | -- | Full promotion |
GitHub Actions Patterns
Reusable workflows -- define once, call everywhere:
# .github/workflows/reusable-deploy.yml on: workflow_call: inputs: environment: { required: true, type: string } image_tag: { required: true, type: string } secrets: DEPLOY_KEY: { required: true }
OIDC authentication -- no long-lived credentials:
permissions: id-token: write contents: read steps: - uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::123456789:role/github-actions aws-region: us-east-1
Secrets hierarchy: Organization > Repository > Environment. Never echo secrets; use
add-mask for dynamic values. Prefer OIDC for cloud auth.
Runner Cost Optimization
| Runner | vCPU | RAM | Cost/min | Best For |
|---|---|---|---|---|
| 2-core | 2 | 7 GB | $0.008 | Standard tasks |
| 4-core | 4 | 16 GB | $0.016 | Build-heavy |
| 8-core | 8 | 32 GB | $0.032 | Large compilations |
| 16-core | 16 | 64 GB | $0.064 | Parallel test suites |
Monthly estimate:
(runs/day) x (avg min/run) x 30 x (cost/min)
Example: 50 pushes/day x 8 min x 30 = 12,000 min x $0.008 = $96/month.
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Monolithic workflow | 45-min single workflow | Split into parallel jobs |
| No caching | Reinstall deps every run | Cache dependencies and builds |
| Secrets in logs | Leaked credentials | , avoid |
| No timeout | Stuck jobs burn budget | on every job |
| Full matrix every push | 30-min matrix on every commit | Full nightly; reduced on push |
| No rollback plan | Stuck with broken deploy | Automate rollback in CD pipeline |
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Workflow never triggers | Wrong config or branch name mismatch | Verify triggers match branching strategy |
| Cache miss every run | Volatile cache key (timestamp) | Use on lock files |
| Matrix fails on one OS only | Platform-specific paths or deps | Use ; install OS deps per matrix entry |
| Secret not available | Wrong environment scope | Ensure job declares correct |
| Health check fails after deploy | App not started before check | Add retry loop with backoff |
| Concurrency cancels needed runs | Overly broad group key | Scope to ; separate groups for deploy |
References
| Guide | Path |
|---|---|
| GitHub Actions Patterns | |
| Deployment Strategies | |
| Agentic Workflows Guide | |
Integration Points
| Skill | Integration |
|---|---|
| Release workflows align with versioning and changelog |
| Deployment strategies complement infra automation |
| Security scanning steps feed SecOps dashboards |
| CI quality gates map to QA acceptance criteria |
| Rollback procedures connect to incident playbooks |
Last Updated: April 2026 Version: 1.1.0