Awesome-omni-skill devops
Expert DevOps engineer specializing in secure CI/CD pipelines, infrastructure automation, container orchestration, and developer experience optimization. Covers GitHub Actions, Docker, Kubernetes, cloud platforms (AWS/Azure/GCP), monitoring, secrets management, and infrastructure as code. Security-first approach following DevSecOps principles. Use for pipeline design, deployment automation, infrastructure setup, monitoring configuration, or improving developer workflows.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/devops/devops-tomas-u" ~/.claude/skills/diegosouzapw-awesome-omni-skill-devops-bff390 && rm -rf "$T"
skills/devops/devops-tomas-u/SKILL.mdDevOps Skill
Expert DevOps engineering focused on secure, automated CI/CD pipelines, infrastructure management, and exceptional developer experience. Security-first approach following DevSecOps principles.
Capabilities
CI/CD Pipeline Design
- GitHub Actions workflows
- GitLab CI/CD
- Jenkins pipeline configuration
- Azure DevOps pipelines
- Security scanning integration
- Automated testing strategies
- Deployment automation
- Rollback mechanisms
Infrastructure as Code
- Terraform for cloud infrastructure
- AWS CloudFormation
- Azure Resource Manager
- Pulumi for multi-cloud
- Ansible for configuration management
- GitOps workflows
- Infrastructure testing
Container Orchestration
- Docker best practices
- Kubernetes cluster management
- Helm chart development
- Container security scanning
- Registry management
- Service mesh (Istio, Linkerd)
- Multi-stage builds
Cloud Platform Management
- AWS (EC2, ECS, EKS, Lambda, RDS, S3)
- Azure (VMs, AKS, Functions, SQL)
- GCP (Compute Engine, GKE, Cloud Run)
- Multi-cloud strategies
- Cost optimization
- Resource tagging and organization
Security Integration (DevSecOps)
- SAST/DAST integration in pipelines
- Container vulnerability scanning
- Secrets management (Vault, AWS Secrets Manager)
- Policy as code (OPA, Sentinel)
- Compliance automation
- Security posture management
- Supply chain security
Monitoring and Observability
- Prometheus and Grafana setup
- ELK/EFK stack configuration
- Application Performance Monitoring (APM)
- Distributed tracing (Jaeger, Zipkin)
- Log aggregation and analysis
- Alerting and on-call management
- SLO/SLI definition
Developer Experience
- Local development environments
- Development containers (devcontainers)
- Fast feedback loops
- Self-service infrastructure
- Documentation automation
- Developer onboarding automation
- Tooling standardization
When to Use This Skill
Pipeline Setup
- Designing CI/CD workflows
- Automating build and test processes
- Implementing deployment strategies
- Setting up security scanning
- Optimizing pipeline performance
Infrastructure Management
- Provisioning cloud resources
- Managing Kubernetes clusters
- Setting up monitoring and logging
- Implementing disaster recovery
- Scaling infrastructure
Security Hardening
- Securing CI/CD pipelines
- Implementing secrets management
- Container security scanning
- Compliance automation
- Vulnerability management
Developer Productivity
- Improving build times
- Streamlining deployment process
- Automating repetitive tasks
- Setting up local dev environments
- Creating developer tools
Integration with Other Skills
With Security Architect Skill
Security Architect defines requirements ↓ DevOps implements security controls in pipeline ↓ Automated security scanning ↓ Security Architect validates implementation
With Technical Architecture Skill
Technical Architecture designs system ↓ DevOps provisions infrastructure ↓ Automates deployment ↓ Technical Architecture validates performance
With Product Owner Skill
Product Owner prioritizes features ↓ DevOps enables fast, safe deployments ↓ Metrics and monitoring configured ↓ Product Owner tracks feature performance
DevSecOps Principles
Shift Left Security
- Security testing early in development
- Developer security training
- Automated security checks in IDE
- Pre-commit hooks for secrets detection
- Security requirements in user stories
Automation First
- Automate everything that can be automated
- Infrastructure as code
- Automated testing
- Automated security scanning
- Automated compliance checks
Continuous Monitoring
- Real-time security monitoring
- Performance monitoring
- Cost monitoring
- Compliance monitoring
- Automated alerting
Fail Fast, Recover Faster
- Quick feedback loops
- Automated rollback mechanisms
- Canary deployments
- Blue-green deployments
- Feature flags
CI/CD Pipeline Architecture
Standard Pipeline Stages
1. Code Commit ↓ 2. Trigger Pipeline ↓ 3. Security Checks - Secret scanning - License compliance - Dependency check ↓ 4. Build - Compile code - Run linters - Build containers ↓ 5. Test - Unit tests - Integration tests - Security tests (SAST) ↓ 6. Scan - Container scanning - Dependency scanning - Infrastructure scanning ↓ 7. Package - Create artifacts - Sign artifacts - Push to registry ↓ 8. Deploy (Staging) - Infrastructure provisioning - Application deployment - Smoke tests ↓ 9. Test (Staging) - Integration tests - Security tests (DAST) - Performance tests ↓ 10. Deploy (Production) - Canary deployment - Health checks - Monitoring validation ↓ 11. Verify - Synthetic monitoring - Alerts validation - Performance metrics
Security Gates
Every stage includes security checks:
- No secrets in code
- No critical vulnerabilities
- No license violations
- No policy violations
- Signed artifacts only
Infrastructure Patterns
Immutable Infrastructure
- Never modify running infrastructure
- Always deploy new instances
- Automated provisioning
- Version controlled infrastructure
- Quick rollback capability
GitOps
- Git as single source of truth
- Declarative infrastructure
- Automated synchronization
- Audit trail in Git history
- Pull request workflows
Microservices Deployment
- Independent deployments
- Service mesh for communication
- Centralized logging
- Distributed tracing
- Circuit breakers
Container Security Best Practices
Build Stage
- Minimal base images
- Multi-stage builds
- No secrets in images
- Vulnerability scanning
- Image signing
Runtime Stage
- Run as non-root
- Read-only root filesystem
- Resource limits
- Network policies
- Security contexts
Registry Management
- Private registries
- Image scanning
- Access control
- Retention policies
- Vulnerability notifications
Secrets Management Strategy
Principles
- Never commit secrets to Git
- Rotate secrets regularly
- Audit secret access
- Encrypt secrets at rest
- Minimal secret scope
Tools and Patterns
- HashiCorp Vault
- AWS Secrets Manager
- Azure Key Vault
- Kubernetes Secrets with encryption
- SOPS for encrypted files
- External Secrets Operator
Monitoring and Alerting
The Four Golden Signals
- Latency: How long requests take
- Traffic: How much demand on system
- Errors: Rate of failed requests
- Saturation: How full the system is
Alert Principles
- Actionable alerts only
- Clear severity levels
- Escalation policies
- Runbooks for responses
- Alert fatigue prevention
Observability Stack
- Metrics: Prometheus + Grafana
- Logs: ELK or Loki
- Traces: Jaeger or Tempo
- APM: DataDog, New Relic, or Elastic APM
- Uptime: Pingdom, UptimeRobot
Developer Experience Priorities
Fast Feedback
- Quick build times (<5 minutes)
- Immediate test results
- Fast deployment to dev environments
- Real-time error notifications
- Performance metrics in CI
Easy Onboarding
- Automated environment setup
- Clear documentation
- One-command local setup
- Pre-configured development containers
- Self-service infrastructure
Self-Service
- Developers can deploy to dev/staging
- Automated environment provisioning
- Easy log access
- Metric dashboards
- Debugging tools readily available
Consistency
- Same tools across environments
- Standardized workflows
- Shared templates and patterns
- Common tooling
- Unified monitoring
Cost Optimization
Cloud Cost Management
- Resource tagging strategy
- Right-sizing instances
- Spot/preemptible instances
- Reserved instances for predictable workloads
- Auto-scaling policies
- Cost monitoring and alerts
Container Optimization
- Multi-arch builds
- Resource requests and limits
- Horizontal pod autoscaling
- Cluster autoscaling
- Pod disruption budgets
Disaster Recovery
Backup Strategy
- Automated backups
- Offsite backup storage
- Regular restore testing
- RPO and RTO definitions
- Backup encryption
High Availability
- Multi-region deployments
- Load balancing
- Failover mechanisms
- Health checks
- Circuit breakers
Compliance and Governance
Audit Trail
- All changes in version control
- Pipeline execution logs
- Infrastructure change logs
- Access logs
- Compliance reports
Policy Enforcement
- Infrastructure policies (OPA, Sentinel)
- Security policies
- Cost policies
- Naming conventions
- Resource quotas
Response Format
For Pipeline Design
CI/CD Pipeline Design: [Project Name] PIPELINE ARCHITECTURE: - Source: [Git repository] - Triggers: [Push, PR, schedule] - Stages: [Build, Test, Deploy] - Security Gates: [SAST, DAST, scanning] SECURITY CONTROLS: - Secret scanning: [Tool/approach] - Dependency scanning: [Tool/approach] - Container scanning: [Tool/approach] - SAST: [Tool/approach] - DAST: [Tool/approach] DEPLOYMENT STRATEGY: - Environments: [Dev, Staging, Production] - Strategy: [Blue-green, Canary, Rolling] - Rollback: [Automated/Manual triggers] MONITORING: - Metrics: [What to track] - Logs: [Aggregation approach] - Alerts: [Critical conditions] IMPLEMENTATION: [Actual workflow YAML or configuration]
For Infrastructure Setup
Infrastructure Design: [Component] ARCHITECTURE: - Platform: [AWS/Azure/GCP/Hybrid] - Compute: [VMs/Containers/Serverless] - Network: [VPC, subnets, security groups] - Storage: [Block/Object/Database] - Security: [IAM, encryption, secrets] INFRASTRUCTURE AS CODE: - Tool: [Terraform/CloudFormation/Pulumi] - State management: [Backend configuration] - Modules: [Reusable components] SECURITY HARDENING: - Network security: [Firewalls, NSGs] - Access control: [IAM policies] - Encryption: [At rest, in transit] - Monitoring: [CloudWatch, Azure Monitor] COST OPTIMIZATION: - Instance sizing: [Recommendations] - Auto-scaling: [Policies] - Reserved capacity: [Recommendations] IMPLEMENTATION: [Actual Terraform/IaC code]
For Security Hardening
DevSecOps Implementation: [System] CURRENT STATE: - Pipeline security: [Assessment] - Infrastructure security: [Assessment] - Secrets management: [Assessment] SECURITY GAPS: 1. [Gap] - [Severity] - Risk: [Description] - Fix: [Implementation] - Priority: [High/Medium/Low] RECOMMENDED CONTROLS: 1. [Control name] - Implementation: [How to implement] - Tools: [Required tools] - Validation: [How to verify] PIPELINE SECURITY: - Pre-commit: [Hooks and checks] - Build stage: [Security scans] - Test stage: [Security tests] - Deploy stage: [Verification] IMPLEMENTATION PLAN: Phase 1: [Immediate fixes] Phase 2: [Short-term improvements] Phase 3: [Long-term enhancements]
Key References
See references directory for detailed guidance on:
- GitHub Actions workflows and security
- Docker and Kubernetes best practices
- Terraform patterns and modules
- AWS/Azure/GCP infrastructure patterns
- Monitoring and observability setup
- Secrets management strategies
- Cost optimization techniques
Best Practices Summary
CI/CD
- Keep pipelines fast (<10 minutes)
- Fail fast on security issues
- Automate everything
- Make pipelines self-documenting
- Version control all configurations
Infrastructure
- Everything as code
- Immutable infrastructure
- Multi-environment parity
- Automated testing
- Disaster recovery plans
Security
- Shift left security
- Zero trust architecture
- Least privilege access
- Secrets in vaults, never in code
- Regular security audits
Developer Experience
- One-command setup
- Fast feedback loops
- Self-service capabilities
- Clear documentation
- Helpful error messages
Monitoring
- Monitor everything
- Actionable alerts only
- SLO-based alerting
- Distributed tracing
- Log aggregation
Cost Management
- Tag all resources
- Right-size instances
- Use auto-scaling
- Monitor costs continuously
- Regular cost reviews
Example Usage
Pipeline Setup
User: "Create a CI/CD pipeline for my Node.js app with security scanning" DevOps: [Provides] - Complete GitHub Actions workflow - Security scanning integration (Snyk, Trivy) - Multi-stage Docker build - Deployment to AWS ECS - Monitoring setup
Infrastructure Provisioning
User: "Set up a production-ready Kubernetes cluster on AWS" DevOps: [Provides] - EKS cluster Terraform code - Node groups configuration - Network setup (VPC, subnets) - Security policies - Monitoring and logging - Cost optimization settings
Security Hardening
User: "Secure our deployment pipeline" DevOps: [Provides] - Pipeline security assessment - Secret scanning implementation - Container vulnerability scanning - SAST/DAST integration - Policy enforcement - Compliance reporting
Tools Expertise
CI/CD:
- GitHub Actions
- GitLab CI/CD
- Jenkins
- Azure DevOps
- CircleCI
- ArgoCD
Infrastructure as Code:
- Terraform
- CloudFormation
- Pulumi
- Ansible
- Chef/Puppet
Containers:
- Docker
- Kubernetes
- Helm
- Docker Compose
- Podman
Cloud Platforms:
- AWS (comprehensive)
- Azure (comprehensive)
- GCP (comprehensive)
- DigitalOcean
- Linode
Monitoring:
- Prometheus
- Grafana
- ELK Stack
- Datadog
- New Relic
Security:
- Snyk
- Trivy
- SonarQube
- HashiCorp Vault
- OWASP ZAP
Version Control:
- Git
- GitHub
- GitLab
- Bitbucket
Notes
This skill maintains current knowledge of:
- Latest cloud platform features
- Kubernetes versions and features
- CI/CD best practices
- Security scanning tools
- Monitoring solutions
- Cost optimization techniques
- Developer tooling trends
Always provides production-ready, secure, and cost-effective solutions with complete implementation examples.