Awesome-omni-skill Disaster Recovery

'Disaster Recovery encompasses strategies and procedures for recovering

install

source · Clone the upstream repo

git clone https://github.com/diegosouzapw/awesome-omni-skill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/backend/disaster-recovery" ~/.claude/skills/diegosouzapw-awesome-omni-skill-disaster-recovery && rm -rf "$T"

manifest: skills/backend/disaster-recovery/SKILL.md

safety · automated scan (low risk)

This is a pattern-based risk scan, not a security review. Our crawler flagged:

references .env files
references API keys

Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.

source content

Disaster Recovery

Skill Profile

(Select at least one profile to enable specific modules)

Overview

Disaster Recovery encompasses strategies and procedures for recovering from catastrophic failures and ensuring business continuity. This includes backup strategies, failover mechanisms, data recovery procedures, and business continuity planning.

Core Principle: "Plan for the worst, hope for the best. Test your recovery plan before you need it."

Why This Matters

Data Protection: Prevents permanent data loss from catastrophic failures
Business Continuity: Ensures operations can resume quickly after major incidents
Customer Trust: Demonstrates commitment to data protection and reliability
Compliance: Meets regulatory requirements for data backup and retention
Reduced Downtime: Minimizes RTO (Recovery Time Objective) and RPO (Recovery Point Objective)

Core Concepts & Rules

1. Core Principles

Follow established patterns and conventions
Maintain consistency across codebase
Document decisions and trade-offs

2. Implementation Guidelines

Start with the simplest viable solution
Iterate based on feedback and requirements
Test thoroughly before deployment

Inputs / Outputs / Contracts

Inputs:
- System architecture and component inventory
- Data classification and criticality levels
- RPO/RTO requirements per system
- Regulatory compliance requirements
Entry Conditions:
- Backup infrastructure is in place
- Recovery procedures are documented
- Team has been trained on recovery procedures
Outputs:
- Backup and recovery documentation
- Runbooks for disaster recovery
- Monitoring and alerting configuration
Artifacts Required (Deliverables):
- Backup schedules and retention policies
- Recovery runbooks with step-by-step procedures
- Contact information for recovery teams and vendors
Acceptance Evidence:
- Successful backup restoration test (screenshot/log)
- Recovery drill results (RTO/RPO met)
- Compliance audit report
Success Criteria:
- RPO and RTO targets met for all critical systems
- Backups tested and verified regularly
- Recovery procedures validated through drills

Skill Composition

Depends on: Failure Modes Analysis, Monitoring & Observability
Compatible with: Chaos Engineering, System Resilience patterns
Conflicts with: Systems without backup infrastructure
Related Skills:
- 40-system-resilience/failure-modes - Understanding what to recover from
- 40-system-resilience/chaos-engineering - Testing recovery procedures
- 14-monitoring-observability/metrics-collection - Detecting failures

Quick Start / Implementation Example

Review requirements and constraints
Set up development environment
Implement core functionality following patterns
Write tests for critical paths
Run tests and fix issues
Document any deviations or decisions

# Example implementation following best practices
def example_function():
    # Your implementation here
    pass

Assumptions / Constraints / Non-goals

Assumptions:
- Development environment is properly configured
- Required dependencies are available
- Team has basic understanding of domain
Constraints:
- Must follow existing codebase conventions
- Time and resource limitations
- Compatibility requirements
Non-goals:
- This skill does not cover edge cases outside scope
- Not a replacement for formal training

Compatibility & Prerequisites

Supported Versions:
- Python 3.8+
- Node.js 16+
- Modern browsers (Chrome, Firefox, Safari, Edge)
Required AI Tools:
- Code editor (VS Code recommended)
- Testing framework appropriate for language
- Version control (Git)
Dependencies:
- Language-specific package manager
- Build tools
- Testing libraries
Environment Setup:
- ```
.env.example
```
  keys:
```
API_KEY
```
  ,
```
DATABASE_URL
```
  (no values)

Test Scenario Matrix (QA Strategy)

Type	Focus Area	Required Scenarios / Mocks
Unit	Core Logic	Must cover primary logic and at least 3 edge/error cases. Target minimum 80% coverage
Integration	DB / API	All external API calls or database connections must be mocked during unit tests
E2E	User Journey	Critical user flows to test
Performance	Latency / Load	Benchmark requirements
Security	Vuln / Auth	SAST/DAST or dependency audit
Frontend	UX / A11y	Accessibility checklist (WCAG), Performance Budget (Lighthouse score)

Technical Guardrails & Security Threat Model

1. Security & Privacy (Threat Model)

Top Threats: Injection attacks, authentication bypass, data exposure

Data Handling: Sanitize all user inputs to prevent Injection attacks. Never log raw PII
Secrets Management: No hardcoded API keys. Use Env Vars/Secrets Manager
Authorization: Validate user permissions before state changes

2. Performance & Resources

Execution Efficiency: Consider time complexity for algorithms
Memory Management: Use streams/pagination for large data
Resource Cleanup: Close DB connections/file handlers in finally blocks

3. Architecture & Scalability

Design Pattern: Follow SOLID principles, use Dependency Injection
Modularity: Decouple logic from UI/Frameworks

4. Observability & Reliability

Logging Standards: Structured JSON, include trace IDs
```
request_id
```
Metrics: Track
```
error_rate
```
,
```
latency
```
,
```
queue_depth
```
Error Handling: Standardized error codes, no bare except
Observability Artifacts:
- Log Fields: timestamp, level, message, request_id
- Metrics: request_count, error_count, response_time
- Dashboards/Alerts: High Error Rate > 5%

Agent Directives & Error Recovery

(ข้อกำหนดสำหรับ AI Agent ในการคิดและแก้ปัญหาเมื่อเกิดข้อผิดพลาด)

Thinking Process: Analyze root cause before fixing. Do not brute-force.
Fallback Strategy: Stop after 3 failed test attempts. Output root cause and ask for human intervention/clarification.
Self-Review: Check against Guardrails & Anti-patterns before finalizing.
Output Constraints: Output ONLY the modified code block. Do not explain unless asked.

Definition of Done (DoD) Checklist

Tests passed + coverage met
Lint/Typecheck passed
Logging/Metrics/Trace implemented
Security checks passed
Documentation/Changelog updated
Accessibility/Performance requirements met (if frontend)

Anti-patterns / Pitfalls

⛔ Don't: Log PII, catch-all exception, N+1 queries
⚠️ Watch out for: Common symptoms and quick fixes
💡 Instead: Use proper error handling, pagination, and logging

Reference Links & Examples

Internal documentation and examples
Official documentation and best practices
Community resources and discussions

Versioning & Changelog

Version: 1.0.0
Changelog:
- 2026-02-22: Initial version with complete template structure