Gsd-skill-creator openstack-backup
OpenStack backup operations skill for protecting cloud infrastructure through systematic backup strategies and disaster recovery procedures. Covers database backups (MariaDB full and incremental with mariabackup), configuration backups (globals.yml, inventory, Fernet keys), volume snapshots (Cinder LVM snapshots), image exports (Glance), instance snapshots (Nova), backup encryption (GPG/OpenSSL), retention policies (daily/weekly/monthly rotation), restore procedures (database point-in-time recovery, service rebuild), RPO/RTO planning, and disaster recovery drills. Use when planning backup strategy, scheduling automated backups, testing restore procedures, or executing disaster recovery.
git clone https://github.com/Tibsfox/gsd-skill-creator
T=$(mktemp -d) && git clone --depth=1 https://github.com/Tibsfox/gsd-skill-creator "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/openstack/backup" ~/.claude/skills/tibsfox-gsd-skill-creator-openstack-backup && rm -rf "$T"
skills/openstack/backup/SKILL.mdOpenStack Backup Operations
A cloud without tested backups is a cloud waiting to lose data. Backup operations protect the entire stack -- from the databases that store state, to the configurations that define behavior, to the volumes that hold user data. The backup hierarchy ensures that any component can be recovered independently, from a single Keystone Fernet key to a complete cloud rebuild.
Backup planning starts with two numbers: RPO (Recovery Point Objective) -- how much data loss is acceptable (e.g., "no more than 1 hour of changes"), and RTO (Recovery Time Objective) -- how long recovery can take (e.g., "service restored within 4 hours"). These numbers drive every decision: backup frequency, storage location, retention depth, and automation investment.
In NASA SE terms, backup maps to Phase E (Operations & Sustainment) as a core sustainment activity, and Phase F (Closeout) for data archive procedures. The backup hierarchy follows the same defense-in-depth philosophy NASA applies to mission-critical data: multiple copies, multiple formats, verified recoverability.
Deploy
Backup Infrastructure Setup
Backup scripts location:
# Create backup directory structure mkdir -p /opt/openstack-backups/{database,config,volumes,images,snapshots} mkdir -p /opt/openstack-backups/scripts mkdir -p /opt/openstack-backups/logs # Set ownership and permissions chown -R root:root /opt/openstack-backups chmod 700 /opt/openstack-backups
Kolla-Ansible built-in database backup:
# MariaDB backup (built into Kolla-Ansible) kolla-ansible -i inventory mariadb_backup # Backup location: /var/lib/docker/volumes/mariadb/_data/backups/ # File: mysqlbackup-<timestamp>.qp.xb.gz
Cron scheduling for automated backups:
# /etc/cron.d/openstack-backup # Daily database backup at 02:00 0 2 * * * root /opt/openstack-backups/scripts/backup-databases.sh >> /opt/openstack-backups/logs/db-backup.log 2>&1 # Daily config backup at 02:30 30 2 * * * root /opt/openstack-backups/scripts/backup-configs.sh >> /opt/openstack-backups/logs/config-backup.log 2>&1 # Weekly volume snapshots (Sunday 03:00) 0 3 * * 0 root /opt/openstack-backups/scripts/backup-volumes.sh >> /opt/openstack-backups/logs/volume-backup.log 2>&1
Backup storage targets:
| Target | Use Case | Configuration |
|---|---|---|
| Local disk | Fast restore, primary copy | |
| NFS mount | Off-host copy, disaster recovery | Mount at |
| Swift (S3-compatible) | Object storage archive, long-term retention | Use or s3cmd |
Configure
MariaDB Backup Strategy
Full backup (weekly):
#!/bin/bash # backup-databases.sh -- full MariaDB backup with mariabackup BACKUP_DIR="/opt/openstack-backups/database" TIMESTAMP=$(date +%Y%m%d_%H%M%S) BACKUP_FILE="${BACKUP_DIR}/full_${TIMESTAMP}" # Use Kolla-Ansible's built-in backup kolla-ansible -i /etc/kolla/inventory mariadb_backup # Or manual mariabackup via container docker exec mariadb mariabackup --backup \ --target-dir=/backup/full_${TIMESTAMP} \ --user=root --password=${DB_ROOT_PASSWORD} echo "Full backup completed: ${BACKUP_FILE}" >> /opt/openstack-backups/logs/db-backup.log
Incremental backup (daily, between full backups):
# Incremental based on last full backup LAST_FULL=$(ls -d ${BACKUP_DIR}/full_* | sort | tail -1) docker exec mariadb mariabackup --backup \ --target-dir=/backup/inc_${TIMESTAMP} \ --incremental-basedir=${LAST_FULL} \ --user=root --password=${DB_ROOT_PASSWORD}
Service-Specific Backups
RabbitMQ definitions export:
# Export queue definitions, exchanges, bindings docker exec rabbitmq rabbitmqctl export_definitions /tmp/rabbitmq_definitions.json docker cp rabbitmq:/tmp/rabbitmq_definitions.json \ /opt/openstack-backups/config/rabbitmq_definitions_$(date +%Y%m%d).json
Keystone Fernet key backup:
# Critical: Fernet keys are required to validate existing tokens docker cp keystone_api:/etc/kolla/keystone/fernet-keys/ \ /opt/openstack-backups/config/fernet-keys_$(date +%Y%m%d)/
Cinder volume snapshot policies:
# Create snapshot of all in-use volumes for vol_id in $(openstack volume list --status in-use -f value -c ID); do openstack volume snapshot create --volume ${vol_id} \ --description "Scheduled backup $(date +%Y%m%d)" \ backup_${vol_id}_$(date +%Y%m%d) done
Glance image export:
# Export critical images (base OS, golden images) for image_id in $(openstack image list --status active -f value -c ID); do openstack image save --file /opt/openstack-backups/images/${image_id}.raw ${image_id} done
Nova instance snapshot:
# Snapshot critical instances openstack server image create --name "backup_$(date +%Y%m%d)" <instance-id>
Configuration File Backup
#!/bin/bash # backup-configs.sh -- backup all Kolla-Ansible configurations BACKUP_DIR="/opt/openstack-backups/config/kolla_$(date +%Y%m%d)" mkdir -p ${BACKUP_DIR} # Core configuration files cp /etc/kolla/globals.yml ${BACKUP_DIR}/ cp /etc/kolla/passwords.yml ${BACKUP_DIR}/ cp -r /etc/kolla/inventory ${BACKUP_DIR}/ cp /etc/kolla/admin-openrc.sh ${BACKUP_DIR}/ # Per-service configuration overrides for svc in keystone nova neutron cinder glance swift heat horizon; do [ -d /etc/kolla/${svc} ] && cp -r /etc/kolla/${svc} ${BACKUP_DIR}/ done # Encrypt sensitive files gpg --symmetric --cipher-algo AES256 \ --output ${BACKUP_DIR}/passwords.yml.gpg \ ${BACKUP_DIR}/passwords.yml rm ${BACKUP_DIR}/passwords.yml echo "Config backup completed: ${BACKUP_DIR}"
Backup Encryption
GPG encryption for backup files:
# Encrypt a backup file gpg --symmetric --cipher-algo AES256 \ --batch --passphrase-file /root/.backup-passphrase \ --output backup.tar.gz.gpg backup.tar.gz # Decrypt gpg --decrypt --batch --passphrase-file /root/.backup-passphrase \ --output backup.tar.gz backup.tar.gz.gpg
OpenSSL alternative:
# Encrypt openssl enc -aes-256-cbc -salt -pbkdf2 \ -in backup.tar.gz -out backup.tar.gz.enc \ -pass file:/root/.backup-passphrase # Decrypt openssl enc -aes-256-cbc -d -pbkdf2 \ -in backup.tar.gz.enc -out backup.tar.gz \ -pass file:/root/.backup-passphrase
Retention Policies
| Tier | Frequency | Retention | Storage Target |
|---|---|---|---|
| Daily | Every day at 02:00 | 7 days | Local disk |
| Weekly | Sunday 03:00 (full DB) | 4 weeks | NFS mount |
| Monthly | First Sunday of month | 12 months | Swift/S3 archive |
Rotation script:
# Purge backups older than retention period find /opt/openstack-backups/database -name "*.gz" -mtime +7 -delete find /mnt/backup-nfs/database -name "*.gz" -mtime +28 -delete # Monthly backups on Swift are managed by object expiry headers
Backup Storage Sizing
| Component | Typical Size | Growth Rate | 30-day Estimate |
|---|---|---|---|
| MariaDB full | 500MB-2GB | ~50MB/day | ~3.5GB |
| MariaDB incremental | 50-200MB | Per day | ~4.2GB |
| Kolla configs | 10-50MB | Stable | ~50MB |
| Fernet keys | <1MB | Stable | <1MB |
| Volume snapshots | Varies by usage | Per volume | Plan per tenant |
| Glance images | 1-10GB per image | Per upload | Plan per image |
Operate
Scheduled Backup Verification
Restore test procedure (monthly):
# 1. Create isolated test database container docker run -d --name mariadb_restore_test \ -e MYSQL_ROOT_PASSWORD=testpass \ mariadb:latest # 2. Restore latest backup into test container docker exec mariadb_restore_test mariabackup --prepare \ --target-dir=/backup/latest_full docker exec mariadb_restore_test mariabackup --copy-back \ --target-dir=/backup/latest_full # 3. Verify data integrity docker exec mariadb_restore_test mysql -u root -ptestpass \ -e "SELECT COUNT(*) FROM keystone.project; SELECT COUNT(*) FROM nova.instances;" # 4. Clean up docker rm -f mariadb_restore_test
Disaster Recovery Drill Procedures
Full restore from backup (complete cloud rebuild):
- Redeploy infrastructure:
kolla-ansible -i inventory bootstrap-servers - Restore MariaDB from latest full + incremental backups
- Restore Kolla configuration files from config backup
- Deploy services:
kolla-ansible -i inventory deploy - Restore Fernet keys (critical for existing tokens)
- Verify all services:
,openstack token issueopenstack service list - Restore volume snapshots as needed
- Verify user data access
Partial service restore (single service rebuild):
# Example: rebuild Nova after database corruption # 1. Restore Nova database docker exec -i mariadb mysql -u root -p nova < nova_backup.sql # 2. Redeploy Nova only kolla-ansible -i inventory deploy --tags nova # 3. Verify openstack server list # Should show existing instances openstack compute service list # All services should be up
Database point-in-time recovery:
# 1. Prepare full backup mariabackup --prepare --target-dir=/backup/full_base # 2. Apply incrementals in order mariabackup --prepare --target-dir=/backup/full_base \ --incremental-dir=/backup/inc_001 mariabackup --prepare --target-dir=/backup/full_base \ --incremental-dir=/backup/inc_002 # 3. Copy back to data directory mariabackup --copy-back --target-dir=/backup/full_base
Backup Catalog Maintenance
Maintain a catalog of all backups with metadata:
# Append to backup catalog after each successful backup echo "$(date -u +%Y-%m-%dT%H:%M:%SZ),database,full,${BACKUP_FILE},$(du -sh ${BACKUP_FILE} | cut -f1),$(sha256sum ${BACKUP_FILE} | cut -d' ' -f1)" \ >> /opt/openstack-backups/catalog.csv
Catalog format:
timestamp,type,level,path,size,checksum
Backup Rotation Execution
#!/bin/bash # rotate-backups.sh -- enforce retention policies DAILY_KEEP=7 WEEKLY_KEEP=28 MONTHLY_KEEP=365 # Remove old daily backups find /opt/openstack-backups/database/daily -mtime +${DAILY_KEEP} -delete -print >> /opt/openstack-backups/logs/rotation.log # Remove old weekly backups find /opt/openstack-backups/database/weekly -mtime +${WEEKLY_KEEP} -delete -print >> /opt/openstack-backups/logs/rotation.log # Update catalog grep -v "DELETED" /opt/openstack-backups/catalog.csv > /opt/openstack-backups/catalog.csv.tmp mv /opt/openstack-backups/catalog.csv.tmp /opt/openstack-backups/catalog.csv echo "Rotation completed: $(date)" >> /opt/openstack-backups/logs/rotation.log
Troubleshoot
1. MariaDB Backup Failure
Symptoms:
kolla-ansible mariadb_backup exits non-zero. No backup file created.
| Cause | Diagnosis | Fix |
|---|---|---|
| Disk full | shows >95% used | Free space: purge old backups, extend volume |
| Lock timeout | Backup log: | Retry during low-activity period; increase |
| Connection refused | Backup log: | Verify MariaDB running: ; restart if needed |
| Insufficient privileges | Backup log: | Verify backup user has RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT grants |
2. Volume Snapshot Stuck in "creating" State
Symptoms:
openstack volume snapshot list shows status "creating" indefinitely.
| Cause | Diagnosis | Fix |
|---|---|---|
| Cinder backend issue | | Check LVM: for snapshot overflow; extend VG if needed |
| LVM snapshot overflow | shows >100% | Delete overflow snapshot: ; increase snapshot reserve |
| iSCSI target locked | shows stale sessions | Clear stale sessions; restart target service |
| Cinder volume service down | shows disabled | Enable service: |
3. Restore Fails
Symptoms: Backup restoration exits with errors. Database or service does not come up cleanly.
| Cause | Diagnosis | Fix |
|---|---|---|
| Incompatible backup version | mariabackup: | Restore using matching mariabackup version; or use mysqldump (logical backup) |
| Corrupted backup | mariabackup --prepare fails with checksum error | Use previous backup; verify checksums before relying on backups |
| Missing encryption key | gpg: | Locate key file; restore from secure key backup location |
| Filesystem permission mismatch | Service fails to start after restore: | ; verify container user mapping |
4. Backup Storage Full
Symptoms: Backup scripts fail with "No space left on device". Retention not enforced.
| Cause | Diagnosis | Fix |
|---|---|---|
| Retention not enforced | Old backups still present: | Run rotation script manually; fix cron schedule |
| Unexpected growth | Single large backup: | Investigate cause (large DB tables, uncompressed backups); add compression |
| NFS mount disconnected | shows local mount | Remount NFS; check network connectivity to NFS server |
5. Incremental Backup Chain Broken
Symptoms: Cannot restore because intermediate incremental backup is missing or corrupted.
| Cause | Diagnosis | Fix |
|---|---|---|
| Missing parent backup | fails: | Start new full backup; rebuild incremental chain from new base |
| Backup file corrupted | Checksum mismatch in catalog | Use last known-good full backup; accept data loss to RPO of that backup |
| Catalog out of sync | Backup files exist but catalog references wrong paths | Rebuild catalog from filesystem: scan backup directories and regenerate |
Integration Points
Backup connects to every data-bearing component of the cloud stack:
Core OpenStack services: Backs up databases and configuration files for all 8 services. Each service's database (keystone, nova, neutron, cinder, glance, swift, heat, horizon) is included in the MariaDB backup set. Service-specific configs are backed up from
/etc/kolla/<service>/.
Monitoring skill: Backup job success/failure metrics feed Prometheus alerting. Monitoring tracks backup duration, storage consumption, and schedule adherence. Alert on backup failures or missed backup windows.
Security skill: Backup encryption protects sensitive data at rest (passwords.yml, Fernet keys, database dumps). Credential backup requires secure storage with access controls. Security audit includes backup access review.
Capacity skill: Backup storage consumption is included in capacity planning models. Growth rate of backup storage drives storage provisioning decisions.
Kolla-ansible-ops skill: Backup is a prerequisite before upgrade and reconfigure operations. The upgrade procedure mandates a verified backup before any destructive changes.
kolla-ansible mariadb_backup is the first step in upgrade runbooks.
NASA SE Cross-References
| SE Phase | Backup Activity | Reference |
|---|---|---|
| Phase B (Preliminary Design) | Define backup requirements: RPO/RTO targets, storage requirements, retention policies | SP-6105 SS 4.3-4.4 |
| Phase E (Operations & Sustainment) | Execute backup schedules, verify restore procedures, maintain backup catalog, conduct DR drills | SP-6105 SS 5.4-5.5 |
| Phase E (Technical Data Management) | Backup as data management activity: ensure all operational data is recoverable per retention policy | SP-6105 SS 6.5 |
| Phase F (Closeout) | Data archive procedures: export and archive all cloud data before decommission, verify archive integrity | SP-6105 SS 6.1 |