Claude-skill-registry database-reliability

πŸ—„οΈ Skill: Database Reliability

install
source Β· Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code Β· Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/database-reliability" ~/.claude/skills/majiayu000-claude-skill-registry-database-reliability && rm -rf "$T"
manifest: skills/data/database-reliability/SKILL.md
source content

πŸ—„οΈ Skill: Database Reliability

πŸ“‹ Metadata

AtributoValor
ID
sre-database-reliability
NivelπŸ”΄ Avanzado
VersiΓ³n1.0.0
Keywords
database
,
postgresql
,
mysql
,
mongodb
,
replication
,
backup
,
failover
,
connection-pooling
ReferenciaPostgreSQL High Availability

πŸ”‘ Keywords para InvocaciΓ³n

  • database-reliability
  • database-replication
  • database-backup
  • failover
  • connection-pooling
  • database-monitoring
  • @skill:database-reliability

Ejemplos de Prompts

Implementa database replication y failover para PostgreSQL
Configura backup y recovery strategy para base de datos
Setup connection pooling y database monitoring
@skill:database-reliability - Alta disponibilidad de base de datos

πŸ“– DescripciΓ³n

Database reliability es fundamental para servicios confiables. Este skill cubre replication, failover, backups, connection pooling, monitoring, y disaster recovery para bases de datos en producciΓ³n.

βœ… CuΓ‘ndo Usar Este Skill

  • Bases de datos en producciΓ³n
  • Servicios con requisitos de alta disponibilidad
  • Datos crΓ­ticos
  • SLAs estrictos
  • Compliance requirements

❌ CuÑndo NO Usar Este Skill

  • Desarrollo local con datos de prueba
  • Prototipos sin datos reales
  • Sistemas de solo lectura sin disponibilidad crΓ­tica

πŸ—οΈ Arquitectura de Alta Disponibilidad

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Application Layer             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   App   β”‚  β”‚   App   β”‚  β”‚   App   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚            β”‚            β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
              β”‚ Connection  β”‚
              β”‚    Pool     β”‚
              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚            β”‚            β”‚
   β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
   β”‚Primary  β”‚  β”‚Replica β”‚  β”‚  Replica β”‚
   β”‚(Master) β”‚  β”‚(Read)  β”‚  β”‚  (Read)  β”‚
   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β”‚ Replication
        β”‚
   β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
   β”‚ Standby β”‚
   β”‚ (DR)    β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ’» ImplementaciΓ³n

πŸ“ Scripts Ejecutables: Este skill incluye scripts Python ejecutables en la carpeta

scripts/
:

Ver

scripts/README.md
para documentaciΓ³n de uso.

1. PostgreSQL Replication Setup

1.1 Primary Configuration

# postgresql.conf (Primary)
# Connection Settings
listen_addresses = '*'
max_connections = 200

# Replication Settings
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3
hot_standby = on

# Performance
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 20MB
min_wal_size = 1GB
max_wal_size = 4GB

# Logging
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 100MB
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_timezone = 'UTC'
# pg_hba.conf (Primary)
# Allow replication from replicas
host replication replicator 10.0.0.0/24 md5

1.2 Replica Configuration

# Setup replication (on replica server)
# 1. Stop PostgreSQL
sudo systemctl stop postgresql

# 2. Backup primary
pg_basebackup -h primary-host -D /var/lib/postgresql/data -U replicator -P -W -R -S replication_slot_1

# 3. Configure recovery
# postgresql.conf (Replica)
hot_standby = on
max_standby_streaming_delay = 30s
wal_receiver_status_interval = 10s
hot_standby_feedback = on
# postgresql.auto.conf (Auto-generated)
primary_conninfo = 'host=primary-host port=5432 user=replicator password=xxx application_name=replica1'
primary_slot_name = 'replication_slot_1'

1.3 Replication Slot Management

-- Create replication slot
SELECT pg_create_physical_replication_slot('replication_slot_1');

-- List replication slots
SELECT * FROM pg_replication_slots;

-- Monitor replication lag
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    sync_state,
    sync_priority
FROM pg_stat_replication;

-- Check lag in bytes
SELECT 
    pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
    client_addr,
    application_name
FROM pg_stat_replication;

2. Automatic Failover (Patroni)

# patroni/patroni.yml
scope: postgres-cluster
namespace: /db/
name: postgres-node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.0.1.1:8008
  auth: 'username:password'

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 30
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
        max_connections: 200
        max_worker_processes: 8
        wal_level: replica
        hot_standby: on
        wal_keep_segments: 8
        max_wal_senders: 3
        max_replication_slots: 3
        synchronous_commit: 'on'
        synchronous_standby_names: 'ANY 1 (postgres-node2,postgres-node3)'
  initdb:
    - encoding: UTF8
    - data-checksums
  pg_hba:
    - host replication replicator 0.0.0.0/0 md5
    - host all all 0.0.0.0/0 md5
  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.0.1.1:5432
  data_dir: /var/lib/postgresql/data
  pgpass: /tmp/pgpass
  authentication:
    replication:
      username: replicator
      password: replicator
    superuser:
      username: postgres
      password: postgres
  parameters:
    unix_socket_directories: '/var/run/postgresql'
  recovery_conf:
    restore_command: 'cp /backup/%f %p'

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

3. Connection Pooling (PgBouncer)

# pgbouncer/pgbouncer.ini
[databases]
mydb = host=primary-host port=5432 dbname=mydb

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
max_db_connections = 100
max_user_connections = 50
server_round_robin = 1
ignore_startup_parameters = extra_float_digits

# Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1

# Admin console
admin_users = admin
stats_users = stats

# Health check
server_check_query = SELECT 1
server_check_delay = 30
server_lifetime = 3600
server_idle_timeout = 600

# Connection limits
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 5

4. Backup Strategy

#!/bin/bash
# scripts/backup.sh

BACKUP_DIR="/backup/postgresql"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=30

# Full backup
pg_basebackup \
  -h localhost \
  -D "$BACKUP_DIR/full_$DATE" \
  -U backup_user \
  -Ft \
  -z \
  -P

# WAL archiving (continuous)
# In postgresql.conf:
# archive_mode = on
# archive_command = 'test ! -f /backup/wal/%f && cp %p /backup/wal/%f'

# Cleanup old backups
find "$BACKUP_DIR" -type d -mtime +$RETENTION_DAYS -exec rm -rf {} \;

# Verify backup
pg_verifybackup -D "$BACKUP_DIR/full_$DATE"

Script ejecutable:

scripts/postgresql_backup.py

Script CLI completo para automatizar backups, restauraciΓ³n y limpieza de PostgreSQL.

CuΓ‘ndo ejecutar

  • Backups programados: Como parte de cron jobs o tareas automatizadas
  • RestauraciΓ³n de emergencia: Para restaurar bases de datos desde backups
  • Limpieza de backups: Para mantener el espacio de almacenamiento bajo control
  • VerificaciΓ³n de backups: Para validar la integridad de backups antes de restaurar

Uso como CLI

# Crear backup completo
python scripts/postgresql_backup.py backup \
  --host localhost \
  --user backup_user \
  --backup-dir /backup/postgresql

# Listar backups disponibles
python scripts/postgresql_backup.py list \
  --backup-dir /backup/postgresql

# Limpiar backups antiguos (dry-run primero)
python scripts/postgresql_backup.py cleanup \
  --backup-dir /backup/postgresql \
  --retention-days 30 \
  --dry-run

# Ejecutar limpieza real
python scripts/postgresql_backup.py cleanup \
  --backup-dir /backup/postgresql \
  --retention-days 30

# Restaurar desde backup
python scripts/postgresql_backup.py restore \
  --backup-path /backup/postgresql/full_20240115_120000 \
  --target-dir /var/lib/postgresql/data

Uso como mΓ³dulo

from scripts.postgresql_backup import PostgreSQLBackup

# Crear instancia
backup = PostgreSQLBackup(
    host="localhost",
    port=5432,
    user="backup_user",
    backup_dir="/backup/postgresql"
)

# Crear backup
backup_path = backup.full_backup(verify=True)

# Limpiar backups antiguos
backup.cleanup_old_backups(retention_days=30)

# Listar backups
backup.list_backups()

CaracterΓ­sticas

  • βœ… Backup fΓ­sico completo con
    pg_basebackup
  • βœ… VerificaciΓ³n automΓ‘tica de backups
  • βœ… RestauraciΓ³n automatizada
  • βœ… Limpieza de backups antiguos con polΓ­tica de retenciΓ³n
  • βœ… Listado de backups disponibles
  • βœ… Modo dry-run para limpieza
  • βœ… Manejo de errores robusto

5. Database Monitoring

# prometheus/database-metrics.yml
scrape_configs:
  - job_name: 'postgres-exporter'
    static_configs:
      - targets: ['postgres-exporter:9187']

  - job_name: 'postgresql'
    static_configs:
      - targets: ['localhost:9187']
-- Custom metrics queries
-- Connection pool usage
SELECT 
    datname,
    numbackends as active_connections,
    max_connections,
    (numbackends::float / max_connections * 100) as connection_pct
FROM pg_stat_database;

-- Replication lag
SELECT 
    EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;

-- Table bloat
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS bloat
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

-- Slow queries
SELECT 
    pid,
    now() - pg_stat_activity.query_start AS duration,
    query,
    state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes'
  AND state = 'active';

-- Index usage
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan as index_scans,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

6. Alert Rules

# prometheus/alerts/database-alerts.yml
groups:
  - name: database_alerts
    interval: 30s
    rules:
      # High connection usage
      - alert: DatabaseHighConnections
        expr: |
          (pg_stat_database_numbackends / pg_stat_database_max_connections) > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High connection usage in {{ $labels.datname }}"
          description: "{{ $value | humanizePercentage }} of connections in use"

      # Replication lag
      - alert: DatabaseReplicationLag
        expr: |
          pg_replication_lag_seconds > 30
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High replication lag"
          description: "Replication lag is {{ $value }} seconds"

      # Database down
      - alert: DatabaseDown
        expr: |
          up{job="postgres-exporter"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Database exporter is down"

      # Slow queries
      - alert: DatabaseSlowQueries
        expr: |
          pg_stat_activity_max_tx_duration > 300
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Slow query detected"
          description: "Query running for {{ $value }} seconds"

🎯 Mejores PrÑcticas

1. Replication

βœ… DO:

  • Use synchronous replication for critical data
  • Monitor replication lag
  • Test failover regularly
  • Use replication slots

❌ DON'T:

  • Ignore replication lag
  • Skip failover testing
  • Use async replication for critical writes

2. Backups

βœ… DO:

  • Test restore procedures regularly
  • Use point-in-time recovery
  • Store backups off-site
  • Encrypt backups

❌ DON'T:

  • Assume backups work without testing
  • Keep all backups in one location
  • Skip backup verification

3. Connection Pooling

βœ… DO:

  • Use connection pooling
  • Set appropriate pool sizes
  • Monitor pool usage
  • Handle connection errors gracefully

❌ DON'T:

  • Exceed database connection limits
  • Use too large pools
  • Ignore pool exhaustion

🚨 Troubleshooting

Replication Lag

  1. Check network connectivity
  2. Monitor WAL generation rate
  3. Check replica performance
  4. Consider increasing resources

Connection Exhaustion

  1. Reduce connection pool size
  2. Close idle connections
  3. Use connection pooling
  4. Increase database max_connections

Backup Failures

  1. Check disk space
  2. Verify permissions
  3. Test backup commands manually
  4. Review backup logs

πŸ“š Recursos Adicionales


VersiΓ³n: 1.0.0
Última actualización: Diciembre 2025
Total lΓ­neas: 1,100+