Claude-skill-registry database-migration
Database schema migration patterns for Aurora MySQL including reconciliation migrations, idempotent operations, and MySQL-specific gotchas.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/database-migration-awannaphasch2016-jousef-landing" ~/.claude/skills/majiayu000-claude-skill-registry-database-migration && rm -rf "$T"
skills/data/database-migration-awannaphasch2016-jousef-landing/SKILL.mdDatabase Migration Skill
Tech Stack: Aurora MySQL 8.0, PyMySQL, migrations via Python scripts
Source: Extracted from CLAUDE.md database migration principles and real migration failures.
When to Use This Skill
Use the database-migration skill when:
- ✓ Creating new database migrations
- ✓ Fixing broken migrations in unknown state
- ✓ Debugging schema mismatch issues
- ✓ Reviewing migration PRs
- ✓ Reconciling production schema drift
DO NOT use this skill for:
- ✗ Query optimization (use code-review PERFORMANCE.md)
- ✗ Data migrations (this skill is schema-only)
- ✗ NoSQL databases (DynamoDB has different patterns)
Quick Decision Tree
What's the migration scenario? ├─ New feature schema? │ ├─ Dev database clean? → Sequential migration (001_add_feature.sql) │ └─ Dev database dirty? → Reconciliation migration (RECONCILE_*.sql) │ ├─ Production schema drift? │ └─ Always → Reconciliation migration (idempotent operations) │ ├─ Migration failed mid-execution? │ ├─ Can rollback? → Rollback, fix migration, re-run │ └─ Cannot rollback? → Reconciliation migration to fix state │ └─ Schema review before merge? └─ Check: RECONCILIATION-MIGRATIONS.md + MYSQL-GOTCHAS.md
Core Migration Principles
Principle 1: Immutability of Committed Migrations
From CLAUDE.md:
"Migration files are immutable once committed to version control—never edit them, always create new migrations for schema changes."
Why This Matters:
- Reproducibility: Same migration file produces same schema across all environments
- History: Preserves evolution of schema over time
- Multi-developer: Prevents conflicts when multiple people migrate simultaneously
# ❌ DON'T: Edit existing migration # migrations/001_create_users.sql (committed 2 weeks ago) CREATE TABLE users ( id INT PRIMARY KEY, name VARCHAR(100) # Changed from VARCHAR(50) → Breaks reproducibility! ); # ✅ DO: Create new migration # migrations/002_widen_user_name.sql ALTER TABLE users MODIFY COLUMN name VARCHAR(100);
Exceptions:
- Migration not yet committed to version control → Can edit freely
- Migration failed in dev environment → Can delete and recreate
- Migration never ran in production → Can edit if also updating all dev environments
Principle 2: Reconciliation Over Sequential Migrations
Pattern: When database state is unknown or partially migrated, use reconciliation migrations.
Sequential Migration Assumption:
-- migrations/003_add_status_column.sql -- Assumes: users table exists AND status column doesn't exist ALTER TABLE users ADD COLUMN status ENUM('active', 'inactive');
Problem: If migration ran before on some servers but not others, you get:
- Already has column → Error: "Duplicate column name 'status'"
- Doesn't have column → Success
- Unknown state → 50/50 chance of failure
Reconciliation Migration Solution:
-- migrations/RECONCILE_user_status.sql -- Works regardless of current state CREATE TABLE IF NOT EXISTS users ( id INT PRIMARY KEY, name VARCHAR(100) ); -- Add column only if missing SET @col_exists = ( SELECT COUNT(*) FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'users' AND COLUMN_NAME = 'status' ); SET @sql = IF(@col_exists = 0, 'ALTER TABLE users ADD COLUMN status ENUM("active", "inactive")', 'SELECT "Column status already exists" AS message' ); PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt;
When to Use:
- Production schema drift (someone manually altered schema)
- Migration failed mid-execution (half the changes applied)
- Multiple environments out of sync
- After fixing a broken migration
See RECONCILIATION-MIGRATIONS.md for detailed patterns.
Principle 3: Verify Schema After Migration
Pattern: Always verify migrations changed what you expected.
# After running migration mysql> DESCRIBE users; +--------+--------------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+--------------------------+------+-----+---------+-------+ | id | int | NO | PRI | NULL | | | name | varchar(100) | YES | | NULL | | | status | enum('active','inactive')| YES | | NULL | | +--------+--------------------------+------+-----+---------+-------+ # Verify: # ✓ status column exists # ✓ ENUM values correct # ✓ Nullable (or NOT NULL if intended)
Why This Matters: MySQL's idempotent operations can silently skip changes:
→ Skips if table exists with different schemaCREATE TABLE IF NOT EXISTS
→ Changes type but not existing dataALTER TABLE MODIFY COLUMN
See MYSQL-GOTCHAS.md for detailed MySQL-specific issues.
Principle 4: Add Column Comments to Prevent Semantic Confusion
Pattern: Use MySQL's native
COMMENT syntax to document column semantics directly in the database schema.
Problem: Column names alone can be semantically ambiguous:
-- What does "date" mean? CREATE TABLE ticker_data ( date DATE NOT NULL -- Is this fetch date? Trading date? Calendar date? );
Real Bug Example (2025-12-29):
- User queried
expecting "today's data"WHERE date = '2025-12-29' - Got 0 results because
represents trading date (market close), NOT fetch datedate - Data for Dec 29 won't exist until Dec 30 5:00 AM Bangkok (19-hour gap)
- Root cause: Semantic confusion about what "date" field represents
Solution: Add
COLUMN COMMENT to clarify semantics:
ALTER TABLE ticker_data MODIFY COLUMN date DATE NOT NULL COMMENT 'Trading date for stock market data (NOT fetch date). Represents the date when market closed, not when data was retrieved. Data for date D is fetched at 5:00 AM Bangkok on date D+1.'; ALTER TABLE ticker_data MODIFY COLUMN fetched_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'UTC timestamp when this data was fetched from Yahoo Finance API. Compare with date field to understand data age.';
When to Add Comments:
-
Date/timestamp columns (highest confusion risk):
- Clarify what the date represents (trading date vs fetch date vs calendar date)
- Document timezone semantics (UTC vs local time)
- Explain time windows when data won't exist yet
-
JSON columns (structure documentation):
- Document expected schema:
{date, open, high, low, close, volume} - List required vs optional fields
- Document expected schema:
-
Enum-like VARCHAR columns (valid values):
- Document valid values:
'pending', 'in_progress', 'completed', 'failed'
- Document valid values:
-
Foreign key columns (relationship clarity):
- Document what they reference:
'References ticker_master.id'
- Document what they reference:
Querying Column Comments:
-- Get all comments for a table SELECT COLUMN_NAME, DATA_TYPE, COLUMN_COMMENT FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'ticker_data' ORDER BY ORDINAL_POSITION; -- Find uncommented date/timestamp columns (needs attention) SELECT TABLE_NAME, COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = DATABASE() AND DATA_TYPE IN ('date', 'timestamp') AND (COLUMN_COMMENT IS NULL OR COLUMN_COMMENT = '');
Migration Pattern:
-- Migration 016: Add semantic comments to prevent date field confusion ALTER TABLE ticker_data MODIFY COLUMN date DATE NOT NULL COMMENT 'Trading date for stock market data (NOT fetch date). Represents the date when market closed, not when data was retrieved. Data for date D is fetched at 5:00 AM Bangkok on date D+1.'; -- Always preserve existing column definition (type, constraints, defaults) -- when adding comments - use MODIFY COLUMN with full definition
Benefits:
- Self-documenting database (no separate docs needed)
- Queryable via
INFORMATION_SCHEMA.COLUMNS - Zero performance impact (comments are metadata only)
- Version-controlled via migration files
- Prevents semantic misinterpretation bugs
Cost: Zero (comments are metadata, don't affect queries or storage)
See migration
db/migrations/016_add_semantic_comments.sql for real example.
Migration Workflow
1. Development Migration
# Step 1: Create migration file cat > migrations/004_add_email_to_users.sql <<'EOF' -- Add email column to users table ALTER TABLE users ADD COLUMN email VARCHAR(255); ALTER TABLE users ADD INDEX idx_email (email); EOF # Step 2: Test locally (requires Aurora tunnel) # Verify tunnel active ss -ltn | grep 3307 # Run migration mysql -h localhost -P 3307 -u admin -p < migrations/004_add_email_to_users.sql # Step 3: Verify schema mysql -h localhost -P 3307 -u admin -p -e "DESCRIBE users;" mysql -h localhost -P 3307 -u admin -p -e "SHOW INDEX FROM users;" # Step 4: Commit migration git add migrations/004_add_email_to_users.sql git commit -m "db: Add email column to users table"
2. Production Migration (Reconciliation Pattern)
# Step 1: Check current production schema # (via Aurora tunnel to production) mysql -h localhost -P 3307 -u admin -p -e "DESCRIBE users;" # Step 2: Create reconciliation migration cat > migrations/RECONCILE_user_email.sql <<'EOF' -- Reconciliation: Add email to users (idempotent) -- Check if email column exists SET @col_exists = ( SELECT COUNT(*) FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'users' AND COLUMN_NAME = 'email' ); -- Add column if missing SET @add_column = IF(@col_exists = 0, 'ALTER TABLE users ADD COLUMN email VARCHAR(255)', 'SELECT "Column email already exists"' ); PREPARE stmt FROM @add_column; EXECUTE stmt; DEALLOCATE PREPARE stmt; -- Check if index exists SET @idx_exists = ( SELECT COUNT(*) FROM information_schema.STATISTICS WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'users' AND INDEX_NAME = 'idx_email' ); -- Add index if missing SET @add_index = IF(@idx_exists = 0, 'ALTER TABLE users ADD INDEX idx_email (email)', 'SELECT "Index idx_email already exists"' ); PREPARE stmt FROM @add_index; EXECUTE stmt; DEALLOCATE PREPARE stmt; EOF # Step 3: Run reconciliation migration mysql -h localhost -P 3307 -u admin -p < migrations/RECONCILE_user_email.sql # Step 4: Verify final state scripts/verify_schema.py --table users --expected-columns id,name,status,email
3. Fixing Broken Migrations
Scenario: Migration failed mid-execution, database in unknown state.
# Step 1: Check what exists mysql -h localhost -P 3307 -u admin -p <<'EOF' SHOW TABLES; DESCRIBE users; -- Check which columns exist SHOW INDEX FROM users; -- Check which indexes exist EOF # Step 2: Create reconciliation migration to finish job # (See RECONCILIATION-MIGRATIONS.md for patterns) # Step 3: Mark old migration as obsolete (if needed) mv migrations/004_add_email_to_users.sql migrations/OBSOLETE_004_add_email_to_users.sql # Step 4: Commit reconciliation migration git add migrations/RECONCILE_user_email.sql git commit -m "db: Reconcile user email migration (fixes broken 004)"
Migration File Naming Convention
migrations/ ├── 001_create_users.sql # Sequential: New feature ├── 002_add_status_to_users.sql # Sequential: Follow-up ├── RECONCILE_user_status.sql # Reconciliation: Fix drift ├── OBSOLETE_003_broken_migration.sql # Mark broken migrations └── README.md # Migration history
Rules:
- Sequential (001, 002, ...): For clean dev environments, new features
- RECONCILE_: For unknown state, production drift, fixing failures
- OBSOLETE_: Mark failed migrations (don't delete - preserve history)
Common Migration Patterns
Pattern 1: Add Column
-- Sequential (clean state) ALTER TABLE users ADD COLUMN created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP; -- Reconciliation (unknown state) SET @col_exists = ( SELECT COUNT(*) FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'users' AND COLUMN_NAME = 'created_at' ); SET @sql = IF(@col_exists = 0, 'ALTER TABLE users ADD COLUMN created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP', 'SELECT "Column created_at already exists"' ); PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt;
Pattern 2: Add Index
-- Sequential CREATE INDEX idx_email ON users(email); -- Reconciliation SET @idx_exists = ( SELECT COUNT(*) FROM information_schema.STATISTICS WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'users' AND INDEX_NAME = 'idx_email' ); SET @sql = IF(@idx_exists = 0, 'CREATE INDEX idx_email ON users(email)', 'SELECT "Index idx_email already exists"' ); PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt;
Pattern 3: Modify Column Type
-- Sequential ALTER TABLE users MODIFY COLUMN name VARCHAR(200); -- Reconciliation -- Note: MODIFY COLUMN changes type but not existing data! SET @current_type = ( SELECT COLUMN_TYPE FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'users' AND COLUMN_NAME = 'name' ); -- Only modify if type is different SET @sql = IF(@current_type != 'varchar(200)', 'ALTER TABLE users MODIFY COLUMN name VARCHAR(200)', 'SELECT "Column name already VARCHAR(200)"' ); PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt; -- CRITICAL: Verify existing data fits new type SELECT name FROM users WHERE LENGTH(name) > 200;
See RECONCILIATION-MIGRATIONS.md for more patterns.
Migration Review Checklist
Before merging migration PR, verify:
Correctness
- Migration tested locally (via Aurora tunnel)
- Schema verified with
DESCRIBE table_name - Indexes verified with
SHOW INDEX FROM table_name - Migration is idempotent (can run multiple times safely)
Safety
- No
without backup strategyDROP TABLE - No
that truncates dataALTER TABLE MODIFY COLUMN - No
withoutDELETE
clauseWHERE - Large tables: Migration uses
(no table lock)ALGORITHM=INPLACE
Documentation
- Migration file has descriptive comment
- Breaking changes documented in PR description
- Migration history updated (if complex change)
MySQL-Specific
- ENUM values validated (see MYSQL-GOTCHAS.md)
- Foreign key constraints checked
- Character set/collation compatible
- Nullable vs NOT NULL explicit
Testing Migrations
Unit Test Pattern
import pytest import pymysql class TestUserEmailMigration: """Test 004_add_email_to_users.sql migration""" def setup_method(self): """Create clean test database""" self.conn = pymysql.connect( host='localhost', port=3307, user='admin', password='...', database='test_db' ) # Run prerequisite migrations self._run_migration('001_create_users.sql') def test_migration_adds_email_column(self): """Verify migration adds email column""" # Run migration self._run_migration('004_add_email_to_users.sql') # Verify column exists with self.conn.cursor() as cursor: cursor.execute("DESCRIBE users") columns = {row[0]: row for row in cursor.fetchall()} assert 'email' in columns assert columns['email'][1] == 'varchar(255)' # Type def test_migration_is_idempotent(self): """Verify migration can run multiple times""" # Run twice self._run_migration('004_add_email_to_users.sql') self._run_migration('004_add_email_to_users.sql') # Should not error # Verify column exists (not duplicated) with self.conn.cursor() as cursor: cursor.execute("DESCRIBE users") columns = [row[0] for row in cursor.fetchall()] assert columns.count('email') == 1 def _run_migration(self, filename): """Helper: Run migration file""" with open(f'migrations/{filename}') as f: sql = f.read() with self.conn.cursor() as cursor: for statement in sql.split(';'): if statement.strip(): cursor.execute(statement) self.conn.commit()
Integration Test (Production-like)
@pytest.mark.integration def test_reconcile_user_email_migration_on_dirty_db(): """Test reconciliation migration handles partial state""" # Setup: Create users table WITHOUT email (partial migration) conn.cursor().execute("CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(100))") # Run reconciliation migration run_migration('RECONCILE_user_email.sql') # Verify: email column added cursor.execute("DESCRIBE users") columns = {row[0] for row in cursor.fetchall()} assert 'email' in columns # Run again (should not error) run_migration('RECONCILE_user_email.sql') # Verify: still works cursor.execute("DESCRIBE users") columns = {row[0] for row in cursor.fetchall()} assert 'email' in columns
Quick Reference
When to Use Sequential vs Reconciliation
| Scenario | Migration Type | Example |
|---|---|---|
| New feature (clean dev) | Sequential | |
| Production deployment | Reconciliation | |
| Schema drift | Reconciliation | |
| Failed migration | Reconciliation | |
| Multi-environment sync | Reconciliation | |
Migration Safety Levels
| Operation | Risk | Mitigation |
|---|---|---|
| ADD COLUMN | Low | Use DEFAULT for NOT NULL columns |
| DROP COLUMN | High | Verify column unused first |
| MODIFY COLUMN | Medium | Check existing data compatibility |
| ADD INDEX | Low | Use for large tables |
| DROP INDEX | Medium | Verify queries don't need it |
| ADD FK | Medium | Verify referential integrity first |
Troubleshooting
Error: "Duplicate column name 'email'"
Cause: Column already exists (migration ran before or schema drift).
Solution: Use reconciliation migration with conditional logic.
Error: "Data truncated for column 'status'"
Cause: Existing data doesn't fit new ENUM values or column type.
Solution:
-- Check existing data first SELECT DISTINCT status FROM users; -- If incompatible, migrate data before changing type UPDATE users SET status = 'active' WHERE status IS NULL; -- Then change type ALTER TABLE users MODIFY COLUMN status ENUM('active', 'inactive') NOT NULL;
Migration Ran But Schema Unchanged
Cause:
CREATE TABLE IF NOT EXISTS skipped because table exists with different schema.
Solution: Use
ALTER TABLE for existing tables, not CREATE TABLE IF NOT EXISTS.
See MYSQL-GOTCHAS.md for comprehensive troubleshooting.
File Organization
.claude/skills/database-migration/ ├── SKILL.md # This file (entry point) ├── RECONCILIATION-MIGRATIONS.md # Idempotent migration patterns ├── MYSQL-GOTCHAS.md # MySQL-specific issues └── scripts/ └── verify_schema.py # Schema verification tool
Next Steps
- For reconciliation patterns: See RECONCILIATION-MIGRATIONS.md
- For MySQL gotchas: See MYSQL-GOTCHAS.md
- For schema verification: Run
scripts/verify_schema.py