git clone https://github.com/ComeOnOliver/skillshub
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/TerminalSkills/skills/data-migration" ~/.claude/skills/comeonoliver-skillshub-data-migration && rm -rf "$T"
skills/TerminalSkills/skills/data-migration/SKILL.mdData Migration
Overview
Builds automated data migration pipelines between databases. Handles schema analysis and mapping, type conversions, data transformations, dependency-ordered table loading, batch processing for large datasets, checkpoint/resume for reliability, post-migration validation, and cutover planning. Produces repeatable scripts that can be dry-run against staging before production.
Instructions
1. Schema Analysis
Start every migration by analyzing source and target:
For each table in source: - Column names, types, nullability, defaults - Primary keys and auto-increment sequences - Foreign key relationships (build dependency graph) - Indexes and unique constraints - Row count estimate (for batch sizing) - Encoding/collation (especially for MySQL → PostgreSQL)
Generate a schema map document listing every column with its source type, target type, and any transformation needed.
2. Type Mapping
Common cross-database type conversions:
| MySQL | PostgreSQL | Notes |
|---|---|---|
| TINYINT(1) | BOOLEAN | Map 0/1 to false/true |
| ENUM('a','b') | VARCHAR + CHECK | Or create custom TYPE |
| DATETIME | TIMESTAMPTZ | Add timezone info |
| INT AUTO_INCREMENT | SERIAL | Reset sequence after migration |
| DOUBLE | DOUBLE PRECISION | Direct mapping |
| BLOB | BYTEA | Binary data |
| TEXT (latin1) | TEXT (UTF-8) | Re-encode characters |
| JSON | JSONB | Use binary JSON in PG |
3. Dependency Resolution
Build a directed acyclic graph from foreign keys:
1. Parse all FK constraints → build adjacency list 2. Topological sort → migration order 3. Circular dependencies: temporarily drop FK, migrate, re-add FK 4. Self-referencing tables: migrate in two passes (data, then self-FK updates)
4. Batch Processing
For tables with more than 10,000 rows:
function migrateLargeTable(table, batchSize = 5000): lastId = loadCheckpoint(table) or 0 while true: rows = SELECT * FROM source.table WHERE id > lastId ORDER BY id LIMIT batchSize if rows.empty: break transformed = rows.map(row => transform(row, table.mapping)) INSERT INTO target.table VALUES transformed lastId = rows.last.id saveCheckpoint(table, lastId, totalMigrated)
Performance targets:
- 5,000 rows/batch for most tables
- 1,000 rows/batch for tables with BLOB/TEXT columns
- Disable target indexes during bulk load, rebuild after
5. Validation
Post-migration validation checklist:
1. Row counts: source vs target for every table 2. Random sampling: 100 random rows per table, field-by-field comparison 3. Aggregate checks: SUM, COUNT, MIN, MAX on numeric columns 4. Referential integrity: all FKs resolve (no orphans) 5. Encoding: sample text fields for valid UTF-8 6. Sequences: verify auto-increment/serial values set above max ID 7. Nullability: no unexpected NULLs in NOT NULL target columns
6. Cutover Planning
Three strategies by downtime tolerance:
Full downtime (simplest): Stop app → migrate → validate → start app. For small datasets (< 1M rows, < 1 hour).
Minimal downtime (recommended): Pre-migrate bulk data → set up change capture → maintenance mode → apply delta → switch → validate. Downtime: 2-10 minutes.
Zero downtime (complex): Dual-write to both databases → background migration → gradual read traffic shift → drop old writes. Requires application changes.
Examples
Example 1: MySQL to PostgreSQL
Prompt: "Migrate our MySQL 5.7 database to PostgreSQL 16. 30 tables, biggest is 5M rows."
Output: Schema mapping JSON, type conversion DDL, migration script with dependency ordering, batch processing for large tables, checkpoint file, validation suite, and cutover runbook.
Example 2: Database Consolidation
Prompt: "Merge two SQLite databases into one PostgreSQL. Some tables overlap with different schemas."
Output: Schema diff report, merge strategy document (which columns win conflicts), deduplication logic using configurable match keys, migration script, and conflict resolution log.
Guidelines
- Always dry-run on staging first — never run migration directly against production
- Keep source untouched — migration should be read-only on source until cutover
- Checkpoint everything — large migrations will fail; resumability is required
- Validate before cutover — automated validation catches what manual spot-checks miss
- Plan rollback — if target validation fails, have a documented path back to source
- Log extensively — rows processed, rows skipped, transformation errors, timing
- Reset sequences — after migration, set serial/auto-increment above max migrated ID
- Test with production volume — a script that works on 1000 rows may OOM on 5M