Skills data-migration

install
source · Clone the upstream repo
git clone https://github.com/TerminalSkills/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/TerminalSkills/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-migration" ~/.claude/skills/terminalskills-skills-data-migration && rm -rf "$T"
manifest: skills/data-migration/SKILL.md
source content

Data Migration

Overview

Builds automated data migration pipelines between databases. Handles schema analysis and mapping, type conversions, data transformations, dependency-ordered table loading, batch processing for large datasets, checkpoint/resume for reliability, post-migration validation, and cutover planning. Produces repeatable scripts that can be dry-run against staging before production.

Instructions

1. Schema Analysis

Start every migration by analyzing source and target:

For each table in source:
  - Column names, types, nullability, defaults
  - Primary keys and auto-increment sequences
  - Foreign key relationships (build dependency graph)
  - Indexes and unique constraints
  - Row count estimate (for batch sizing)
  - Encoding/collation (especially for MySQL → PostgreSQL)

Generate a schema map document listing every column with its source type, target type, and any transformation needed.

2. Type Mapping

Common cross-database type conversions:

MySQLPostgreSQLNotes
TINYINT(1)BOOLEANMap 0/1 to false/true
ENUM('a','b')VARCHAR + CHECKOr create custom TYPE
DATETIMETIMESTAMPTZAdd timezone info
INT AUTO_INCREMENTSERIALReset sequence after migration
DOUBLEDOUBLE PRECISIONDirect mapping
BLOBBYTEABinary data
TEXT (latin1)TEXT (UTF-8)Re-encode characters
JSONJSONBUse binary JSON in PG

3. Dependency Resolution

Build a directed acyclic graph from foreign keys:

1. Parse all FK constraints → build adjacency list
2. Topological sort → migration order
3. Circular dependencies: temporarily drop FK, migrate, re-add FK
4. Self-referencing tables: migrate in two passes (data, then self-FK updates)

4. Batch Processing

For tables with more than 10,000 rows:

function migrateLargeTable(table, batchSize = 5000):
  lastId = loadCheckpoint(table) or 0
  while true:
    rows = SELECT * FROM source.table WHERE id > lastId ORDER BY id LIMIT batchSize
    if rows.empty: break
    transformed = rows.map(row => transform(row, table.mapping))
    INSERT INTO target.table VALUES transformed
    lastId = rows.last.id
    saveCheckpoint(table, lastId, totalMigrated)

Performance targets:

  • 5,000 rows/batch for most tables
  • 1,000 rows/batch for tables with BLOB/TEXT columns
  • Disable target indexes during bulk load, rebuild after

5. Validation

Post-migration validation checklist:

1. Row counts: source vs target for every table
2. Random sampling: 100 random rows per table, field-by-field comparison
3. Aggregate checks: SUM, COUNT, MIN, MAX on numeric columns
4. Referential integrity: all FKs resolve (no orphans)
5. Encoding: sample text fields for valid UTF-8
6. Sequences: verify auto-increment/serial values set above max ID
7. Nullability: no unexpected NULLs in NOT NULL target columns

6. Cutover Planning

Three strategies by downtime tolerance:

Full downtime (simplest): Stop app → migrate → validate → start app. For small datasets (< 1M rows, < 1 hour).

Minimal downtime (recommended): Pre-migrate bulk data → set up change capture → maintenance mode → apply delta → switch → validate. Downtime: 2-10 minutes.

Zero downtime (complex): Dual-write to both databases → background migration → gradual read traffic shift → drop old writes. Requires application changes.

Examples

Example 1: MySQL to PostgreSQL

Prompt: "Migrate our MySQL 5.7 database to PostgreSQL 16. 30 tables, biggest is 5M rows."

Output: Schema mapping JSON, type conversion DDL, migration script with dependency ordering, batch processing for large tables, checkpoint file, validation suite, and cutover runbook.

Example 2: Database Consolidation

Prompt: "Merge two SQLite databases into one PostgreSQL. Some tables overlap with different schemas."

Output: Schema diff report, merge strategy document (which columns win conflicts), deduplication logic using configurable match keys, migration script, and conflict resolution log.

Guidelines

  • Always dry-run on staging first — never run migration directly against production
  • Keep source untouched — migration should be read-only on source until cutover
  • Checkpoint everything — large migrations will fail; resumability is required
  • Validate before cutover — automated validation catches what manual spot-checks miss
  • Plan rollback — if target validation fails, have a documented path back to source
  • Log extensively — rows processed, rows skipped, transformation errors, timing
  • Reset sequences — after migration, set serial/auto-increment above max migrated ID
  • Test with production volume — a script that works on 1000 rows may OOM on 5M