Post-Migration Validation

The migration script finished. No errors in the log. The DBA says "looks clean." But "no errors" is not the same as "correct." Silent data loss is the biggest migration risk. Your job now is to prove that every record made it across, every value is accurate, and every relationship is intact.

Record Count Matching

The first check. The simplest check. And the one that catches the most bugs. Compare row counts between source and target for every single table.

record-count-match.sqlsql

-- Source counts (run on source database)
SELECT 'customers' AS tbl, COUNT(*) AS src_count FROM source_db.customers
UNION ALL SELECT 'accounts', COUNT(*) FROM source_db.accounts
UNION ALL SELECT 'transactions', COUNT(*) FROM source_db.transactions;

-- Target counts (run on target database)
SELECT 'customers' AS tbl, COUNT(*) AS tgt_count FROM target_db.customers
UNION ALL SELECT 'accounts', COUNT(*) FROM target_db.accounts
UNION ALL SELECT 'transactions', COUNT(*) FROM target_db.transactions;

-- Combined comparison (if both databases are accessible)
SELECT
  'customers' AS table_name,
  (SELECT COUNT(*) FROM source_db.customers) AS source_count,
  (SELECT COUNT(*) FROM target_db.customers) AS target_count,
  (SELECT COUNT(*) FROM source_db.customers) -
  (SELECT COUNT(*) FROM target_db.customers) AS difference;

Equal record counts do NOT mean the data is correct. You could have 1000 records in source and 1000 in target, but 50 might be different records. Count matching is necessary but not sufficient. You must also compare actual data.

Finding Missing Records

find-missing-records.sqlsql

-- Records in source but NOT in target (lost during migration)
SELECT s.customer_id, s.name, s.email
FROM source_db.customers s
LEFT JOIN target_db.customers t ON s.customer_id = t.customer_id
WHERE t.customer_id IS NULL;

-- Records in target but NOT in source (unexpected extras)
SELECT t.customer_id, t.first_name, t.last_name, t.email
FROM target_db.customers t
LEFT JOIN source_db.customers s ON t.customer_id = s.customer_id
WHERE s.customer_id IS NULL;

-- Count the gaps
SELECT
  (SELECT COUNT(*) FROM source_db.customers s
   LEFT JOIN target_db.customers t ON s.customer_id = t.customer_id
   WHERE t.customer_id IS NULL) AS missing_in_target,
  (SELECT COUNT(*) FROM target_db.customers t
   LEFT JOIN source_db.customers s ON t.customer_id = s.customer_id
   WHERE s.customer_id IS NULL) AS extra_in_target;

Data Accuracy — Field-by-Field Comparison

Records exist in both source and target. But do they contain the same data? Maybe names were truncated. Maybe dates shifted. Maybe decimal precision changed. Field-by-field comparison catches these problems.

field-comparison.sqlsql

-- Find records where data differs
SELECT
  s.customer_id,
  CASE WHEN s.email != t.email
    THEN CONCAT('email: ', s.email, ' -> ', t.email) END AS email_diff,
  CASE WHEN s.phone != t.phone
    THEN CONCAT('phone: ', s.phone, ' -> ', t.phone) END AS phone_diff,
  CASE WHEN s.name != CONCAT(t.first_name, ' ', t.last_name)
    THEN CONCAT('name: ', s.name, ' -> ', t.first_name, ' ', t.last_name)
    END AS name_diff
FROM source_db.customers s
JOIN target_db.customers t ON s.customer_id = t.customer_id
WHERE s.email != t.email
   OR s.phone != t.phone
   OR s.name != CONCAT(t.first_name, ' ', t.last_name);

-- Checksum comparison to quickly detect any differences
SELECT
  'source' AS db,
  SUM(CRC32(CONCAT_WS('|', customer_id, email, phone))) AS checksum
FROM source_db.customers
UNION ALL
SELECT
  'target',
  SUM(CRC32(CONCAT_WS('|', customer_id, email, phone))) AS checksum
FROM target_db.customers;

Mapping Validation — Old Schema to New Schema

When the schema changes during migration, you need to verify that data landed in the correct columns. The mapping document says "source.name goes to target.first_name and target.last_name." Your SQL proves whether that actually happened.

mapping-validation.sqlsql

-- Old schema: users(id, name, email, account_number, balance)
-- New schema: users(id, first_name, last_name, email)
--             accounts(id, user_id, account_number, balance, type)

-- Verify name split worked correctly
SELECT
  s.id,
  s.name AS source_name,
  CONCAT(t.first_name, ' ', t.last_name) AS target_name,
  CASE WHEN s.name = CONCAT(t.first_name, ' ', t.last_name)
    THEN 'MATCH' ELSE 'MISMATCH' END AS status
FROM source_db.users s
JOIN target_db.users t ON s.id = t.id
WHERE s.name != CONCAT(t.first_name, ' ', t.last_name)
LIMIT 20;

-- Verify account data moved to new accounts table
SELECT
  s.id AS user_id,
  s.account_number AS source_acct,
  a.account_number AS target_acct,
  s.balance AS source_balance,
  a.balance AS target_balance,
  CASE WHEN s.balance = a.balance THEN 'MATCH' ELSE 'MISMATCH' END AS balance_status
FROM source_db.users s
JOIN target_db.accounts a ON s.id = a.user_id
WHERE s.account_number != a.account_number
   OR s.balance != a.balance;

-- Check: every source user has exactly one account in target
SELECT user_id, COUNT(*) AS account_count
FROM target_db.accounts
GROUP BY user_id
HAVING COUNT(*) != 1;

When the schema splits one table into multiple, always verify both directions: every source record has a corresponding target record, AND every target record traces back to a valid source record. One-way checks miss orphaned data.

Q: How do you validate data after a migration?

A: I follow a systematic approach: (1) Record count matching — compare exact row counts between source and target for every table. (2) Missing record detection — LEFT JOIN source to target and look for NULLs to find lost records. (3) Field-by-field comparison — JOIN matching records and compare each column value, especially for columns that underwent transformation. (4) Checksum comparison — compare aggregate checksums to quickly detect any data changes. (5) Mapping validation — verify data landed in the correct new columns per the mapping document. (6) Both-direction checks — confirm every source record has a target AND every target traces back to a valid source. Equal counts alone are not sufficient because you could have the same number of records but different actual data.

Key Point: Post-migration validation requires record count matching, missing record detection, field-by-field comparison, and mapping validation. Equal counts do not mean correct data — always verify actual values.

Previous Up NextData Transformation Testing

Chapter 7: Data Migration Testing