Java Selenium TestNG CI/CD API Testing Database Playwright Performance

Chapter 7: Data Migration Testing

Pre-Migration Testing

Think of pre-migration testing as taking an inventory before the movers arrive. You count every plate, photograph every painting, and note every scratch on the furniture. If something goes wrong during the move, this inventory is your proof.

Source Data Profiling

Data profiling means understanding what your data actually looks like — not what it should look like. Production data is messy. It has NULLs where you do not expect them. It has duplicates. It has values that violate business rules but somehow got in anyway. You need to know all of this before migration.

data-profiling.sqlsql

-- Record counts for all tables
SELECT 'customers' AS table_name, COUNT(*) AS row_count FROM customers
UNION ALL
SELECT 'accounts', COUNT(*) FROM accounts
UNION ALL
SELECT 'transactions', COUNT(*) FROM transactions
UNION ALL
SELECT 'addresses', COUNT(*) FROM addresses;

-- Column-level profiling for customers table
SELECT
  COUNT(*) AS total_rows,
  COUNT(DISTINCT email) AS unique_emails,
  COUNT(*) - COUNT(email) AS null_emails,
  COUNT(*) - COUNT(phone) AS null_phones,
  MIN(created_at) AS earliest_record,
  MAX(created_at) AS latest_record,
  MIN(LENGTH(name)) AS shortest_name,
  MAX(LENGTH(name)) AS longest_name
FROM customers;

-- Value distribution — catch unexpected data
SELECT status, COUNT(*) AS cnt
FROM accounts
GROUP BY status
ORDER BY cnt DESC;

-- Find duplicates that might cause issues
SELECT email, COUNT(*) AS duplicate_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

Record Count Baseline

The simplest and most powerful test. Count every row in every table. Write it down. After migration, count again. If the numbers do not match, you have a problem.

record-count-baseline.sqlsql

-- Generate counts for ALL tables in the database
SELECT
  TABLE_NAME,
  TABLE_ROWS AS estimated_count
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'banking_db'
  AND TABLE_TYPE = 'BASE TABLE'
ORDER BY TABLE_NAME;

-- IMPORTANT: TABLE_ROWS is an estimate in InnoDB
-- For exact counts, use this pattern:
SELECT 'customers' AS tbl, COUNT(*) AS exact_count FROM customers
UNION ALL SELECT 'accounts', COUNT(*) FROM accounts
UNION ALL SELECT 'transactions', COUNT(*) FROM transactions
UNION ALL SELECT 'loan_applications', COUNT(*) FROM loan_applications
UNION ALL SELECT 'addresses', COUNT(*) FROM addresses
UNION ALL SELECT 'audit_log', COUNT(*) FROM audit_log;

INFORMATION_SCHEMA.TABLES gives ESTIMATED row counts for InnoDB tables. It can be off by 10-40%. Always use SELECT COUNT(*) for exact pre-migration baselines. The estimate is fine for quick checks but never for official migration sign-off.

Checksum Baseline

Record counts catch missing rows. Checksums catch changed data. If a name was silently truncated, or a decimal was rounded, the record count stays the same but the checksum will differ.

checksum-baseline.sqlsql

-- MySQL: Checksum using CRC32
SELECT
  SUM(CRC32(CONCAT_WS('|',
    customer_id, name, email, phone,
    COALESCE(address, ''),
    CAST(created_at AS CHAR)
  ))) AS table_checksum
FROM customers;

-- PostgreSQL: Checksum using MD5
SELECT
  MD5(STRING_AGG(
    customer_id || '|' || name || '|' || email,
    ',' ORDER BY customer_id
  )) AS table_checksum
FROM customers;

-- Per-record checksum (for finding which rows differ)
SELECT
  customer_id,
  CRC32(CONCAT_WS('|', customer_id, name, email, phone)) AS row_checksum
FROM customers
ORDER BY customer_id
LIMIT 100;

Always include COALESCE for nullable columns in your checksum. NULL concatenated with anything gives NULL in most databases. One NULL column will make the entire row checksum NULL, and you will miss real data differences.

Backup Verification

Everyone says "take a backup before migration." But how many teams actually verify that the backup works? I have seen migrations fail where the backup was corrupted, incomplete, or pointing to the wrong database. Test your backup before you need it.

Backup Verification Steps

Confirm backup was taken after the last data freeze

Restore the backup to a separate test database

Run record count queries on the restored database and compare with source

Run checksum queries on restored database and compare with source

Verify the application can connect to and function with the restored database

Document the restore procedure and time taken

Sample Data Snapshot

sample-snapshot.sqlsql

-- Take a snapshot of critical records (boundary cases)
-- Record with longest name
SELECT * FROM customers ORDER BY LENGTH(name) DESC LIMIT 5;

-- Record with special characters
SELECT * FROM customers WHERE name REGEXP '[^a-zA-Z0-9 ]' LIMIT 10;

-- Oldest and newest records
(SELECT * FROM customers ORDER BY created_at ASC LIMIT 5)
UNION ALL
(SELECT * FROM customers ORDER BY created_at DESC LIMIT 5);

-- Records with NULL values in critical fields
SELECT * FROM customers WHERE email IS NULL OR phone IS NULL LIMIT 10;

-- High-value accounts (most critical to verify)
SELECT * FROM accounts ORDER BY balance DESC LIMIT 10;

Q: What do you do before a data migration as a QA engineer?

A: Before migration, I establish a complete baseline: (1) Record counts for every table using exact SELECT COUNT(*), not estimated counts. (2) Data profiling — distinct values, NULL counts, min/max lengths, value distributions for critical columns. (3) Checksums on key tables using CRC32 or MD5 to detect any silent data changes later. (4) Sample snapshots of boundary records — longest names, special characters, oldest/newest records, NULL values, highest-value accounts. (5) Duplicate detection — identify existing duplicates so they are not confused with migration bugs. (6) Backup verification — restore the backup to a test DB and validate it. (7) Dry run — execute the full migration on a test environment and validate before touching production.

Key Point: Pre-migration testing establishes your baseline: exact record counts, checksums, data profiles, sample snapshots, and verified backups. Without this baseline, you cannot prove data loss after migration.

Previous Up NextPost-Migration Validation