Migration is not always a simple copy. Data gets transformed along the way. Columns split. Columns merge. Data types change. Default values fill in gaps. Encoding converts. Each transformation is a potential point of failure. Each one needs its own test.
| Transformation | Source | Target | Risk |
|---|---|---|---|
| Column split | name = "Rajesh Verma" | first_name = "Rajesh", last_name = "Verma" | Multi-word names, single names, titles (Dr., Mr.) |
| Column merge | street + city + state + pin | full_address = "MG Road, Pune, MH 411001" | Missing components, delimiter handling |
| Type conversion | VARCHAR "2024-01-15" | DATE 2024-01-15 | Invalid dates, format ambiguity (01/02/2024 — Jan 2 or Feb 1?) |
| Default fill | NULL status | status = "ACTIVE" | NULL might mean something different than ACTIVE |
| Encoding change | Latin1 "caf\u00e9" | UTF-8 "caf\u00e9" | Hindi/Marathi text corruption, emoji loss |
| Precision change | DECIMAL(10,4) = 99.9999 | DECIMAL(10,2) = 100.00 | Rounding errors in financial data |
-- Source: customers.name (full name in one column)
-- Target: customers.first_name, customers.last_name
-- Test 1: Normal two-word names
SELECT s.customer_id, s.name,
t.first_name, t.last_name,
CONCAT(t.first_name, ' ', t.last_name) AS reconstructed
FROM source_db.customers s
JOIN target_db.customers t ON s.customer_id = t.customer_id
WHERE s.name != CONCAT(t.first_name, ' ', t.last_name)
LIMIT 20;
-- Test 2: Names with more than two words
-- "Aamir Khan Shah" — where does last name start?
SELECT s.customer_id, s.name,
t.first_name, t.last_name
FROM source_db.customers s
JOIN target_db.customers t ON s.customer_id = t.customer_id
WHERE LENGTH(s.name) - LENGTH(REPLACE(s.name, ' ', '')) > 1;
-- Test 3: Single-word names (no space)
SELECT s.customer_id, s.name,
t.first_name, t.last_name
FROM source_db.customers s
JOIN target_db.customers t ON s.customer_id = t.customer_id
WHERE s.name NOT LIKE '% %';-- Source: date_of_birth stored as VARCHAR "15-01-1990"
-- Target: date_of_birth as DATE type
-- Test 1: Check for invalid dates that could not convert
SELECT customer_id, date_of_birth
FROM target_db.customers
WHERE date_of_birth IS NULL;
-- Cross-reference with source — were these NULL originally?
SELECT s.customer_id,
s.date_of_birth AS source_dob_varchar,
t.date_of_birth AS target_dob_date
FROM source_db.customers s
JOIN target_db.customers t ON s.customer_id = t.customer_id
WHERE s.date_of_birth IS NOT NULL
AND t.date_of_birth IS NULL;
-- Any rows here = conversion failures (data loss)
-- Test 2: Verify dates match
SELECT s.customer_id,
s.date_of_birth AS source_dob,
DATE_FORMAT(t.date_of_birth, '%d-%m-%Y') AS target_dob_formatted
FROM source_db.customers s
JOIN target_db.customers t ON s.customer_id = t.customer_id
WHERE s.date_of_birth != DATE_FORMAT(t.date_of_birth, '%d-%m-%Y')
LIMIT 20;-- Migration rule: NULL status → "ACTIVE"
-- But wait — were all NULLs supposed to become ACTIVE?
-- Count NULLs in source
SELECT COUNT(*) AS null_status_in_source
FROM source_db.accounts
WHERE status IS NULL;
-- Count new ACTIVE in target that were NULL in source
SELECT COUNT(*) AS null_to_active
FROM source_db.accounts s
JOIN target_db.accounts t ON s.account_id = t.account_id
WHERE s.status IS NULL AND t.status = 'ACTIVE';
-- These two numbers should match
-- If null_to_active < null_status_in_source, some NULLs got a different value-- Source: balance DECIMAL(12,4) — four decimal places
-- Target: balance DECIMAL(12,2) — two decimal places
-- Find records where rounding changed the value
SELECT s.account_id,
s.balance AS source_balance,
t.balance AS target_balance,
s.balance - t.balance AS rounding_diff
FROM source_db.accounts s
JOIN target_db.accounts t ON s.account_id = t.account_id
WHERE ROUND(s.balance, 2) != t.balance;
-- Total financial impact of rounding
SELECT
SUM(s.balance) AS source_total,
SUM(t.balance) AS target_total,
SUM(s.balance) - SUM(t.balance) AS total_rounding_loss
FROM source_db.accounts s
JOIN target_db.accounts t ON s.account_id = t.account_id;In financial systems, even a Rs 0.01 rounding difference per record can add up to lakhs across millions of records. Always calculate the total financial impact of precision changes. Auditors will ask for this number.
-- Check for encoding corruption (garbled text)
SELECT customer_id, name
FROM target_db.customers
WHERE name LIKE '%?%'
OR name LIKE '%\\ufffd%'
OR LENGTH(name) != CHAR_LENGTH(name) * 1; -- Multi-byte mismatch
-- Compare character lengths
SELECT s.customer_id,
s.name AS source_name,
t.first_name AS target_name,
LENGTH(s.name) AS source_bytes,
LENGTH(t.first_name) AS target_bytes
FROM source_db.customers s
JOIN target_db.customers t ON s.customer_id = t.customer_id
WHERE LENGTH(s.name) != LENGTH(t.first_name) + LENGTH(t.last_name) + 1;Q: What data transformations do you test during migration and how?
A: I test six common transformations: (1) Column splits — verify multi-word names, single names, and names with titles are handled correctly by reconstructing the original and comparing. (2) Type conversions — check for conversion failures (source NOT NULL but target IS NULL), and verify converted values match the original format. (3) Default value fills — count source NULLs and verify they all received the expected default, not a different value. (4) Precision changes — find records where rounding altered values and calculate total financial impact. (5) Encoding changes — check for garbled text, question marks, and byte-length mismatches that indicate encoding corruption. (6) Column merges — verify all components are present and delimiters are correct. Each transformation gets its own SQL validation query.
Key Point: Test every transformation independently: column splits, type conversions, default fills, precision changes, and encoding. Each one can silently corrupt data in ways that record counts will never catch.