Most real-world migrations do not use simple SQL scripts. They use ETL pipelines — Extract, Transform, Load. ETL is the industrial machinery of data movement. Think of it as a factory assembly line: raw materials come in, get processed, and finished products come out. Your job is to inspect every stage of that assembly line.
| Phase | What Happens | QA Focus |
|---|---|---|
| Extract | Read data from source — database tables, CSV files, APIs, Excel sheets | Was all data extracted? Were any records filtered out? Did connections fail silently? |
| Transform | Clean nulls, convert types, split columns, apply business rules, deduplicate | Did transformations apply correctly? Were edge cases handled? Did any data get corrupted? |
| Load | Write to target — insert, update, or upsert into destination tables | Did all records load? Were duplicates created? Did constraint violations get handled? |
-- Test: Was ALL data extracted from source?
-- Source record count
SELECT COUNT(*) AS source_total FROM source_db.customers;
-- Extracted record count (staging table)
SELECT COUNT(*) AS extracted_total FROM staging.customers_raw;
-- Compare — difference should be zero
SELECT
(SELECT COUNT(*) FROM source_db.customers) AS source,
(SELECT COUNT(*) FROM staging.customers_raw) AS extracted,
(SELECT COUNT(*) FROM source_db.customers) -
(SELECT COUNT(*) FROM staging.customers_raw) AS missing;
-- Check for filter side effects
-- Sometimes ETL jobs have WHERE clauses that exclude valid data
SELECT s.customer_id, s.status
FROM source_db.customers s
LEFT JOIN staging.customers_raw r ON s.customer_id = r.customer_id
WHERE r.customer_id IS NULL;-- Test: Were transformations applied correctly?
-- Check 1: NULL cleaning
SELECT COUNT(*) AS nulls_remaining
FROM staging.customers_transformed
WHERE email IS NULL; -- Should be 0 if rule says "replace NULL with empty string"
-- Check 2: Name splitting
SELECT customer_id, original_name, first_name, last_name
FROM staging.customers_transformed
WHERE CONCAT(first_name, ' ', last_name) != original_name
LIMIT 20;
-- Check 3: Date format conversion
SELECT customer_id, raw_dob, converted_dob
FROM staging.customers_transformed
WHERE converted_dob IS NULL AND raw_dob IS NOT NULL;
-- Rows here = conversion failures
-- Check 4: Deduplication
SELECT email, COUNT(*) AS cnt
FROM staging.customers_transformed
GROUP BY email
HAVING COUNT(*) > 1;
-- Should be 0 if dedup rule was applied-- Test: Did all transformed records load into target?
SELECT
(SELECT COUNT(*) FROM staging.customers_transformed) AS transformed,
(SELECT COUNT(*) FROM target_db.customers) AS loaded,
(SELECT COUNT(*) FROM staging.customers_transformed) -
(SELECT COUNT(*) FROM target_db.customers) AS load_failures;
-- Check for load errors (rejected records)
SELECT * FROM etl_error_log
WHERE job_name = 'customer_migration'
AND run_date = CURDATE()
ORDER BY error_time;
-- Check for duplicates created during load
SELECT customer_id, COUNT(*) AS cnt
FROM target_db.customers
GROUP BY customer_id
HAVING COUNT(*) > 1;| Bug | Phase | How to Detect |
|---|---|---|
| Silent filter — WHERE clause excludes valid records | Extract | Compare source count vs extracted count |
| NULL propagation — one NULL poisons the entire transformation | Transform | Check for unexpected NULLs in output columns |
| Date format ambiguity — 01/02/2024 parsed as Jan 2 vs Feb 1 | Transform | Compare sample dates manually |
| Truncation — VARCHAR(100) source loaded into VARCHAR(50) target | Load | Check MAX(LENGTH(column)) in source vs target column size |
| Duplicate load — ETL job ran twice without cleanup | Load | GROUP BY primary key HAVING COUNT > 1 |
| Encoding mismatch — Latin1 extracted into UTF-8 without conversion | Extract/Load | Check for garbled characters in text columns |
| Partial failure — half the records loaded before the job crashed | Load | Compare total loaded vs expected total |
Always check the ETL error log and rejected records table. Many ETL tools silently skip bad records and log them in a separate table. If you do not check that table, you will miss data loss. Ask the DBA: "Where do rejected records go?"
Q: What is ETL testing and what do you check in each phase?
A: ETL stands for Extract-Transform-Load. In the Extract phase, I verify that all source data was pulled without silent filtering — I compare source row counts with staging table counts and look for missing records. In the Transform phase, I check that business rules were applied correctly — NULL cleaning, name splitting, date conversions, deduplication — and look for transformation failures where output is NULL but input was not. In the Load phase, I verify all transformed records made it to the target without duplicates or errors — I check the error log and rejected records table. Common bugs include silent WHERE filters in extraction, date format ambiguity in transformation, and truncation or duplicate loads in the load phase.
Key Point: ETL has three phases: Extract (pull data), Transform (clean and convert), Load (insert into target). Test each phase independently. Always check the ETL error log — rejected records are silent data loss.