Data migration testing is a favorite interview topic for QA positions in banking, insurance, and enterprise companies. Interviewers want to know if you can plan a migration test from scratch, write validation queries, and catch the subtle bugs that break production. Here are the most common questions with detailed answers.
Q: What is data migration testing and when is it needed?
A: Data migration testing validates that data moved between systems is complete, accurate, and maintains integrity. It is needed during: (1) database upgrades — schema changes like adding columns or changing data types, (2) platform migration — switching from Oracle to PostgreSQL or SQL Server to MySQL, (3) schema restructuring — normalizing a single table into multiple related tables, (4) cloud migration — moving from on-premise to AWS/Azure/GCP. As a QA, I verify record counts match, data values are preserved, relationships are intact, transformations are correct, and the application performs well on the new database.
Q: What is your approach to planning a migration test?
A: I follow three phases: Pre-migration — I baseline the source with exact record counts, checksums, data profiles (NULLs, distinct values, min/max), and sample snapshots of boundary records. I also verify the backup and run a dry run on a test environment. During migration — I monitor logs for errors, track progress, and watch for table locks and disk space issues. Post-migration — I run record count comparison (source vs target), check for missing records using LEFT JOINs, compare field values for accuracy, verify referential integrity (no orphan records), validate data transformations (column splits, type conversions), benchmark query performance, and test rollback. I sign off only when every check passes.
Q: How do you detect data loss during migration?
A: Multiple layers: (1) Record count comparison — count rows in source and target for every table. If counts differ, data was lost or duplicated. (2) Missing record detection — LEFT JOIN source to target on the primary key, WHERE target.pk IS NULL gives records that exist in source but not in target. (3) Checksum comparison — compute CRC32 or MD5 checksums on critical columns. Different checksums mean data changed even if counts match. (4) Financial totals — SUM(balance) on source vs target must match exactly for financial tables. (5) Sample comparison — select the same records in both systems and compare field by field. I always establish baselines before migration so I have concrete numbers to compare against.
Q: What is ETL and how do you test it?
A: ETL stands for Extract-Transform-Load. Extract pulls data from source systems into staging. Transform cleans, converts, and reshapes the data according to business rules. Load inserts the transformed data into the target. I test each phase: For Extract — compare source counts with staging counts to catch silent filters. For Transform — verify NULL handling, date conversions, name splitting, deduplication by checking staging output. For Load — compare staging counts with target counts, check for duplicates, and inspect the ETL error log for rejected records. Common bugs include WHERE clauses that silently exclude records, date format ambiguity, VARCHAR truncation, and duplicate loads from re-running jobs.
Q: What are the risks of a platform migration like Oracle to PostgreSQL?
A: Several critical risks: (1) Data type differences — Oracle DATE includes time, PostgreSQL DATE does not. Oracle VARCHAR2 vs standard VARCHAR. Oracle NUMBER has different precision than PostgreSQL NUMERIC. (2) SQL dialect differences — NVL() becomes COALESCE(), ROWNUM becomes LIMIT, SYSDATE becomes CURRENT_TIMESTAMP, sequences become SERIAL/IDENTITY. (3) Stored procedure rewrite — PL/SQL must be rewritten in PL/pgSQL, with different syntax and behavior. (4) Index and performance — query optimizer works differently, same indexes may not be optimal. (5) Character encoding — Oracle might use WE8MSWIN1252 while PostgreSQL uses UTF-8. (6) NULL handling — Oracle treats empty string as NULL, PostgreSQL does not. This alone can cause hundreds of bugs.
Q: How do you test rollback after a failed migration?
A: Rollback testing verifies that the system can return to its pre-migration state without data loss. My approach: (1) Document the pre-migration baseline — exact record counts, checksums, and sample data. (2) Execute the migration in a test environment. (3) Trigger a rollback — either manually or by intentionally causing a failure mid-migration. (4) After rollback, re-run all pre-migration baseline queries and compare — counts, checksums, and samples must match exactly. (5) Verify the application works normally on the rolled-back database. (6) Time the rollback — if rollback takes 4 hours but the maintenance window is 2 hours, the rollback plan is not viable. (7) Test rollback at different failure points — what if migration fails after 50% of tables are migrated?
Q: What do you check for in terms of data integrity after migration?
A: Three areas: (1) Referential integrity — LEFT JOIN every child table to its parent on the FK column. Any NULL result means an orphan record — a foreign key pointing to a non-existent parent. I check this for every relationship: orders-customers, transactions-accounts, order_items-products. (2) Constraint preservation — I query INFORMATION_SCHEMA to verify all foreign key constraints, NOT NULL constraints, unique constraints, and check constraints were recreated in the target. (3) Constraint enforcement — I verify FOREIGN_KEY_CHECKS is enabled (migration tools often disable it for performance). I also try inserting invalid data to confirm the constraints actually reject it. Missing constraints will not show problems immediately, but they will allow bad data in the future.
| Concept | Key Point to Remember |
|---|---|
| Pre-migration baseline | Record counts, checksums, data profiles, sample snapshots — your "before photo" |
| Record count matching | Necessary but NOT sufficient — equal counts do not mean correct data |
| Checksum comparison | Detects silent data changes that count matching misses |
| Missing record detection | LEFT JOIN source to target, WHERE target.pk IS NULL |
| Orphan records | Child records with FK pointing to non-existent parent |
| Data transformation testing | Test column splits, type conversions, default fills, precision changes independently |
| ETL testing | Test Extract, Transform, and Load phases separately. Always check error logs. |
| Index verification | Compare index counts and check EXPLAIN plans after migration |
| ANALYZE TABLE | Updates optimizer statistics — run this after every migration |
| Rollback testing | Verify you CAN actually roll back. Time it. Test at different failure points. |
Key Point: In interviews, always structure your answer in phases: pre-migration (baseline), migration (monitoring), post-migration (validation). Show that you think systematically, not just about individual queries.
Key Point: In interviews, structure your answers in phases and demonstrate systematic thinking. Show that you test completeness, accuracy, integrity, transformations, performance, and rollback — not just record counts.
Answer all 5 questions, then submit to see your score.
1. What is the FIRST thing you should do before a data migration begins?
2. Record counts match between source and target after migration. Does this guarantee the data is correct?
3. What is an orphan record in the context of data migration?
4. In an ETL pipeline, where would you check for silent data filtering that excludes valid records?
5. After migration, queries that took 10ms now take 2 seconds. What is the most likely cause?