Data validation questions are favorites in banking, fintech, and e-commerce interviews. Interviewers want to know if you can go beyond UI testing and actually verify the data. Here are the most commonly asked questions. Practice answering them out loud — not just reading them.
Q: What is data validation in database testing?
A: Data validation is verifying that data in the database is accurate, complete, consistent, and follows all business rules. It operates at four levels: field-level (data type, length, format, NULL checks), record-level (cross-field rules like end_date > start_date), table-level (duplicates, orphans, referential integrity), and cross-system (source vs target comparison after migrations). I use SQL queries to detect violations at each level.
Q: How do you validate data after a migration?
A: I follow a structured four-step approach. First, count comparison — source and target should have identical row counts. Second, I find missing records using LEFT JOIN from source to target. Third, I find extra records using LEFT JOIN from target to source. Fourth, I compare individual field values for matching records to catch truncation, encoding, or transformation errors. For decimal fields, I use tolerance-based comparison with ABS() to handle floating-point precision. I document every discrepancy with the SQL query, expected value, and actual value.
Q: How do you detect duplicate records?
A: For exact duplicates, I use GROUP BY on the column that should be unique with HAVING COUNT(*) > 1. For fuzzy duplicates — same data in different formats — I normalize using LOWER(), TRIM(), and REGEXP_REPLACE before grouping. I also test the application for double-submit behavior by rapidly clicking Submit and checking if duplicate records appear. To prevent duplicates, I verify that UNIQUE constraints exist on the appropriate columns.
Q: What is referential integrity and how do you test it?
A: Referential integrity means every child record has a valid parent. I test it three ways: First, I query for orphan records using LEFT JOIN between child and parent tables, filtering WHERE parent.pk IS NULL. Second, I verify FK constraints exist using information_schema. Third, I test cascade behavior by deleting a parent and checking if children are handled correctly — CASCADE deletes them, SET NULL clears the FK, RESTRICT blocks the delete. I also test negative cases by trying to insert a child with a non-existent parent ID.
Q: Give an example of a data validation bug you caught that was not visible in the UI.
A: In a banking application, the UI showed "Transfer Successful" after every transaction. But when I ran a reconciliation query — comparing each account balance with the sum of all its transactions — I found 23 accounts where the stored balance did not match the calculated balance. The difference ranged from Rs 0.50 to Rs 2,300. The root cause was a race condition: when two transfers happened simultaneously on the same account, one update overwrote the other. The UI showed both as successful, but only one balance change persisted. I caught this purely through a SQL validation query.
Q: How do you automate database validation?
A: I use JDBC with Java in my TestNG framework. I create a utility class with methods for database connection, query execution, and result extraction. Each validation rule becomes a test method that runs a SQL query and asserts the result — for example, assertEquals(orphanCount, 0). I use PreparedStatement for parameterized queries to avoid SQL injection. The validation suite runs in our CI/CD pipeline after every deployment. If any test fails, the team gets an alert before the build is promoted.
Q: What is the difference between data validation and data verification?
A: Data validation checks if the data follows defined rules — correct format, within range, no NULLs where not allowed, business rules satisfied. Data verification checks if the data matches the source — after migration, the target data should be identical to the source. Validation asks "Is this data valid?" Verification asks "Is this data the same as what we started with?" Both are needed: a migration could produce valid data that is different from the source, or correct data that violates rules. I test both.
In interviews, always give concrete SQL examples. Do not just say "I check for duplicates." Say "I use SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1." Specificity shows you actually do this work, not just talk about it.
Key Point: For interviews: know the four validation levels (field, record, table, cross-system), have SQL examples ready for each, and be able to describe your JDBC automation approach step by step.
Key Point: Data validation interview answers should include specific SQL patterns, mention all four validation levels, and describe your automation approach with JDBC.
Answer all 5 questions, then submit to see your score.
1. A user submits an order via the UI and sees "Order Placed Successfully." But in the database, the order total is Rs 1,485 while the sum of line items is Rs 1,500. Which validation technique would catch this?
2. You find 15 orders in the orders table that reference a user_id which does not exist in the users table. What is the most likely root cause?
3. Which SQL pattern correctly detects fuzzy duplicate emails like "Ravi@Email.COM" and "ravi@email.com"?
4. After a data migration from old_users to users, you find that old_users has 10,000 rows but users has 9,847 rows. What is the FIRST query you should run?
5. When comparing monetary values stored as DECIMAL in the database, why should you use ABS(value1 - value2) > 0.01 instead of value1 != value2?