Q: What is data validation in database testing?

A: Data validation is verifying that data in the database is accurate, complete, consistent, and follows all business rules. It operates at four levels: field-level (data type, length, format, NULL checks), record-level (cross-field rules like end_date > start_date), table-level (duplicates, orphans, referential integrity), and cross-system (source vs target comparison after migrations). I use SQL queries to detect violations at each level.

Q: How do you validate data after a migration?

A: I follow a structured four-step approach. First, count comparison — source and target should have identical row counts. Second, I find missing records using LEFT JOIN from source to target. Third, I find extra records using LEFT JOIN from target to source. Fourth, I compare individual field values for matching records to catch truncation, encoding, or transformation errors. For decimal fields, I use tolerance-based comparison with ABS() to handle floating-point precision. I document every discrepancy with the SQL query, expected value, and actual value.

Q: How do you detect duplicate records?

A: For exact duplicates, I use GROUP BY on the column that should be unique with HAVING COUNT(*) > 1. For fuzzy duplicates — same data in different formats — I normalize using LOWER(), TRIM(), and REGEXP_REPLACE before grouping. I also test the application for double-submit behavior by rapidly clicking Submit and checking if duplicate records appear. To prevent duplicates, I verify that UNIQUE constraints exist on the appropriate columns.

Q: What is referential integrity and how do you test it?

A: Referential integrity means every child record has a valid parent. I test it three ways: First, I query for orphan records using LEFT JOIN between child and parent tables, filtering WHERE parent.pk IS NULL. Second, I verify FK constraints exist using information_schema. Third, I test cascade behavior by deleting a parent and checking if children are handled correctly — CASCADE deletes them, SET NULL clears the FK, RESTRICT blocks the delete. I also test negative cases by trying to insert a child with a non-existent parent ID.

Q: Give an example of a data validation bug you caught that was not visible in the UI.

A: In a banking application, the UI showed "Transfer Successful" after every transaction. But when I ran a reconciliation query — comparing each account balance with the sum of all its transactions — I found 23 accounts where the stored balance did not match the calculated balance. The difference ranged from Rs 0.50 to Rs 2,300. The root cause was a race condition: when two transfers happened simultaneously on the same account, one update overwrote the other. The UI showed both as successful, but only one balance change persisted. I caught this purely through a SQL validation query.

Q: How do you automate database validation?

A: I use JDBC with Java in my TestNG framework. I create a utility class with methods for database connection, query execution, and result extraction. Each validation rule becomes a test method that runs a SQL query and asserts the result — for example, assertEquals(orphanCount, 0). I use PreparedStatement for parameterized queries to avoid SQL injection. The validation suite runs in our CI/CD pipeline after every deployment. If any test fails, the team gets an alert before the build is promoted.

Q: What is the difference between data validation and data verification?

A: Data validation checks if the data follows defined rules — correct format, within range, no NULLs where not allowed, business rules satisfied. Data verification checks if the data matches the source — after migration, the target data should be identical to the source. Validation asks "Is this data valid?" Verification asks "Is this data the same as what we started with?" Both are needed: a migration could produce valid data that is different from the source, or correct data that violates rules. I test both.

Interview Corner

Interview Corner