Theory questions test knowledge. Scenario questions test thinking. Senior interviewers skip "What is a JOIN?" and ask "Given this business problem, write a query." This lesson prepares you for those tough, real-world scenario questions.
Q: The finance team reports that the total revenue in the dashboard does not match the total payments collected. How would you investigate this using SQL?
A: I would run 4 queries: (1) Compare total order amounts vs total payment amounts: SELECT SUM(total_amount) FROM orders WHERE status = "completed" and compare with SELECT SUM(amount) FROM payments WHERE payment_status = "success". (2) Find orders with no payment: LEFT JOIN orders with payments WHERE payment_id IS NULL. (3) Find payment amount mismatches: JOIN orders with payments WHERE o.total_amount != p.amount. (4) Check for duplicate payments: GROUP BY order_id HAVING COUNT(*) > 1. The discrepancy is usually one of these: missing payment records, amount mismatches, duplicate payments, or cancelled orders still counted as revenue.
Q: A report query that used to run in 2 seconds now takes 45 seconds after the table grew to 10 million rows. How would you diagnose and fix it?
A: Step 1: Run EXPLAIN on the query to see the execution plan. Look for "full table scan" — that means no index is being used. Step 2: Check if indexes exist on columns used in WHERE, JOIN, and ORDER BY clauses. Step 3: Add indexes on frequently filtered columns. Step 4: Check if the query uses functions on indexed columns (like WHERE YEAR(date) = 2024) — this prevents index usage. Rewrite as WHERE date BETWEEN "2024-01-01" AND "2024-12-31". Step 5: Consider if the query can use LIMIT or pagination. Step 6: Check if subqueries can be replaced with JOINs — JOINs are usually faster.
Q: You just migrated 500,000 customer records from an old system to a new one. How would you validate the migration was successful?
A: I would run 5 categories of validation queries: (1) Count check: SELECT COUNT(*) from both old and new tables — they must match. (2) Column-level check: Compare SUM, MIN, MAX, AVG of numeric columns. SELECT SUM(balance) from both systems. (3) Sample check: Pick 50 random records and compare all columns between old and new. SELECT * FROM old WHERE id IN (random_ids) and compare row by row. (4) NULL check: SELECT COUNT(*) WHERE required_column IS NULL — should be 0 for required fields. (5) Referential integrity: JOIN new tables to verify foreign keys resolve correctly. I would also check for truncation — compare MAX(LENGTH(name)) between old and new to ensure no data was cut off by shorter column definitions.
Q: Users are receiving duplicate emails for the same order. How would you use SQL to find and fix the root cause?
A: First, identify the duplicates: SELECT order_id, COUNT(*) FROM email_log GROUP BY order_id, email_type HAVING COUNT(*) > 1. Then investigate the pattern: Are duplicates happening for all orders or specific ones? Check timestamps: SELECT * FROM email_log WHERE order_id = [duplicate_id] ORDER BY sent_at — are they sent milliseconds apart (race condition) or minutes apart (retry bug)? Check the trigger: Is there a database trigger that fires on INSERT? Is the application calling the email service twice? Fix: Add a UNIQUE constraint on (order_id, email_type) to prevent duplicate records. For existing duplicates: DELETE using ROW_NUMBER to keep only the first occurrence.
Q: Build a customer leaderboard showing rank, name, total spent, total orders, average order value, and the customer segment (Bronze/Silver/Gold/Platinum based on spending). Handle ties in ranking.
A: SELECT DENSE_RANK() OVER (ORDER BY total_spent DESC) AS rank, full_name, total_orders, total_spent, avg_order_value, CASE WHEN total_spent >= 100000 THEN "Platinum" WHEN total_spent >= 50000 THEN "Gold" WHEN total_spent >= 10000 THEN "Silver" ELSE "Bronze" END AS segment FROM (SELECT CONCAT(c.first_name, " ", c.last_name) AS full_name, COUNT(o.order_id) AS total_orders, SUM(o.total_amount) AS total_spent, ROUND(AVG(o.total_amount), 2) AS avg_order_value FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.status = "completed" GROUP BY c.customer_id, c.first_name, c.last_name) AS customer_summary ORDER BY rank. I use DENSE_RANK so ties get the same rank and no numbers are skipped.
-- Full leaderboard query
SELECT
DENSE_RANK() OVER (ORDER BY total_spent DESC) AS rnk,
full_name,
total_orders,
total_spent,
avg_order_value,
CASE
WHEN total_spent >= 100000 THEN 'Platinum'
WHEN total_spent >= 50000 THEN 'Gold'
WHEN total_spent >= 10000 THEN 'Silver'
ELSE 'Bronze'
END AS segment
FROM (
SELECT
CONCAT(c.first_name, ' ', c.last_name) AS full_name,
COUNT(o.order_id) AS total_orders,
SUM(o.total_amount) AS total_spent,
ROUND(AVG(o.total_amount), 2) AS avg_order_value
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.first_name, c.last_name
) AS customer_summary
ORDER BY rnk;Q: The product team ran an A/B test on checkout flow. Group A (customer_id 1-8) saw the old flow. Group B (customer_id 9-15) saw the new flow. Compare conversion rates and average order values between the two groups.
A: I would write: SELECT CASE WHEN c.customer_id <= 8 THEN "Group A (Old)" ELSE "Group B (New)" END AS test_group, COUNT(DISTINCT c.customer_id) AS total_customers, COUNT(DISTINCT CASE WHEN o.status = "completed" THEN o.order_id END) AS completed_orders, ROUND(AVG(CASE WHEN o.status = "completed" THEN o.total_amount END), 2) AS avg_order_value, ROUND(COUNT(DISTINCT CASE WHEN o.status = "completed" THEN c.customer_id END) * 100.0 / COUNT(DISTINCT c.customer_id), 1) AS conversion_pct FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY CASE WHEN c.customer_id <= 8 THEN "Group A (Old)" ELSE "Group B (New)" END. Key insight: I use LEFT JOIN so customers with zero orders are still counted in the denominator for conversion rate.
-- A/B Test comparison query
SELECT
CASE WHEN c.customer_id <= 8
THEN 'Group A (Old Flow)'
ELSE 'Group B (New Flow)'
END AS test_group,
COUNT(DISTINCT c.customer_id) AS total_customers,
COUNT(DISTINCT CASE WHEN o.status = 'completed'
THEN o.order_id END) AS completed_orders,
ROUND(AVG(CASE WHEN o.status = 'completed'
THEN o.total_amount END), 2) AS avg_order_value,
ROUND(
COUNT(DISTINCT CASE WHEN o.status = 'completed'
THEN c.customer_id END) * 100.0 /
COUNT(DISTINCT c.customer_id), 1
) AS conversion_pct
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY
CASE WHEN c.customer_id <= 8
THEN 'Group A (Old Flow)'
ELSE 'Group B (New Flow)'
END;Q: At 3 AM, the on-call engineer gets paged: "Orders are being created but payments are not being recorded." How would you use SQL to assess the blast radius and help debug?
A: Blast radius assessment — 3 queries: (1) How many orders are affected? SELECT COUNT(*) FROM orders o LEFT JOIN payments p ON o.order_id = p.order_id WHERE p.payment_id IS NULL AND o.order_date >= "2024-06-28" (the incident window). (2) What is the total money at risk? Same query but with SUM(o.total_amount). (3) Which customers are affected? Add customer details with JOIN. Debugging — 2 queries: (4) What was the last successful payment? SELECT MAX(payment_date) FROM payments WHERE payment_status = "success". This tells you exactly when the system broke. (5) Are ALL payment methods failing or just one? SELECT payment_method, MAX(payment_date) FROM payments WHERE payment_status = "success" GROUP BY payment_method. If only UPI has old timestamps, the UPI integration is down. This structured approach gives the incident commander the data they need within minutes.
In scenario questions, always start with "First I would check..." and walk through your approach step by step. Interviewers care more about your systematic thinking than the perfect query syntax.
Key Point: Scenario-based interviews test how you THINK, not what you memorize. Structure your answers: (1) Understand the problem, (2) Identify which tables you need, (3) Write the query, (4) Explain what the results tell you.
Key Point: Scenario questions are the toughest part of SQL interviews. Always structure your answer: assess the problem, identify the tables, write the query, interpret the results.
Answer all 5 questions, then submit to see your score.
1. Which SQL pattern is MOST useful for finding orphan records in QA testing?
2. What is the key difference between RANK() and DENSE_RANK()?
3. In the SQL execution order, which runs FIRST?
4. You find that SUM(order_items.quantity * order_items.unit_price) does not match orders.total_amount. What type of data validation issue is this?
5. When testing a stored procedure with a discount rule "orders OVER Rs 50,000 get 10% off", which test case is MOST critical?