Q: The finance team reports that the total revenue in the dashboard does not match the total payments collected. How would you investigate this using SQL?

A: I would run 4 queries: (1) Compare total order amounts vs total payment amounts: SELECT SUM(total_amount) FROM orders WHERE status = "completed" and compare with SELECT SUM(amount) FROM payments WHERE payment_status = "success". (2) Find orders with no payment: LEFT JOIN orders with payments WHERE payment_id IS NULL. (3) Find payment amount mismatches: JOIN orders with payments WHERE o.total_amount != p.amount. (4) Check for duplicate payments: GROUP BY order_id HAVING COUNT(*) > 1. The discrepancy is usually one of these: missing payment records, amount mismatches, duplicate payments, or cancelled orders still counted as revenue.

Q: A report query that used to run in 2 seconds now takes 45 seconds after the table grew to 10 million rows. How would you diagnose and fix it?

A: Step 1: Run EXPLAIN on the query to see the execution plan. Look for "full table scan" — that means no index is being used. Step 2: Check if indexes exist on columns used in WHERE, JOIN, and ORDER BY clauses. Step 3: Add indexes on frequently filtered columns. Step 4: Check if the query uses functions on indexed columns (like WHERE YEAR(date) = 2024) — this prevents index usage. Rewrite as WHERE date BETWEEN "2024-01-01" AND "2024-12-31". Step 5: Consider if the query can use LIMIT or pagination. Step 6: Check if subqueries can be replaced with JOINs — JOINs are usually faster.

Q: You just migrated 500,000 customer records from an old system to a new one. How would you validate the migration was successful?

A: I would run 5 categories of validation queries: (1) Count check: SELECT COUNT(*) from both old and new tables — they must match. (2) Column-level check: Compare SUM, MIN, MAX, AVG of numeric columns. SELECT SUM(balance) from both systems. (3) Sample check: Pick 50 random records and compare all columns between old and new. SELECT * FROM old WHERE id IN (random_ids) and compare row by row. (4) NULL check: SELECT COUNT(*) WHERE required_column IS NULL — should be 0 for required fields. (5) Referential integrity: JOIN new tables to verify foreign keys resolve correctly. I would also check for truncation — compare MAX(LENGTH(name)) between old and new to ensure no data was cut off by shorter column definitions.

Q: Users are receiving duplicate emails for the same order. How would you use SQL to find and fix the root cause?

A: First, identify the duplicates: SELECT order_id, COUNT(*) FROM email_log GROUP BY order_id, email_type HAVING COUNT(*) > 1. Then investigate the pattern: Are duplicates happening for all orders or specific ones? Check timestamps: SELECT * FROM email_log WHERE order_id = [duplicate_id] ORDER BY sent_at — are they sent milliseconds apart (race condition) or minutes apart (retry bug)? Check the trigger: Is there a database trigger that fires on INSERT? Is the application calling the email service twice? Fix: Add a UNIQUE constraint on (order_id, email_type) to prevent duplicate records. For existing duplicates: DELETE using ROW_NUMBER to keep only the first occurrence.

Q: Build a customer leaderboard showing rank, name, total spent, total orders, average order value, and the customer segment (Bronze/Silver/Gold/Platinum based on spending). Handle ties in ranking.

A: SELECT DENSE_RANK() OVER (ORDER BY total_spent DESC) AS rank, full_name, total_orders, total_spent, avg_order_value, CASE WHEN total_spent >= 100000 THEN "Platinum" WHEN total_spent >= 50000 THEN "Gold" WHEN total_spent >= 10000 THEN "Silver" ELSE "Bronze" END AS segment FROM (SELECT CONCAT(c.first_name, " ", c.last_name) AS full_name, COUNT(o.order_id) AS total_orders, SUM(o.total_amount) AS total_spent, ROUND(AVG(o.total_amount), 2) AS avg_order_value FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.status = "completed" GROUP BY c.customer_id, c.first_name, c.last_name) AS customer_summary ORDER BY rank. I use DENSE_RANK so ties get the same rank and no numbers are skipped.

Q: The product team ran an A/B test on checkout flow. Group A (customer_id 1-8) saw the old flow. Group B (customer_id 9-15) saw the new flow. Compare conversion rates and average order values between the two groups.

A: I would write: SELECT CASE WHEN c.customer_id <= 8 THEN "Group A (Old)" ELSE "Group B (New)" END AS test_group, COUNT(DISTINCT c.customer_id) AS total_customers, COUNT(DISTINCT CASE WHEN o.status = "completed" THEN o.order_id END) AS completed_orders, ROUND(AVG(CASE WHEN o.status = "completed" THEN o.total_amount END), 2) AS avg_order_value, ROUND(COUNT(DISTINCT CASE WHEN o.status = "completed" THEN c.customer_id END) * 100.0 / COUNT(DISTINCT c.customer_id), 1) AS conversion_pct FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY CASE WHEN c.customer_id <= 8 THEN "Group A (Old)" ELSE "Group B (New)" END. Key insight: I use LEFT JOIN so customers with zero orders are still counted in the denominator for conversion rate.

Q: At 3 AM, the on-call engineer gets paged: "Orders are being created but payments are not being recorded." How would you use SQL to assess the blast radius and help debug?

A: Blast radius assessment — 3 queries: (1) How many orders are affected? SELECT COUNT(*) FROM orders o LEFT JOIN payments p ON o.order_id = p.order_id WHERE p.payment_id IS NULL AND o.order_date >= "2024-06-28" (the incident window). (2) What is the total money at risk? Same query but with SUM(o.total_amount). (3) Which customers are affected? Add customer details with JOIN. Debugging — 2 queries: (4) What was the last successful payment? SELECT MAX(payment_date) FROM payments WHERE payment_status = "success". This tells you exactly when the system broke. (5) Are ALL payment methods failing or just one? SELECT payment_method, MAX(payment_date) FROM payments WHERE payment_status = "success" GROUP BY payment_method. If only UPI has old timestamps, the UPI integration is down. This structured approach gives the incident commander the data they need within minutes.

Interview Corner — Scenario-Based SQL Questions

Scenario 1: The Missing Money

Scenario 2: The Slow Report

Scenario 3: Data Migration Validation

Scenario 4: The Duplicate Bug

Scenario 5: The Leaderboard

Scenario 6: The A/B Test

Scenario 7: The Production Incident