Q: What is the difference between WHERE and HAVING?

A: WHERE filters individual rows before grouping happens. HAVING filters groups after aggregation. You cannot use aggregate functions in WHERE because the groups do not exist yet at that stage. Example: WHERE status = "completed" filters rows. HAVING COUNT(*) > 5 filters groups. Always prefer WHERE when possible because it reduces data before GROUP BY processes it, making the query faster.

Q: What is a window function? How is it different from GROUP BY?

A: A window function performs a calculation across a set of rows related to the current row, without collapsing them. GROUP BY collapses 10 rows into 1 summary row. Window functions keep all 10 rows and add a calculated column. For example, SUM with GROUP BY gives you total revenue per customer (one row per customer). SUM with OVER(PARTITION BY customer_id) gives you every order row with a running total column added.

Q: Explain ROW_NUMBER vs RANK vs DENSE_RANK with an example.

A: Assume scores 95, 90, 90, 85. ROW_NUMBER assigns 1, 2, 3, 4 — always unique, no ties. RANK assigns 1, 2, 2, 4 — ties share a rank, then skips the next number. DENSE_RANK assigns 1, 2, 2, 3 — ties share a rank, no gap. Use ROW_NUMBER when you need exactly one row per group. Use RANK for competitive ranking where gaps matter. Use DENSE_RANK for "top N distinct values" queries.

Q: How would you find duplicate records using SQL?

A: Two approaches. First, GROUP BY + HAVING: SELECT email, COUNT(*) FROM customers GROUP BY email HAVING COUNT(*) > 1. This shows which values are duplicated and how many times. Second, ROW_NUMBER: PARTITION BY the duplicate columns, ORDER BY created_at, then WHERE rn > 1. This approach is better because it shows you the actual duplicate rows, not just the count, and you can use it to delete duplicates by keeping rn = 1 and removing the rest.

Q: What is PARTITION BY? How is it different from GROUP BY?

A: Both divide data into groups. GROUP BY collapses each group into one row and is used with aggregate functions. PARTITION BY divides data into groups for window functions but keeps all rows in the output. If you have 100 orders from 10 customers: GROUP BY customer_id gives 10 rows. PARTITION BY customer_id keeps 100 rows but calculates window functions separately for each customer.

Q: How would you calculate a running total in SQL?

A: Use SUM as a window function with ORDER BY. Example: SUM(amount) OVER(ORDER BY date) gives a cumulative sum ordered by date. Each row shows the total of all amounts up to and including that row. For per-customer running totals, add PARTITION BY customer_id inside OVER(). Without ORDER BY, SUM OVER() returns the grand total in every row.

Q: What are LAG and LEAD? Give a practical use case.

A: LAG accesses a value from the previous row. LEAD accesses a value from the next row. Both require ORDER BY inside OVER(). Practical use: detecting revenue drops. Use LAG to get yesterday's revenue alongside today's, then calculate the percentage change. If the change exceeds a threshold like 50% drop, it flags a potential issue — maybe a payment gateway went down, or a bug broke the checkout flow.

Situation	Use This
Count, sum, average of all rows	Aggregate functions (COUNT, SUM, AVG)
Aggregate per category/group	GROUP BY
Filter groups by aggregate value	HAVING
Rank rows without collapsing	ROW_NUMBER / RANK / DENSE_RANK
Running total or cumulative count	SUM/COUNT OVER(ORDER BY ...)
Compare with previous/next row	LAG / LEAD
Calculate per-group windows	PARTITION BY inside OVER()
Find duplicates	ROW_NUMBER + PARTITION BY or GROUP BY + HAVING
Moving average	AVG OVER(ORDER BY ... ROWS BETWEEN N PRECEDING AND CURRENT ROW)

Situation	Use This
Count, sum, average of all rows	Aggregate functions (COUNT, SUM, AVG)
Aggregate per category/group	GROUP BY
Filter groups by aggregate value	HAVING
Rank rows without collapsing	ROW_NUMBER / RANK / DENSE_RANK
Running total or cumulative count	SUM/COUNT OVER(ORDER BY ...)
Compare with previous/next row	LAG / LEAD
Calculate per-group windows	PARTITION BY inside OVER()
Find duplicates	ROW_NUMBER + PARTITION BY or GROUP BY + HAVING
Moving average	AVG OVER(ORDER BY ... ROWS BETWEEN N PRECEDING AND CURRENT ROW)

Interview Corner — Aggregates and Window Functions

Quick Reference

Interview Corner — Aggregates and Window Functions

Quick Reference