Chapter 3: Aggregate Functions and Window Functions
Aggregate and window function questions are common in QA interviews, especially for senior roles. Window functions separate candidates who only know basic SQL from those who can do real data analysis. Here are the most asked questions.
Q: What is the difference between WHERE and HAVING?
A: WHERE filters individual rows before grouping happens. HAVING filters groups after aggregation. You cannot use aggregate functions in WHERE because the groups do not exist yet at that stage. Example: WHERE status = "completed" filters rows. HAVING COUNT(*) > 5 filters groups. Always prefer WHERE when possible because it reduces data before GROUP BY processes it, making the query faster.
Q: What is a window function? How is it different from GROUP BY?
A: A window function performs a calculation across a set of rows related to the current row, without collapsing them. GROUP BY collapses 10 rows into 1 summary row. Window functions keep all 10 rows and add a calculated column. For example, SUM with GROUP BY gives you total revenue per customer (one row per customer). SUM with OVER(PARTITION BY customer_id) gives you every order row with a running total column added.
Q: Explain ROW_NUMBER vs RANK vs DENSE_RANK with an example.
A: Assume scores 95, 90, 90, 85. ROW_NUMBER assigns 1, 2, 3, 4 — always unique, no ties. RANK assigns 1, 2, 2, 4 — ties share a rank, then skips the next number. DENSE_RANK assigns 1, 2, 2, 3 — ties share a rank, no gap. Use ROW_NUMBER when you need exactly one row per group. Use RANK for competitive ranking where gaps matter. Use DENSE_RANK for "top N distinct values" queries.
Q: How would you find duplicate records using SQL?
A: Two approaches. First, GROUP BY + HAVING: SELECT email, COUNT(*) FROM customers GROUP BY email HAVING COUNT(*) > 1. This shows which values are duplicated and how many times. Second, ROW_NUMBER: PARTITION BY the duplicate columns, ORDER BY created_at, then WHERE rn > 1. This approach is better because it shows you the actual duplicate rows, not just the count, and you can use it to delete duplicates by keeping rn = 1 and removing the rest.
Q: What is PARTITION BY? How is it different from GROUP BY?
A: Both divide data into groups. GROUP BY collapses each group into one row and is used with aggregate functions. PARTITION BY divides data into groups for window functions but keeps all rows in the output. If you have 100 orders from 10 customers: GROUP BY customer_id gives 10 rows. PARTITION BY customer_id keeps 100 rows but calculates window functions separately for each customer.
Q: How would you calculate a running total in SQL?
A: Use SUM as a window function with ORDER BY. Example: SUM(amount) OVER(ORDER BY date) gives a cumulative sum ordered by date. Each row shows the total of all amounts up to and including that row. For per-customer running totals, add PARTITION BY customer_id inside OVER(). Without ORDER BY, SUM OVER() returns the grand total in every row.
Q: What are LAG and LEAD? Give a practical use case.
A: LAG accesses a value from the previous row. LEAD accesses a value from the next row. Both require ORDER BY inside OVER(). Practical use: detecting revenue drops. Use LAG to get yesterday's revenue alongside today's, then calculate the percentage change. If the change exceeds a threshold like 50% drop, it flags a potential issue — maybe a payment gateway went down, or a bug broke the checkout flow.
| Situation | Use This |
|---|---|
| Count, sum, average of all rows | Aggregate functions (COUNT, SUM, AVG) |
| Aggregate per category/group | GROUP BY |
| Filter groups by aggregate value | HAVING |
| Rank rows without collapsing | ROW_NUMBER / RANK / DENSE_RANK |
| Running total or cumulative count | SUM/COUNT OVER(ORDER BY ...) |
| Compare with previous/next row | LAG / LEAD |
| Calculate per-group windows | PARTITION BY inside OVER() |
| Find duplicates | ROW_NUMBER + PARTITION BY or GROUP BY + HAVING |
| Moving average | AVG OVER(ORDER BY ... ROWS BETWEEN N PRECEDING AND CURRENT ROW) |
Key Point: Window functions are a senior-level SQL skill. In interviews, always explain what problem the function solves, not just how it works. Connect every answer to a real QA scenario — duplicate detection, report verification, trend analysis.
Key Point: Aggregate and window function questions test your depth. Know WHERE vs HAVING, GROUP BY vs PARTITION BY, and all three ranking functions. Always connect answers to QA use cases.
Answer all 5 questions, then submit to see your score.
1. What is the difference between COUNT(*) and COUNT(column_name)?
2. Which clause is used to filter groups after aggregation?
3. If three students have the same score, what does DENSE_RANK assign to the next student?
4. Which window function would you use to compare today's revenue with yesterday's revenue?
5. What does PARTITION BY do inside a window function's OVER() clause?