Chapter 3: Aggregate Functions and Window Functions
Theory is done. Now let me show you the queries you will actually run at work. These are patterns that catch real bugs — duplicates that slip through, reports that show wrong numbers, data trends that indicate something is broken.
The most elegant way to find and remove duplicates. Assign ROW_NUMBER within each group of potential duplicates. Keep rn = 1, delete the rest.
-- Find duplicate customer emails
SELECT * FROM (
SELECT
customer_id,
email,
first_name,
created_at,
ROW_NUMBER() OVER(
PARTITION BY email
ORDER BY created_at ASC
) AS rn
FROM customers
) ranked
WHERE rn > 1;
-- rn > 1 means these are the duplicates (not the original)
-- Find duplicate orders (same customer, same amount, same date)
-- Possible double-submit bug
SELECT * FROM (
SELECT
order_id,
customer_id,
total_amount,
order_date,
ROW_NUMBER() OVER(
PARTITION BY customer_id, total_amount, order_date
ORDER BY order_id ASC
) AS rn
FROM orders
) ranked
WHERE rn > 1;-- Verify monthly revenue report
-- The dashboard shows revenue, order count, avg order value per month
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS total_orders,
SUM(total_amount) AS monthly_revenue,
ROUND(AVG(total_amount), 2) AS avg_order_value,
MIN(total_amount) AS min_order,
MAX(total_amount) AS max_order
FROM orders
WHERE status IN ('completed', 'delivered')
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
-- Verify "Top 5 Customers" widget on admin dashboard
SELECT
c.first_name,
c.last_name,
COUNT(o.order_id) AS total_orders,
SUM(o.total_amount) AS lifetime_value
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY lifetime_value DESC
LIMIT 5;-- Week-over-week comparison to spot sudden drops
SELECT
week_start,
weekly_orders,
prev_week_orders,
weekly_orders - prev_week_orders AS change,
CASE
WHEN prev_week_orders > 0 THEN
ROUND((weekly_orders - prev_week_orders) * 100.0 / prev_week_orders, 1)
ELSE NULL
END AS pct_change
FROM (
SELECT
DATE_TRUNC('week', order_date) AS week_start,
COUNT(*) AS weekly_orders,
LAG(COUNT(*)) OVER(ORDER BY DATE_TRUNC('week', order_date)) AS prev_week_orders
FROM orders
GROUP BY DATE_TRUNC('week', order_date)
) weekly_stats
ORDER BY week_start DESC;
-- Detect hours with zero orders (possible system outage)
SELECT gs.hour_slot
FROM generate_series(
DATE_TRUNC('day', CURRENT_DATE - INTERVAL '1 day'),
DATE_TRUNC('day', CURRENT_DATE) - INTERVAL '1 hour',
INTERVAL '1 hour'
) AS gs(hour_slot)
LEFT JOIN orders o ON DATE_TRUNC('hour', o.order_date) = gs.hour_slot
WHERE o.order_id IS NULL;-- Orders where stored total does not match calculated sum of line items
SELECT
o.order_id,
o.total_amount AS stored_total,
SUM(oi.quantity * oi.unit_price) AS calculated_total,
o.total_amount - SUM(oi.quantity * oi.unit_price) AS difference
FROM orders o
INNER JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.order_id, o.total_amount
HAVING o.total_amount != SUM(oi.quantity * oi.unit_price);
-- This is a classic bug: rounding errors, missed discount logic,
-- or stale cached totals after item updatesKey Point: Aggregate and window functions are not just for reporting. In QA, they are bug-finding tools. Duplicate detection with ROW_NUMBER, report verification with SUM/COUNT/AVG, trend analysis with LAG — these queries catch bugs that manual testing never will.
Key Point: Use ROW_NUMBER to find duplicates. Use SUM/AVG to verify reports. Use LAG to detect trend breaks. These are your daily QA weapons.