Chapter 3: Aggregate Functions and Window Functions
LAG looks backward. LEAD looks forward. LAG gives you the previous row's value. LEAD gives you the next row's value. This is how you detect trends, calculate changes, and spot sudden jumps or drops in data. Think of it like checking yesterday's price to see how much today's price changed.
-- LAG: Get the previous day's revenue alongside today's
SELECT
order_date,
SUM(total_amount) AS daily_revenue,
LAG(SUM(total_amount)) OVER(ORDER BY order_date) AS prev_day_revenue
FROM orders
GROUP BY order_date
ORDER BY order_date;
-- Result:
-- order_date | daily_revenue | prev_day_revenue
-- 2024-01-01 | 15000 | NULL (no previous day)
-- 2024-01-02 | 18000 | 15000
-- 2024-01-03 | 12000 | 18000
-- 2024-01-04 | 22000 | 12000
-- LEAD: See the next day's revenue
SELECT
order_date,
SUM(total_amount) AS daily_revenue,
LEAD(SUM(total_amount)) OVER(ORDER BY order_date) AS next_day_revenue
FROM orders
GROUP BY order_date
ORDER BY order_date;-- Daily revenue with change and percentage change
SELECT
order_date,
daily_revenue,
prev_day_revenue,
daily_revenue - prev_day_revenue AS change_amount,
ROUND(
(daily_revenue - prev_day_revenue) * 100.0 / prev_day_revenue, 2
) AS pct_change
FROM (
SELECT
order_date,
SUM(total_amount) AS daily_revenue,
LAG(SUM(total_amount)) OVER(ORDER BY order_date) AS prev_day_revenue
FROM orders
GROUP BY order_date
) daily_stats
ORDER BY order_date;-- Find days where order count jumped by more than 200%
-- Could indicate a bot attack or a test data leak
SELECT * FROM (
SELECT
order_date,
COUNT(*) AS order_count,
LAG(COUNT(*)) OVER(ORDER BY order_date) AS prev_day_count,
CASE
WHEN LAG(COUNT(*)) OVER(ORDER BY order_date) > 0
THEN ROUND(
COUNT(*) * 100.0 / LAG(COUNT(*)) OVER(ORDER BY order_date), 2
)
ELSE NULL
END AS pct_of_prev_day
FROM orders
GROUP BY order_date
) daily_summary
WHERE pct_of_prev_day > 200
ORDER BY order_date;
-- Compare each customer's current order with their previous one
SELECT
customer_id,
order_id,
order_date,
total_amount,
LAG(total_amount) OVER(
PARTITION BY customer_id ORDER BY order_date
) AS prev_order_amount,
total_amount - LAG(total_amount) OVER(
PARTITION BY customer_id ORDER BY order_date
) AS amount_change
FROM orders
ORDER BY customer_id, order_date;LAG and LEAD accept a second argument for offset and a third for default value. LAG(amount, 2) looks 2 rows back. LAG(amount, 1, 0) uses 0 instead of NULL when there is no previous row. This avoids NULL arithmetic issues.
Q: How would you use LAG to detect data quality issues?
A: LAG lets you compare adjacent rows in a time series. If daily active users normally fluctuate by 5-10% but suddenly drops 80%, that is a data pipeline issue or a bug. You write a query with LAG to get yesterday's value, calculate the percentage change, and filter for abnormal changes. This is a standard approach in QA for monitoring dashboards and reports over time.
Key Point: LAG looks at the previous row. LEAD looks at the next row. Use them to calculate day-over-day changes, detect trends, and spot anomalies in time-series data.