Subqueries are queries inside queries. Some people call them nested queries. They are ugly to read but incredibly powerful. These 6 problems cover every subquery type.
| Subquery Type | Returns | Used With | Example Use |
|---|---|---|---|
| Scalar | Single value | =, >, < | Find orders above average amount |
| Row | Single row, multiple columns | =, IN | Find the customer with max spending |
| Table (IN) | List of values | IN, NOT IN | Find customers who ordered a specific product |
| Correlated | Depends on outer query | EXISTS, comparisons | Find customers whose order exceeds their average |
| EXISTS | TRUE/FALSE | WHERE EXISTS | Find products that have at least one review |
Find all orders where total_amount is above the average order amount.
SELECT order_id, customer_id, order_date,
total_amount, status
FROM orders
WHERE total_amount > (
SELECT AVG(total_amount) FROM orders
)
ORDER BY total_amount DESC;Find customers who have ordered at least one product in the Electronics category.
SELECT customer_id, first_name, last_name, email
FROM customers
WHERE customer_id IN (
SELECT DISTINCT o.customer_id
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE p.category = 'Electronics'
);Find products that have zero reviews. Use EXISTS instead of LEFT JOIN this time.
SELECT p.product_id, p.product_name, p.category, p.price
FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM reviews r
WHERE r.product_id = p.product_id
);EXISTS is often faster than IN for large datasets. EXISTS stops scanning as soon as it finds the first match. IN builds the entire list first. For QA with millions of rows, this matters.
Find customers whose total spending is higher than the average spending in their city. This is a correlated subquery — the inner query depends on the outer query.
SELECT c.first_name, c.last_name, c.city,
SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.first_name, c.last_name, c.city
HAVING SUM(o.total_amount) > (
SELECT AVG(city_total)
FROM (
SELECT SUM(o2.total_amount) AS city_total
FROM customers c2
JOIN orders o2 ON c2.customer_id = o2.customer_id
WHERE c2.city = c.city
AND o2.status = 'completed'
GROUP BY c2.customer_id
) AS city_averages
)
ORDER BY total_spent DESC;Find the second highest order amount for each customer. Customers with only one order should not appear.
SELECT c.first_name, c.last_name, o.total_amount AS second_highest
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.total_amount = (
SELECT DISTINCT o2.total_amount
FROM orders o2
WHERE o2.customer_id = c.customer_id
ORDER BY o2.total_amount DESC
LIMIT 1 OFFSET 1
)
ORDER BY c.first_name;Find customers who placed an order in every month between their first and last order. These are your most consistent buyers.
SELECT c.first_name, c.last_name,
MIN(o.order_date) AS first_order,
MAX(o.order_date) AS last_order,
COUNT(DISTINCT DATE_FORMAT(o.order_date, '%Y-%m')) AS active_months,
TIMESTAMPDIFF(MONTH, MIN(o.order_date), MAX(o.order_date)) + 1 AS expected_months
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
HAVING COUNT(DISTINCT DATE_FORMAT(o.order_date, '%Y-%m')) =
TIMESTAMPDIFF(MONTH, MIN(o.order_date), MAX(o.order_date)) + 1
AND TIMESTAMPDIFF(MONTH, MIN(o.order_date), MAX(o.order_date)) >= 1
ORDER BY active_months DESC;Q: What is a correlated subquery? How is it different from a regular subquery?
A: A regular subquery runs ONCE and returns a fixed result. The outer query uses that result. A correlated subquery runs ONCE FOR EACH ROW of the outer query because it references a column from the outer query. Example: "Find employees earning more than their department average" — the inner query depends on the department of the current row being processed. Correlated subqueries are slower because they execute repeatedly, but sometimes they are the only clean solution.
Key Point: Subqueries unlock complex logic. Scalar for single values, IN for lists, EXISTS for existence checks, and correlated when the inner query depends on the outer.