A subquery is a SELECT inside another SELECT. Think of it like asking a question to answer a bigger question. "Which customers spent more than average?" First you need to know the average (inner query). Then you filter customers above that average (outer query). The inner query runs first. Its result feeds into the outer query.
-- Find orders above the average order amount
SELECT order_id, customer_id, total_amount
FROM orders
WHERE total_amount > (
SELECT AVG(total_amount) FROM orders
);
-- The inner query returns one number: the average
-- The outer query uses that number to filter
-- Find the customer who placed the most recent order
SELECT customer_id, first_name, last_name
FROM customers
WHERE customer_id = (
SELECT customer_id
FROM orders
ORDER BY order_date DESC
LIMIT 1
);-- Find customers who have placed at least one order over 500
SELECT customer_id, first_name, last_name, email
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE total_amount > 500
);
-- Find products that have NEVER been ordered
SELECT product_id, product_name, price
FROM products
WHERE product_id NOT IN (
SELECT DISTINCT product_id
FROM order_items
);NOT IN with NULLs is a trap. If the subquery returns even one NULL value, NOT IN returns zero rows. Always add WHERE column IS NOT NULL inside the subquery, or use NOT EXISTS instead. This is a real bug that bites teams in production.
EXISTS checks if the subquery returns any rows at all. It does not care about the actual values — just whether rows exist. It is faster than IN for large datasets because it stops as soon as it finds one match.
-- Find customers who have placed at least one order (EXISTS)
SELECT c.customer_id, c.first_name, c.email
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- Find customers who have NEVER ordered (NOT EXISTS)
SELECT c.customer_id, c.first_name, c.email
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);A correlated subquery is different. It references a column from the outer query. This means it runs once for every row in the outer query. Powerful, but can be slow on large tables.
-- Find customers whose total spending exceeds 1000
SELECT c.customer_id, c.first_name, c.last_name
FROM customers c
WHERE (
SELECT SUM(o.total_amount)
FROM orders o
WHERE o.customer_id = c.customer_id
) > 1000;
-- Find orders where the amount is above that customer's average
SELECT o.order_id, o.customer_id, o.total_amount
FROM orders o
WHERE o.total_amount > (
SELECT AVG(o2.total_amount)
FROM orders o2
WHERE o2.customer_id = o.customer_id
);EXISTS is almost always faster than IN for large datasets. EXISTS stops at the first match. IN computes the entire list first. When in doubt, use EXISTS for better performance.
Q: What is the difference between a correlated and a non-correlated subquery?
A: A non-correlated subquery is independent — it runs once, produces a result, and the outer query uses that result. Example: WHERE amount > (SELECT AVG(amount) FROM orders). A correlated subquery references the outer query's columns — it runs once per row of the outer query. Example: WHERE amount > (SELECT AVG(amount) FROM orders WHERE customer_id = c.customer_id). Correlated subqueries are more flexible but slower because they execute repeatedly.
Key Point: Subqueries in WHERE filter data using results of another query. Use scalar subqueries for single values, IN for lists, EXISTS for existence checks, and correlated subqueries when the inner query depends on the outer.