This is one of the most common interview questions and one of the most practical decisions you will make daily. Sometimes a JOIN and a subquery produce the exact same result. So how do you choose? Short answer: use JOINs when you need columns from both tables. Use subqueries when you just need to filter or calculate.
-- Task: Find customers who have placed an order
-- Approach 1: Using JOIN
SELECT DISTINCT c.customer_id, c.first_name, c.email
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
-- Approach 2: Using IN subquery
SELECT customer_id, first_name, email
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id FROM orders
);
-- Approach 3: Using EXISTS subquery
SELECT customer_id, first_name, email
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
-- All three return the same result!| Factor | JOIN | Subquery |
|---|---|---|
| Need columns from both tables? | Yes — use JOIN | Not needed — subquery works |
| Performance on large data | Usually faster | Can be slower (especially correlated) |
| Readability | Clear for simple relationships | Better for step-by-step logic |
| Duplicate handling | May need DISTINCT | IN/EXISTS handle naturally |
| NULL safety | No issues | NOT IN fails with NULLs — use NOT EXISTS |
| Aggregate comparison | Needs derived table | Natural fit in WHERE |
Here is a rule of thumb from real projects: Start with a JOIN. If the query gets messy because of DISTINCT or duplicate rows, switch to EXISTS. If you need to compare against an aggregate value, use a subquery. There is no single "right" answer — both tools exist for a reason.
Key Point: JOINs are for combining data from multiple tables. Subqueries are for filtering based on data in another table. Most queries can be written either way — choose based on readability and whether you need columns from both tables.
Q: Which is faster — JOIN or subquery?
A: It depends on the database engine, table sizes, indexes, and the specific query. In most modern databases, the optimizer rewrites simple subqueries as JOINs internally. However, correlated subqueries can be slow because they run once per row. Non-correlated subqueries with IN are usually fine. JOINs tend to be faster for large datasets because the optimizer has more room to choose an efficient execution plan. The real answer is: check the execution plan (EXPLAIN) and compare. Do not guess. Measure.
Key Point: Use JOINs when you need columns from both tables. Use subqueries for filtering or comparing aggregates. When in doubt, check the execution plan.