JOIN SQL cheat sheet

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why JOINs decide your SQL interview

JOIN is the backbone of SQL. Real data lives in multiple tables: orders separate from customers, products from inventory, events from users. The JOIN clause stitches them on a shared key. On a data analyst loop at Stripe, Airbnb, or DoorDash, JOIN appears in every other question, and knowing the words "LEFT JOIN" is not enough — the interviewer wants you to predict which rows survive and which get silently dropped.

The cost of being sloppy is real. A LEFT JOIN that inflates row counts can turn a $2M revenue number into $6M on a dashboard. An INNER JOIN that drops users without a profile can hide a churn cohort from the executive review. Hiring panels design problems where the wrong JOIN produces a wrong-but-plausible answer.

All examples below use the same two tiny tables. The employees table:

id name department_id
1 Anna 10
2 Boris 20
3 Vera 30
4 Gleb NULL

The departments table:

id dept_name
10 Analytics
20 Marketing
40 Finance

Notice the friction: Vera references department 30, which is missing in departments. Gleb has no department. Finance has no employees. Those three mismatches separate one JOIN type from another.

INNER JOIN

INNER JOIN returns only rows where the key matches on both sides. Anything unmatched disappears.

SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;

Result: Anna with Analytics, Boris with Marketing. Vera, Gleb, and Finance are all gone.

INNER JOIN is correct when you want only fully matched rows: "show me all orders alongside their product names." If a product was deleted, dropping that order is fine. The trap is that INNER JOIN silently shrinks the result. Always compare row counts before and after when validating a new query.

LEFT JOIN

LEFT JOIN returns every row from the left table. If the right side has no match, right-side columns come back NULL.

SELECT e.name, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;

All four employees survive. Vera and Gleb get NULL in dept_name.

LEFT JOIN is the workhorse of analytics: "give me every user and whatever orders they have, including users who never bought anything." The pitfall: if the right table has duplicate keys, LEFT JOIN multiplies your left rows. Run a quick GROUP BY key HAVING COUNT(*) > 1 on the right side before joining anything you will report.

RIGHT JOIN

RIGHT JOIN is the mirror of LEFT JOIN.

SELECT e.name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.id;

Finance shows up with NULL employee name; Vera and Gleb are dropped. In practice, almost no one writes RIGHT JOIN in production — it reads more clearly when rewritten as LEFT JOIN with the table order swapped. Know what it does, then write LEFT at work.

FULL OUTER JOIN

FULL OUTER JOIN keeps every row from both sides, padding unmatched columns with NULL.

SELECT e.name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.id;

No row is lost. This is the JOIN for data audits. The portability trap: MySQL does not support FULL OUTER JOIN natively — emulate with LEFT JOIN UNION RIGHT JOIN. PostgreSQL, Snowflake, BigQuery, and Databricks support it directly.

CROSS JOIN

CROSS JOIN produces a Cartesian product — every row from the left paired with every row from the right. No ON condition.

SELECT e.name, d.dept_name
FROM employees e
CROSS JOIN departments d;

The result is 4 x 3 = 12 rows. CROSS JOIN earns its keep when you need a complete grid: cross every user with every date, then LEFT JOIN events to find days with zero activity.

SELECT dates.dt, users.user_id, a.event_count
FROM dates
CROSS JOIN users
LEFT JOIN activity a
  ON a.user_id = users.user_id AND a.event_date = dates.dt;

The danger is row explosion: 100,000 users x 365 days is 36.5 million rows before filtering. Estimate cardinality before running CROSS JOIN against production tables.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

SELF JOIN

A SELF JOIN is just a JOIN where both sides refer to the same table under different aliases.

SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

SELF JOINs traverse hierarchies, compare neighboring events, or detect duplicates. Aliases are required — without them, the parser cannot tell which instance of employees each column refers to.

WHERE vs ON: where to put the condition

This is the single nastiest interview gotcha in the JOIN family, and it only manifests on LEFT, RIGHT, or FULL JOIN.

Condition in ON filters before the join:

SELECT e.name, d.dept_name
FROM employees e
LEFT JOIN departments d
  ON e.department_id = d.id
  AND d.dept_name = 'Analytics';

Result: all four employees. Only Anna has dept_name populated.

Condition in WHERE filters after:

SELECT e.name, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.dept_name = 'Analytics';

Result: only Anna. WHERE removes every row where dept_name is not 'Analytics', including NULLs. The LEFT JOIN effectively became an INNER JOIN. Rule of thumb: right-side filters go in ON if you want to preserve all left rows; in WHERE for a hard filter.

10 JOIN interview problems

Problem 1. Users without orders (anti-join)

SELECT u.user_id, u.name
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.order_id IS NULL;

LEFT JOIN + WHERE IS NULL is the canonical anti-join pattern. NOT EXISTS is sometimes faster on indexed columns.

Problem 2. Users who bought in both category A and category B

SELECT DISTINCT o1.user_id
FROM orders o1
JOIN order_items oi1 ON o1.order_id = oi1.order_id
JOIN products p1 ON oi1.product_id = p1.product_id AND p1.category = 'A'
JOIN orders o2 ON o1.user_id = o2.user_id
JOIN order_items oi2 ON o2.order_id = oi2.order_id
JOIN products p2 ON oi2.product_id = p2.product_id AND p2.category = 'B';

Multiple JOINs plus a SELF JOIN on user_id. Cleaner alternative: two subqueries with INTERSECT.

Problem 3. The most recent order per user

SELECT o.*
FROM orders o
JOIN (
  SELECT user_id, MAX(created_at) AS max_date
  FROM orders GROUP BY user_id
) latest
  ON o.user_id = latest.user_id
  AND o.created_at = latest.max_date;

Window function alternative: ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) filtered to rank 1.

Problem 4. Products priced above the category average

SELECT p.product_id, p.name, p.price, avg_prices.avg_price
FROM products p
JOIN (
  SELECT category, AVG(price) AS avg_price
  FROM products GROUP BY category
) avg_prices ON p.category = avg_prices.category
WHERE p.price > avg_prices.avg_price;

Problem 5. Pairs of users from the same city (SELF JOIN)

SELECT u1.name AS user1, u2.name AS user2, u1.city
FROM users u1
JOIN users u2
  ON u1.city = u2.city
  AND u1.user_id < u2.user_id;

The strict inequality removes self-pairs and de-duplicates mirrored pairs.

Problem 6. All products and all order items (FULL JOIN)

SELECT
  COALESCE(p.product_id, oi.product_id) AS product_id,
  p.name,
  COUNT(oi.item_id) AS times_ordered
FROM products p
FULL JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY COALESCE(p.product_id, oi.product_id), p.name;

Problem 7. CROSS JOIN: every product x region combination

SELECT p.name, r.region, COALESCE(s.total_sales, 0) AS sales
FROM products p
CROSS JOIN (SELECT DISTINCT region FROM stores) r
LEFT JOIN (
  SELECT product_id, region, SUM(amount) AS total_sales
  FROM sales JOIN stores USING (store_id)
  GROUP BY product_id, region
) s ON p.product_id = s.product_id AND r.region = s.region;

CROSS JOIN builds the skeleton; LEFT JOIN backfills sales; COALESCE zeroes empty cells.

Problem 8. The ON-vs-WHERE trap with LEFT JOIN

SELECT u.user_id, COUNT(o.order_id) AS recent_orders
FROM users u
LEFT JOIN orders o
  ON u.user_id = o.user_id
  AND o.created_at >= '2026-01-01'
GROUP BY u.user_id;

Putting the date filter inside ON keeps users with zero recent orders. Moving it to WHERE silently turns the query into an INNER JOIN.

Problem 9. Revenue by category and month (three-table JOIN)

SELECT
  p.category,
  DATE_TRUNC('month', o.created_at) AS month,
  SUM(oi.quantity * oi.price) AS revenue
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY p.category, DATE_TRUNC('month', o.created_at)
ORDER BY month, revenue DESC;

Three tables is the minimum for a realistic analytics query. JOIN order does not change the result on Snowflake or BigQuery but shapes readability.

Problem 10. LATERAL JOIN: top 3 orders per user

SELECT u.user_id, u.name, t.order_id, t.amount
FROM users u,
LATERAL (
  SELECT order_id, amount
  FROM orders o
  WHERE o.user_id = u.user_id
  ORDER BY amount DESC
  LIMIT 3
) t;

LATERAL lets the inner query reference columns from the outer table. ROW_NUMBER is portable; LATERAL can win on huge tables with a good index.

Common pitfalls

The most common JOIN failure mode is silent row inflation from non-unique join keys. When the right-side table has duplicates, every matching left row is multiplied, and an order count of 10,000 can balloon to 27,000. SUM and AVG then return inflated numbers that look reasonable but are wrong. Always run a duplicate check on the join key before any query that feeds a dashboard.

The second pitfall is mixing up WHERE and ON with outer joins. A LEFT JOIN with a right-table filter in WHERE silently becomes an INNER JOIN. New analysts often write the WHERE version because it parses more naturally, then submit a churn report that excludes the very users it was supposed to measure.

The third trap is unmatched data types in join keys. If users.user_id is integer and events.user_id arrives as a string, some warehouses silently cast while others throw an error or match only some rows because of leading zeros or whitespace. Always confirm data types when one side comes from a JSON column or CSV import.

The fourth pitfall is forgetting that NULL never equals NULL in a JOIN condition. If both sides have NULL in the join key, those rows will not match. When you actually want NULL-to-NULL matching, spell it out: ON a.key = b.key OR (a.key IS NULL AND b.key IS NULL).

The fifth pitfall is letting the optimizer's chosen JOIN order surprise you on large tables. Modern engines usually pick a sensible plan, but stale statistics or unexpected skew can push them into nested loops. Run EXPLAIN on any JOIN that touches more than a few million rows.

If you want to drill JOIN problems like these every day until they stop feeling tricky, NAILDD is launching with 500+ SQL problems organized around exactly this pattern.

FAQ

Which JOIN is most common in an analyst's day-to-day work?

LEFT JOIN, by a wide margin. The default question — "give me every user and whatever orders they have" — is a LEFT JOIN with users on the left. INNER JOIN comes second for cases where data is guaranteed consistent and dropping unmatched rows is acceptable. CROSS JOIN and FULL JOIN are rare but matter for grids and audits.

Does the order of tables in a JOIN affect performance?

On modern warehouses like PostgreSQL, Snowflake, BigQuery, and Databricks the planner reorders joins based on statistics, so textual order rarely changes runtime. Readability is a different story — start with the fact table and chain dimensions off it. On older MySQL and some ClickHouse configurations, literal order can leak into the chosen plan, so check EXPLAIN if you suspect a regression.

What is the difference between JOIN ON and JOIN USING?

JOIN ... USING(column) is shorthand for when the join column has the same name in both tables. JOIN departments USING(department_id) is equivalent to JOIN departments ON employees.department_id = departments.department_id. The visible difference: with USING the column appears only once in the SELECT output, while with ON it shows up twice. A small readability win.

When should I prefer NOT EXISTS over a LEFT JOIN with IS NULL?

Both produce the same result, but NOT EXISTS lets the optimizer stop as soon as it finds a matching row, while LEFT JOIN materializes matches first and then filters NULLs. On large indexed tables NOT EXISTS is often faster. On smaller tables or with unindexed join columns, LEFT JOIN + IS NULL runs in similar time. Benchmark both with EXPLAIN ANALYZE if performance is critical.

How do I avoid duplicate rows after a JOIN?

Confirm the join key is unique on at least one side before joining. A quick SELECT key, COUNT(*) FROM right_table GROUP BY key HAVING COUNT(*) > 1 surfaces duplicates immediately. If duplicates are legitimate (one order to many line items), aggregate the right side in a subquery before the join. SELECT DISTINCT at the end hides the problem rather than fixing it.

Can I JOIN on more than one column?

Yes, whenever the relationship is a composite key. The ON clause accepts AND-combined conditions: ON s.region = t.region AND s.month = t.month. The join only matches when every condition is true. Composite keys are common in time-series analytics, like joining sales targets to actuals by region and month.