NULL in SQL — what it is and how to handle it

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why NULL keeps biting analysts

You ship a dashboard at 5pm. By 9am the PM is asking why the "active users" tile dropped 40% overnight. The culprit is a join on customer_id where 8% of new signups have NULL because the partner API batches that write. Your INNER JOIN silently dropped them. The query parsed fine, the numbers looked plausible, and a meeting got scheduled before anyone noticed.

This is how NULL hurts — not as a syntax error, but as a quiet correctness bug that survives code review and surfaces a week later. Data interview loops at Stripe, Airbnb, Snowflake, and Databricks include at least one NULL question for exactly this reason.

This guide covers what NULL represents, how it behaves under each operator, where it silently changes results, and the habits that prevent the bug above.

What NULL actually is

NULL is a marker meaning "this value is absent or unknown". It is not zero, not an empty string, and not false — each of those has well-defined arithmetic, NULL has none.

NULL is not 0. Zero is a number you can add to other numbers; NULL is the absence of a number. 5 + NULL returns NULL, not 5.

NULL is not ''. An empty string has length zero; NULL is the absence of any string. LENGTH('') returns 0, LENGTH(NULL) returns NULL.

NULL is not FALSE. False is a boolean value; NULL in boolean context means "we do not know whether this is true or false". This produces three-valued logic.

-- All three checks return "not equal"
SELECT
    CASE WHEN NULL = 0    THEN 'equal' ELSE 'NOT equal' END,  -- NOT equal
    CASE WHEN NULL = ''   THEN 'equal' ELSE 'NOT equal' END,  -- NOT equal
    CASE WHEN NULL = NULL THEN 'equal' ELSE 'NOT equal' END;  -- NOT equal!

The last line catches everyone. NULL = NULL does not return TRUE, because two unknown values are not guaranteed to be the same unknown.

Three-valued logic

Ordinary boolean logic has TRUE and FALSE. SQL adds a third value: UNKNOWN. Any comparison with NULL returns UNKNOWN, and it propagates through AND, OR, and NOT.

Expression Result
NULL = NULL UNKNOWN
NULL <> NULL UNKNOWN
NULL > 5 UNKNOWN
NULL AND TRUE UNKNOWN
NULL AND FALSE FALSE
NULL OR TRUE TRUE
NULL OR FALSE UNKNOWN
NOT NULL UNKNOWN

WHERE only keeps rows where the condition is TRUE. UNKNOWN rows are dropped exactly like FALSE — they do not raise an error, they just do not show up. That is the source of most "missing data" surprises.

Useful AND rule: FALSE AND NULL is FALSE because one false leg sinks the expression, but TRUE AND NULL is UNKNOWN. Symmetric logic applies to OR.

IS NULL and IS NOT NULL

To test whether a value is NULL, use IS NULL or IS NOT NULL. The = operator does not work because column = NULL always evaluates to UNKNOWN.

-- Correct: matches rows where email is missing
SELECT * FROM users WHERE email IS NULL;

-- Wrong: returns zero rows, every time
SELECT * FROM users WHERE email = NULL;

The wrong version is one of the most common beginner SQL bugs. It parses, runs without error, and returns an empty set — but the empty set is wrong, because email = NULL is UNKNOWN for every row.

PostgreSQL and most modern engines also support IS DISTINCT FROM and IS NOT DISTINCT FROM, which treat NULL as equal to NULL. Useful when comparing two nullable columns and you want NULL = NULL to behave as TRUE.

NULL in aggregate functions

Aggregates handle NULL differently depending on which function you call. The two patterns are "count all rows" versus "ignore NULLs entirely".

Function Behavior with NULL
COUNT(*) Counts every row, including rows with NULLs
COUNT(column) Skips rows where column is NULL
SUM(column) Ignores NULL
AVG(column) Ignores NULL (NULLs are not in the denominator)
MIN, MAX Ignore NULL
-- Suppose bonuses.amount holds: {1000, NULL, 2000, NULL, 3000}
SELECT
    COUNT(*)        AS total_rows,      -- 5
    COUNT(amount)   AS non_null_count,  -- 3
    SUM(amount)     AS total,           -- 6000
    AVG(amount)     AS average          -- 2000 (6000 / 3, NOT 6000 / 5)
FROM bonuses;

AVG is the case that produces wrong dashboards. If you want NULLs treated as zero so they pull the average down, wrap: AVG(COALESCE(amount, 0)). Average bonus among employees who got one is different from average bonus across all employees — decide which one your metric needs.

Another quirk: SUM over an empty set returns NULL, not 0. If a metric tile expects 0 for "no data yet", wrap with COALESCE before serving.

COALESCE and NULLIF

COALESCE(a, b, c, ...) returns the first non-NULL argument from left to right. It is the standard way to provide a default value.

SELECT
    user_id,
    COALESCE(nickname, first_name, 'Anonymous') AS display_name
FROM users;

NULLIF(a, b) is the inverse: it returns NULL when a = b, otherwise it returns a. The textbook use case is protecting division from zero denominators.

-- Without NULLIF: divide-by-zero error when sessions is 0
SELECT revenue / sessions AS rps FROM metrics;

-- With NULLIF: returns NULL instead of erroring out
SELECT revenue / NULLIF(sessions, 0) AS rps FROM metrics;

NULLIF also normalizes data on the way in: NULLIF(TRIM(column), '') turns whitespace-only strings into NULL, which is often what you want before counting non-empty values.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

NULL in JOIN

Joins use equality, and equality with NULL is UNKNOWN, so rows with NULL in the join key never match — not even to NULLs on the other side. INNER JOIN silently drops them.

-- Employees with department_id = NULL will not appear in the result
SELECT *
FROM employees e
JOIN departments d ON e.department_id = d.id;

To keep those employees, use LEFT JOIN — unmatched rows come through with NULLs on the right-hand columns. If you deliberately want NULL to match NULL (rare, usually a data model smell), use IS NOT DISTINCT FROM in the join condition on engines that support it.

The interview question: "you join orders to customers on customer_id and the counts do not match — what do you check?" Correct answers: rows with NULL on either side, and duplicate keys in the right table. Mention both.

The NOT IN trap

The most dangerous NULL bug, and a fixture in interview loops at Meta, Google, and Notion because it catches experienced analysts. If the subquery inside NOT IN returns even one NULL, the entire result is empty.

-- Suppose the subquery returns: {1, 2, NULL}
SELECT * FROM orders
WHERE customer_id NOT IN (SELECT id FROM blacklist);
-- Result: 0 rows

Why? customer_id NOT IN (1, 2, NULL) expands to customer_id <> 1 AND customer_id <> 2 AND customer_id <> NULL. The last term is UNKNOWN for every row, and TRUE AND UNKNOWN is UNKNOWN — no row passes WHERE.

The fix: use NOT EXISTS with a correlated subquery.

SELECT * FROM orders o
WHERE NOT EXISTS (
    SELECT 1 FROM blacklist b WHERE b.id = o.customer_id
);

The alternative is filtering NULLs from the subquery (WHERE id IS NOT NULL). NOT EXISTS is clearer because "no matching row exists" matches the operator name. Pick one pattern, document it, and stop using NOT IN against nullable subqueries.

NULL in ORDER BY

Default ordering of NULLs varies by engine. PostgreSQL puts NULL last with ASC and first with DESC. Snowflake puts NULL last by default. BigQuery puts NULL first for ASC. Do not rely on the default.

SELECT * FROM users
ORDER BY last_login NULLS LAST;

SELECT * FROM users
ORDER BY last_login DESC NULLS FIRST;

Always specify NULLS FIRST or NULLS LAST on nullable columns. This keeps queries portable across engines and removes the "it worked on my laptop" failure mode when you switch warehouses.

Common pitfalls

The most common pitfall is = NULL instead of IS NULL. The query parses, runs, returns zero rows, and nobody is alerted. The mechanical fix is to always use IS NULL, but the deeper habit is to scan every WHERE clause filtering on a nullable column and verify the operator. Linters in dbt and SQLFluff can flag this on every PR.

The NOT IN trap is a close second. Analysts inherit queries written when the subquery was guaranteed non-NULL, then a schema change introduces nullable rows later and the result silently empties. Standardize on NOT EXISTS as the team default and the bug becomes impossible to reintroduce.

Expecting zero instead of NULL is another silent killer. SUM over an empty set returns NULL, not 0. A "revenue today" tile that has no rows yet will render nothing instead of $0. Wrap with COALESCE(SUM(revenue), 0) whenever the consumer expects a zero on the empty case.

AVG with NULL inputs produces wrong-but-plausible numbers, the worst kind. If 30% of rows have NULL because the metric was not collected that period, AVG quietly averages only the 70% that do. Decide explicitly: NULL counts as zero (use COALESCE(col, 0) inside AVG), NULL means missing (use AVG as-is and report N), or exclude the row entirely (WHERE col IS NOT NULL and document why).

Finally, wrapping an indexed column in COALESCE inside WHERE kills the index. WHERE COALESCE(col, 0) > 10 forces a sequential scan. Rewrite as WHERE col > 10 OR col IS NULL if both branches make sense, or just WHERE col > 10 if NULL rows should be excluded.

If you want to drill SQL questions that probe NULL behavior every day, NAILDD is launching with 500+ SQL problems pulled from real data analyst interview loops at Stripe, Airbnb, Snowflake, Databricks, and Notion.

FAQ

Is NULL the same as an empty string?

No. An empty string '' is a string value of length zero — you can compare it, concatenate with it, use it in functions. NULL is the absence of any value. '' IS NULL returns FALSE on every major engine. Oracle is the historical exception, treating empty strings as NULL for backward compatibility, which trips up queries moving between Oracle and PostgreSQL. Always test on your target engine.

How do I count NULL values in a column?

Use COUNT(*) - COUNT(column). COUNT(*) counts every row; COUNT(column) counts only non-NULL rows. The difference is the NULL count. You can also write SUM(CASE WHEN column IS NULL THEN 1 ELSE 0 END) — more explicit but slower on large tables because it allocates a CASE per row.

Does COALESCE hurt performance?

The function itself is cheap — it short-circuits at the first non-NULL argument. The performance hit comes from wrapping indexed columns in WHERE clauses, because the optimizer can no longer use the index. WHERE COALESCE(col, 0) > 10 forces a sequential scan; WHERE col > 10 OR col IS NULL keeps the index usable. In SELECT lists and aggregations, overhead is essentially zero.

Can a column have NULL as its default value?

Yes — that is the implicit default if you do not specify NOT NULL or a DEFAULT clause. Declare NOT NULL on columns where missing values would be a data quality bug (primary keys, required foreign keys, status fields). Use DEFAULT clauses where a sensible default exists, like created_at or is_active.

Why does NULL = NULL return UNKNOWN instead of TRUE?

Because NULL means "we do not know the value". Two unknowns are not guaranteed to be the same unknown — one might be Alice's missing birth year, the other Bob's, and we cannot say they are equal just because we are equally ignorant of both. The practical consequence is that you need IS NULL, IS NOT DISTINCT FROM, or COALESCE tricks whenever you want NULL-aware equality.

How does NULL behave in GROUP BY?

GROUP BY treats all NULLs in the grouping column as a single group — group by country and 50 rows with country IS NULL collapse into one bucket. This differs from join behavior, where NULL never matches NULL. The asymmetry exists because GROUP BY is set partitioning (every row belongs somewhere) while JOIN is matching (rows only join if they equal another).