May 18, 2026·11 min read

How to calculate IQR outliers in SQL

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Contents:

Why IQR for outlier detection
The 1.5xIQR rule
The SQL formula
Outliers by segment
IQR vs z-score
Common pitfalls
Optimization tips
Related reading
FAQ

Why IQR for outlier detection

Picture a Monday morning at Stripe: the payments PM pings you because the weekend basket-size dashboard looks suspicious. Two orders north of forty thousand dollars dragged the average up, finance is already drafting a memo about an unusual revenue spike, and you have ninety minutes to confirm whether those rows are real. The mean is useless because the very rows under suspicion are pulling it. Standard deviation is contaminated by the same rows. You need a detector that does not care that the tail is long, and that is what the interquartile range delivers.

IQR is the gap between the 25th percentile (Q1) and the 75th percentile (Q3). It is a robust measure of spread: by construction, the top and bottom 25 percent of the data are thrown away before spread is measured, so a single screaming outlier cannot shift the result. That property is why every boxplot uses IQR rather than mean and standard deviation, and it is what makes the 1.5xIQR rule reliable on the skewed, right-tailed distributions you see in production — revenue, response time, basket size, session length.

The contrast with z-score is worth holding in your head before you start writing SQL. A z-score detector estimates mean and standard deviation, then flags anything more than three standard deviations away. On a symmetric distribution like adult height it works great. On a right-skewed distribution like order value, both estimators are biased by the tail you are trying to detect. IQR sidesteps that bias, which is why interviewers at Snowflake, Databricks, and Airbnb keep asking candidates to implement it in SQL.

The 1.5xIQR rule

Compute Q1 and Q3, take their difference as IQR, then flag any value below Q1 minus 1.5 times IQR or above Q3 plus 1.5 times IQR.

IQR         = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outlier     = value < lower_bound OR value > upper_bound

The multiplier 1.5 is the Tukey default and corresponds to the whiskers on a standard boxplot. To catch only extreme cases, swap it for 3.0 — Tukey himself called those "far outliers". In a well-behaved Gaussian the 1.5 rule catches around 0.7 percent of values; on heavy-tailed business metrics it will catch noticeably more, which is the intended behaviour.

The SQL formula

Below is the full recipe in PostgreSQL-flavoured SQL. The same pattern works in Snowflake, BigQuery (use APPROX_QUANTILES), and Databricks (use PERCENTILE_CONT or the approximate variant). The two-CTE structure makes the boundaries visible if your PM asks what threshold you used.

WITH quartiles AS (
    SELECT
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS q1,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) AS q3
    FROM orders
    WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
),
bounds AS (
    SELECT
        q1,
        q3,
        q3 - q1                       AS iqr,
        q1 - 1.5 * (q3 - q1)          AS lower_bound,
        q3 + 1.5 * (q3 - q1)          AS upper_bound
    FROM quartiles
)
SELECT
    o.order_id,
    o.user_id,
    o.amount,
    CASE
        WHEN o.amount < b.lower_bound THEN 'low_outlier'
        WHEN o.amount > b.upper_bound THEN 'high_outlier'
        ELSE 'normal'
    END AS verdict
FROM orders o
CROSS JOIN bounds b
WHERE o.created_at >= CURRENT_DATE - INTERVAL '30 days'
  AND (o.amount < b.lower_bound OR o.amount > b.upper_bound)
ORDER BY o.amount DESC
LIMIT 100;

Three details are worth narrating. First, PERCENTILE_CONT interpolates linearly between adjacent sorted values, which is what every textbook IQR definition assumes. PERCENTILE_DISC picks an actual sample value and gives slightly different numbers; for continuous metrics, stick with CONT. Second, the CROSS JOIN against a one-row bounds CTE broadcasts thresholds across every order without window functions. Third, the same WHERE filter appears in both CTEs so bounds and candidates come from the same window — drift between those windows is the most common bug in this pattern.

If your warehouse charges per scan, collapse the recipe into a single query with window functions:

SELECT
    order_id,
    user_id,
    amount,
    CASE
        WHEN amount < q1 - 1.5 * (q3 - q1) THEN 'low_outlier'
        WHEN amount > q3 + 1.5 * (q3 - q1) THEN 'high_outlier'
        ELSE 'normal'
    END AS verdict
FROM (
    SELECT
        o.*,
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) OVER () AS q1,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) OVER () AS q3
    FROM orders o
    WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
) t
ORDER BY amount DESC
LIMIT 100;

Window-function percentiles are not supported on every engine — BigQuery requires APPROX_QUANTILES instead — so test before committing to this shape.

Outliers by segment

Global bounds rarely survive contact with reality. A DoorDash basket-size detector with one threshold across all restaurants flags every fine-dining order as suspicious and misses fraud in the dollar-store category. Compute bounds per segment, then join each row back to the bounds for its segment.

WITH segment_quartiles AS (
    SELECT
        u.plan,
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY o.amount) AS q1,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY o.amount) AS q3
    FROM orders o
    JOIN users u USING (user_id)
    WHERE o.created_at >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY u.plan
),
bounds AS (
    SELECT
        plan,
        q1 - 1.5 * (q3 - q1) AS lower_bound,
        q3 + 1.5 * (q3 - q1) AS upper_bound
    FROM segment_quartiles
)
SELECT
    b.plan,
    COUNT(*) FILTER (
        WHERE o.amount < b.lower_bound OR o.amount > b.upper_bound
    ) AS outliers,
    COUNT(*) AS total_orders,
    COUNT(*) FILTER (
        WHERE o.amount < b.lower_bound OR o.amount > b.upper_bound
    ) * 100.0 / NULLIF(COUNT(*), 0) AS outlier_pct
FROM orders o
JOIN users u USING (user_id)
JOIN bounds b USING (plan)
WHERE o.created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY b.plan
ORDER BY outlier_pct DESC;

Read that result carefully. If one segment shows an outlier rate dramatically above the others — say, ten percent on Premium versus one percent everywhere else — the right next step is rarely "clean the data". It is "open a ticket". Either pricing changed, a refund batch was reprocessed, or a fraud ring is testing limits. The detector did its job; the investigation begins now.

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Join the waitlist

IQR vs z-score

A useful interview answer is to compute both side by side.

WITH stats AS (
    SELECT
        AVG(amount)                                                  AS mu,
        STDDEV_SAMP(amount)                                          AS sigma,
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount)         AS q1,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount)         AS q3
    FROM orders
    WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
    o.order_id,
    o.amount,
    ABS((o.amount - s.mu) / NULLIF(s.sigma, 0)) > 3                         AS z_outlier,
    (o.amount < s.q1 - 1.5 * (s.q3 - s.q1)
        OR o.amount > s.q3 + 1.5 * (s.q3 - s.q1))                           AS iqr_outlier
FROM orders o
CROSS JOIN stats s
ORDER BY o.amount DESC
LIMIT 50;

On right-skewed data, IQR will typically flag more rows than z-score, because the standard deviation is inflated by the same tail z-score is supposed to detect. On near-symmetric data the two detectors will agree on most rows. When they disagree, IQR is the one to trust on raw business metrics; z-score is the one to trust on signals that have been log-transformed first.

Common pitfalls

The first trap is running IQR on a sample with too few rows. With fewer than fifty observations, Q1 and Q3 are themselves noisy estimates, and the 1.5xIQR bound jitters from refresh to refresh. The fix is either to widen the window until you have enough data, or to switch to a more robust estimator like the median absolute deviation (MAD). If you must use IQR on small samples — for example for a per-merchant detector with only a handful of orders — pair the threshold with a minimum-rows gate so you only emit verdicts when n is large enough.

The second trap is applying IQR to categorical or boolean fields. PERCENTILE_CONT will happily run on an enum encoded as integers and return a number, but the number is meaningless because there is no notion of "1.5 times the gap between low-mid and high-mid" on a nominal scale. Restrict IQR to ordinal and continuous variables: amount, duration, count, score. For categorical fields, use frequency-based detectors instead.

The third trap is treating every flagged low value as a typo and deleting it. On an orders table, low outliers are usually refunds, test transactions, or partial payments — exactly the rows your investigation needs. Silent deletion is the easiest way to ship a dashboard that lies. Quarantine first, investigate next, delete only after the source has been confirmed.

The fourth trap is reusing the 1.5 multiplier across every domain. Financial transactions, especially anything routed through a payments provider like Stripe, have heavy fat tails by design — wires, B2B invoices, treasury movements all sit far above the consumer median. A 1.5xIQR threshold will flag them as outliers even though they are entirely legitimate. Use 3xIQR for finance, fraud investigations, or any domain where the legitimate maximum is genuinely much larger than the typical value.

The fifth trap is running IQR on a multi-modal distribution. If your orders table mixes free-trial conversions sitting at one dollar with enterprise contracts sitting at fifty thousand, the IQR will straddle the gap between modes and the bounds will land on nothing meaningful. Segment first — by plan, by channel, by country — and only then apply 1.5xIQR within each segment.

Optimization tips

PERCENTILE_CONT is a sort-and-interpolate operation, so on a billion-row table it can dominate the query cost. Three optimizations help. First, push the time-window filter into the CTE that computes the quartiles, not into the outer SELECT — most engines will not reorder the predicate across an aggregate by default. Second, on Snowflake and Databricks, prefer APPROX_PERCENTILE for exploratory work and reserve PERCENTILE_CONT for production cuts where the small extra accuracy matters. Third, materialize daily quartiles into a small iqr_thresholds table partitioned by date and segment — recomputing them once per day is cheap, and downstream detectors just look up the bounds.

Indexing helps less than people expect — percentile aggregations need the full sorted list anyway. The bigger win is partitioning the source table by created_at so the WHERE filter can prune partitions. On BigQuery, add a clustering key on the segment column; APPROX_QUANTILES respects clustering and scans dramatically less.

If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.

FAQ

When should I prefer IQR over z-score?

Prefer IQR whenever the distribution is skewed or heavy-tailed, which covers nearly every business metric you will encounter — revenue, basket size, response time, session length. Z-score is built on the mean and standard deviation, both of which are pulled by the same extreme values the detector is supposed to catch, so it ends up half-blind on the data that needs it most. Z-score is fine on near-symmetric, lightly-tailed distributions or on signals that have been log-transformed first, but IQR is the safer default on raw business data.

Should I use 1.5 or 3.0 as the multiplier?

Use 1.5 for standard outlier detection on operational metrics. Switch to 3.0 when you specifically want only the extreme tail — fraud investigations, finance reconciliations, or any case where the legitimate maximum is known to be far above the typical value. A useful production pattern is to emit two flags: one at 1.5 ("watchlist") and one at 3.0 ("escalate"). That gives the on-call team a triage signal without forcing a single threshold to do two jobs.

Can I apply IQR to a time series?

You can, but only on a rolling window — a global IQR over a long history will get distorted by trend and seasonality. The typical pattern is a 28-day rolling Q1 and Q3 computed with window functions, with the bounds applied to the next day's points. For most time series, though, a rolling MAD or a seasonal naive forecast residual will outperform a rolling IQR, because they handle weekly seasonality explicitly. IQR on a rolling window is a good baseline; it is rarely the final detector.

What outlier rate should I expect?

On a clean normal distribution, the 1.5xIQR rule flags about 0.7 percent of rows. On real business data — right-skewed revenue, log-normal latency, mixture distributions — expect anywhere from 2 to 8 percent. If your rate is materially above that, the data is either truly heavy-tailed (use 3.0 or segment first) or contaminated by a process bug (investigate the source). If the rate is zero, the detector is probably broken — almost no real dataset has zero outliers.

How does IQR handle NULL values?

PERCENTILE_CONT ignores NULLs natively in every major SQL engine — Postgres, Snowflake, BigQuery, Databricks, Redshift. You do not need a separate WHERE filter to exclude them from the quartile calculation. Do still check for unexpected NULL rates upstream, because a sudden jump in NULL share is itself an anomaly signal worth catching with a separate data-quality check, separate from outlier detection on the values themselves.

Does IQR work on integer counts?

Yes, but be careful when the variable has a small range — counts of one to five, for instance. PERCENTILE_CONT will interpolate to non-integer percentiles, which is mathematically fine but produces awkward bounds like 0.5 and 5.5. For low-cardinality integers, either round the bounds or switch to PERCENTILE_DISC so thresholds land on actual observed values. Either is acceptable; just be explicit about which convention you used when the data scientist downstream asks.