May 18, 2026·12 min read

How to detrend a time series in SQL

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Contents:

What detrending is and why it matters
The SQL formula
Differencing vs linear detrend
Detrending and seasonality
Common pitfalls
Optimization tips
Related reading
FAQ

What detrending is and why it matters

Imagine your PM at Stripe pings you on Monday morning: "DAU is up 30 percent year over year, but I want to know what's happening this week after we strip out the growth trend." You stare at a chart that climbs steadily from January and realize that any short-term pattern is hidden under the slope. The DAU line is going up because the product is growing, not because Tuesday is special. Until you remove that long-run drift, every weekly comparison you make will be contaminated.

That drift is called a trend, and the operation of subtracting it from the raw series is detrending. A series with a strong trend behaves badly in almost every downstream analysis. Its autocorrelation function decays slowly because every point is correlated with the next through the shared slope. Regressions between two trending variables produce spurious results — two unrelated metrics that both grow over time will appear correlated. ARIMA models require the input to be stationary, which the raw series rarely is. Seasonality decomposition cannot separate weekly cycles from monthly growth unless the growth is removed first.

In an analytics interview at Airbnb, Uber, or DoorDash, you may be asked to "show weekly seasonality in bookings" or "detect anomalies in checkout conversion." Both questions assume a stationary input. If you skip the detrending step and run a z-score on the raw series, every point at the end of the series will look like a positive anomaly simply because the level is higher. Hiring managers watch for this. The candidate who explicitly removes the trend before running the test signals that they have done time-series work before.

The SQL formula

A linear detrend fits the model y_t = alpha + beta * t to the raw series, then subtracts the fitted line from each observation. The residual is the detrended series. Postgres, Snowflake, BigQuery, and Redshift all expose REGR_SLOPE and REGR_INTERCEPT, which compute the OLS slope and intercept in a single pass. You do not need to materialize a regression library — the database can do it.

The pattern is three CTEs: one to assign a sequential time index, one to compute the slope and intercept, and one to subtract the fitted line.

WITH base AS (
    SELECT
        event_date,
        value,
        ROW_NUMBER() OVER (ORDER BY event_date) AS t
    FROM daily_metrics
    WHERE metric_name = 'dau'
      AND event_date >= DATE '2025-01-01'
),
model AS (
    SELECT
        REGR_SLOPE(value, t)     AS beta,
        REGR_INTERCEPT(value, t) AS alpha
    FROM base
)
SELECT
    b.event_date,
    b.value,
    m.alpha + m.beta * b.t            AS trend,
    b.value - (m.alpha + m.beta * b.t) AS detrended
FROM base b
CROSS JOIN model m
ORDER BY b.event_date;

The detrended column is what you feed into the next stage of analysis — anomaly detection, autocorrelation, or seasonality decomposition. The trend column is the deterministic component you have just identified.

If your warehouse does not expose REGR_SLOPE directly, you can compute it from first principles using AVG, SUM, and COUNT. The OLS slope is Cov(y, t) / Var(t) and the intercept is mean(y) - beta * mean(t). Both quantities are one-pass aggregates, so this trick costs you a single scan.

WITH base AS (
    SELECT
        event_date,
        value,
        ROW_NUMBER() OVER (ORDER BY event_date) AS t
    FROM daily_metrics
    WHERE metric_name = 'dau'
),
stats AS (
    SELECT
        AVG(t)            AS t_mean,
        AVG(value)        AS y_mean,
        AVG(t * value)    AS ty_mean,
        AVG(t * t)        AS t2_mean
    FROM base
),
model AS (
    SELECT
        (ty_mean - t_mean * y_mean) / NULLIF(t2_mean - t_mean * t_mean, 0) AS beta,
        y_mean - ((ty_mean - t_mean * y_mean) / NULLIF(t2_mean - t_mean * t_mean, 0)) * t_mean AS alpha
    FROM stats
)
SELECT
    b.event_date,
    b.value,
    m.alpha + m.beta * b.t            AS trend,
    b.value - (m.alpha + m.beta * b.t) AS detrended
FROM base b
CROSS JOIN model m
ORDER BY b.event_date;

For a non-linear trend, swap the linear model for a polynomial. A quadratic fit uses t and t * t as features and requires solving a 2x2 normal equation. Most analysts who reach this point switch to LOESS in Python, but for an interview question a polynomial detrend in SQL is fair game and shows familiarity with the closed-form solution.

Differencing vs linear detrend

Detrending is not the same as differencing, and confusing the two is the fastest way to lose marks in a senior analytics interview. Differencing replaces each observation with the gap to the previous observation: y_t - y_{t-1}. It works when the trend is stochastic — when the level wanders without a fixed slope, as in a random walk. Linear detrend works when the trend is deterministic — when the underlying growth is a steady slope.

SELECT
    event_date,
    value,
    -- Differencing: handles stochastic trend
    value - LAG(value, 1) OVER (ORDER BY event_date) AS first_diff,
    -- Linear detrend: handles deterministic trend
    value - (alpha + beta * row_number) AS linear_detrended
FROM joined_with_model
ORDER BY event_date;

Approach	When to use
Differencing	random walk, stochastic trend
Linear detrend	deterministic linear trend
Polynomial	non-linear but smooth trend
LOESS	flexible local fit, expensive in SQL

In practice, for product metrics at companies like Notion or Figma you usually start with linear detrend. If the residual still shows curvature, escalate to a quadratic. If the slope itself changes at a known date — a product relaunch, a pricing change — fit two segments and detrend each separately.

Detrending and seasonality

A common follow-up question is how to handle a series that contains both a trend and a weekly cycle. The canonical answer is STL decomposition: split the series into trend, seasonal, and remainder components. In SQL you can approximate this in three steps. First, run the linear detrend above to remove the long-run slope. Second, group the detrended values by day of week and take the mean — that is your weekly seasonal component. Third, subtract the seasonal component from the detrended series to leave the noise.

WITH detrended AS (
    SELECT
        event_date,
        EXTRACT(DOW FROM event_date)::INT AS dow,
        detrended_value
    FROM step1_detrended
),
seasonal AS (
    SELECT
        dow,
        AVG(detrended_value) AS seasonal_component
    FROM detrended
    GROUP BY dow
)
SELECT
    d.event_date,
    d.detrended_value,
    s.seasonal_component,
    d.detrended_value - s.seasonal_component AS residual
FROM detrended d
JOIN seasonal s USING (dow)
ORDER BY d.event_date;

The residual column is what you actually run anomaly detection on. A z-score on the residual flags days that are unusual after both growth and weekday effects are accounted for. A z-score on the raw series flags Mondays.

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Join the waitlist

Common pitfalls

The most frequent mistake is fitting a linear detrend to an exponential growth curve. If revenue at OpenAI grows by 8 percent month over month, the underlying process is multiplicative, not additive. A linear fit will sit above the data at the start and end of the window and below it in the middle, leaving a U-shaped residual that mimics seasonality. The fix is to take the log of the series first, then linear-detrend on the log scale. The linear slope on logs is the geometric growth rate.

Another trap is using the detrended series directly as a forecast input without remembering to add the trend back at the end. Detrending is a transformation, not the final answer. You build a model on the residual — ARMA, exponential smoothing, whatever — generate a forecast in residual space, then reverse the transformation by adding alpha + beta * t for future values of t. Teams that forget the reverse step ship forecasts that drift back to zero.

A subtler pitfall is failing to store the alpha and beta parameters. Detrending is reversible only if you keep the model. Six months later, when a new analyst asks why the residual chart looks the way it does, you need to reconstruct the trend line. The convention at most warehouse teams is to write the parameters to a small audit table keyed by metric, fit date, and window. Without that, the operation is opaque.

A fourth issue is a changepoint inside the window. If the trend slope changed in March because of a relaunch, fitting one line across January through June will leave a large residual on either side of the break. The honest fix is to detect the changepoint — visually or with a Chow test — and fit two separate slopes. The lazy fix is to shorten the window to one regime.

Finally, analysts sometimes run a stationarity test like ADF or KPSS on the detrended series and treat passing the test as evidence that detrending is the right approach. The detrended series almost always passes the test by construction. The test tells you whether the residual is stationary, not whether linear detrend was the correct model. To validate the model, plot the residual and look for structure — curvature, level shifts, or remaining cycles.

Optimization tips

For large daily-metric tables the bottleneck is usually the regression scan, not the row arithmetic. If you compute REGR_SLOPE and REGR_INTERCEPT over a multi-year window every time you build a dashboard, you are paying for that scan repeatedly. The fix is to materialize the fitted parameters in a small results table — one row per metric per fit window — and join the trend line in at query time. The materialized table is tiny; the daily table is not.

Partitioning the source table by event_date lets the query pruner read only the window you care about. On Snowflake, clustering on event_date plus a metric_name filter typically cuts scan cost by an order of magnitude. On BigQuery, partition by date and cluster by metric. On Redshift, sort key on event_date and distribute by metric_name if you have many metrics.

If you need to detrend many metrics in one query, push the regression into a GROUP BY metric_name so the database does the math in parallel. The structure is the same — three CTEs — but every aggregate is grouped. Avoid the temptation to loop over metrics in your orchestrator; one parallel query is faster than fifty serial ones.

If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.

FAQ

What is the difference between detrending and differencing?

Detrending subtracts a fitted trend line and leaves the residual. Differencing replaces each observation with the gap to the previous observation. Detrending is the right tool for a deterministic trend — a steady slope that you can describe with alpha + beta * t. Differencing is the right tool for a stochastic trend — a series that wanders without a fixed slope, as in a random walk. In an ARIMA workflow you often try differencing first because it is parameter-free; in product analytics you often prefer detrending because the fitted slope itself is informative.

When do I need a polynomial detrend instead of linear?

If the residual from a linear detrend shows visible curvature — a smile or a frown shape — the underlying trend is not linear and a quadratic fit will reduce the residual. In practice you almost never go beyond a quadratic. If the curve is more complex than that, the right answer is usually a log transform, a piecewise fit across changepoints, or LOESS in Python rather than a higher-order polynomial in SQL.

Should I log-transform before or after detrending?

Log first, then detrend. If the trend is multiplicative — revenue, active users, GMV during a high-growth phase — the log transform converts the multiplicative trend into an additive one, and a linear detrend on the log scale recovers the geometric growth rate. Detrending first and then logging mixes the operations and gives you a residual that is hard to interpret.

Can I detrend a seasonal series?

Yes, but be deliberate about the order. The standard recipe is to detrend first to remove the long-run slope, then estimate the seasonal component on the detrended series, then subtract the seasonal component to leave the noise. If you try to estimate the seasonal component on the raw series, the trend leaks into the seasonal averages and biases them.

How do I invert the detrend to forecast in the original scale?

Store the alpha and beta from the fit, extend the time index t into the future, and add alpha + beta * t_future back to the residual-space forecast. If you log-transformed the series before detrending, exponentiate after adding the trend back. The full chain is forecast in residual space, add trend, exponentiate — in that order.

How long a window should I fit the trend on?

Long enough that the slope is stable but short enough that you have not crossed a regime change. For most product metrics six to eighteen months is a reasonable range. If you suspect a changepoint — a relaunch, a pricing change, a platform shift — fit the trend on the segment after the change rather than across it. The cost of fitting across a changepoint is a residual full of structure that you will then misread as seasonality.