How to detrend a time series in SQL
Contents:
What detrending is and why it matters
Imagine your PM at Stripe pings you on Monday morning: "DAU is up 30 percent year over year, but I want to know what's happening this week after we strip out the growth trend." You stare at a chart that climbs steadily from January and realize that any short-term pattern is hidden under the slope. The DAU line is going up because the product is growing, not because Tuesday is special. Until you remove that long-run drift, every weekly comparison you make will be contaminated.
That drift is called a trend, and the operation of subtracting it from the raw series is detrending. A series with a strong trend behaves badly in almost every downstream analysis. Its autocorrelation function decays slowly because every point is correlated with the next through the shared slope. Regressions between two trending variables produce spurious results — two unrelated metrics that both grow over time will appear correlated. ARIMA models require the input to be stationary, which the raw series rarely is. Seasonality decomposition cannot separate weekly cycles from monthly growth unless the growth is removed first.
In an analytics interview at Airbnb, Uber, or DoorDash, you may be asked to "show weekly seasonality in bookings" or "detect anomalies in checkout conversion." Both questions assume a stationary input. If you skip the detrending step and run a z-score on the raw series, every point at the end of the series will look like a positive anomaly simply because the level is higher. Hiring managers watch for this. The candidate who explicitly removes the trend before running the test signals that they have done time-series work before.
The SQL formula
A linear detrend fits the model y_t = alpha + beta * t to the raw series, then subtracts the fitted line from each observation. The residual is the detrended series. Postgres, Snowflake, BigQuery, and Redshift all expose REGR_SLOPE and REGR_INTERCEPT, which compute the OLS slope and intercept in a single pass. You do not need to materialize a regression library — the database can do it.
The pattern is three CTEs: one to assign a sequential time index, one to compute the slope and intercept, and one to subtract the fitted line.
WITH base AS (
SELECT
event_date,
value,
ROW_NUMBER() OVER (ORDER BY event_date) AS t
FROM daily_metrics
WHERE metric_name = 'dau'
AND event_date >= DATE '2025-01-01'
),
model AS (
SELECT
REGR_SLOPE(value, t) AS beta,
REGR_INTERCEPT(value, t) AS alpha
FROM base
)
SELECT
b.event_date,
b.value,
m.alpha + m.beta * b.t AS trend,
b.value - (m.alpha + m.beta * b.t) AS detrended
FROM base b
CROSS JOIN model m
ORDER BY b.event_date;The detrended column is what you feed into the next stage of analysis — anomaly detection, autocorrelation, or seasonality decomposition. The trend column is the deterministic component you have just identified.
If your warehouse does not expose REGR_SLOPE directly, you can compute it from first principles using AVG, SUM, and COUNT. The OLS slope is Cov(y, t) / Var(t) and the intercept is mean(y) - beta * mean(t). Both quantities are one-pass aggregates, so this trick costs you a single scan.
WITH base AS (
SELECT
event_date,
value,
ROW_NUMBER() OVER (ORDER BY event_date) AS t
FROM daily_metrics
WHERE metric_name = 'dau'
),
stats AS (
SELECT
AVG(t) AS t_mean,
AVG(value) AS y_mean,
AVG(t * value) AS ty_mean,
AVG(t * t) AS t2_mean
FROM base
),
model AS (
SELECT
(ty_mean - t_mean * y_mean) / NULLIF(t2_mean - t_mean * t_mean, 0) AS beta,
y_mean - ((ty_mean - t_mean * y_mean) / NULLIF(t2_mean - t_mean * t_mean, 0)) * t_mean AS alpha
FROM stats
)
SELECT
b.event_date,
b.value,
m.alpha + m.beta * b.t AS trend,
b.value - (m.alpha + m.beta * b.t) AS detrended
FROM base b
CROSS JOIN model m
ORDER BY b.event_date;For a non-linear trend, swap the linear model for a polynomial. A quadratic fit uses t and t * t as features and requires solving a 2x2 normal equation. Most analysts who reach this point switch to LOESS in Python, but for an interview question a polynomial detrend in SQL is fair game and shows familiarity with the closed-form solution.
Differencing vs linear detrend
Detrending is not the same as differencing, and confusing the two is the fastest way to lose marks in a senior analytics interview. Differencing replaces each observation with the gap to the previous observation: y_t - y_{t-1}. It works when the trend is stochastic — when the level wanders without a fixed slope, as in a random walk. Linear detrend works when the trend is deterministic — when the underlying growth is a steady slope.
SELECT
event_date,
value,
-- Differencing: handles stochastic trend
value - LAG(value, 1) OVER (ORDER BY event_date) AS first_diff,
-- Linear detrend: handles deterministic trend
value - (alpha + beta * row_number) AS linear_detrended
FROM joined_with_model
ORDER BY event_date;| Approach | When to use |
|---|---|
| Differencing | random walk, stochastic trend |
| Linear detrend | deterministic linear trend |
| Polynomial | non-linear but smooth trend |
| LOESS | flexible local fit, expensive in SQL |
In practice, for product metrics at companies like Notion or Figma you usually start with linear detrend. If the residual still shows curvature, escalate to a quadratic. If the slope itself changes at a known date — a product relaunch, a pricing change — fit two segments and detrend each separately.
Detrending and seasonality
A common follow-up question is how to handle a series that contains both a trend and a weekly cycle. The canonical answer is STL decomposition: split the series into trend, seasonal, and remainder components. In SQL you can approximate this in three steps. First, run the linear detrend above to remove the long-run slope. Second, group the detrended values by day of week and take the mean — that is your weekly seasonal component. Third, subtract the seasonal component from the detrended series to leave the noise.
WITH detrended AS (
SELECT
event_date,
EXTRACT(DOW FROM event_date)::INT AS dow,
detrended_value
FROM step1_detrended
),
seasonal AS (
SELECT
dow,
AVG(detrended_value) AS seasonal_component
FROM detrended
GROUP BY dow
)
SELECT
d.event_date,
d.detrended_value,
s.seasonal_component,
d.detrended_value - s.seasonal_component AS residual
FROM detrended d
JOIN seasonal s USING (dow)
ORDER BY d.event_date;The residual column is what you actually run anomaly detection on. A z-score on the residual flags days that are unusual after both growth and weekday effects are accounted for. A z-score on the raw series flags Mondays.
Common pitfalls
The most frequent mistake is fitting a linear detrend to an exponential growth curve. If revenue at OpenAI grows by 8 percent month over month, the underlying process is multiplicative, not additive. A linear fit will sit above the data at the start and end of the window and below it in the middle, leaving a U-shaped residual that mimics seasonality. The fix is to take the log of the series first, then linear-detrend on the log scale. The linear slope on logs is the geometric growth rate.
Another trap is using the detrended series directly as a forecast input without remembering to add the trend back at the end. Detrending is a transformation, not the final answer. You build a model on the residual — ARMA, exponential smoothing, whatever — generate a forecast in residual space, then reverse the transformation by adding alpha + beta * t for future values of t. Teams that forget the reverse step ship forecasts that drift back to zero.
A subtler pitfall is failing to store the alpha and beta parameters. Detrending is reversible only if you keep the model. Six months later, when a new analyst asks why the residual chart looks the way it does, you need to reconstruct the trend line. The convention at most warehouse teams is to write the parameters to a small audit table keyed by metric, fit date, and window. Without that, the operation is opaque.
A fourth issue is a changepoint inside the window. If the trend slope changed in March because of a relaunch, fitting one line across January through June will leave a large residual on either side of the break. The honest fix is to detect the changepoint — visually or with a Chow test — and fit two separate slopes. The lazy fix is to shorten the window to one regime.
Finally, analysts sometimes run a stationarity test like ADF or KPSS on the detrended series and treat passing the test as evidence that detrending is the right approach. The detrended series almost always passes the test by construction. The test tells you whether the residual is stationary, not whether linear detrend was the correct model. To validate the model, plot the residual and look for structure — curvature, level shifts, or remaining cycles.
Optimization tips
For large daily-metric tables the bottleneck is usually the regression scan, not the row arithmetic. If you compute REGR_SLOPE and REGR_INTERCEPT over a multi-year window every time you build a dashboard, you are paying for that scan repeatedly. The fix is to materialize the fitted parameters in a small results table — one row per metric per fit window — and join the trend line in at query time. The materialized table is tiny; the daily table is not.
Partitioning the source table by event_date lets the query pruner read only the window you care about. On Snowflake, clustering on event_date plus a metric_name filter typically cuts scan cost by an order of magnitude. On BigQuery, partition by date and cluster by metric. On Redshift, sort key on event_date and distribute by metric_name if you have many metrics.
If you need to detrend many metrics in one query, push the regression into a GROUP BY metric_name so the database does the math in parallel. The structure is the same — three CTEs — but every aggregate is grouped. Avoid the temptation to loop over metrics in your orchestrator; one parallel query is faster than fifty serial ones.
Related reading
- SQL window functions interview questions
- How to calculate autocorrelation in SQL
- How to calculate cross-correlation in SQL
- How to calculate day-of-week analysis in SQL
If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.
FAQ
What is the difference between detrending and differencing?
Detrending subtracts a fitted trend line and leaves the residual. Differencing replaces each observation with the gap to the previous observation. Detrending is the right tool for a deterministic trend — a steady slope that you can describe with alpha + beta * t. Differencing is the right tool for a stochastic trend — a series that wanders without a fixed slope, as in a random walk. In an ARIMA workflow you often try differencing first because it is parameter-free; in product analytics you often prefer detrending because the fitted slope itself is informative.
When do I need a polynomial detrend instead of linear?
If the residual from a linear detrend shows visible curvature — a smile or a frown shape — the underlying trend is not linear and a quadratic fit will reduce the residual. In practice you almost never go beyond a quadratic. If the curve is more complex than that, the right answer is usually a log transform, a piecewise fit across changepoints, or LOESS in Python rather than a higher-order polynomial in SQL.
Should I log-transform before or after detrending?
Log first, then detrend. If the trend is multiplicative — revenue, active users, GMV during a high-growth phase — the log transform converts the multiplicative trend into an additive one, and a linear detrend on the log scale recovers the geometric growth rate. Detrending first and then logging mixes the operations and gives you a residual that is hard to interpret.
Can I detrend a seasonal series?
Yes, but be deliberate about the order. The standard recipe is to detrend first to remove the long-run slope, then estimate the seasonal component on the detrended series, then subtract the seasonal component to leave the noise. If you try to estimate the seasonal component on the raw series, the trend leaks into the seasonal averages and biases them.
How do I invert the detrend to forecast in the original scale?
Store the alpha and beta from the fit, extend the time index t into the future, and add alpha + beta * t_future back to the residual-space forecast. If you log-transformed the series before detrending, exponentiate after adding the trend back. The full chain is forecast in residual space, add trend, exponentiate — in that order.
How long a window should I fit the trend on?
Long enough that the slope is stable but short enough that you have not crossed a regime change. For most product metrics six to eighteen months is a reasonable range. If you suspect a changepoint — a relaunch, a pricing change, a platform shift — fit the trend on the segment after the change rather than across it. The cost of fitting across a changepoint is a residual full of structure that you will then misread as seasonality.