How to calculate Cohort Decay in SQL
Contents:
What Cohort Decay actually answers
The growth lead at Notion drops a Slack message on a Tuesday afternoon: "Our retention curves look fine but the finance team thinks LTV is overstated. Can you tell me how fast each cohort is decaying and whether the slope is getting better or worse?" That is the question Cohort Decay answers, and it is the question that separates analysts who ship from analysts who get politely sidelined into dashboards-only work.
Cohort Decay is the rate at which a cohort thins out from one period to the next. A retention curve plots survivors at week one, week two, week three. Decay is the first derivative of that curve — how much you lose between consecutive points. Stated in numbers it sounds like "10% weekly decay" or "half-life of 12 weeks". Stated in business terms it tells the CFO whether the LTV number in the board deck is conservative or fantasy.
This metric shows up in product analytics interviews at Stripe, Airbnb, DoorDash, Linear, and Snowflake whenever the take-home includes a retention dataset. Junior candidates compute retention and stop. Senior candidates compute decay, compare it across acquisition channels, and call out the cohorts whose decay is accelerating. The senior framing is what this post teaches, with SQL you can paste into a Postgres or Snowflake console and adapt for the dialect you happen to be interviewing on.
The base retention table
Decay is derived, so you build the cohort retention table first and then compute differences on top of it. The retention table has one row per acquisition cohort per week offset, with a survival ratio in the last column. A clean cohort is one weekly bucket — daily is too noisy for early-stage products, monthly hides the early drop where most churn lives.
The first CTE assigns every user to their acquisition cohort. The second joins the event stream and computes how many weeks each event sits after signup. The final SELECT divides each week's active count by week zero to produce a ratio bounded between zero and one.
WITH cohort_week AS (
SELECT
DATE_TRUNC('week', u.created_at)::DATE AS cohort_week,
u.user_id
FROM users u
WHERE u.created_at >= CURRENT_DATE - INTERVAL '24 weeks'
),
activity AS (
SELECT
cw.cohort_week,
cw.user_id,
FLOOR((a.event_date - cw.cohort_week) / 7)::INT AS week_offset
FROM cohort_week cw
JOIN product_events a ON a.user_id = cw.user_id
WHERE a.event_date >= cw.cohort_week
)
SELECT
cohort_week,
week_offset,
COUNT(DISTINCT user_id) AS active_users,
COUNT(DISTINCT user_id)::NUMERIC
/ FIRST_VALUE(COUNT(DISTINCT user_id)) OVER (
PARTITION BY cohort_week ORDER BY week_offset
) AS retention
FROM activity
WHERE week_offset BETWEEN 0 AND 12
GROUP BY cohort_week, week_offset
ORDER BY cohort_week, week_offset;Read it carefully. FIRST_VALUE(...) OVER (PARTITION BY cohort_week ORDER BY week_offset) always returns the week-zero count for each cohort, so the division produces a clean retention ratio starting at 1.0 at week zero and falling each week thereafter. The 24-week window keeps the result set bounded and protects you from scanning the entire history when you only need a recent trend. Save the output as a view or temp table — every block below assumes it exists under the name cohort_retention.
Weekly decay rate
Weekly decay is 1 - retention(t) / retention(t-1). If retention at week 2 is 0.45 and at week 3 is 0.40, decay between those two weeks is 1 - 0.40 / 0.45 = 0.111, or 11.1%. The LAG window function pulls the previous row's retention into the current row, and the arithmetic falls out from there.
WITH decay AS (
SELECT
cohort_week,
week_offset,
retention,
LAG(retention) OVER (
PARTITION BY cohort_week ORDER BY week_offset
) AS prev_retention
FROM cohort_retention
)
SELECT
cohort_week,
week_offset,
retention,
1 - retention / NULLIF(prev_retention, 0) AS weekly_decay_rate
FROM decay
WHERE prev_retention IS NOT NULL
ORDER BY cohort_week, week_offset;NULLIF(prev_retention, 0) is the safety net. If a prior week had no survivors the division would explode; NULLIF returns NULL instead, which propagates harmlessly. The WHERE prev_retention IS NOT NULL drops the week-zero row, where decay is undefined.
The interpretation step is where most analysts stop short. If decay sits stable at around 10% across week offsets, your retention curve is well-behaved geometric decay and your LTV formula is trustworthy. If decay is high in week one and falls each week after — say 35%, then 18%, then 8%, then 4% — you have a typical "leaky bucket then loyal core" shape where geometric LTV understates value. If decay rises over time, something breaks at later lifecycle stages — a renewal flow, a payment retry, a notification that stops firing. That last pattern is the one Cohort Decay was invented to catch.
Cohort half-life
Half-life is the week offset where retention first crosses 50%. It compresses a curve into a single number a PM can quote in a standup. A half-life of 12 weeks for a consumer app is excellent. For a paid B2B SaaS with monthly contracts, half-life of 12 weeks is a fire — you would expect 26 weeks or more.
SELECT
cohort_week,
MIN(week_offset) AS half_life_weeks
FROM cohort_retention
WHERE retention <= 0.5
GROUP BY cohort_week
ORDER BY cohort_week;The MIN(week_offset) plus retention <= 0.5 filter returns the first week the cohort fell below half. If a cohort never falls below 50% in the observation window — common for young cohorts that have not aged enough — the row drops out entirely. Report those cohorts as ">12 weeks" rather than NULL, because a missing row in a dashboard column reads as zero to a tired exec at 9pm.
Plot half-life over time. If half-life trends upward cohort over cohort, retention is improving and you have a story to tell the board. If half-life is stable, the product is mature and the lever is acquisition mix, not retention. If half-life trends down, something in the product or pricing is degrading and you need to find it before the next forecast cycle.
Revenue forecast from decay
If decay is approximately constant, future revenue from a cohort is a geometric series. The closed form is ARPU divided by decay. Plug it into SQL and you have a per-cohort LTV estimate good for the next 30 to 90 days.
WITH cohort_stats AS (
SELECT
cohort_week,
AVG(weekly_decay_rate) AS avg_decay,
AVG(weekly_revenue_per_user) AS arpu
FROM cohort_retention_with_decay
GROUP BY cohort_week
)
SELECT
cohort_week,
arpu,
avg_decay,
-- Geometric series: ARPU * sum (1 - decay)^t = ARPU / decay
arpu / NULLIF(avg_decay, 0) AS ltv_estimate
FROM cohort_stats;The formula ARPU / decay is the geometric-series shortcut for an infinite horizon. It overstates short-horizon LTV slightly and understates long-horizon LTV materially when a cohort has a floor — the loyal 10 to 20% of users who never leave. For tighter forecasts, fit an exponential to the retention curve and add a constant for the floor: retention(t) = floor + (1 - floor) * exp(-lambda * t). SQL is the wrong tool for that fit; pull the retention table into a notebook and use scipy.optimize.curve_fit.
In an interview, write the geometric-series version and say one sentence about the floor. The interviewer will either nod and move on or ask you to elaborate, at which point you sketch the exponential fit. Both outcomes are wins.
Common pitfalls
The most damaging pitfall is computing decay on tiny cohorts. A cohort of 50 users that loses 10 people week-over-week looks like 20% weekly decay, but the noise band on a sample that small is enormous. Filter cohorts under a sensible floor — usually 200 to 500 users for consumer products, 50 for enterprise — and clearly label small cohorts as low-confidence on the dashboard. Acquisition leads will push back on the exclusion until you show them the confidence intervals.
The second pitfall is computing decay on cumulative retention instead of period retention. Cumulative retention is monotonically rising — it counts anyone who ever returned through week N. Period retention is the share active in that specific week. Decay only makes sense on period retention; on cumulative the decay is always negative or zero, which is meaningless. Triple-check which column you are differencing before you ship.
The third pitfall is reporting half-life for cohorts that have not yet crossed 50%. Young cohorts will have a NULL half-life and a literal reading of the dashboard will misclassify them as "infinite retention". Either filter them out, render ">12 weeks", or split the dashboard into "mature cohorts" and "immature cohorts" with explicit labels.
The fourth pitfall is extrapolating decay to infinity. Real products have a floor of users who never churn — they pay for years out of inertia or because the product is genuinely indispensable. A pure geometric formula assumes decay continues forever and converges to zero, which underestimates true LTV on time horizons beyond a year. The Stripe and DoorDash finance teams add a manual floor term; you should too.
The fifth pitfall is averaging decay across segments with wildly different shapes. A paid-search cohort and an organic-search cohort often have completely different decay profiles, and the average decay number is the worst lie in analytics — it describes neither group. Always break decay out by acquisition channel, plan tier, geography, or whatever segmentation explains variance in your data, and only report a global decay number with a sentence about the spread.
Optimization tips
For large event tables, the retention CTE is the bottleneck. Partition the product_events table by event_date — monthly partitions on Postgres, native partitioning on Snowflake and BigQuery. A query for the last 24 weeks of cohorts then reads six partitions instead of the full table. On a five-billion-row event stream this turns a five-minute query into thirty seconds.
The second lever is a pre-aggregated weekly_active_users rollup refreshed by an overnight dbt model, shaped (user_id, active_week, key_action_count). The cohort retention CTE joins to that table instead of raw events, cutting compute by another order of magnitude. This is the standard pattern on Databricks and the reason your data engineering team built it — use it.
The third lever is materializing the cohort_retention view itself. If five different decay queries reference it every morning, recomputing the CTE five times is wasteful. Materialize as a daily-refreshed table; downstream decay, half-life, and LTV queries all become trivial.
Related reading
- How to calculate Churn in SQL
- How to calculate ARPU in SQL
- How to calculate Active Cohort in SQL
- How to calculate Churn Risk Score in SQL
- SQL window functions interview questions
If you want to drill cohort SQL questions like this daily, NAILDD is launching with 500+ SQL problems on exactly this pattern.
FAQ
What counts as a "good" weekly decay rate?
It depends entirely on product category and lifecycle stage. SaaS products often see week-one decay of 30 to 40% as casual signups bounce, then a long tail of 5 to 10% weekly decay as the surviving cohort settles into habit. Consumer social and gaming products tend to start with steeper week-one decay — 50% or higher is common — and reach a floor faster. The number that matters is the slope between weeks 4 and 12, where the geometric assumption is most useful for LTV.
Is a 12-week half-life impressive?
For consumer mobile apps where most installs never open the app twice, a 12-week half-life is excellent. For paid B2B SaaS where customers signed a contract, a 12-week half-life is alarming — you would expect 26 weeks or more before half a cohort lapses. Compare half-life only to your own historical cohorts and to public benchmarks for your exact segment; cross-category comparisons are misleading.
My decay is rising over time — should I worry?
Almost always yes. Rising decay means late-stage users are churning faster than they used to, which usually signals a broken renewal flow, a pricing change that bit deeper than expected, or a degraded core feature. Pull the affected cohorts apart by plan tier and segment, and look at the support tickets and product events for the weeks where decay jumps. Rising decay is the canary; the cause is upstream.
Can I compute decay on DAU instead of weekly cohorts?
Yes, with care. Daily decay is noisier — weekend effects, payday cycles, and timezone clustering all show up. Smooth with a seven-day rolling average before differencing, or step up to weekly cohorts unless you have a specific reason for daily granularity. Daily decay is most useful for live products with extremely high engagement where weekly granularity hides the signal.
Is the geometric LTV formula good enough for a board deck?
For a 30 to 90-day horizon, yes, with a footnote on the floor assumption. For an annual LTV projection used in a fundraise or budget, no — fit an exponential plus floor on the actual retention curve, or use a probabilistic model like BG/NBD. Geometric LTV understates value on long horizons because it assumes decay continues forever, and the gap between geometric and floor-adjusted LTV widens with time.
How many cohorts do I need before decay stabilizes?
For weekly cohorts in a consumer product, eight to twelve cohorts with at least 500 users each is usually enough to see a stable average decay and detect a trend. For enterprise SaaS with smaller cohort sizes, you may need to aggregate quarterly cohorts and accept a slower feedback loop. Reporting decay on a single cohort is almost never useful — show the trend across cohorts so the audience sees the slope, not a point estimate.