How to calculate CAC recovery in SQL
Contents:
What CAC recovery measures and why payback alone is not enough
Your finance lead at Stripe pings you on Slack on a Sunday night: "We need to know how fast the Q1 paid-search cohort is paying us back, by Monday standup." You can hand them a single payback number, say 9 months, and that answer is technically correct, but useless for a growth decision. Payback is a point estimate. It tells the team when the line crosses 100 percent, and hides everything before and after that moment.
CAC recovery is the curve, not the point. For each cohort and each month since acquisition, recovery is the share of customer acquisition cost the cohort has already returned through revenue. Two cohorts can both reach 100 percent recovery at month 9, yet one of them might already be at 60 percent by month 3 while the other is still scraping along at 20 percent. The first cohort is the better cohort, because the business gets cash back faster and can recycle it into the next month of acquisition spend.
The other reason recovery matters is that it makes channel mix visible. A blended payback of 9 months can hide an organic channel hitting 200 percent by month 6 and a paid social channel plateauing at 70 percent. Plot recovery curves by channel and the bad channels reveal themselves. This question comes up on growth analyst loops at DoorDash, Airbnb, and Notion, so it is worth being fluent with the SQL.
The SQL formula
The math is short and the SQL is short, but you have to be careful about what goes into the numerator and the denominator. The numerator is cumulative revenue per user for the cohort up to month k since acquisition. The denominator is the CAC for that same cohort, computed as total acquisition spend divided by acquired users in the cohort month. Recovery at month k is the ratio. A value of one means the cohort has fully paid back its acquisition cost. Values above one are profit on top of the original spend.
Assume you already have a cumulative LTV table by cohort and a CAC per cohort table. The cumulative LTV table holds, for each cohort month and each month-since-acquisition, the per-user cumulative revenue earned so far. The CAC table holds, for each cohort month, the blended CAC that was paid to acquire that cohort. Joining them on cohort_month gives you the recovery curve in one shot.
WITH ltv AS (
SELECT cohort_month, months_since, cumulative_ltv
FROM ltv_table
),
cac AS (
SELECT cohort_month, cac
FROM cac_per_cohort
)
SELECT
l.cohort_month,
l.months_since,
l.cumulative_ltv,
c.cac,
l.cumulative_ltv / NULLIF(c.cac, 0) AS recovery_ratio,
l.cumulative_ltv / NULLIF(c.cac, 0) * 100.0 AS recovery_pct
FROM ltv l
JOIN cac c USING (cohort_month)
ORDER BY l.cohort_month, l.months_since;The NULLIF(c.cac, 0) is not paranoia. There is always one row in production where CAC is zero because the cohort came from a referral push or a viral spike, and without the guard the whole report crashes on a divide-by-zero. In a Looker or Metabase dashboard this becomes a line per cohort against months-since-acquisition with a 100 percent reference line. The shape tells the story: a steep early curve means fast payback, a flat plateau means a cohort that will never get there.
If you do not already have a cumulative LTV table, build it on the fly with a window function. The classic interview version looks like this:
WITH revenue_by_cohort AS (
SELECT
u.cohort_month,
DATE_DIFF('month', u.cohort_month, t.txn_month) AS months_since,
SUM(t.gross_revenue) / COUNT(DISTINCT u.user_id) AS arpu_in_month
FROM users u
JOIN transactions_monthly t USING (user_id)
GROUP BY u.cohort_month, DATE_DIFF('month', u.cohort_month, t.txn_month)
)
SELECT
cohort_month,
months_since,
SUM(arpu_in_month) OVER (
PARTITION BY cohort_month
ORDER BY months_since
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_ltv
FROM revenue_by_cohort;The SUM ... OVER (PARTITION BY cohort_month ORDER BY months_since) is the building block. If you are shaky on window frames, that is one of the most common ways the question gets twisted on an interview loop.
Recovery by channel
The same shape extends naturally to channels. Replace the cohort-only join with a join on cohort plus channel, and aggregate recovery across cohorts to get an average curve per acquisition source. The trick is to only include cohorts that are old enough to have observed the months-since-acquisition value you are aggregating, otherwise you are averaging in zeros.
SELECT
utm_source,
months_since,
AVG(cumulative_ltv / NULLIF(cac, 0)) * 100.0 AS avg_recovery_pct
FROM cohort_ltv_with_cac
WHERE cohort_month BETWEEN DATE '2025-01-01' AND DATE '2025-06-30'
GROUP BY utm_source, months_since
ORDER BY utm_source, months_since;Plotting one line per utm_source against months_since is the standard growth-team chart reviewed every quarter at companies like Airbnb and DoorDash. Organic and direct climb fast, paid search lands in the middle, paid social often hangs below the rest. That is the visualization that gets the budget moved.
There is a subtle accounting decision baked into this query. Because you are averaging cumulative_ltv / cac across cohorts, each cohort gets equal weight regardless of size. If one cohort has 50 users and another 5000, the small one swings the average. Switch to a weighted average using cohort size as the weight, or use sum-of-numerator over sum-of-denominator instead.
Recovery vs payback in one query
When the executive review meeting starts, the question is rarely just "what is the recovery curve". It is "what is the recovery curve and the payback month and the long-run multiple, all on one row, for every cohort and every channel". The pattern you want is MIN(months_since) FILTER (WHERE recovery_pct >= 100) to find the payback month and MAX(recovery_pct) to find the final long-run recovery.
WITH dashboard AS (
SELECT
cohort_month,
utm_source,
months_since,
cumulative_ltv / NULLIF(cac, 0) * 100.0 AS recovery_pct
FROM cohort_ltv_with_cac
)
SELECT
cohort_month,
utm_source,
MIN(months_since) FILTER (WHERE recovery_pct >= 100) AS payback_month,
MAX(recovery_pct) AS final_recovery
FROM dashboard
GROUP BY cohort_month, utm_source
ORDER BY cohort_month, utm_source;A final_recovery of 250 means a cohort is producing roughly 2.5 times its acquisition cost in revenue across its observed lifetime. A final_recovery of 80 after 24 months of data is the cohort that is, as the finance team will dryly note, never going to pay you back. That cohort needs a different fix, usually a pricing change or a channel cut, not more spend.
Common pitfalls
The first pitfall is computing recovery on gross revenue when finance reports it on gross margin. Revenue-based recovery looks great in your dashboard and then collapses in the boardroom because the CFO is netting out hosting, payments, and refunds before checking payback. To stay consistent, multiply cumulative_ltv by your gross margin percentage, or build a cumulative_margin table alongside the cumulative LTV one. The query shape is unchanged. The number gets uglier and more honest.
The second pitfall is comparing young and old cohorts on the absolute months-since-acquisition axis. A cohort that is three months old physically cannot have a 12-month recovery value. If you join it onto a chart with cohorts that have 18 months of history, the young cohort line just stops, and the eye reads "this cohort died". Either truncate every cohort to the same maximum age before plotting, or use a stacked small-multiples view, one panel per cohort, so the comparison is honest.
The third pitfall is including churned users in the denominator of the per-user calculation as if they were still active. Survivor-only recovery, where the denominator is "users still subscribed at month k", looks much rosier than total-cohort recovery, where you keep everyone in the denominator and assign zero revenue to the churned. Both are legitimate views and they answer different questions. The mistake is picking one silently and not telling the consumer of the chart which one they are looking at. Label the axis explicitly.
The fourth pitfall is using a single blended CAC for every channel inside a cohort. If a January cohort of 100 users came in across five paid channels with very different costs per acquisition, the blended CAC will look fine, but every channel-level recovery curve you compute on top of it is wrong. You need cohort-by-channel CAC, computed as channel spend in the cohort month divided by users acquired through that channel in that month. Build the granular table once and reuse it.
The fifth pitfall is ignoring foreign exchange when cohorts span years or revenue lives in multiple currencies. A 24-month recovery window picks up real changes in the dollar value of euro revenue. Normalize to a single reporting currency at a fixed reference rate, or you will spend a meeting arguing about whether recovery moved or the exchange rate did.
Optimization tips
Cumulative LTV tables get expensive fast because every cohort-month row carries a running total. On Snowflake or Databricks the first easy win is to materialize the cohort-level cumulative LTV as a daily incremental table rather than recomputing it from raw transactions every time the dashboard loads. If you keep the raw transactions partitioned by transaction month, an incremental refresh only scans the latest month and updates the running sums for the affected cohorts.
The second optimization is pre-aggregating per-user revenue to monthly grain before you ever touch the cohort calculation. Doing the SUM once at user-month level and then re-aggregating to cohort-month-since is dramatically cheaper than streaming through raw event-level rows for every dashboard query. This matters most on platforms billed by compute time.
The third optimization is to lean on ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW carefully. On very wide cohort windows, switching to a precomputed prefix-sum column inside the materialized cohort table beats recomputing the window aggregate on every query. The dashboard then becomes a cheap lookup, and you pay the heavy compute once during the nightly job.
The fourth optimization is to keep CAC per cohort and cumulative LTV tables joinable on the same partition key, typically cohort_month. Co-locating partitions removes a shuffle from the join, which on tables with channel breakouts is often the slowest step.
Related reading
- How to calculate blended CAC in SQL
- How to calculate churn in SQL
- SQL window functions interview questions
If you want to drill SQL problems that mirror real growth-analyst interview loops every day, NAILDD is launching with 500+ curated SQL problems built around exactly this kind of cohort and recovery work.
FAQ
Is hitting 100 percent recovery the same thing as payback?
Yes. Recovery is the continuous curve, and the moment it crosses 100 percent is the discrete payback month. Some teams quote the curve, some teams quote the point, but they are mathematically the same event. If you are presenting to a finance audience that lives in payback periods, hand them the month value. If you are presenting to growth or product, hand them the curve, because it shows whether the cohort is improving over time or stagnating just under the line.
What recovery numbers are considered healthy?
For a typical SaaS business, healthy paid cohorts hit 80 to 150 percent recovery by month 12 and push past 200 percent by month 24. Consumer apps with monthly churn behave differently and often see the curve flatten earlier. The single most useful comparison is not to an industry benchmark but to your own prior cohorts on the same channel. If the December cohort recovers faster than the November cohort at every month-since value, your acquisition is improving regardless of where the absolute number lands.
What does it mean if recovery plateaus at 70 percent and never moves?
It means the cohort is structurally unprofitable on a fully loaded basis. The fix is rarely more retention work. It is usually either a price increase that lifts revenue per user, a CAC reduction for that channel through better targeting, or a decision to stop spending on that channel entirely. Trying to grow your way out of a cohort that plateaus below 100 percent is how growth teams burn through runway.
Why would recovery keep climbing after the obvious plateau month?
Expansion revenue and renewals. In SaaS, customers who reach year two often upgrade tiers or add seats, which pushes cumulative revenue higher without a new acquisition cost. In consumer subscriptions, annual renewals at slightly higher prices have the same effect. A cohort that looks flat at month 18 can suddenly add 30 to 40 points of recovery in month 24 if expansion is the dominant motion. This is one of the most under-appreciated reasons to keep cohorts open in your reporting rather than truncating them at the payback line.
How does CAC recovery relate to the LTV-to-CAC ratio?
LTV-to-CAC collapses the entire cohort lifetime into a single ratio, useful for board-deck summaries and unhelpful for operational decisions. CAC recovery preserves the time dimension, so you can see whether a 3.0 ratio came from a fast cohort that paid back in six months or a slow cohort that paid back in twenty. Two cohorts with identical LTV-to-CAC can have very different cash dynamics, and recovery is the curve that surfaces the difference.
Should I include refunds and chargebacks in cumulative revenue?
Yes, by netting them out at the transaction level before cohort aggregation. If you only deduct refunds at the finance reporting layer, your recovery curves will look prettier than reality and the gap between your dashboard and the P and L surfaces at quarter end. The cleaner pattern is one canonical net-revenue view that every cohort and recovery query reads from.