How to calculate Customer Lifetime in SQL
Contents:
What Customer Lifetime measures and why it matters
Customer Lifetime is the duration a user stays a paying or active customer, measured in days or months between signup and churn. LTV is the dollar version of the same story; Customer Lifetime is the time version. Both metrics ride the same underlying behavior, but a product manager who asks "how long do our users stick around?" wants a number in months, not in revenue, and the two answers are not interchangeable.
Picture this scenario at a Stripe-style SaaS. Your PM messages you at 5pm: "leadership wants average Customer Lifetime by acquisition month, broken down by self-serve vs sales-led, by tomorrow's standup." That request hides three landmines — censored data, skewed distributions, and the definition of churn for users who never explicitly cancel. Get any one of them wrong and your number is off by 30 to 60 percent, which is enough to change a hiring decision or a roadmap bet.
The metric also drives downstream models. Marketing uses average lifetime to set acceptable customer acquisition cost. Finance uses it to model cohort revenue waterfalls. Growth uses lifetime by feature-adoption cohort to figure out which onboarding step is actually load-bearing. And every one of those downstream uses assumes the analyst computed lifetime with survival-aware math, not a naive AVG(end_date - start_date).
The SQL formula
The conceptual formula is short:
Customer Lifetime = (churn_date OR today) - signup_date
for each user, with a censored flag for still-activeFor an exponential-decay assumption — which roughly holds for SaaS with stable monthly churn — there is also a closed form:
Average Lifetime (months) = 1 / monthly_churn_rateIf your monthly churn is 4 percent, the implied average lifetime is 25 months. This closed form is fast to quote in a meeting but it lies whenever churn is not constant over time, which is almost always. For anything beyond a rough sanity check, compute lifetime from the user-level table.
Basic lifetime query
Assume two tables: users(user_id, signup_date) and subscriptions(user_id, ended_at). A null ended_at means the user is still active. The query computes per-user days alive, flags whether the row is censored (still active), and rolls up to the population average:
WITH lifetime AS (
SELECT
u.user_id,
u.signup_date,
COALESCE(s.ended_at, CURRENT_DATE) AS lifetime_end,
s.ended_at IS NOT NULL AS churned,
EXTRACT(EPOCH FROM (COALESCE(s.ended_at, CURRENT_DATE) - u.signup_date)) / 86400 AS days_alive
FROM users u
LEFT JOIN subscriptions s ON s.user_id = u.user_id
)
SELECT
AVG(days_alive) AS avg_lifetime_days,
AVG(days_alive) FILTER (WHERE churned) AS avg_lifetime_churned,
AVG(days_alive) FILTER (WHERE NOT churned) AS avg_lifetime_still_active,
COUNT(*) FILTER (WHERE churned)::NUMERIC
* 100 / NULLIF(COUNT(*), 0) AS churn_pct
FROM lifetime;The split between avg_lifetime_churned and avg_lifetime_still_active is the single most useful diagnostic in this query. If the churned-only average is 9 months and the still-active average is 14 months, the population is healthy — your survivors are older than your departures. If they are equal, your population is still young and the headline number is not yet trustworthy.
For a cohorted view, replace the final SELECT with DATE_TRUNC('month', signup_date) as a GROUP BY key. That gives you the table your PM actually wants — lifetime by acquisition month — which lets you see whether recent cohorts are decaying faster than older ones.
Survival analysis in SQL
The basic query is fine for a quick read, but it systematically underestimates true lifetime because it treats every still-active user as if their lifetime equals today. Kaplan-Meier survival analysis fixes this by asking a different question at each time bucket: of the users still at risk at month M, what fraction made it to month M+1? Multiply those conditional probabilities and you get an unbiased survival curve.
Here is a Kaplan-Meier estimate in plain SQL, returning the fraction of users still alive at each month from 0 to 24:
WITH user_lifetimes AS (
SELECT
u.user_id,
EXTRACT(EPOCH FROM (COALESCE(s.ended_at, CURRENT_DATE) - u.signup_date)) / 86400 AS days_alive,
s.ended_at IS NOT NULL AS churned
FROM users u
LEFT JOIN subscriptions s ON s.user_id = u.user_id
),
months AS (
SELECT generate_series(0, 24, 1) AS month
),
at_risk AS (
SELECT
m.month,
COUNT(*) FILTER (WHERE u.days_alive >= m.month * 30) AS still_in,
COUNT(*) FILTER (WHERE u.days_alive >= m.month * 30 AND u.churned) AS churned_so_far
FROM months m
CROSS JOIN user_lifetimes u
GROUP BY m.month
)
SELECT
month,
still_in,
still_in::NUMERIC * 100 / NULLIF(MAX(still_in) OVER (), 0) AS survival_pct
FROM at_risk
ORDER BY month;To collapse the curve back into a single number — the median survival time — find the smallest month where survival_pct <= 50. Median is the right summary statistic here because lifetime distributions are heavy-tailed: a small share of power users with multi-year tenures will pull the arithmetic mean upward, while the median tells you what a typical customer experiences.
For a more rigorous implementation that matches what libraries like lifelines in Python produce, you would also compute hazards per month and chain the conditional survival probabilities. The version above is the analyst-friendly approximation that fits on one screen and answers 90 percent of business questions.
Customer Lifetime vs LTV
| Metric | What it measures | Unit |
|---|---|---|
| Customer Lifetime | Duration of the customer relationship | Days or months |
| LTV | Cumulative revenue contributed | Dollars |
The relationship is linear in the simplest model: LTV = Customer Lifetime x ARPU. In practice the two metrics drift apart because ARPU is not flat over a customer's tenure — expansion revenue lifts late-tenure ARPU while contraction events drag it down. If your finance team reports LTV growing while Customer Lifetime is flat, expansion is doing the work. If lifetime is growing while LTV is flat, you have a price problem.
Interviewers at companies like Notion, Linear, Vercel, and Airbnb routinely probe this distinction. Knowing that Customer Lifetime is a time-domain metric while LTV is a dollar-domain metric — and that they answer different product questions — is one of the cheapest ways to look senior in a data round.
Common pitfalls
The first trap is including still-active users in the average lifetime number without flagging them. Their observed lifetime is a lower bound — today minus signup — not their actual lifetime, which is unknown. A team that reports AVG(today - signup_date) across the whole base will systematically underreport real lifetime, and the bias is worst for fast-growing companies where most users are recent. The fix is either to split the average by churned versus active, or to switch to a survival-aware estimate like Kaplan-Meier.
A second trap is ignoring right-censoring entirely. Right-censoring means a user is observed up to time T but may continue beyond it. Naive averages assign these users a lifetime of exactly T, which is wrong. Standard survival analysis treats them as "at risk through T, then unknown" and updates the survival curve only at observed events. Skip this step and a healthy 24-month lifetime can look like 11 months on a 12-month-old dataset.
A third trap is confusing Customer Lifetime with retention. Retention is a fraction of users still active at a specific point in time — a snapshot. Customer Lifetime is the integral under the retention curve — a total. They are related but answer different questions: retention tells you how sticky your product is at month 6, while lifetime tells you the expected total time a customer will spend with you. Mixing them up in a report is one of the fastest ways to lose credibility with a numerate stakeholder.
A fourth trap is letting churn definition drift between teams. For a paid subscription, ended_at is a clean signal. For a freemium product, churn is a behavioral definition — typically 30, 60, or 90 days without a qualifying action. Different teams pick different windows and end up with lifetime numbers that look inconsistent. The fix is to write the churn definition into a shared SQL function or dbt macro and reference it everywhere, so every downstream lifetime calculation agrees by construction.
A fifth trap is reporting the mean of a heavy-tailed distribution. Survival times are almost always right-skewed: a long tail of loyal customers pulls the average up, even though most customers experience something closer to the median. If your AVG(days_alive) is 540 and your PERCENTILE_CONT(0.5) is 280, both numbers are correct, but the median is the one that describes a typical customer. Always report both, and use the median as the headline if a non-technical audience will read the chart.
Optimization tips
For tables with hundreds of millions of users, the basic lifetime query is bound by the join between users and subscriptions. Partition subscriptions by user_id hash and add a covering index on (user_id, ended_at) so the lookup is a single index scan per user. On Snowflake or Databricks, clustering the table on user_id plays the same role.
The Kaplan-Meier query is much heavier because of the CROSS JOIN between user lifetimes and the month series. For a one-shot dashboard refresh, materialize user_lifetimes as a temporary table first so the cross join hits a small projected slice instead of replanning the full subscription join 25 times. For nightly pipelines, persist the lifetime fact table and only recompute rows whose ended_at changed since yesterday.
When the same Customer Lifetime number is consumed by ten downstream models, build it once as a dbt model with an explicit churn definition and let everything else reference it. Replacing scattered ad-hoc queries with one shared model is usually a bigger win than tuning any single query — the analyst hours saved dwarf the warehouse compute saved.
Related reading
- How to calculate churn in SQL
- How to calculate cohort retention in SQL
- How to calculate cohort decay in SQL
- SQL window functions interview questions
If you want to drill questions exactly like this every day — Customer Lifetime, survival math, censored data, the pitfalls senior interviewers actually probe — NAILDD is launching with 500+ SQL problems built around this pattern.
FAQ
What is a healthy Customer Lifetime number?
For B2B SaaS with annual contracts, healthy average lifetime is in the 24 to 60 month range; anything below 18 months usually signals a product-market-fit problem rather than a churn-tactics problem. B2C SaaS sits in the 12 to 24 month band because consumer attention is more volatile, and mobile freemium apps typically see 3 to 12 months even for strong products. Compare against your category, not against an industry-wide average, because the variance between categories is larger than the variance within them.
Should I report average or median lifetime?
Median is almost always the better headline because survival times are heavy-tailed and the mean is dragged upward by a small loyal cohort. Report both: the median answers "what does a typical customer experience?" while the mean answers "what is the expected total time per customer?" If finance is consuming the number to project cohort revenue, they need the mean; if growth is consuming it to set expectations for new cohorts, they want the median.
Do I really need Kaplan-Meier, or is a simple average enough?
For a rough internal read on a mature product where most users have already churned, a simple average split by churned versus active is acceptable. For anything customer-facing, for board materials, or for any product less than two years old, you need a survival-aware estimate because the censored fraction is large enough to bias the naive average by double-digit percentage points. The SQL is not much longer than the naive version, so the cost of being correct is small.
Customer Lifetime is increasing — is that always good?
Increasing lifetime is almost always positive, but check the composition. If lifetime is up because new acquisition slowed and the active base is aging, you are not actually retaining better — you are just measuring an older population. The clean read is to compute lifetime by acquisition cohort and confirm that recent cohorts are tracking above or even with older ones at the same tenure. If cohort-on-cohort lifetime is flat or down while the headline number rises, growth has slowed and you are reading a survivorship illusion.
How do I handle users with multiple products or accounts?
If a single human owns five subscriptions, you have to decide whether their Customer Lifetime is the longest active subscription, the union of all active periods, or the time since first signup across any product. Most companies report at the account level — first signup to last cancel — because that is what the business actually cares about, but it requires identity resolution that links subscriptions to a stable customer ID. Build that identity layer once at the warehouse level and every downstream lifetime, LTV, and retention calculation gets correct automatically.
What window should I use to define churn for freemium users?
Common windows are 30, 60, and 90 days of inactivity, and the right choice depends on your natural usage cadence. A daily-active product like a chat app can use 30 days safely; a weekly-active product like a project management tool needs 60 to 90 to avoid flagging vacationing users as churned. Whatever you pick, write it down once as a shared definition and never let two teams use different windows in the same review meeting — the resulting confusion costs more than any precision you gain by tuning the window.