How to calculate LTV in SQL
Contents:
Why analysts compute LTV in SQL
LTV drives the marketing strategy: how much you can spend on acquisition, which channels deserve more budget, when an upsell program pays for itself. If LTV is wrong, the CAC budget gets misallocated — the company either floods unprofitable channels with cash or starves profitable ones. Both failure modes trace to one root cause: a number nobody can defend.
On paper LTV looks simple. In practice the devil is in the choices. Do refunds count? Revenue or gross margin? Historical roll-up or forward projection? Mean or median? The same dataset can produce numbers that differ by two or three times.
This playbook walks through eight SQL recipes that cover the patterns analysts ship at Stripe, Airbnb, Notion, and DoorDash. Each answers a real question a PM or finance partner will ask. Treat them as templates: copy, adjust the schema, run.
Formula and data model
LTV (lifetime value) is the total revenue a single customer brings over their entire lifecycle in the product. Two shapes show up most often:
E-commerce:
LTV = AOV * purchase_frequency * customer_lifespan
Subscription / SaaS:
LTV = ARPU * (1 / churn_rate) = ARPU / churnFor long-horizon decisions, recompute LTV using gross margin instead of revenue. A 1,000 dollar LTV at 20 percent margin is 200 dollars of profit, and that 200 dollar number is what legitimately compares against CAC.
Schema: orders(user_id, total, status, created_at), users(user_id, attribution_channel), plus payments and subscriptions for SaaS recipes.
Historical LTV
The simplest recipe — revenue each customer has produced through today.
SELECT
user_id,
SUM(total) AS ltv_historical,
COUNT(*) AS orders_cnt,
MIN(created_at) AS first_order,
MAX(created_at) AS last_order
FROM orders
WHERE status = 'paid'
GROUP BY user_id
ORDER BY ltv_historical DESC;Strength: fully defensible — the number is what happened, no modeling assumptions hide inside. Weakness: it heavily underestimates young cohorts because their future revenue has not yet landed. Use it for finance roll-ups and sanity checks, not forecasts.
Average LTV by cohort
A single LTV number is too coarse for real decisions. Group by signup month.
WITH cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', MIN(created_at)) AS cohort_month
FROM orders
GROUP BY user_id
),
user_revenue AS (
SELECT
user_id,
SUM(total) AS total_revenue
FROM orders
WHERE status = 'paid'
GROUP BY user_id
)
SELECT
c.cohort_month,
COUNT(*) AS cohort_size,
SUM(ur.total_revenue) AS total_revenue,
AVG(ur.total_revenue) AS avg_ltv
FROM cohorts c
JOIN user_revenue ur ON ur.user_id = c.user_id
GROUP BY c.cohort_month
ORDER BY c.cohort_month;A cohort table is more honest than a single average. Older cohorts have had more time to compound revenue, which is normal. The signal worth alarming on is a young cohort whose accumulation is flatter than the prior cohort at the same age — that is the early warning that activation or retention is regressing, often months before the topline shows it.
LTV by months since signup
The canonical chart growth and finance partners want: revenue accumulation as a function of customer age. Cohort curves are what you anchor every forecast on.
WITH cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', MIN(created_at)) AS cohort_month
FROM orders
GROUP BY user_id
),
revenue_by_month AS (
SELECT
o.user_id,
c.cohort_month,
DATE_TRUNC('month', o.created_at) AS order_month,
SUM(o.total) AS revenue
FROM orders o
JOIN cohorts c ON c.user_id = o.user_id
WHERE o.status = 'paid'
GROUP BY 1, 2, 3
),
ltv_cumulative AS (
SELECT
user_id,
cohort_month,
order_month,
(EXTRACT(YEAR FROM AGE(order_month, cohort_month)) * 12
+ EXTRACT(MONTH FROM AGE(order_month, cohort_month)))::INT AS months_since_signup,
SUM(revenue) OVER (
PARTITION BY user_id
ORDER BY order_month
) AS cumulative_ltv
FROM revenue_by_month
)
SELECT
cohort_month,
months_since_signup,
AVG(cumulative_ltv) AS avg_cumulative_ltv,
COUNT(DISTINCT user_id) AS users_alive
FROM ltv_cumulative
GROUP BY cohort_month, months_since_signup
ORDER BY cohort_month, months_since_signup;The arithmetic in months_since_signup is the easy place to trip yourself up. EXTRACT(MONTH FROM AGE(...)) returns only the month component, between 0 and 11, so 18 calendar months becomes 6 unless you also add EXTRACT(YEAR FROM AGE(...)) * 12. Forget this and the chart claims every cohort retains forever.
SaaS LTV via ARPU over churn
The closed-form formula every B2B deck quotes. Average revenue per user divided by monthly churn rate.
WITH arpu_calc AS (
SELECT AVG(monthly_revenue) AS arpu
FROM (
SELECT
user_id,
DATE_TRUNC('month', created_at) AS month,
SUM(amount) AS monthly_revenue
FROM payments
GROUP BY 1, 2
) t
),
churn_calc AS (
SELECT
COUNT(*) FILTER (WHERE status = 'churned')::FLOAT
/ NULLIF(COUNT(*), 0) AS monthly_churn_rate
FROM subscriptions
WHERE created_at > NOW() - INTERVAL '3 months'
)
SELECT
arpu_calc.arpu,
churn_calc.monthly_churn_rate,
arpu_calc.arpu / NULLIF(churn_calc.monthly_churn_rate, 0) AS ltv
FROM arpu_calc, churn_calc;The closed form assumes constant ARPU and constant churn. Both assumptions break in practice — ARPU drifts with pricing and expansion, churn fluctuates with seasonality. Treat the output as one data point and cross-check it against the cohort curve from the previous query. When the two disagree by more than 20 percent, trust the cohort curve and find which assumption snapped.
E-commerce LTV formula
The classical decomposition. Average order value times monthly purchase frequency times expected lifespan in months.
WITH user_stats AS (
SELECT
user_id,
AVG(total) AS aov,
(EXTRACT(YEAR FROM AGE(MAX(created_at), MIN(created_at))) * 12
+ EXTRACT(MONTH FROM AGE(MAX(created_at), MIN(created_at))))::NUMERIC AS lifespan_months,
COUNT(*) / NULLIF(
EXTRACT(YEAR FROM AGE(MAX(created_at), MIN(created_at))) * 12
+ EXTRACT(MONTH FROM AGE(MAX(created_at), MIN(created_at))),
0) AS orders_per_month
FROM orders
WHERE status = 'paid'
GROUP BY user_id
HAVING COUNT(*) > 1
)
SELECT
AVG(aov) AS avg_order_value,
AVG(orders_per_month) AS avg_monthly_frequency,
AVG(lifespan_months) AS avg_lifespan_months,
AVG(aov * orders_per_month * lifespan_months) AS ltv
FROM user_stats;The decomposition is elegant but the average of products is highly sensitive to outliers — a small group of old repeat buyers can pull the mean well above what a typical customer is worth. Report median alongside mean, and winsorize the top 1 percent if your distribution has a heavy tail.
Simple predictive LTV
New customers have not finished their lifecycle, so historical LTV underestimates them. The easiest forward-looking estimate is to anchor on a mature cohort whose lifecycle has effectively closed.
WITH old_cohort AS (
SELECT user_id
FROM users
WHERE created_at < NOW() - INTERVAL '12 months'
),
user_revenue AS (
SELECT user_id, SUM(total) AS total_revenue
FROM orders
WHERE status = 'paid'
GROUP BY user_id
)
SELECT
AVG(ur.total_revenue) AS avg_ltv_old_cohort
FROM old_cohort oc
JOIN user_revenue ur USING (user_id);A rough estimate that suits sanity checks and serves as a baseline for ML models. Serious predictive LTV is built separately — survival analysis, BG/NBD plus Gamma-Gamma probabilistic models, or supervised regression on engagement features. Production teams end up there once the simple anchor stops tracking reality.
Margin-aware LTV
CAC should be compared to profit, not revenue. Most growth teams skip this step and end up overpaying for low-margin segments.
SELECT
AVG(ltv * 0.70) AS ltv_with_margin
FROM (
SELECT user_id, SUM(total) AS ltv
FROM orders
WHERE status = 'paid'
GROUP BY user_id
) t;Substitute the right gross margin for your business. If margin varies across categories — a marketplace selling physical goods and services, say — push margin down to the order line level and aggregate after. A flat 70 percent applied to mixed categories systematically misleads category-level decisions.
LTV to CAC by channel
The headline unit-economics metric. Whether the current marketing mix is creating or destroying enterprise value.
WITH user_ltv AS (
SELECT user_id, SUM(total) * 0.70 AS ltv
FROM orders
WHERE status = 'paid'
GROUP BY user_id
),
user_cac AS (
SELECT user_id, spend AS cac
FROM marketing_attribution
)
SELECT
AVG(ul.ltv) AS avg_ltv,
AVG(uc.cac) AS avg_cac,
AVG(ul.ltv) / NULLIF(AVG(uc.cac), 0) AS ltv_cac_ratio
FROM user_ltv ul
JOIN user_cac uc USING (user_id);Benchmarks. A ratio near 3:1 is generally considered healthy unit economics. 5:1 or higher is often underinvestment — the business could acquire more customers profitably and is choosing not to. Below 2:1, acquisition is too expensive relative to customer value; CAC needs to drop or LTV needs to expand before the channel earns more budget.
Common pitfalls
Including cancelled and refunded orders is the most common first-time mistake. A naive SUM(total) picks up rows the company never collected on, inflating LTV by 5 to 30 percent. Fix with WHERE status = 'paid' or, when refunds are tracked separately, SUM(total - refund_amount). Reconcile against finance before shipping the number.
Confusing revenue with margin is the second trap, usually invisible until someone challenges the LTV/CAC ratio. Revenue is not profit. A 1,000 dollar LTV at 20 percent gross margin is 200 dollars of profit, which is what you compare to a 150 dollar CAC. Teams that compare revenue LTV directly to CAC routinely call 3:1 ratios healthy when after delivery costs the channel is losing money.
Reporting LTV of "all customers" without distinguishing it from "active customers" causes constant cross-team confusion. The all-customers number includes one-and-done buyers and is the honest pessimistic estimate. The active-customers number flatters the metric by excluding people who never stuck. Both have a place, but the label must be explicit — or executives will quote one and budget against the other.
Using historical LTV when the question is a forecast leads to chronic underinvestment. Historical LTV describes what already happened; for next quarter's budget you need the cohort-curve view applied to mature cohorts, or a predictive model on top. Use historical for finance reporting, cohort projections for planning, and be explicit about which one you are showing.
Reporting the mean when the distribution has a heavy right tail is the last trap. LTV distributions are almost always right-skewed — the top 1 percent can generate 30 to 40 percent of revenue. Report both mean and median; when they differ by more than 30 percent the median is usually safer and the gap itself is worth escalating.
Optimization tips
These queries aggregate over the entire orders history, which gets expensive on multi-billion-row warehouses. First move: materialize cohort assignment once. Build a user_cohorts table with user_id and cohort_month, refresh nightly, and join against it everywhere instead of recomputing DATE_TRUNC('month', MIN(created_at)) from scratch. On Snowflake or BigQuery this cuts query time by an order of magnitude.
For the cumulative LTV curve, partition orders by month and cluster by user_id if your warehouse supports it. The window function then reads only relevant slices. On Databricks and Snowflake, clustered versus unclustered on this query is typically a 5x to 20x difference at scale.
Avoid recomputing margin at query time when it is product-specific. Persist margin as a column on order lines at ETL time, then sum line-level margin into customer-level margin in one pass. A CASE WHEN category IN (...) ladder is slow, brittle, and breaks every time finance reclassifies a product family.
When volume is too large for a full scan, sample. A stratified 10 percent sample by cohort and channel produces LTV estimates within a few percent of the full population for reporting. Save the full scan for finance close.
Related reading
- How to calculate CAC in SQL
- How to calculate churn in SQL
- How to calculate cohort retention in SQL
- How to calculate cohort revenue in SQL
- How to calculate ARPU in SQL
- How to calculate contribution margin in SQL
- SQL window functions interview questions
If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.
FAQ
Should I use historical or predictive LTV?
For reporting on what already happened, historical LTV is correct because it is defensible and free of modeling assumptions. For budget decisions, you need a forward-looking view — a cohort projection anchored on mature cohorts, or a predictive model on engagement features. Mature teams maintain both and label every chart.
Should LTV include only paid and fulfilled orders?
Yes. Pending, cancelled, or refunded orders inflate LTV in ways that only show up when finance reconciles against the ledger. Filter to status equal to paid, or net refunds explicitly, and validate against the income statement before shipping the metric.
What currency should LTV be calculated in?
For comparisons against CAC the currencies must match — spend and revenue must be in the same unit before the ratio means anything. For multi-region businesses, normalize to USD using monthly average exchange rates from a single source of truth. Switching units mid-series destroys the trend signal.
How should I handle discounts?
Calculate net revenue after discounts and before margin. Many teams keep two parallel series — gross LTV and net LTV — because the gap quantifies how much discounting is propping up the topline. If gross-to-net widens without a matching lift in volume, discounts are subsidizing demand the product cannot generate on its own.
Do I need to discount future LTV cash flows?
On horizons under a year, generally no — the time value of money is small enough relative to measurement noise that a discount factor adds more uncertainty than it resolves. On multi-year subscription horizons it matters. Discount projected future revenue to present value using the company's weighted average cost of capital, then compare against CAC.
What is the difference between LTV and CLV?
LTV (lifetime value) and CLV (customer lifetime value) are the same metric — different teams use different acronyms. Some organizations reserve "predicted LTV" for the modeled forward-looking version and "historical LTV" for the actual-to-date calculation. The acronym matters less than agreeing on the exact formula across teams that consume the number.