How to calculate Cohort Revenue in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why Cohort Revenue matters

It is Tuesday evening and your CFO pings: "Customer retention is 70 percent and the board deck says we are bleeding. But ARR keeps climbing — what is going on?" The answer almost always lives in Cohort Revenue. Customer retention counts heads; Cohort Revenue counts dollars. A SaaS business with 70 percent logo retention and 130 percent cohort revenue retention is the textbook negative-net-churn story — the customers who stay are paying more over time, and they more than offset the ones who leave. That ratio is what makes Stripe, Notion, Snowflake, and Databricks revenue engines look so different from their churn dashboards.

Cohort Revenue is the total revenue attributable to users acquired in month m and recognized in period p. It is the metric you need before you can talk about NRR, expansion, contraction, or LTV with any rigor. The interview version of this question shows up at Snowflake, Databricks, Stripe, Notion, and Linear, usually as a take-home or a sixty-minute pair-programming round. This post walks the SQL end to end: the base aggregation, the retention curve overlay, the cohort-by-offset pivot a PM can put on a slide, the pitfalls that bite first-time authors, and the optimizations that keep the query under ten seconds on a billion-row transactions table.

The SQL formula

In plain English: Cohort Revenue for acquisition month m recognized in period p is the sum of successful transaction amounts from users whose signup month equals m and whose payment timestamp falls inside p. Three columns drive the calculation — user_id, signup_date, and transactions.paid_at — and one filter excludes failed or refunded charges.

The trap most analysts hit is treating Cohort Revenue as "monthly revenue grouped by something". The cohort identity is fixed at acquisition; a user who signed up in January is forever a member of the 2026-01 cohort regardless of when they pay. That is the whole point — it lets you compare the long-term value of January acquisitions against March acquisitions on equal footing.

Base calculation

Assume two tables. users(user_id, signup_date) has one row per acquired user. transactions(user_id, amount, paid_at, status) has one row per charge attempt with a status flag. The base query assigns every user to their cohort month, joins to transactions, and rolls up.

WITH cohort AS (
    SELECT
        user_id,
        DATE_TRUNC('month', signup_date) AS cohort_month
    FROM users
),
period_revenue AS (
    SELECT
        c.cohort_month,
        DATE_TRUNC('month', t.paid_at) AS period,
        SUM(t.amount) AS revenue
    FROM cohort c
    JOIN transactions t ON t.user_id = c.user_id
    WHERE t.status = 'success'
    GROUP BY c.cohort_month, DATE_TRUNC('month', t.paid_at)
)
SELECT
    cohort_month,
    period,
    EXTRACT(YEAR FROM AGE(period, cohort_month)) * 12
        + EXTRACT(MONTH FROM AGE(period, cohort_month)) AS month_offset,
    revenue
FROM period_revenue
ORDER BY cohort_month, period;

Read it top to bottom. The cohort CTE tags every user with one immutable acquisition month. The period_revenue CTE joins charges, filters out the noise (status = 'success' is the minimum bar — most real schemas need a NOT IN ('refunded', 'chargeback') exclusion too), and aggregates to one row per (cohort_month, period) pair. The outer SELECT adds month_offset so cohorts can be aligned on age rather than calendar time.

The EXTRACT(YEAR FROM AGE(...)) * 12 + EXTRACT(MONTH FROM AGE(...)) pattern is the safest cross-database arithmetic for "months between two dates". It works on Postgres and Snowflake. On BigQuery substitute DATE_DIFF(period, cohort_month, MONTH); on Redshift use DATEDIFF(month, cohort_month, period). Memorize all three — interviewers will specify the dialect.

Revenue retention curves

The single most useful overlay on Cohort Revenue is the percentage curve — each period's revenue expressed as a share of the cohort's month-zero revenue. A curve above 100 percent at M6 means expansion is beating churn. A curve falling toward zero means the cohort is bleeding out.

WITH cohort_initial AS (
    SELECT
        c.cohort_month,
        SUM(t.amount) FILTER (WHERE DATE_TRUNC('month', t.paid_at) = c.cohort_month) AS m0_revenue
    FROM (
        SELECT user_id, DATE_TRUNC('month', signup_date) AS cohort_month FROM users
    ) c
    JOIN transactions t ON t.user_id = c.user_id
    WHERE t.status = 'success'
    GROUP BY c.cohort_month
),
period_revenue AS (
    SELECT
        c.cohort_month,
        DATE_TRUNC('month', t.paid_at) AS period,
        SUM(t.amount) AS revenue
    FROM (
        SELECT user_id, DATE_TRUNC('month', signup_date) AS cohort_month FROM users
    ) c
    JOIN transactions t ON t.user_id = c.user_id
    WHERE t.status = 'success'
    GROUP BY c.cohort_month, DATE_TRUNC('month', t.paid_at)
)
SELECT
    p.cohort_month,
    p.period,
    EXTRACT(YEAR FROM AGE(p.period, p.cohort_month)) * 12
        + EXTRACT(MONTH FROM AGE(p.period, p.cohort_month)) AS m_offset,
    p.revenue,
    ROUND(p.revenue * 100.0 / NULLIF(ci.m0_revenue, 0), 2) AS pct_of_initial
FROM period_revenue p
LEFT JOIN cohort_initial ci ON ci.cohort_month = p.cohort_month
ORDER BY p.cohort_month, p.period;

The FILTER (WHERE ...) clause on the aggregate is a Postgres and Snowflake feature; on BigQuery rewrite as SUM(CASE WHEN ... THEN amount END). The NULLIF on the denominator is critical — if a cohort had zero month-zero revenue for any reason (free trials, delayed billing), the divide-by-zero will blow up the query and you will spend the morning debugging instead of presenting.

A pct_of_initial greater than 100 at M6 is the negative-net-churn signal a SaaS board wants to see. A flat line near 100 means stable revenue from a shrinking customer base. A curve dropping to 40 by M3 means heavy churn, contraction, or a free-trial-heavy cohort whose month-zero revenue was inflated by a one-time fee. Distinguish those three before taking the chart to a meeting.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Cohort by month-offset pivot

The retention curve is great for engineers; PMs and growth leads want a pivot. One row per cohort, columns for M0, M1, M3, M6, M12. It fits on a slide and the eye can scan twenty cohorts in three seconds.

WITH cohorts AS (
    SELECT
        DATE_TRUNC('month', u.signup_date) AS cohort_month,
        (EXTRACT(YEAR FROM AGE(DATE_TRUNC('month', t.paid_at), DATE_TRUNC('month', u.signup_date))) * 12
         + EXTRACT(MONTH FROM AGE(DATE_TRUNC('month', t.paid_at), DATE_TRUNC('month', u.signup_date))))::INT AS m_offset,
        SUM(t.amount) AS revenue
    FROM users u
    JOIN transactions t ON t.user_id = u.user_id AND t.status = 'success'
    GROUP BY 1, 2
)
SELECT
    cohort_month,
    SUM(CASE WHEN m_offset = 0 THEN revenue END) AS m0,
    SUM(CASE WHEN m_offset = 1 THEN revenue END) AS m1,
    SUM(CASE WHEN m_offset = 3 THEN revenue END) AS m3,
    SUM(CASE WHEN m_offset = 6 THEN revenue END) AS m6,
    SUM(CASE WHEN m_offset = 12 THEN revenue END) AS m12
FROM cohorts
GROUP BY cohort_month
ORDER BY cohort_month;

The SUM(CASE WHEN ...) pattern is the canonical SQL pivot. Each (cohort_month, m_offset) pair is unique after the inner GROUP BY, so SUM returns that single value. A MAX would work identically; SUM is more honest about what is happening, and it survives the edge case where you forget to deduplicate upstream.

Cohorts that have not aged enough will show NULL in the right columns — the 2026-04 cohort cannot have an M12 in May 2026. Render those cells as an em-dash on the slide, not as zero. A zero looks like a dead cohort; a dash signals "not yet measurable" and prevents misreads in front of leadership.

Common pitfalls

The most damaging pitfall is conflating Cohort Revenue with monthly revenue. Cohort Revenue is sliced by acquisition month, not by spend month. If you group only by DATE_TRUNC('month', paid_at), you produce a perfectly good monthly revenue chart with no cohort information at all — and that chart will look fine right up until the moment leadership asks why the January cohort is collapsing. Lock the cohort identity at the cohort CTE and never let it drift downstream.

The second trap is the month-offset arithmetic. EXTRACT(MONTH FROM AGE(...)) alone returns a value between zero and eleven; without the EXTRACT(YEAR FROM AGE(...)) * 12 prefix, a December-to-February gap reports as two months instead of fourteen. The bug is invisible until a cohort ages past M12, then every annual comparison silently rolls over to zero. Always include the year multiplier and unit-test against a known pair of dates.

The third pitfall is partial cohorts on a comparison chart. The 2026-02 cohort cannot have an M12 figure when you run the query in May 2026 — only three full months have passed. If you let it through, the slide will show 2026-02 as a flat zero at M12 while older cohorts show real numbers, and the audience will conclude that the February cohort is dead. The fix is a WHERE period <= CURRENT_DATE - INTERVAL '12 months' filter when reporting M12, or a minimum-age filter on the cohort itself.

The fourth pitfall is ARPU per cohort versus total revenue per cohort. Total revenue per cohort confounds revenue growth with cohort size — a fat January acquisition campaign produces a bigger January line for purely mechanical reasons. ARPU per cohort divides out that effect. Pair the two on the same dashboard so a reviewer can tell whether revenue is up because more humans paid or the same humans paid more.

The fifth pitfall is hidden expansion versus delayed activation. Cohort revenue at M6 can grow for two very different reasons. Existing paying customers may upgrade — true expansion, the story you want. Or non-paying signups may finally convert — delayed activation, which is also good but is not expansion. Decompose by joining to a flag table that marks each user's first paid month: if first-paid-month equals cohort-month, it is expansion; if later, it is activation. Surface both.

Optimization tips

On a billion-row transactions table the join in period_revenue is the bottleneck. The first lever is partitioning on paid_at — Snowflake and BigQuery partition natively, and on Postgres declare monthly partitions explicitly. A query for "revenue paid in 2026-03" then reads one partition instead of the full table — on real workloads the difference between six seconds and six minutes.

The second lever is a pre-aggregated monthly_user_revenue table refreshed by a dbt model overnight, shaped (user_id, payment_month, revenue, transactions_count). Cohort Revenue becomes a small join between the cohort assignment table and this pre-aggregate, typically two to three orders of magnitude smaller than the raw transaction stream and reusable for ARPU, NRR, and LTV queries.

The third lever is materializing the cohort-by-offset pivot itself. If the same dashboard refreshes hundreds of times a day, push the pivot into a materialized view that rebuilds nightly. The trade-off is freshness — confirm with the dashboard owner that "yesterday's close" is acceptable.

If you want to drill cohort SQL daily, NAILDD is launching with 500+ SQL problems on this pattern.

FAQ

What is the difference between Cohort Revenue and NRR?

Net Revenue Retention is a ratio — the cohort's revenue in a later period divided by the cohort's revenue in its initial period, usually expressed as a percentage. Cohort Revenue is the absolute dollar amount in either period. You derive NRR by dividing Cohort Revenue at M12 by Cohort Revenue at M0, which is exactly what pct_of_initial computes in the retention-curve query. Both numbers live on the same dashboard because executives want the absolute dollar story and the trajectory story side by side.

Cohort Revenue is rising month over month — is that good news?

Not necessarily. Cohort Revenue scales mechanically with cohort size, so if you acquired twice as many users in March, the M1 row for that cohort will look bigger even if the product got worse. Pair the absolute number with pct_of_initial or ARPU per cohort. If both the absolute and the per-user numbers rise, the revenue engine is genuinely healthier. If only the absolute number rises, you are spending more on acquisition for the same per-user economics.

How should I pick the cohort window?

For SaaS with annual contracts, twenty-four months is the floor — renewal behavior is the whole story, and you cannot see it before month twelve. For monthly SaaS, twelve months is usually enough. For consumer apps with weekly engagement, six months stabilizes most curves. Reporting before maturity forces you to walk numbers back to leadership later, which is far more expensive than a short delay at the start.

Cohort Revenue is falling — what is the next query?

Decompose into the three drivers. First, count of paying users per cohort — has the paying fraction dropped? Second, ARPU per cohort — has spend per paying user dropped? Third, churn rate — has the survival curve steepened? The drop almost always lives in one of the three, and the fix is different for each. A churn problem calls for retention work; an ARPU problem calls for pricing or packaging; a paying-user-count problem calls for activation.

How does Cohort Revenue differ between e-commerce and SaaS?

In e-commerce the story is repeat purchases — Cohort Revenue at M6 measures whether the customers acquired in M0 came back to buy again. In SaaS it is a blend of subscription longevity and upsell — M6 measures whether the cohort is still paying their subscription and whether they have upgraded plan tiers. The SQL is the same; the interpretation diverges sharply. An e-commerce cohort that hits 30 percent at M6 is healthy. A SaaS cohort at 30 percent is a disaster.

Should I include refunds and chargebacks?

Yes, with a sign flip. A refund issued in May for a January charge is May revenue with a negative amount, charged against the January cohort. Exclude refunds entirely and you overstate Cohort Revenue — a finance audit will catch it. The clean pattern is SUM(CASE WHEN status = 'success' THEN amount WHEN status IN ('refunded', 'chargeback') THEN -amount ELSE 0 END). Document the rule and reuse it across every revenue query.