How to calculate last-touch attribution in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

What last-touch attribution actually answers

Last-touch attribution credits every conversion to the very last marketing touchpoint a user had before they bought. When the paid acquisition lead at Stripe asks "which campaigns drove revenue yesterday", that question is a last-touch question, and it has a different answer than the first-touch query the content team runs on Mondays. Whichever channel sits closest to the conversion takes the credit; everything earlier is background noise.

Last-touch dominates ad platforms because each platform can only see the click it sourced. Google Ads, Meta Ads Manager, and Apple Search Ads all report last-touch by default, and analysts inherit a picture where retargeting display always looks like a rockstar while the podcast partnership that drove the brand search gets zero. Last-touch fits short transactional journeys where the closing channel really is the deciding factor; it is wrong on its own for long B2B journeys where most of the work happens months before the buy button.

Every query below runs on Postgres with one-line tweaks for Snowflake, BigQuery, or Redshift. The schema: touchpoints(user_id, channel, touched_at) and conversions(user_id, amount, converted_at).

The base last-touch query

For each conversion, pick the most recent touchpoint that happened before that conversion, then aggregate revenue and conversion counts by channel. The "before that conversion" constraint is the load-bearing line — without it the join silently includes touchpoints recorded after the user converted (backfills, late CRM events, server-side enrichment) and the dashboard quietly overcounts every channel the pipeline touches after the sale. A correlated subquery against the touchpoints table per conversion is readable but does not scale; the window-function form below runs in seconds and is what most teams deploy.

WITH ranked AS (
    SELECT
        c.user_id,
        c.amount,
        c.converted_at,
        t.channel,
        t.touched_at,
        ROW_NUMBER() OVER (
            PARTITION BY c.user_id, c.converted_at
            ORDER BY t.touched_at DESC
        ) AS rn
    FROM conversions c
    JOIN touchpoints t
      ON t.user_id = c.user_id
     AND t.touched_at <= c.converted_at
)
SELECT
    channel        AS last_channel,
    COUNT(*)       AS conversions,
    SUM(amount)    AS attributed_revenue
FROM ranked
WHERE rn = 1
GROUP BY channel
ORDER BY attributed_revenue DESC;

Three details earn their place. The partition is on (user_id, converted_at), not just user_id, so a customer with multiple conversions gets the right last touch for each one. ORDER BY touched_at DESC picks the most recent touchpoint per partition. And the inner join already enforces touched_at <= converted_at, so the outer rn = 1 filter is safe without a second guard.

An inner join silently drops users who converted with nothing in the touchpoint table. Replace it with a left join from conversions and bucket the resulting NULL channel as untracked. If the untracked bucket is half the revenue, the attribution table is not ready to inform budget decisions.

Revenue and conversions by last-touch channel

The query above gives the all-time view. Most reports want a bounded window — thirty to ninety days for paid channels. Filter touchpoints to the window of interest and scope conversions to the same window so an old touchpoint cannot become a last touch for a recent conversion.

WITH ranked AS (
    SELECT
        c.user_id,
        c.amount,
        t.channel,
        ROW_NUMBER() OVER (
            PARTITION BY c.user_id, c.converted_at
            ORDER BY t.touched_at DESC
        ) AS rn
    FROM conversions c
    LEFT JOIN touchpoints t
      ON t.user_id = c.user_id
     AND t.touched_at <= c.converted_at
     AND t.touched_at >= c.converted_at - INTERVAL '30 days'
    WHERE c.converted_at >= CURRENT_DATE - INTERVAL '90 days'
)
SELECT
    COALESCE(channel, 'untracked') AS last_channel,
    COUNT(*)                       AS conversions,
    SUM(amount)                    AS revenue
FROM ranked
WHERE rn = 1
GROUP BY COALESCE(channel, 'untracked')
ORDER BY revenue DESC;

The thirty-day join lookback keeps ancient touches from claiming credit for a recent conversion, and the ninety-day filter on converted_at scopes the report to the reporting period. COALESCE surfaces the untracked bucket as a real row rather than a null that gets quietly dropped.

A practical extension is to add a conversion_month dimension and trend the channel mix. A six-month rolling view of last-touch share by channel reveals when retargeting is grabbing credit it does not deserve — a sudden spike usually means upper-funnel campaigns started working, not that retargeting got better.

Last-touch versus first-touch in one query

Growth teams report both models side by side. First-touch identifies the channel that opened the door and last-touch identifies the channel that closed the sale, and the spread between them is itself useful. High on first-touch and low on last-touch is a discovery channel; high on last-touch and low on first-touch is a closing channel. The pair is the smallest honest attribution dashboard and what most senior analysts ship before any multi-touch model.

WITH paths AS (
    SELECT
        c.user_id,
        c.converted_at,
        c.amount,
        FIRST_VALUE(t.channel) OVER (
            PARTITION BY c.user_id, c.converted_at
            ORDER BY t.touched_at ASC
        ) AS first_channel,
        FIRST_VALUE(t.channel) OVER (
            PARTITION BY c.user_id, c.converted_at
            ORDER BY t.touched_at DESC
        ) AS last_channel,
        ROW_NUMBER() OVER (
            PARTITION BY c.user_id, c.converted_at
            ORDER BY t.touched_at DESC
        ) AS rn
    FROM conversions c
    JOIN touchpoints t
      ON t.user_id = c.user_id
     AND t.touched_at <= c.converted_at
)
SELECT
    first_channel,
    last_channel,
    COUNT(*)      AS conversions,
    SUM(amount)   AS revenue
FROM paths
WHERE rn = 1
GROUP BY first_channel, last_channel
ORDER BY revenue DESC
LIMIT 50;

FIRST_VALUE with two orderings is the trick. The ascending window returns the earliest channel; the descending one returns the latest. The outer rn = 1 collapses to one row per conversion. On Snowflake or BigQuery a QUALIFY clause replaces the outer filter, but the CTE form runs on every warehouse.

The rows that matter are the pairs where first and last disagree by a wide margin. Organic search to retargeting is a discovery-to-close path. Podcast to direct is a brand-to-intent path. Paid social to email is a paid-acquisition-to-lifecycle path. Each pair tells a story about how the funnel actually works, and budget decisions get better when that story is on the chart rather than in the analyst's head.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Common pitfalls

The first pitfall is forgetting the chronological filter on the join. A last-touch query that joins on user_id without enforcing touched_at <= converted_at silently includes touchpoints recorded after the conversion — backfills, late events, CRM enrichment — and inflates revenue on every channel touched after the sale. The guard is one line of SQL and prevents a class of dashboard bugs that only show up in audits.

The second pitfall is how to count direct traffic. A user who lands by typing the URL or clicking a bookmark shows up as direct, and the question is whether direct represents brand awareness from earlier channels or a genuinely unattributed visit. The pragmatic answer is to look back one more step — if the touchpoint before direct is within seven days, attribute the conversion to that earlier channel. Notion, Linear, and several growth-led B2B teams roll direct into the prior non-direct touch because direct otherwise eats credit that branded search created.

The third pitfall is an unbounded lookback window. A touchpoint from two years ago that finally converts today is technically a last touch under a naive query but not a useful one for budget decisions. B2C usually caps the lookback at seven to thirty days; B2B with long sales cycles uses thirty to ninety. Without a cap, the dashboard credits channels that haven't run in months for revenue closing this week.

The fourth pitfall is multi-conversion users without a per-conversion partition. A power buyer at DoorDash with five orders has five conversions and five potentially different last touches, and partitioning by user_id alone collapses them into one row. Partition by (user_id, converted_at) so each conversion gets its own last touch.

The fifth pitfall is same-day touch ambiguity. A user who clicks a paid Google ad in the morning, gets a retargeting impression in the afternoon, and converts an hour later has two touchpoints with the same calendar date. ORDER BY touched_at DESC uses the full timestamp — but only if the pipeline captures sub-second precision in a single time zone. Otherwise ties resolve arbitrarily and channel ranking drifts across query runs. Cast every timestamp to UTC at ingest.

Optimization tips

Last-touch queries scan the entire touchpoints table, and cost is dominated by its size. The biggest win is to partition touchpoints by touched_at — daily partitions in Postgres and BigQuery, date clustering in Snowflake, sort key on the timestamp in Redshift. Every reasonable last-touch report is bounded by a lookback window, so partition pruning takes a billion-row scan down to seven or thirty or ninety partitions.

The second optimization is to materialize a per-conversion last-touch table on a schedule. The last touch for a converted user is set at conversion and only moves if a late event arrives, so recomputing the window function on every dashboard refresh is wasteful. A nightly batch produces a conversion_last_touch table keyed on (user_id, converted_at) with last_channel and last_touched_at. Downstream dashboards join this thin table and skip the window function — on a billion-row warehouse this takes a channel dashboard from forty seconds to under two.

The third optimization is a narrow join type. user_id should be an integer or fixed-length hash, not a varchar — string joins cost roughly ten times what integer joins cost on warehouse hardware. Hash string IDs to a BIGINT at ingest.

The fourth optimization is to push the lookback filter into the join clause, not the outer WHERE. Warehouses prune partitions based on join predicates more aggressively than outer filters, and writing AND t.touched_at >= c.converted_at - INTERVAL '30 days' inside the ON clause routinely cuts query time by a third on Snowflake and BigQuery.

If you want to drill attribution and growth SQL questions like this every day, NAILDD is launching with hundreds of analytics SQL problems built around exactly this pattern.

FAQ

When should I use last-touch instead of first-touch?

Use last-touch when the question is "which channel closed the sale" and first-touch when the question is "which channel discovered the customer". Last-touch is the right model for the closing layer — retargeting, branded search, lifecycle email — that earns its keep by converting people who already know the product. It is wrong on its own for evaluating top-of-funnel investments like content, SEO, or podcasts, because the closing channel will always look like the hero. The honest growth dashboard reports both and labels the model on every chart.

What lookback window should I use for last-touch?

Seven days for B2C apps with short transactional journeys is a defensible default and matches what most ad platforms use out of the box. Thirty days fits higher-consideration purchases like travel, electronics, or financial products. Thirty to ninety days fits B2B SaaS where the closing meeting happens weeks after the demo. Measure the time-from-last-touch-to-conversion distribution in your own data, pick the window that covers the ninetieth percentile, and disclose it on the dashboard footer.

How should I handle direct traffic in last-touch?

Decide upfront whether direct is a real channel or a stand-in for prior brand awareness, and document the rule. The standard approach for brand-driven products is to look back one step — if the touch before direct was within seven days and was a paid or earned channel, roll the credit back to that channel. For pure transactional products, leave direct as a real bucket. Pick one rule, write it into a CASE expression, and keep it stable across reporting periods.

Last-touch versus multi-touch attribution?

Multi-touch distributes credit across every touchpoint using a rule — linear gives equal weight, time-decay weights recent touches heavier, U-shaped weights first and last more than the middle, data-driven models learn weights from history. In principle multi-touch is more accurate; in practice it depends on data quality. Every model assumes the touchpoint stream is complete, which it rarely is on mobile after the ATT prompt and is never fully true cross-device. Start with first-touch and last-touch as baselines, add linear multi-touch when the deterministic match rate is above eighty percent, and skip data-driven attribution until the data layer is ready.

Why does retargeting look so strong under last-touch?

Because last-touch credits the channel closest to the conversion, and retargeting is by construction designed to be that channel — its audience is users who already visited the site, so an earlier touch did the real work. Reporting retargeting alongside first-touch usually deflates its share by half or more, and that combined view is what the channel actually contributes. Cutting retargeting spend by twenty percent and watching whether revenue moves is a cheaper test than rebuilding the attribution stack.

Do ad platform numbers match my last-touch SQL?

They will not match exactly, and the gap is usually not a bug. Google, Meta, and Apple each report last-touch based on the click their own platform sourced, so each one credits itself for any conversion it can plausibly claim. Summed together, platform-reported conversions routinely exceed actual conversions by twenty to forty percent because the same conversion gets claimed by two or three platforms. The warehouse last-touch query is the single source of truth — platform dashboards are pacing tools, not accounting.