How to calculate install attribution in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

What install attribution actually answers

Install attribution is the SQL that connects an ad click on the open web to an app install on a phone and credits a specific campaign. Every paid acquisition team at Uber, DoorDash, Airbnb, and Meta runs this query daily because the number closes the loop between spend and outcome. When the growth lead at Snowflake asks "what is our cost per install on Google Ads versus TikTok", the answer comes out of an attribution table built with SQL like the recipes below.

Where new mobile analysts get burned is treating attribution as a single fact stored in the events table. It is not. Attribution is a join between two streams the mobile OS deliberately separates — clicks on the web, installs on the device — with a fragile join key, a debatable window, and a consequential model assumption. iOS 14.5 and the ATT prompt removed the deterministic device identifier from most installs, which is why probabilistic fingerprint matching exists. Every query runs on Postgres with one-line tweaks for Snowflake, BigQuery, or Redshift.

Picking an attribution model

The first decision is which model the query implements. Last-touch credits the install to the most recent click before install. First-touch credits the first click in the window. Multi-touch distributes credit linearly or by a decay curve. Position-based gives forty percent each to first and last and twenty percent across the middle. Each model produces a different ranking, and meaningful budget moves whenever a team changes models.

For a mobile install pipeline the default is last-touch click attribution inside a twenty-four to forty-eight hour window. The click is the strongest signal of intent immediately before the install, and most installs happen within sixty minutes of the click. Last-touch overcredits the bottom of the funnel — teams that care about upper-funnel exposure add view-through alongside. The strong interview answer is to name the limitation, not pretend it does not exist.

Multi-touch is more accurate in principle, less robust in practice — it assumes the entire click history is observable, which is exactly what iOS made impossible. Most consumer teams ship last-touch, treat SKAdNetwork as truth for iOS, and build multi-touch only when the warehouse has enough deterministic web-to-app pairings to learn from.

The base last-touch attribution query

Once the model is chosen, the calculation is a join. Take recent installs, find every click for the same device inside the click window, rank clicks by recency, and pick the last one. Installs with no matching click fall through as organic.

WITH installs AS (
    SELECT user_id, install_at, device_id, ip, user_agent
    FROM app_installs
    WHERE install_at >= CURRENT_DATE - INTERVAL '30 days'
),
clicks AS (
    SELECT
        device_id,
        click_id,
        click_at,
        campaign,
        utm_source,
        utm_medium
    FROM ad_clicks
    WHERE click_at >= CURRENT_DATE - INTERVAL '37 days'
),
matched AS (
    SELECT
        i.user_id,
        i.install_at,
        c.click_id,
        c.campaign,
        c.utm_source,
        c.utm_medium,
        c.click_at,
        ROW_NUMBER() OVER (
            PARTITION BY i.user_id
            ORDER BY c.click_at DESC
        ) AS rn
    FROM installs i
    LEFT JOIN clicks c
      ON c.device_id = i.device_id
     AND c.click_at <= i.install_at
     AND c.click_at >= i.install_at - INTERVAL '7 days'
)
SELECT
    user_id,
    install_at,
    COALESCE(campaign, 'organic')      AS attributed_campaign,
    COALESCE(utm_source, 'organic')    AS attributed_source,
    COALESCE(utm_medium, 'organic')    AS attributed_medium
FROM matched
WHERE rn = 1 OR rn IS NULL;

Four details earn their place. The LEFT JOIN keeps installs with no matching click — those become organic in the COALESCE — and an INNER JOIN would drop the organic share, usually thirty to sixty percent of consumer mobile installs. The ROW_NUMBER plus rn = 1 picks the most recent click. The click window pulls thirty-seven days because seven-day click-to-install plus a thirty-day install window means the earliest relevant click is thirty-seven days back. And WHERE rn = 1 OR rn IS NULL handles matched and unmatched installs without a UNION.

When there is no matching click, the install came from somewhere the click stream cannot see — App Store search, word of mouth, an ATT decline, a missed referrer. "Unknown" would be more accurate, but every dashboard says organic.

Click-to-install window analysis

Before fixing a window, look at the click-to-install distribution. Hyper-casual games convert in minutes. Subscription apps with onboarding friction take hours. B2B apps with employer approval take days.

WITH attributed AS (
    SELECT
        attributed_source,
        EXTRACT(EPOCH FROM (install_at - click_at)) / 3600 AS hours_to_install
    FROM attribution_table
    WHERE click_at IS NOT NULL
)
SELECT
    attributed_source,
    COUNT(*)                                                   AS installs,
    ROUND(AVG(hours_to_install)::numeric, 2)                   AS avg_hours,
    PERCENTILE_CONT(0.5)  WITHIN GROUP (ORDER BY hours_to_install) AS median_hours,
    PERCENTILE_CONT(0.9)  WITHIN GROUP (ORDER BY hours_to_install) AS p90_hours,
    COUNT(*) FILTER (WHERE hours_to_install <= 1)              AS within_1h,
    COUNT(*) FILTER (WHERE hours_to_install <= 24)             AS within_24h,
    COUNT(*) FILTER (WHERE hours_to_install <= 168)            AS within_7d
FROM attributed
GROUP BY attributed_source
ORDER BY installs DESC;

Across a typical consumer mobile app, seventy to eighty percent of attributed installs land inside the first hour after the click and ninety to ninety-five percent inside the first twenty-four hours. A twenty-four hour click window captures almost all of the real signal at a fraction of the noise of a thirty-day window. Retargeting pushes to seven days; prospecting holds at twenty-four hours.

The per-source breakdown matters more than the aggregate. Facebook installs convert faster than TikTok, which converts faster than display. If one channel shows a median click-to-install of forty hours when the rest converge at one, the join is grabbing the wrong click, the channel has a creative quality problem, or the tracker is fingerprint-matching too loosely. Investigate before reporting to finance.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Fingerprint matching when device IDs are missing

The hard case is iOS without a deterministic identifier. The ATT prompt removed IDFA from most installs, and the only signals left are IP address, user agent string, and timestamp. Probabilistic fingerprint matching joins on those three inside a tight window, accepts a measurable false-positive rate, and labels the match as low-confidence so downstream consumers can discount it.

WITH installs AS (
    SELECT user_id, install_at, ip, user_agent
    FROM app_installs
    WHERE install_at >= CURRENT_DATE - INTERVAL '7 days'
      AND device_id IS NULL
),
candidates AS (
    SELECT
        i.user_id,
        i.install_at,
        c.click_id,
        c.campaign,
        c.utm_source,
        ABS(EXTRACT(EPOCH FROM (i.install_at - c.click_at))) AS time_diff_sec
    FROM installs i
    LEFT JOIN ad_clicks c
      ON c.ip                = i.ip
     AND c.user_agent_hash   = i.user_agent_hash
     AND c.click_at BETWEEN i.install_at - INTERVAL '24 hours'
                        AND i.install_at
),
ranked AS (
    SELECT
        user_id,
        install_at,
        click_id,
        campaign,
        utm_source,
        time_diff_sec,
        ROW_NUMBER() OVER (
            PARTITION BY user_id
            ORDER BY time_diff_sec
        ) AS rn
    FROM candidates
)
SELECT
    user_id,
    install_at,
    COALESCE(campaign, 'organic') AS attributed_campaign,
    time_diff_sec                 AS match_age_seconds,
    'probabilistic'               AS match_type
FROM ranked
WHERE rn = 1 OR rn IS NULL;

A QUALIFY clause does the same thing in Snowflake and BigQuery in one line, but the CTE form runs everywhere. The match key uses a hash of the user agent rather than the raw string — user agents vary by minor version and produce noisier joins directly.

What the query does not tell you is the false-positive rate, and that determines whether the channel is reportable. Run a calibration on a backfill where both a deterministic and a probabilistic match are available — Android installs are the natural test set, because Google Play Install Referrer gives a deterministic answer the fingerprint match can be compared against. Sixty to eighty percent precision is typical for IP plus user-agent matching inside a one-hour window. Below sixty percent the channel is not safe to spend against. Above eighty percent the join is reliable enough to use alongside SKAdNetwork.

Common pitfalls

The first pitfall is the window-too-wide trap. A thirty-day click window credits installs to clicks the user no longer remembers, lets retargeting double-count prospecting, and inflates last-touch CPI on every channel that shows ads to engaged users. Measure the actual click-to-install distribution and pick the smallest window that captures the ninetieth percentile of real installs.

The second pitfall is treating view-through and click-through as the same signal. A click is a real action with a deterministic click ID. A view-through is an impression with no interaction, and the join is probabilistic. View-through makes high-impression channels look better than they are and makes cross-channel comparisons meaningless because each network reports view-throughs by its own rules. Keep view-through in a separate column and never sum the two into a single CPI.

The third pitfall is conflating last-paid with last-click. A user who saw a Facebook prospecting ad on Monday, searched the app's brand name on Tuesday, and installed from a branded search ad will be attributed to branded search by a naive last-click query. Branded search captures intent the prospecting ad created. Most teams credit branded search at a discount or filter it out of paid acquisition entirely.

The fourth pitfall is ignoring SKAdNetwork on iOS. SKAN is the OS-level attribution channel Apple introduced with iOS 14.5 and the only deterministic signal for iOS installs without ATT consent. Teams that rely solely on click-based attribution on iOS see organic share north of seventy percent because the deterministic join key is gone. A modern iOS pipeline blends SKAN with fingerprint matching and reports the combined number with a confidence label.

The fifth pitfall is timezone drift. The click stream arrives from the ad server in UTC. The install timestamp arrives from the mobile SDK and might be in the device's local timezone, the server's, or UTC depending on the integration. When the two streams use different bases, the click-to-install delta is sometimes negative and the join silently drops those installs. Cast every timestamp into UTC at the bronze layer.

Optimization tips

Attribution queries join two large, time-bounded streams, and cost is dominated by the click side. Partition the ad_clicks table by click_at — daily partitions in Postgres and BigQuery, date clustering in Snowflake. Every attribution query is bounded by the install plus click window, so partition pruning collapses a billion-row scan to two or three days.

The second optimization is to materialize the attribution table itself. Recomputing last-touch on every dashboard request is the single most common warehouse waste in mobile growth pipelines. A nightly batch produces a user_install_attribution table keyed on user_id with one row per install — install_at, attributed_campaign, attributed_source, match_type, match_age_seconds, confidence_score — and every dashboard groups against that thin table. A Snowflake task, a BigQuery scheduled query, or a dbt incremental model keyed on install_at is the right home. On a hundred-million-install warehouse this takes a heavy CPI dashboard from twenty seconds to under one.

If you want to drill SQL questions like this daily, NAILDD is launching with hundreds of analytics SQL problems built around this pattern.

FAQ

What is the difference between click attribution and view-through attribution?

Click attribution credits an install to a click the user actually performed, and the join uses a deterministic click ID. View-through credits an install to an impression with no interaction, and the join is probabilistic — usually device ID plus a short time window. Click attribution is the stronger signal and what most analytics interviews expect by default. View-through is reported alongside to capture upper-funnel exposure, but the two should never be summed because the methodologies are not comparable. Keep view-through in a separate column and remove it from cross-channel rankings.

How should I think about iOS attribution after ATT?

The deterministic IDFA is gone for most iOS users since iOS 14.5, so the device-ID join that Android still uses no longer works. SKAdNetwork is the deterministic replacement and returns conversion postbacks with limited campaign granularity and a delay. The standard iOS pipeline blends SKAN postbacks with fingerprint matching for the opted-out population and accepts a higher organic share than Android. Any iOS attribution table should carry a match_type column so dashboards can filter by confidence.

What attribution window should I use?

Twenty-four hours for click attribution is the industry default and a defensible interview answer. Seven days is acceptable for view-through and retargeting where the cycle is longer. Anything beyond seven days credits installs to clicks the user does not remember and inflates channels that show ads to already-engaged users. Measure the actual click-to-install distribution in your own data and pick the window that captures ninety to ninety-five percent of installs.

What counts as organic in the attribution table?

By convention, an install with no matching click in the window is labelled organic. The label is not strictly accurate — the bucket includes App Store search, word-of-mouth installs, users who declined ATT with no matching SKAN postback, and clicks the tracker failed to log — but it is the universal convention. The right move is to track the absolute organic install count over time and dig in when it moves.

How does install attribution connect to CAC?

CAC is total paid spend divided by attributed installs at the channel level, and the denominator comes directly from the table above. A leaky attribution pipeline inflates organic share, deflates paid installs, and produces a CAC number that looks worse than reality. When CAC moves unexpectedly, audit the attribution table first — match rates by channel, organic share over time, and the click-to-install window distribution. Most CAC anomalies are attribution bugs, not real performance changes.