How to calculate churn risk score in SQL
Contents:
What a churn risk score actually does
Picture the setup: it's late Tuesday, your head of retention pings you on Slack asking for "a list of accounts most likely to cancel in the next 30 days, ranked, so CS can call them in the morning." The model team is six weeks out from the gradient-boosted classifier. That request is why every analytics function eventually builds a rule-based churn risk score in SQL — the bridge between knowing churn happened and doing something about it first.
A churn risk score is a number per user that estimates how likely the user is to leave in the next window, usually 30 or 90 days. Stripe, DoorDash, and Notion all run some variant: high score routes to customer success, medium to an automated discount email, low is left alone. Score accuracy determines whether retention dollars get spent on accounts who would have stayed anyway — the most common failure of any save campaign.
The ML version uses logistic regression or gradient boosting on labeled outcomes. The SQL version uses additive or weighted rules on observable signals — days inactive, engagement decline, support tickets, plan downgrades. Less accurate, but interpretable, fast to ship, and good enough for the first quarter of retention work while data science builds the classifier. This post walks the full recipe: scoring, composite signals, calibration, and targeting, plus the traps that come up in production and in interviews.
Rule-based scoring
Start with the simplest version: an additive score on a handful of signals. Each rule contributes a fixed point value, rules add up to a 0-100 score, and a threshold sends users into the high-risk bucket. The query below assumes a users table, an activity table with one row per user-day, and a support_tickets table.
WITH user_signals AS (
SELECT
u.user_id,
CURRENT_DATE - COALESCE(MAX(a.DATE), u.created_at::DATE) AS days_inactive,
COUNT(a.DATE) FILTER (
WHERE a.DATE >= CURRENT_DATE - INTERVAL '7 days'
) AS active_last_7d,
COUNT(a.DATE) FILTER (
WHERE a.DATE >= CURRENT_DATE - INTERVAL '30 days'
) AS active_last_30d,
COUNT(a.DATE) FILTER (
WHERE a.DATE >= CURRENT_DATE - INTERVAL '7 days'
)::NUMERIC
/ NULLIF(
COUNT(a.DATE) FILTER (
WHERE a.DATE BETWEEN CURRENT_DATE - INTERVAL '14 days'
AND CURRENT_DATE - INTERVAL '8 days'
),
0
) AS week_over_week_ratio,
EXISTS (
SELECT 1 FROM support_tickets t
WHERE t.user_id = u.user_id
AND t.created_at >= CURRENT_DATE - INTERVAL '30 days'
AND t.priority = 'high'
) AS recent_complaint
FROM users u
LEFT JOIN activity a USING (user_id)
WHERE u.active = TRUE
GROUP BY u.user_id, u.created_at
)
SELECT
user_id,
days_inactive,
active_last_7d,
week_over_week_ratio,
(CASE WHEN days_inactive > 14 THEN 30 ELSE 0 END) +
(CASE WHEN days_inactive > 30 THEN 30 ELSE 0 END) +
(CASE WHEN week_over_week_ratio < 0.5 THEN 20 ELSE 0 END) +
(CASE WHEN active_last_7d = 0 THEN 30 ELSE 0 END) +
(CASE WHEN recent_complaint THEN 20 ELSE 0 END) AS churn_risk_score
FROM user_signals
ORDER BY churn_risk_score DESC;The score caps at 100 because the top two days_inactive thresholds overlap on the 30+ day band. Above 70 is high-risk; 30 to 70 is medium; under 30 is low. Recalibrate against your own churn outcomes (next section but one).
The week_over_week_ratio signal compares activity in the last seven days to the seven before. A ratio under 0.5 means usage cut by more than half — one of the strongest leading indicators of churn in any product where engagement has been stable. It catches users still technically active but visibly disengaging — the population a save campaign should target first.
Composite score with deciles
The additive score is easy to explain but hard to weight. If days_inactive is twice as predictive as complaint_score, the additive version cannot express that. A composite with explicit weights and a decile cut fixes both.
WITH user_score AS (
SELECT
user_id,
days_inactive,
engagement_trend,
complaint_score,
usage_decline,
days_inactive * 0.4
+ engagement_trend * 0.3
+ complaint_score * 0.2
+ usage_decline * 0.1 AS raw_score
FROM user_signals
)
SELECT
user_id,
raw_score,
NTILE(10) OVER (ORDER BY raw_score) AS risk_decile,
CASE
WHEN NTILE(10) OVER (ORDER BY raw_score) >= 9 THEN 'CRITICAL'
WHEN NTILE(10) OVER (ORDER BY raw_score) >= 7 THEN 'HIGH'
WHEN NTILE(10) OVER (ORDER BY raw_score) >= 4 THEN 'MEDIUM'
ELSE 'LOW'
END AS risk_segment
FROM user_score;The NTILE(10) cut makes this robust. Instead of trusting an absolute threshold, sort active users by raw score and put them in deciles. The top decile is always 10% of the user base, regardless of whether the average raw score has drifted. Additive scores tend to inflate as your product adds features and users accumulate signals — a fixed cutoff slowly captures more users until the "high risk" bucket is half the customer base and CS stops trusting the list.
The weights — 0.4, 0.3, 0.2, 0.1 — are an opinionated starting point. Days inactive is usually the strongest single signal for consumer subscriptions. For B2B SaaS where login frequency is naturally low because the product runs in the background, shift weight onto usage_decline and complaint_score. The right way to set weights is a logistic regression on labeled outcomes — a quick scikit-learn fit in a notebook tells you which signals are pulling their weight.
Calibration against actual churn
A score nobody validates is a guess in a CASE statement. The most important step is calibration: take last quarter's scores, look at who actually churned, and verify the top decile churned at a meaningfully higher rate than the bottom.
WITH historical_scores AS (
SELECT user_id, churn_risk_score AS score_30d_ago
FROM user_score_snapshots
WHERE snapshot_date = CURRENT_DATE - INTERVAL '30 days'
),
churn_outcomes AS (
SELECT
user_id,
churned_at IS NOT NULL AS churned
FROM users
WHERE created_at < CURRENT_DATE - INTERVAL '30 days'
)
SELECT
NTILE(10) OVER (ORDER BY score_30d_ago) AS risk_decile,
COUNT(*) AS users,
SUM(CASE WHEN churned THEN 1 ELSE 0 END) AS churned,
SUM(CASE WHEN churned THEN 1 ELSE 0 END)::NUMERIC * 100 / COUNT(*) AS churn_rate_pct
FROM historical_scores
JOIN churn_outcomes USING (user_id)
GROUP BY 1
ORDER BY 1;The output is a 10-row table. If the score is doing real work, the top decile should show a churn rate three to ten times higher than the bottom. At healthy SaaS shops the ratio is typically 5-8x: top decile at 20-25% over 30 days, bottom at 2-4%. If the top only churns at 1.5x the bottom, your signals are weak — likely your engagement metric is noisy or your support-ticket signal too sparse.
A second sanity check is the lift curve: cumulative churns captured if you target the top N% of scores. A useful score reaches 50% of all churners by contacting the top 20%. If you need 60% of users to catch 50% of churners, the score is barely better than random.
You need a user_score_snapshots table — a daily or weekly rollup that freezes scores so you can compare to outcomes 30 days later. Without snapshots there is no way to validate against history.
Targeting high-risk users
Once the score works, push the high-risk slice into a retention list. The query is trivial; what matters is filter discipline and the LIMIT.
SELECT
user_id,
email,
churn_risk_score,
days_inactive
FROM user_scores
WHERE churn_risk_score >= 70
AND email_opt_in = TRUE
ORDER BY churn_risk_score DESC
LIMIT 1000;The email_opt_in filter is non-negotiable — sending retention emails to users who opted out is the fastest way to land in the spam folder forever. The LIMIT 1000 is about capacity: CS can realistically work 50-100 calls a week, so the top 1,000 go to email and the top 50 go to CS.
A more advanced version segments by band. Critical accounts get a human CS call. High accounts get a personalized discount email. Medium accounts get a re-engagement email with a feature tour. Low accounts get nothing — the lift from contacting low-risk users is statistically zero and the unsubscribe rate hurts long-term deliverability.
Common pitfalls
The most common failure is a score that's too sensitive — every other user flags as "high risk" and CS stops opening the list because most accounts on it would have stayed anyway. Use decile-based segmentation rather than absolute thresholds, and check the size of the high-risk bucket weekly. If it grows past 15-20% of the active base, recalibrate before the retention team learns to ignore the alerts.
A second trap is static rules. Thresholds that captured churners in Q1 will not perfectly capture them in Q4 — the product changed, the acquisition mix shifted, seasonal patterns moved the baseline. A score that was 80% accurate in January and 55% accurate in October is silently broken. Calibrate quarterly at minimum, monthly if the product is changing fast.
Shipping a score without validation is the third failure. Teams roll out a rule-based score because it "looks reasonable", and discover six months later that the top decile churns at the same rate as the bottom because one rule is firing opposite to intent. The calibration query is not optional — run it before wiring the score into any workflow, and again every time you change a rule or weight.
Relying on a single signal is the fourth trap. Days inactive alone is weak for products where users have multi-week gaps between sessions — tax filings, quarterly procurement, seasonal travel apps. The composite version exists because no single signal is universally predictive; combining engagement, support tickets, plan changes, and feature-usage decline gives a smoother ranking than any one alone.
Survivor bias is the subtlest. If you train score weights on users who signed up at least 24 months ago, you systematically miss the early-tenure churn pattern — users who churned in their first 3 months are underrepresented. The fix is right-censored survival analysis, or restrict the training cohort to users whose first 90 days you can observe in full and accept the score is only valid for similar-tenure users.
Optimization tips
The signals query joins users, activity, and support_tickets and aggregates per user — at any reasonable scale this is the expensive part of the pipeline. Two indexes earn their cost back the first week: a composite on activity (user_id, date) and a composite on support_tickets (user_id, created_at, priority). Without those, the FILTER clauses and the EXISTS subquery degenerate into full scans.
Daily scoring on more than 500K users should not run as an ad-hoc query on source tables. Materialize a user_signals_daily table — one row per user per day, with raw signal columns precomputed — and recompute scores from that. The signal table is incrementally appended each day by a small DAG step that processes only the previous day. On Snowflake or BigQuery, partition by snapshot_date so historical reads prune to a single partition.
The snapshot pattern makes calibration cheap. Instead of re-deriving 30-day-old scores from raw events, read directly from user_score_snapshots with a date filter. The cost of one extra rollup table is worth the leverage for every downstream consumer.
Related reading
If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.
FAQ
Rule-based score or ML model — which should I ship first?
Ship rules first, always. A rule-based score in SQL can be in production within a week, with full visibility into why every account is flagged. An ML model takes months to train, validate, and productionize. The two are not really competitors — rules run while the ML team builds the classifier, and once the classifier is live rules become the fallback when the model service is down. Linear and Notion both ran rule-based scores for over a year before swapping in a gradient-boosted version.
What threshold makes a user "high risk"?
It depends on what you do with the label, not on the score itself. If high-risk users get a human CS call, keep the bucket small — top 5-10%, cutoff around 70-80 on a 0-100 additive score. If they get an automated discount email, top 20-25% works, cutoff around 50. Work backward from CS capacity and email budget, not from an arbitrary round number.
How often should the score be recomputed?
Daily for products where users can churn fast — consumer subscriptions, freemium apps, gaming. Weekly is fine for B2B SaaS with annual contracts. Real-time scoring is mostly vanity: by the time a user has been inactive a week, you have time to recompute overnight and email them in the morning. Real-time is justified only when the action loop is real-time too, like a cancellation save flow at the cancel click.
What signals should I include?
Start with the four that show up in every production score: days since last meaningful action, engagement trend (week-over-week ratio), recent high-priority support tickets, plan changes or downgrades. Then add product-specific signals: feature-usage decline, seat-utilization drop, payment failures, NPS or CSAT drops. Add signals only when they measurably improve calibration — four well-calibrated signals beat twelve noisy ones.
How do I know the score is actually working?
Run the calibration query monthly and watch two numbers: top-decile churn rate versus bottom-decile, and the lift curve. A working score has a top-decile churn rate at least three times the bottom and captures 50% of churners in the top 20%. A second proof is save-campaign ROI: if the high-risk bucket has better save rates than a control group getting the same intervention regardless of score, the score is earning its place.