SQL for fraud detection
Contents:
Why fraud SQL matters
Fraud detection is one of the highest-leverage tasks an analyst can own. Every blocked card-testing burst, every caught promo ring, every chargeback prevented translates directly into saved revenue. At Stripe, DoorDash, Uber, and any marketplace large enough to attract bad actors, the first line of defense is rarely a machine learning model — it is a small library of SQL queries that runs every few minutes and flags anything outside a known envelope.
Fraud SQL also shows up constantly in interviews for fintech, payments, and marketplace data roles. Hiring managers sketch a schema with transactions, sessions, users, and devices and ask you to find card testing, account takeover, or a fraud ring. Candidates who reach for a single rule lose. Candidates who layer velocity, cardinality, and behavioral signals into a stacked score win. The patterns below run on Postgres-style dialects with minor adjustments for Snowflake, BigQuery, or Databricks.
Fraud archetypes you will model
Account takeover happens when an attacker compromises a real user's credentials. The pattern is a login from a brand-new device or IP, immediately followed by an unusual transaction, often preceded by failed login attempts. The SQL joins sessions to transactions on user_id and inspects the gap between anomalous login and next debit.
Card testing is a stolen card probed with tiny charges across many merchants. The signature is high-volume small transactions on a single card, high decline rate, and unrelated merchants. The query groups by card_fingerprint over a short window and filters on count and decline ratio.
Promo abuse is the marketplace classic — one human creates many accounts to redeem a sign-up bonus. The fingerprints overlap: same device, same IP, sequential emails, sometimes the same payment instrument behind multiple "users". Detection lives in self-joins on sessions and payment_methods.
Chargeback fraud is when a real customer makes a legitimate purchase and later disputes it to keep both goods and money. The signal is subtler — first-time customers, high-value orders, a spike in chargeback rate at a specific merchant or shipping zone. The query is a merchant-level aggregation with a volume floor.
Velocity rules
The simplest fraud check is the velocity rule: more than N events from one entity in X minutes. Velocity catches card testing, credential stuffing, scraping, and bot activity without any user history.
SELECT
user_id,
COUNT(*) AS tx_count
FROM transactions
WHERE created_at >= NOW() - INTERVAL '5 minutes'
GROUP BY user_id
HAVING COUNT(*) > 10;The threshold is a knob. Ten transactions in five minutes is reasonable for payments; for B2B SaaS where one transaction takes a sales rep an hour, the threshold might be three. Tune against your historical population — pick the 99.5th or 99.9th percentile of per-user velocity, then watch precision over the first week.
The same shape works at the IP, device, and card level. Replace user_id with ip_address, device_fingerprint, or card_fingerprint and you have four velocity rules from one template. Stack them with UNION ALL for a sheet that catches the noisiest attacks before any model runs.
Cardinality patterns: users, cards, devices
Cardinality queries answer "how many distinct X are attached to one Y" and cover a huge fraction of real cases. A user with five cards in 30 days is unusual; a card belonging to three users in 30 days is almost always fraud.
SELECT
user_id,
COUNT(DISTINCT card_fingerprint) AS unique_cards
FROM transactions
WHERE created_at >= CURRENT_DATE - 30
GROUP BY user_id
HAVING COUNT(DISTINCT card_fingerprint) > 5;The inverse — one card, many users — is the stronger signal because legitimate users almost never share a card.
SELECT
card_fingerprint,
COUNT(DISTINCT user_id) AS unique_users
FROM transactions
WHERE created_at >= CURRENT_DATE - 30
GROUP BY card_fingerprint
HAVING COUNT(DISTINCT user_id) > 3;Extend the shape to devices. A single device serving five or more accounts in a month on a consumer marketplace usually means promo abuse or a coordinated fraud farm.
SELECT
device_fingerprint,
COUNT(DISTINCT user_id) AS users_on_device
FROM sessions
WHERE created_at >= CURRENT_DATE - 30
GROUP BY device_fingerprint
HAVING COUNT(DISTINCT user_id) > 5;Behavioral anomaly queries
Velocity and cardinality are stateless. Behavioral anomalies compare a single event to the user's own historical envelope, which catches the medium-sophistication attacker who paces fraud to look like one normal user.
Geographic anomaly is the textbook example. A New York transaction followed thirty minutes later by a Singapore charge is impossible for the cardholder.
WITH with_prev AS (
SELECT
user_id,
transaction_id,
created_at,
country,
LAG(country) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_country,
LAG(created_at) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_time
FROM transactions
)
SELECT *
FROM with_prev
WHERE country != prev_country
AND EXTRACT(EPOCH FROM (created_at - prev_time)) / 3600 < 1;Amount anomalies catch a compromised account drained in one large charge. Compare the amount to the user's historical mean and standard deviation; anything more than three standard deviations away warrants review.
WITH user_stats AS (
SELECT
user_id,
AVG(amount) AS avg_amount,
STDDEV(amount) AS std_amount
FROM transactions
WHERE created_at < CURRENT_DATE
GROUP BY user_id
)
SELECT t.*, us.avg_amount, us.std_amount
FROM transactions t
JOIN user_stats us ON t.user_id = us.user_id
WHERE ABS(t.amount - us.avg_amount) > 3 * us.std_amount
AND t.created_at >= CURRENT_DATE;Time-of-day anomalies catch attackers operating in the user's overnight hours. Build a per-user array of normal hours from the prior month, then flag events outside it.
WITH user_hours AS (
SELECT
user_id,
ARRAY_AGG(DISTINCT EXTRACT(HOUR FROM created_at)) AS normal_hours
FROM transactions
WHERE created_at < CURRENT_DATE - 7
GROUP BY user_id
)
SELECT t.*
FROM transactions t
JOIN user_hours uh ON t.user_id = uh.user_id
WHERE NOT (EXTRACT(HOUR FROM t.created_at) = ANY(uh.normal_hours))
AND t.created_at >= CURRENT_DATE;Graph analysis for fraud rings
Sophisticated fraud is rarely a lone wolf. Promo rings, refund mills, and money-mule networks share signal across the graph — same card, same device, same shipping address, same IP block. A self-join on a shared attribute exposes the network.
-- Users who share a card fingerprint
SELECT DISTINCT
a.user_id AS user_a,
b.user_id AS user_b
FROM transactions a
JOIN transactions b
ON a.card_fingerprint = b.card_fingerprint
WHERE a.user_id < b.user_id;The user_id < b.user_id filter prevents the symmetric pair (A, B) and (B, A) from both appearing. Once you have a table of suspicious edges, keep self-joining to expand the cluster, or push edges into a graph store for connected-component analysis.
For chargeback fraud, aggregate at the merchant level and look for elevated dispute rates above a meaningful volume floor.
SELECT
merchant_id,
COUNT(*) AS total_tx,
SUM(CASE WHEN is_chargeback THEN 1 ELSE 0 END) * 1.0
/ COUNT(*) AS chargeback_rate
FROM transactions
WHERE created_at >= CURRENT_DATE - 90
GROUP BY merchant_id
HAVING COUNT(*) > 100
ORDER BY chargeback_rate DESC;A merchant whose dispute rate stays above one percent for three consecutive months is a risk merchant — compromised, selling counterfeit goods, or running first-party chargeback fraud at scale.
Risk scoring with stacked flags
A single low-threshold rule floods the review queue; a single high-threshold rule misses most fraud. Compute several weak signals per transaction and require multiple to fire at once.
WITH user_stats AS (
SELECT
user_id,
APPROX_PERCENTILE(amount, 0.99) AS p99_amount,
MODE() WITHIN GROUP (ORDER BY country) AS usual_country
FROM transactions
WHERE created_at < CURRENT_DATE
GROUP BY user_id
),
risk_flags AS (
SELECT
t.transaction_id,
t.user_id,
CASE WHEN t.amount > us.p99_amount THEN 1 ELSE 0 END AS amt_flag,
CASE WHEN t.country != us.usual_country THEN 1 ELSE 0 END AS geo_flag,
CASE WHEN EXTRACT(HOUR FROM t.created_at)
NOT BETWEEN 6 AND 23 THEN 1 ELSE 0 END AS time_flag
FROM transactions t
JOIN user_stats us USING (user_id)
WHERE t.created_at >= NOW() - INTERVAL '1 hour'
)
SELECT
transaction_id,
user_id,
amt_flag + geo_flag + time_flag AS risk_score
FROM risk_flags
WHERE amt_flag + geo_flag + time_flag >= 2;Three independent flags requiring at least two to fire typically lifts precision from forty percent on any single rule to ninety percent stacked, while still catching most coordinated attacks. Rules of this shape are also the ideal training-data generator: once the heuristic stops scaling, hand labeled events to a gradient boosted tree.
Common pitfalls
The most common mistake is treating velocity as the only signal. Velocity catches card testing and credential stuffing beautifully but misses the patient attacker who pages fraud across days to stay under per-minute thresholds. Pair velocity with behavioral baselines so a slow attacker still trips the amount or geo flag.
Another trap is computing user baselines that include today's data. If the current transaction is a ten-thousand-dollar withdrawal and you include it in the mean and standard deviation, you raise the bar for the very event you are trying to flag. Always exclude the event under review: WHERE created_at < CURRENT_DATE in the user_stats CTE, and join live transactions separately.
Static rules age quickly. Fraudsters watch your decline pattern and adapt — card testing moves from five-minute bursts to thirty-minute drips to multi-day campaigns. A rule shipped six months ago and never revisited will hit half the precision it had at launch. Schedule a quarterly review against flagged volume, confirmation rate, and lift over a no-rule baseline.
Ignoring false-positive cost is the most expensive mistake. Every flagged legitimate transaction is a support ticket, a churned user, sometimes a public complaint. Before tightening a rule, estimate the increase in flagged volume and stage the change to a small percent of traffic with manual review. Both precision against confirmed fraud and false-positive rate against confirmed legitimate must move correctly before dialing up.
Performance and operational tips
Fraud queries run on a hot tail of recent data and indexes must match. The baseline for a transactions table is composite indexes on (user_id, created_at DESC), (card_fingerprint, created_at DESC), and (device_fingerprint, created_at DESC). Without these, every velocity check does a full scan and detection lag balloons from seconds to minutes.
Partitioning by date pays for itself immediately. Most fraud queries filter on created_at >= NOW() - INTERVAL 'X', so pruning eliminates over ninety percent of data the query would otherwise touch. On Snowflake or BigQuery, cluster on the same column.
User baselines are candidates for materialized views or daily snapshots — recomputing AVG(amount) and STDDEV(amount) per user every five minutes is wasteful when you can refresh hourly and join the snapshot. For real-time detection at payments scale, batch SQL is too slow; stream processors like Flink and Materialize run the same SQL semantics over a stream and emit flags within hundreds of milliseconds. Keep SQL portable so the same logic runs in batch review and stream production.
Related reading
- Account takeover detection in SQL
- How to calculate fraud score in SQL
- How to calculate fraud rate in SQL
- How to calculate chargeback rate in SQL
- How to calculate IP velocity in SQL
- How to calculate device fingerprinting in SQL
- How to detect anomalies in SQL
- SQL window functions interview questions
If you want to drill fraud SQL questions under realistic interview pressure, NAILDD is launching with 500+ SQL problems across these patterns.
FAQ
Should fraud detection run as real-time SQL or scheduled batch?
Both. Real-time SQL — executed by a stream processor running SQL semantics — blocks or steps up authentication before a transaction settles. Batch SQL, running every five to fifteen minutes against the warehouse, catches slower patterns and feeds the manual review queue. The high-precision rules you ship to real-time are a subset of the wider batch library; promote a rule to streaming only after it survives several weeks at acceptable precision.
How do I balance precision and recall when tuning thresholds?
Translate both errors into dollars. A false positive costs the average ticket times churn probability; a false negative costs the chargeback plus fees plus dispute handling. Compute the cost-weighted F-score, sweep the threshold across a labeled holdout month, and pick the threshold that minimizes expected cost. For most payment products, the optimum lands at high precision and moderate recall — a missed fraud is recoverable through review, a churned user usually is not.
How do I handle the cold start problem when a user has no history?
For brand-new users you have no behavioral baseline, so per-user anomaly checks fall back to neutral. Compensate with stricter population-level rules: lower velocity thresholds, tighter cardinality caps, and a population baseline from the cohort of users in the same country and product. After roughly thirty events, swap to personal baselines via a user_age_days join and a CASE expression.
When should I move from SQL rules to a machine learning model?
When your rule library exceeds about a hundred entries, the rules contradict each other often, and you have at least six months of labeled events. A gradient boosted tree trained on the features your rules already extract — velocity stats, baseline deviations, device and IP signals — typically lifts AUC by ten to twenty points. Rules stay around to handle high-confidence blocks and serve as a model sanity check.
What features matter most for a fraud model?
Velocity features dominate — transaction counts at one, five, fifteen, and sixty minute windows per user, card, IP, and device. Behavioral deviation features come next — amount and time-of-day z-scores against personal baselines. Graph features follow — distinct cards per user, distinct users per device, centrality measures. Categorical features like country mismatch and merchant category round it out. Forty to sixty well-engineered signals typically outperforms hundreds of raw columns.