May 18, 2026·12 min read

SQL for fraud detection

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Contents:

Why fraud SQL matters
Fraud archetypes you will model
Velocity rules
Cardinality patterns: users, cards, devices
Behavioral anomaly queries
Graph analysis for fraud rings
Risk scoring with stacked flags
Common pitfalls
Performance and operational tips
Related reading
FAQ

Why fraud SQL matters

Fraud detection is one of the highest-leverage tasks an analyst can own. Every blocked card-testing burst, every caught promo ring, every chargeback prevented translates directly into saved revenue. At Stripe, DoorDash, Uber, and any marketplace large enough to attract bad actors, the first line of defense is rarely a machine learning model — it is a small library of SQL queries that runs every few minutes and flags anything outside a known envelope.

Fraud SQL also shows up constantly in interviews for fintech, payments, and marketplace data roles. Hiring managers sketch a schema with transactions, sessions, users, and devices and ask you to find card testing, account takeover, or a fraud ring. Candidates who reach for a single rule lose. Candidates who layer velocity, cardinality, and behavioral signals into a stacked score win. The patterns below run on Postgres-style dialects with minor adjustments for Snowflake, BigQuery, or Databricks.

Fraud archetypes you will model

Account takeover happens when an attacker compromises a real user's credentials. The pattern is a login from a brand-new device or IP, immediately followed by an unusual transaction, often preceded by failed login attempts. The SQL joins sessions to transactions on user_id and inspects the gap between anomalous login and next debit.

Card testing is a stolen card probed with tiny charges across many merchants. The signature is high-volume small transactions on a single card, high decline rate, and unrelated merchants. The query groups by card_fingerprint over a short window and filters on count and decline ratio.

Promo abuse is the marketplace classic — one human creates many accounts to redeem a sign-up bonus. The fingerprints overlap: same device, same IP, sequential emails, sometimes the same payment instrument behind multiple "users". Detection lives in self-joins on sessions and payment_methods.

Chargeback fraud is when a real customer makes a legitimate purchase and later disputes it to keep both goods and money. The signal is subtler — first-time customers, high-value orders, a spike in chargeback rate at a specific merchant or shipping zone. The query is a merchant-level aggregation with a volume floor.

Velocity rules

The simplest fraud check is the velocity rule: more than N events from one entity in X minutes. Velocity catches card testing, credential stuffing, scraping, and bot activity without any user history.

SELECT
    user_id,
    COUNT(*) AS tx_count
FROM transactions
WHERE created_at >= NOW() - INTERVAL '5 minutes'
GROUP BY user_id
HAVING COUNT(*) > 10;

The threshold is a knob. Ten transactions in five minutes is reasonable for payments; for B2B SaaS where one transaction takes a sales rep an hour, the threshold might be three. Tune against your historical population — pick the 99.5th or 99.9th percentile of per-user velocity, then watch precision over the first week.

The same shape works at the IP, device, and card level. Replace user_id with ip_address, device_fingerprint, or card_fingerprint and you have four velocity rules from one template. Stack them with UNION ALL for a sheet that catches the noisiest attacks before any model runs.

Cardinality patterns: users, cards, devices

Cardinality queries answer "how many distinct X are attached to one Y" and cover a huge fraction of real cases. A user with five cards in 30 days is unusual; a card belonging to three users in 30 days is almost always fraud.

SELECT
    user_id,
    COUNT(DISTINCT card_fingerprint) AS unique_cards
FROM transactions
WHERE created_at >= CURRENT_DATE - 30
GROUP BY user_id
HAVING COUNT(DISTINCT card_fingerprint) > 5;

The inverse — one card, many users — is the stronger signal because legitimate users almost never share a card.

SELECT
    card_fingerprint,
    COUNT(DISTINCT user_id) AS unique_users
FROM transactions
WHERE created_at >= CURRENT_DATE - 30
GROUP BY card_fingerprint
HAVING COUNT(DISTINCT user_id) > 3;

Extend the shape to devices. A single device serving five or more accounts in a month on a consumer marketplace usually means promo abuse or a coordinated fraud farm.

SELECT
    device_fingerprint,
    COUNT(DISTINCT user_id) AS users_on_device
FROM sessions
WHERE created_at >= CURRENT_DATE - 30
GROUP BY device_fingerprint
HAVING COUNT(DISTINCT user_id) > 5;

Behavioral anomaly queries

Velocity and cardinality are stateless. Behavioral anomalies compare a single event to the user's own historical envelope, which catches the medium-sophistication attacker who paces fraud to look like one normal user.

Geographic anomaly is the textbook example. A New York transaction followed thirty minutes later by a Singapore charge is impossible for the cardholder.

WITH with_prev AS (
    SELECT
        user_id,
        transaction_id,
        created_at,
        country,
        LAG(country) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_country,
        LAG(created_at) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_time
    FROM transactions
)
SELECT *
FROM with_prev
WHERE country != prev_country
    AND EXTRACT(EPOCH FROM (created_at - prev_time)) / 3600 < 1;

Amount anomalies catch a compromised account drained in one large charge. Compare the amount to the user's historical mean and standard deviation; anything more than three standard deviations away warrants review.

WITH user_stats AS (
    SELECT
        user_id,
        AVG(amount) AS avg_amount,
        STDDEV(amount) AS std_amount
    FROM transactions
    WHERE created_at < CURRENT_DATE
    GROUP BY user_id
)
SELECT t.*, us.avg_amount, us.std_amount
FROM transactions t
JOIN user_stats us ON t.user_id = us.user_id
WHERE ABS(t.amount - us.avg_amount) > 3 * us.std_amount
    AND t.created_at >= CURRENT_DATE;

Time-of-day anomalies catch attackers operating in the user's overnight hours. Build a per-user array of normal hours from the prior month, then flag events outside it.

WITH user_hours AS (
    SELECT
        user_id,
        ARRAY_AGG(DISTINCT EXTRACT(HOUR FROM created_at)) AS normal_hours
    FROM transactions
    WHERE created_at < CURRENT_DATE - 7
    GROUP BY user_id
)
SELECT t.*
FROM transactions t
JOIN user_hours uh ON t.user_id = uh.user_id
WHERE NOT (EXTRACT(HOUR FROM t.created_at) = ANY(uh.normal_hours))
    AND t.created_at >= CURRENT_DATE;

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Join the waitlist

Graph analysis for fraud rings

Sophisticated fraud is rarely a lone wolf. Promo rings, refund mills, and money-mule networks share signal across the graph — same card, same device, same shipping address, same IP block. A self-join on a shared attribute exposes the network.

-- Users who share a card fingerprint
SELECT DISTINCT
    a.user_id AS user_a,
    b.user_id AS user_b
FROM transactions a
JOIN transactions b
    ON a.card_fingerprint = b.card_fingerprint
WHERE a.user_id < b.user_id;

The user_id < b.user_id filter prevents the symmetric pair (A, B) and (B, A) from both appearing. Once you have a table of suspicious edges, keep self-joining to expand the cluster, or push edges into a graph store for connected-component analysis.

For chargeback fraud, aggregate at the merchant level and look for elevated dispute rates above a meaningful volume floor.

SELECT
    merchant_id,
    COUNT(*) AS total_tx,
    SUM(CASE WHEN is_chargeback THEN 1 ELSE 0 END) * 1.0
        / COUNT(*) AS chargeback_rate
FROM transactions
WHERE created_at >= CURRENT_DATE - 90
GROUP BY merchant_id
HAVING COUNT(*) > 100
ORDER BY chargeback_rate DESC;

A merchant whose dispute rate stays above one percent for three consecutive months is a risk merchant — compromised, selling counterfeit goods, or running first-party chargeback fraud at scale.

Risk scoring with stacked flags

A single low-threshold rule floods the review queue; a single high-threshold rule misses most fraud. Compute several weak signals per transaction and require multiple to fire at once.

WITH user_stats AS (
    SELECT
        user_id,
        APPROX_PERCENTILE(amount, 0.99) AS p99_amount,
        MODE() WITHIN GROUP (ORDER BY country) AS usual_country
    FROM transactions
    WHERE created_at < CURRENT_DATE
    GROUP BY user_id
),
risk_flags AS (
    SELECT
        t.transaction_id,
        t.user_id,
        CASE WHEN t.amount > us.p99_amount         THEN 1 ELSE 0 END AS amt_flag,
        CASE WHEN t.country != us.usual_country    THEN 1 ELSE 0 END AS geo_flag,
        CASE WHEN EXTRACT(HOUR FROM t.created_at)
                 NOT BETWEEN 6 AND 23              THEN 1 ELSE 0 END AS time_flag
    FROM transactions t
    JOIN user_stats us USING (user_id)
    WHERE t.created_at >= NOW() - INTERVAL '1 hour'
)
SELECT
    transaction_id,
    user_id,
    amt_flag + geo_flag + time_flag AS risk_score
FROM risk_flags
WHERE amt_flag + geo_flag + time_flag >= 2;

Three independent flags requiring at least two to fire typically lifts precision from forty percent on any single rule to ninety percent stacked, while still catching most coordinated attacks. Rules of this shape are also the ideal training-data generator: once the heuristic stops scaling, hand labeled events to a gradient boosted tree.

Common pitfalls

The most common mistake is treating velocity as the only signal. Velocity catches card testing and credential stuffing beautifully but misses the patient attacker who pages fraud across days to stay under per-minute thresholds. Pair velocity with behavioral baselines so a slow attacker still trips the amount or geo flag.

Another trap is computing user baselines that include today's data. If the current transaction is a ten-thousand-dollar withdrawal and you include it in the mean and standard deviation, you raise the bar for the very event you are trying to flag. Always exclude the event under review: WHERE created_at < CURRENT_DATE in the user_stats CTE, and join live transactions separately.

Static rules age quickly. Fraudsters watch your decline pattern and adapt — card testing moves from five-minute bursts to thirty-minute drips to multi-day campaigns. A rule shipped six months ago and never revisited will hit half the precision it had at launch. Schedule a quarterly review against flagged volume, confirmation rate, and lift over a no-rule baseline.

Ignoring false-positive cost is the most expensive mistake. Every flagged legitimate transaction is a support ticket, a churned user, sometimes a public complaint. Before tightening a rule, estimate the increase in flagged volume and stage the change to a small percent of traffic with manual review. Both precision against confirmed fraud and false-positive rate against confirmed legitimate must move correctly before dialing up.

Performance and operational tips

Fraud queries run on a hot tail of recent data and indexes must match. The baseline for a transactions table is composite indexes on (user_id, created_at DESC), (card_fingerprint, created_at DESC), and (device_fingerprint, created_at DESC). Without these, every velocity check does a full scan and detection lag balloons from seconds to minutes.

Partitioning by date pays for itself immediately. Most fraud queries filter on created_at >= NOW() - INTERVAL 'X', so pruning eliminates over ninety percent of data the query would otherwise touch. On Snowflake or BigQuery, cluster on the same column.

User baselines are candidates for materialized views or daily snapshots — recomputing AVG(amount) and STDDEV(amount) per user every five minutes is wasteful when you can refresh hourly and join the snapshot. For real-time detection at payments scale, batch SQL is too slow; stream processors like Flink and Materialize run the same SQL semantics over a stream and emit flags within hundreds of milliseconds. Keep SQL portable so the same logic runs in batch review and stream production.

If you want to drill fraud SQL questions under realistic interview pressure, NAILDD is launching with 500+ SQL problems across these patterns.

FAQ

Should fraud detection run as real-time SQL or scheduled batch?

Both. Real-time SQL — executed by a stream processor running SQL semantics — blocks or steps up authentication before a transaction settles. Batch SQL, running every five to fifteen minutes against the warehouse, catches slower patterns and feeds the manual review queue. The high-precision rules you ship to real-time are a subset of the wider batch library; promote a rule to streaming only after it survives several weeks at acceptable precision.

How do I balance precision and recall when tuning thresholds?

Translate both errors into dollars. A false positive costs the average ticket times churn probability; a false negative costs the chargeback plus fees plus dispute handling. Compute the cost-weighted F-score, sweep the threshold across a labeled holdout month, and pick the threshold that minimizes expected cost. For most payment products, the optimum lands at high precision and moderate recall — a missed fraud is recoverable through review, a churned user usually is not.

How do I handle the cold start problem when a user has no history?

For brand-new users you have no behavioral baseline, so per-user anomaly checks fall back to neutral. Compensate with stricter population-level rules: lower velocity thresholds, tighter cardinality caps, and a population baseline from the cohort of users in the same country and product. After roughly thirty events, swap to personal baselines via a user_age_days join and a CASE expression.

When should I move from SQL rules to a machine learning model?

When your rule library exceeds about a hundred entries, the rules contradict each other often, and you have at least six months of labeled events. A gradient boosted tree trained on the features your rules already extract — velocity stats, baseline deviations, device and IP signals — typically lifts AUC by ten to twenty points. Rules stay around to handle high-confidence blocks and serve as a model sanity check.

What features matter most for a fraud model?

Velocity features dominate — transaction counts at one, five, fifteen, and sixty minute windows per user, card, IP, and device. Behavioral deviation features come next — amount and time-of-day z-scores against personal baselines. Graph features follow — distinct cards per user, distinct users per device, centrality measures. Categorical features like country mismatch and merchant category round it out. Forty to sixty well-engineered signals typically outperforms hundreds of raw columns.