PARTITION BY SQL cheat sheet

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why PARTITION BY matters

PARTITION BY is the verb that turns a flat SELECT into a window query. It tells the engine to slice rows into groups and compute an aggregate or rank inside each slice without collapsing the result set. Every row stays, but each row carries context about its neighbours: the running total for that user, the rank inside that city, the share of that category's revenue. That single capability shows up in almost every data analyst loop at Stripe, Airbnb, DoorDash, Snowflake, and Meta, and it is the most common stumble for candidates who learned SQL through GROUP BY first.

Interviewers love PARTITION BY because it forces you to think about granularity: what a row represents, what the partition key represents, and what the order column adds. Get the triple right and cohort retention, attribution shares, leaderboards, and rolling averages become a few extra lines. Get it wrong and you silently aggregate across users, or forget ORDER BY and watch your "running total" go flat.

This cheat sheet covers the syntax, the ten interview patterns, the edge cases that trip juniors, and the runtime tricks engineers use at billion-row scale.

Syntax in one breath

function() OVER (PARTITION BY column ORDER BY column)
  • PARTITION BY splits rows into windows. Rows sharing a partition key live in the same window.
  • ORDER BY defines order inside each window. Required for ranking (ROW_NUMBER, RANK, DENSE_RANK), offset (LAG, LEAD, FIRST_VALUE), and any running aggregate.
  • Both are optional, but almost always paired in real queries.
  • ROWS BETWEEN or RANGE BETWEEN narrows the window further for sliding frames.

Worked example: number each user's orders chronologically.

SELECT
    user_id,
    order_id,
    amount,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS order_num
FROM orders;

Each user gets a counter starting at 1 in created_at order. No rows are dropped.

PARTITION BY vs GROUP BY

Shortest mental model: GROUP BY collapses, PARTITION BY annotates.

Question GROUP BY PARTITION BY
Collapses rows? Yes No
Row count One per group One per input row
Used with Aggregates in SELECT Window functions in OVER()
Lives in SELECT and HAVING The OVER() clause only

Same intent, two shapes:

-- GROUP BY: one row per user
SELECT user_id, SUM(amount) AS user_total
FROM orders
GROUP BY user_id;

-- PARTITION BY: every row plus the user's total alongside
SELECT
    user_id,
    order_id,
    amount,
    SUM(amount) OVER (PARTITION BY user_id) AS user_total
FROM orders;

Reach for PARTITION BY when downstream logic needs the per-group total next to every row, for example each order's share of the customer's lifetime spend or filtering orders below the user's own average. Reach for GROUP BY when you only need the aggregated table.

Edge cases that confuse juniors

Multiple columns in PARTITION BY

SELECT
    o.user_id,
    p.category,
    o.order_id,
    ROW_NUMBER() OVER (
        PARTITION BY o.user_id, p.category
        ORDER BY o.created_at
    ) AS rn
FROM orders o
JOIN products p USING (product_id);

Windows form on the combination of partition keys. A user who buys both electronics and apparel gets two parallel sequences, one per category, each starting at 1.

PARTITION BY without ORDER BY

SELECT
    order_id,
    user_id,
    amount,
    SUM(amount) OVER (PARTITION BY user_id) AS user_total
FROM orders;

Without ORDER BY, the aggregate covers the full window. That is exactly what you want for "total per group on every row". Add ORDER BY and the same SUM becomes a running total.

ORDER BY without PARTITION BY

SELECT
    created_at,
    amount,
    SUM(amount) OVER (ORDER BY created_at) AS running_total
FROM orders;

One window equal to the entire result set, ordered chronologically. Useful for global running totals; less useful at scale because there is no parallelism.

Multiple aggregates on one window

SELECT
    city,
    user_id,
    orders_count,
    MAX(orders_count) OVER (PARTITION BY city) AS city_max,
    AVG(orders_count) OVER (PARTITION BY city) AS city_avg,
    COUNT(*) OVER (PARTITION BY city) AS city_size
FROM user_stats;

Three aggregates, one logical window. Keep the OVER() clause identical character for character and the optimizer fuses them into a single pass.

Sliding frames with ROWS BETWEEN

SELECT
    user_id,
    day,
    amount,
    AVG(amount) OVER (
        PARTITION BY user_id
        ORDER BY day
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS ma_7d
FROM daily_orders;

Seven-day moving average per user. ROWS counts physical rows; RANGE counts logical ordering values. Mix them up on a date column with gaps and your "7-day moving average" silently becomes "7 rows of data".

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Ten interview tasks with solutions

These ten show up almost verbatim in data analyst loops at Stripe, Airbnb, DoorDash, Snowflake, Notion, and Linear. Work through them once and you will recognise the rest.

1. Order number within user

SELECT
    user_id,
    order_id,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS n
FROM orders;

2. First order per user

WITH numbered AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS rn
    FROM orders
)
SELECT *
FROM numbered
WHERE rn = 1;

3. Top three products per category

WITH ranked AS (
    SELECT
        category,
        product_id,
        revenue,
        ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS rn
    FROM sales
)
SELECT *
FROM ranked
WHERE rn <= 3;

4. Order share of user's lifetime spend

SELECT
    user_id,
    order_id,
    amount,
    amount * 1.0 / SUM(amount) OVER (PARTITION BY user_id) AS share
FROM orders;

5. Maximum order amount per user

SELECT
    user_id,
    order_id,
    amount,
    MAX(amount) OVER (PARTITION BY user_id) AS max_in_group
FROM orders;

6. Rank users by revenue inside their city

SELECT
    city,
    user_id,
    revenue,
    RANK() OVER (PARTITION BY city ORDER BY revenue DESC) AS city_rank
FROM user_stats;

7. Previous order amount for the same user

SELECT
    user_id,
    order_id,
    amount,
    LAG(amount) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_amount
FROM orders;

8. Average per user without aggregating away rows

SELECT
    user_id,
    order_id,
    amount,
    AVG(amount) OVER (PARTITION BY user_id) AS user_avg
FROM orders;

9. Monthly running total that resets each calendar month

SELECT
    order_id,
    user_id,
    amount,
    created_at,
    SUM(amount) OVER (
        PARTITION BY DATE_TRUNC('month', created_at)
        ORDER BY created_at
    ) AS monthly_cumulative
FROM orders;

10. Percent change from previous order

SELECT
    user_id,
    order_id,
    amount,
    (amount - LAG(amount) OVER (PARTITION BY user_id ORDER BY created_at)) * 100.0
    / NULLIF(LAG(amount) OVER (PARTITION BY user_id ORDER BY created_at), 0) AS pct_change
FROM orders;

NULLIF guards against a zero previous value and turns a division error into a clean NULL.

Common pitfalls

When candidates fail a PARTITION BY question on a live interview, it is almost never the syntax. It is one of these five traps.

The first pitfall is forgetting that PARTITION BY does not collapse rows. Someone writes SELECT SUM(amount) OVER (PARTITION BY user_id) FROM orders expecting one row per user and is surprised by the input row count coming back. The fix is to wrap it in a CTE and SELECT DISTINCT on the partition key, or switch to GROUP BY if you never needed per-row context. Knowing which one applies is the signal interviewers are testing.

A second pitfall is mixing ORDER BY semantics. Adding ORDER BY to a SUM window quietly turns it into a running total because the default frame becomes RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Be explicit: if you want a frozen total, omit ORDER BY; if you want a running total, keep it and rename the column so future readers know what it is.

A third pitfall is ranking ties with the wrong function. ROW_NUMBER assigns 1, 2, 3 even when rows tie on the order column, making top-N filters non-deterministic. RANK ties them as 1, 1, 3 and DENSE_RANK as 1, 1, 2. Pick the one that matches the business question and add a tie-breaker (ORDER BY revenue DESC, product_id ASC) so the result is stable.

A fourth pitfall is partition keys that hide nulls. If user_id is NULL for guest checkouts, all null users land in the same partition, inflating "the anonymous customer" in your top-N. Filter the null partition explicitly or coalesce to a synthetic guest id.

A fifth pitfall is mixing ROWS and RANGE framing. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means seven rows of data; on a time series with missing days, that is not a seven-day average. Switch to RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW on engines that support it, or fill the gaps with a calendar table first.

Optimization tips

PARTITION BY is cheap when the partition key matches a physical layout, and expensive when it does not. The optimizer cannot repartition a billion-row table on a random hash for free.

On Snowflake, BigQuery, and Redshift, the engine parallelises window functions by partition key. PARTITION BY user_id over a table clustered on user_id runs in roughly linear time, while PARTITION BY DATE_TRUNC('month', created_at) on the same table has to reshuffle first. Align the physical layout to your most common access pattern.

On Postgres, the equivalent move is a B-tree index on (partition_key, order_key) together. An index on (user_id, created_at) lets SUM(amount) OVER (PARTITION BY user_id ORDER BY created_at) produce sorted input directly, skipping the in-memory sort that dominates runtime on warm caches. EXPLAIN ANALYZE confirms whether the sort node disappeared.

When multiple window functions share a partition, write the OVER() clauses identically so the planner computes them in one pass. Cleaner alternative: the named window. WINDOW w AS (PARTITION BY user_id ORDER BY created_at) then SUM(x) OVER w, AVG(x) OVER w.

Finally, filter before the window, not after. Window functions evaluate after WHERE and before QUALIFY. If you only care about active users from the last 90 days, filter those rows first. Partitions get smaller, the sort gets cheaper, the result is identical.

If you want to drill PARTITION BY questions like these every day until they feel automatic, the NAILDD app is launching with a backlog of SQL problems built around exactly this pattern.

FAQ

Is PARTITION BY slower than GROUP BY?

On paper, no. Both have the same asymptotic complexity because both need to group rows by key, and the heavy step is the underlying sort or hash. The difference is output volume: GROUP BY returns one row per group, while PARTITION BY returns one row per input row plus an aggregate column, so it almost always moves more bytes. If you only need the aggregate values, GROUP BY is cheaper. If you need row-level context next to the aggregate, PARTITION BY is the only sensible option and the extra row volume is the price.

Can I use several PARTITION BY clauses in one SELECT?

Yes. Each window function carries its own OVER() clause, so you can mix partition keys freely. A common funnel pattern is SUM(x) OVER (PARTITION BY user_id) for the user's lifetime spend alongside SUM(x) OVER (PARTITION BY country) for the country total in the same SELECT. The planner runs them in parallel when partitions differ and fuses them when they match.

How does it differ from table partitioning in DDL?

They are unrelated despite sharing the keyword. PARTITION BY in a window function is a logical operation that slices query rows into windows for one calculation. PARTITION BY in CREATE TABLE is a physical operation that splits the data into files or shards on disk. The two can interact: if your table is physically partitioned by created_at and your window is also partitioned by created_at, the engine can usually skip a reshuffle.

Can I use PARTITION BY without ORDER BY?

Yes, and it is the right move when you want a flat per-group aggregate next to every row. SUM(amount) OVER (PARTITION BY user_id) returns the user's lifetime total on every order row. Adding ORDER BY turns the same SUM into a running total because the default frame becomes a growing prefix. Use ORDER BY for sequence-aware aggregates like running totals, moving averages, LAG, LEAD, and ranking functions; skip it for group totals and group averages.

When should I switch from PARTITION BY to a self-join?

Almost never on a modern engine. Self-joins to compute the previous order amount or the rank inside a group were the textbook answer before window functions reached every database. Today, LAG and ROW_NUMBER produce the same answer in a fraction of the runtime and one tenth of the code. A self-join still wins only when you need to pair rows across different tables or filters that no window frame can express, for example matching each order with the user's most recent support ticket regardless of order timing.

Which engines fully support PARTITION BY?

All major analytical engines: Postgres, MySQL 8+, SQL Server, Oracle, Snowflake, BigQuery, Redshift, Databricks, ClickHouse, and DuckDB. Syntax is identical for the basics. Differences appear at the edges: QUALIFY to filter on a window function exists in BigQuery, Snowflake, and Databricks but not Postgres or MySQL; RANGE BETWEEN INTERVAL framing has subtle date-arithmetic differences. Stick to PARTITION BY plus ORDER BY plus ROWS BETWEEN and your SQL ports cleanly across every engine you are likely to interview against.