GROUP BY vs PARTITION BY — what is the difference

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why this question keeps showing up

If you have interviewed for an analyst role at Stripe, DoorDash, or Snowflake, you have probably been asked some version of "what is the difference between GROUP BY and PARTITION BY". Candidates who recite "one aggregates, the other does windows" usually fail the follow-up that asks them to compute revenue share without collapsing the orders. The interviewer is checking whether you understand that the two clauses live at different stages of the query.

Imagine your PM at Airbnb pings you on Slack: "I need a list of every booking from last quarter with each guest's share of total host revenue next to it, before the Monday review." A junior analyst tries GROUP BY, realizes the individual bookings vanish, then writes a self-join that times out. A senior analyst writes one window function and goes home.

The mental model is simple. GROUP BY changes the shape of your result set: it folds many rows into one. PARTITION BY does not change the shape — it tells a window function "compute this number within each group, but leave the rows alone." Hold that distinction and the rest is mechanics.

How GROUP BY works

GROUP BY combines rows that share the same values in one or more columns and lets you apply an aggregate over each group. The output has one row per group. Anything in the SELECT list must appear in the GROUP BY or be wrapped in an aggregate like COUNT, SUM, AVG, MIN, or MAX. The rule exists because the database has no idea which underlying row to return when many rows collapse into one.

SELECT department,
       AVG(salary) AS avg_salary,
       COUNT(*) AS headcount
FROM employees
GROUP BY department;

The result is one row per department. Individual employee names, hire dates, and managers are gone — they cannot exist in this output because each department row represents many people.

department avg_salary headcount
Sales 85,000 12
Engineering 120,000 45
Support 62,000 18

GROUP BY runs after WHERE and before HAVING. To filter on an aggregate use HAVING — WHERE cannot see the aggregated value yet. A common interview gotcha is asking why WHERE COUNT(*) > 10 fails: at WHERE time the groups do not exist yet.

How PARTITION BY works

PARTITION BY appears inside the OVER() of a window function. It tells the window function to restart its calculation for each distinct group, but it does not collapse those groups into single rows. Every input row appears in the output, decorated with the window value computed for its partition.

SELECT name,
       department,
       salary,
       AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary,
       salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_dept_avg
FROM employees;

The output keeps every employee row and adds two new columns: the department average and how far each salary deviates from it. Nothing was lost.

name department salary dept_avg_salary diff_from_dept_avg
Alice Sales 90,000 85,000 5,000
Brian Sales 80,000 85,000 -5,000
Vera Engineering 130,000 120,000 10,000
Gabriel Engineering 110,000 120,000 -10,000

PARTITION BY pairs naturally with ranking functions (ROW_NUMBER, RANK, DENSE_RANK), offset functions (LAG, LEAD), and running aggregates (SUM, COUNT, AVG over a frame). OVER() can also include ORDER BY for functions that care about row order, but PARTITION BY itself is just "split into buckets, then compute per bucket."

Side-by-side comparison

The clearest way to internalize the difference is to put both clauses next to each other.

Aspect GROUP BY PARTITION BY
Output row count One row per group Same as input
Where it lives in the query Top-level clause Inside OVER() of a window function
Access to individual rows Lost Preserved
Pairs with HAVING, aggregate functions ROW_NUMBER, RANK, LAG, LEAD, SUM, AVG, COUNT over OVER
Typical use case Summary report, KPI dashboard Per-row context: share of group, rank, running total
Evaluation order After WHERE, before SELECT projection After GROUP BY and HAVING, before ORDER BY
Multiple in same query One GROUP BY clause per query A different PARTITION BY per window function

The evaluation-order row catches the most candidates. Window functions run after GROUP BY, so you can layer PARTITION BY on top of an aggregated result without a subquery if everything you reference is either grouped or an aggregate.

When to reach for each one

GROUP BY is the right tool when you want a summary. The PM wants revenue by month. The CFO wants headcount by team. The growth lead wants signups by acquisition channel. You need totals, not individual rows — collapsing is exactly what you want.

-- Monthly revenue summary
SELECT DATE_TRUNC('month', order_date) AS month,
       SUM(amount) AS total_revenue,
       COUNT(DISTINCT customer_id) AS unique_buyers
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

PARTITION BY is the right tool when each row needs context from its group without losing the row: each order with that customer's lifetime spend next to it, each employee ranked within their team, each daily revenue point with the previous day's revenue alongside for a delta column.

-- Each order's share of that customer's total spend
SELECT order_id,
       customer_id,
       amount,
       amount * 100.0 / SUM(amount) OVER (PARTITION BY customer_id) AS pct_of_customer_total
FROM orders;
-- Day-over-day change per product
SELECT DATE,
       product_id,
       revenue,
       revenue - LAG(revenue) OVER (PARTITION BY product_id ORDER BY DATE) AS daily_change
FROM daily_revenue;

If you write a GROUP BY and then join the result back to the original table to recover row-level context, that is a strong signal you should have used PARTITION BY from the start.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Combining GROUP BY with window functions

You can use both in the same query. Aggregation runs first, then the window function operates over the aggregated rows. This pattern is everywhere in dashboard SQL.

-- Monthly revenue with each month's share of the full-year total
SELECT month,
       monthly_revenue,
       monthly_revenue * 100.0 / SUM(monthly_revenue) OVER () AS pct_of_year
FROM (
  SELECT DATE_TRUNC('month', order_date) AS month,
         SUM(amount) AS monthly_revenue
  FROM orders
  WHERE order_date >= '2025-01-01'
  GROUP BY DATE_TRUNC('month', order_date)
) t
ORDER BY month;

A neater variant in databases that support it skips the subquery by referencing the GROUP BY column directly:

SELECT DATE_TRUNC('month', order_date) AS month,
       SUM(amount) AS monthly_revenue,
       SUM(amount) * 100.0 / SUM(SUM(amount)) OVER () AS pct_of_year
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

The nested SUM(SUM(amount)) OVER () looks strange but is legal: the inner SUM is the group aggregate, the outer SUM is the window aggregate over the grouped rows. Interviewers at Snowflake and Databricks love this pattern because it tests evaluation order.

When an interviewer asks you to take a summary and add a "share of total" or "delta vs previous period" column, that is the classic GROUP BY plus window function setup. Walking through the order — group first, window after — scores points on its own.

Common pitfalls

The first trap is reaching for GROUP BY when you actually need PARTITION BY. The symptom is a query that collapses rows you wanted to keep, followed by a panic self-join to glue the row-level details back on. If your output needs the same number of rows as your input plus a column derived from each row's group, PARTITION BY is almost always the right answer. Reading the question — "for each order" versus "for each customer" — tells you which one the interviewer wants.

The second trap is forgetting that ORDER BY inside OVER() changes the semantics for cumulative aggregates. SUM(amount) OVER (PARTITION BY customer_id) returns the total per customer on every row. SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) returns a running total that grows row by row. Interviewers ask for the second when they say "cumulative revenue" and the first when they say "share of customer total."

The third trap is filtering on a window result with WHERE. The window function has not run yet at WHERE time, exactly like aggregates have not run yet at WHERE time. You need a subquery or CTE: compute the window column in the inner query, filter in the outer one. WHERE ROW_NUMBER() OVER (...) = 1 will produce a syntax error in every major database.

The fourth trap is leaving PARTITION BY off when you meant to include it. RANK() OVER (ORDER BY salary DESC) ranks across the entire table; RANK() OVER (PARTITION BY department ORDER BY salary DESC) ranks within each department. Skipping PARTITION BY silently produces a wrong answer rather than an error — the worst kind of bug on a "top earner per team" prompt.

Performance and optimization tips

When both clauses can solve a problem, GROUP BY usually wins on cost because it emits fewer rows. If a dashboard needs only the summary, write the summary — do not compute window aggregates and then DISTINCT them down. The planner cannot always undo that mistake.

Columnar warehouses like Snowflake, BigQuery, and Databricks SQL are good at PARTITION BY when the partition column is clustered or sorted. If you regularly compute SUM(amount) OVER (PARTITION BY customer_id) on a billion-row table, clustering by customer_id cuts runtime dramatically because rows in the same partition land in the same micro-partitions.

When combining GROUP BY with a window function on the grouped result, prefer the single-query form with SUM(SUM(amount)) OVER () over the subquery form. The planner sees one logical operation instead of an aggregate plus outer scan, and the resulting plan is identical or better.

Avoid PARTITION BY on a high-cardinality column you do not actually need. PARTITION BY user_id on a table with 200 million users forces the engine to hold state for every user. If your real question is "share of last 30 days' active users," partitioning by active_cohort_id with a few thousand values is dramatically cheaper.

If you want to drill questions like this every day, NAILDD is launching with 500+ SQL problems built around exactly this pattern.

FAQ

Can I replace GROUP BY with PARTITION BY?

No. They produce different shapes of output. GROUP BY reduces the number of rows; PARTITION BY keeps every row and adds a column. If you need one row per department, use GROUP BY. If you need every employee with their department average alongside, use PARTITION BY. Trying to substitute one for the other will give you the wrong answer, not just a slower query.

Is PARTITION BY slower than GROUP BY?

Usually yes for the same logical question, because PARTITION BY has to emit every input row while GROUP BY emits one per group. But "slower" is not the same as "wrong" — when you need row-level context, PARTITION BY is the right tool regardless of cost, and the alternative (self-joins or correlated subqueries) is almost always worse. On modern columnar warehouses with clustered tables, the gap is small.

Can PARTITION BY be used without ORDER BY?

Yes. ORDER BY inside OVER() is required only for functions whose answer depends on row order: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, NTILE, and cumulative aggregates with a frame. For position-independent aggregates like AVG, SUM, COUNT, MIN, and MAX over the full partition, ORDER BY is optional and usually omitted.

Can I use multiple PARTITION BY clauses in one query?

Yes — every window function in your SELECT can have its own PARTITION BY. You can compute AVG(salary) OVER (PARTITION BY department) and AVG(salary) OVER (PARTITION BY city) in the same SELECT, and they will return independent per-group averages on the same input rows. This is one of the reasons window functions replace many self-joins.

What is the difference between GROUP BY and OVER without PARTITION BY?

OVER () with no PARTITION BY computes the window aggregate across the entire result set, returning the same value on every row. It is useful for "share of total" columns. GROUP BY without grouping columns is similar but collapses everything to a single row. The two produce different shapes for the same total.

How do I get "top N per group" — GROUP BY or PARTITION BY?

PARTITION BY, almost always. Use ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY metric DESC) in a CTE, then filter the outer query for rows where the row number is ≤ N. GROUP BY can give you the per-group maximum but not the row that achieved it, which is the question most interviewers actually want answered.