How to calculate GMV in SQL
Contents:
Why GMV matters
A marketplace announces "GMV up 30% quarter over quarter." The audience claps. The analyst digs in: company revenue grew 5%, take rate dropped from 10% to 8%. The real story is growth bought with promos and discounts, not organic demand. If you cannot reproduce the headline number in SQL, you cannot push back on the interpretation.
GMV is the top-line metric every marketplace and two-sided network reports to investors. Analysts at Amazon, DoorDash, Uber, Airbnb, and Etsy build the same queries every week: GMV by category, GMV by seller, weekly trend, new buyer share, GMV minus refunds. Your PM emails on Monday morning: "Need GMV by acquisition cohort for the board on Wednesday, broken out by category, with a new vs repeat split." This guide walks the SQL one block at a time so you can deliver that without a Slack thread of follow-up questions.
What GMV actually means
GMV (gross merchandise value) is the total value of goods and services transacted through a platform during a period, measured at the price the buyer paid, before the platform takes a cut.
GMV = SUM(item_price * quantity) for orders in a "paid" or "delivered" stateGMV is not revenue. On a marketplace, GMV is the money that flowed through the platform; revenue is the commission the platform earned (take rate times GMV) plus ads, subscription fees, and logistics charges. Confusing the two will get you a polite "let us re-run the numbers" email from finance and a less polite Slack DM from your manager.
Two more definitions you should pin down before writing any query. Net GMV subtracts refunds, cancellations, and platform-funded discounts from gross GMV. Take rate is revenue / GMV expressed as a percent. Both numbers tell you what the platform actually keeps, which is usually what investors care about underneath the headline.
The base SQL formula
Assume an orders table with order_id, user_id, total, status, created_at, currency. The simplest possible query for a single calendar month looks like this.
SELECT
SUM(total) AS gmv
FROM orders
WHERE status IN ('paid', 'delivered')
AND created_at >= '2026-04-01'
AND created_at < '2026-05-01';The status filter is the part new analysts forget. pending orders (still in the cart), cancelled orders, and failed payment attempts must not count toward GMV. Including them typically overstates the number by 20 to 35 percent on a consumer marketplace. Which exact statuses to include depends on your warehouse schema; check with the data engineer who owns the orders pipeline before you ship.
If created_at is stored in UTC but the business reports in local time, wrap the column in AT TIME ZONE before truncation. Otherwise weekend days leak across the date boundary and the daily series looks noisy for no reason.
GMV by slice
The naked total is rarely what anyone wants. Below are the three slices that show up in almost every marketplace dashboard.
By category
SELECT
category,
SUM(total) AS gmv,
COUNT(*) AS orders,
SUM(total)::NUMERIC / NULLIF(COUNT(*), 0) AS aov
FROM orders
WHERE status IN ('paid', 'delivered')
AND created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY category
ORDER BY gmv DESC;NULLIF(COUNT(*), 0) protects against divide-by-zero when a category is empty for the chosen window. Sorting by GMV descending makes the table immediately readable in a Notion doc; sorting by AOV gives a different, also useful, view of which categories are high-ticket.
By seller or merchant
SELECT
seller_id,
SUM(total) AS gmv,
COUNT(DISTINCT user_id) AS buyers,
COUNT(*) AS orders
FROM orders
WHERE status IN ('paid', 'delivered')
AND created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY seller_id
ORDER BY gmv DESC
LIMIT 100;The top-100 sellers usually drive 60 to 80 percent of GMV on a mature marketplace; the long tail is where most of the headcount goes. If you see a single seller accounting for more than 5 percent of GMV, escalate that as concentration risk to the business team.
New vs repeat buyers
WITH user_orders AS (
SELECT
user_id,
total,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS order_num
FROM orders
WHERE status IN ('paid', 'delivered')
)
SELECT
CASE WHEN order_num = 1 THEN 'new' ELSE 'repeat' END AS buyer_type,
SUM(total) AS gmv,
COUNT(*) AS orders,
AVG(total) AS aov
FROM user_orders
GROUP BY 1;A healthy split for a mature marketplace is 60 to 80 percent of GMV from repeat buyers. If your business is below 30 percent, you are paying acquisition costs every quarter to refill the top of the funnel, which is a different financial profile and probably means the contribution margin slide deserves another look.
GMV vs revenue vs net GMV
The three numbers measure different things and are easy to confuse.
| Metric | What it captures | Formula |
|---|---|---|
| GMV | Full order value flowing through the platform | SUM(total) for paid or delivered orders |
| Net GMV | GMV minus refunds, cancellations, platform discounts | SUM(total) - SUM(refunds) - SUM(platform_discounts) |
| Revenue | What the platform actually earns | SUM(commission) + SUM(ads_revenue) + SUM(fees) |
A single query that produces all three plus the implied take rate.
WITH gross AS (
SELECT SUM(total) AS gmv
FROM orders
WHERE status IN ('paid', 'delivered')
AND created_at >= '2026-04-01' AND created_at < '2026-05-01'
),
refunds AS (
SELECT SUM(refund_amount) AS refunded
FROM order_refunds
WHERE refund_date >= '2026-04-01' AND refund_date < '2026-05-01'
),
platform_discounts AS (
SELECT SUM(discount_amount) AS discount
FROM order_discounts
WHERE source = 'platform'
AND created_at >= '2026-04-01' AND created_at < '2026-05-01'
),
revenue AS (
SELECT SUM(commission_amount) AS commission
FROM order_commissions
WHERE commission_date >= '2026-04-01' AND commission_date < '2026-05-01'
)
SELECT
g.gmv,
g.gmv - COALESCE(r.refunded, 0) - COALESCE(pd.discount, 0) AS net_gmv,
rev.commission AS revenue,
rev.commission::NUMERIC / NULLIF(g.gmv, 0) * 100 AS take_rate_pct
FROM gross g
CROSS JOIN refunds r
CROSS JOIN platform_discounts pd
CROSS JOIN revenue rev;Reasonable take rate benchmarks: Etsy and eBay sit around 10 to 14 percent, Amazon third-party near 15 percent, DoorDash and Uber Eats 25 to 30 percent on commissions plus delivery fees, Airbnb around 14 percent. If take rate drifts down quarter over quarter without a pricing change, the usual culprit is rising platform-funded promos eating into the revenue line. Plot revenue and GMV on the same chart and the divergence becomes obvious.
GMV trend and seasonality
A daily series with a 7-day rolling average and a week-over-week comparison is the single most useful executive view of GMV.
WITH daily AS (
SELECT
DATE_TRUNC('day', created_at) AS day,
SUM(total) AS gmv
FROM orders
WHERE status IN ('paid', 'delivered')
AND created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY 1
)
SELECT
day,
gmv,
AVG(gmv) OVER (
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS gmv_7d_avg,
gmv * 100.0 / NULLIF(LAG(gmv, 7) OVER (ORDER BY day), 0) - 100 AS wow_change_pct
FROM daily
ORDER BY day;The 7-day rolling average smooths the day-of-week effect that wrecks consumer marketplace charts. The WoW comparison sets the same weekday against itself, which is the right denominator. For seasonal businesses, layer a year-over-year column on top by joining the prior-year date.
Common pitfalls
The most common mistake on first calculations is using SUM(total) without a status filter. Carts in pending, orders in cancelled, and unrecorded payment failures inflate the number by 20 to 35 percent depending on the funnel. The fix is to pin the allowed states explicitly with the engineering team that owns the orders pipeline and document the list at the top of the dashboard so the next analyst does not relitigate the decision.
The second trap is ignoring refunds. In month one, GMV looks great; thirty days later a wave of refunds lands and the real number is materially lower. Headlines based on gross GMV will be revised quietly, which erodes trust in your dashboards. For internal decisions always compute net GMV in parallel and put both columns on the same view.
A third trap is summing across currencies without conversion. Orders in USD, EUR, and GBP cannot be added directly. The correct pattern is SUM(amount * fx_rate) joined against a daily FX table, snapping the rate to the transaction date so revenue and GMV land on the same exchange basis. Hard-coding a single rate at the top of a script will be wrong the moment the currency moves.
A fourth pitfall is letting GMV and revenue swap names in the same conversation. "GMV up 30 percent" makes a finance director ask why the revenue forecast still says 5. Pick a glossary, share it across product, finance, and analytics, and link to it from every dashboard.
A fifth trap is double-counting refunds when both an order_refunds row exists and the order itself flips to a refunded status. The CTE that computes net GMV will subtract twice and the number drops suspiciously. Inspect the join and pick one source of truth; usually order_refunds is the right table because it carries the exact refund timestamp and amount.
Finally, tax handling silently breaks comparisons. Some platforms report GMV inclusive of VAT or sales tax, others exclusive. Comparing Etsy GMV to Amazon GMV without normalizing means measuring on different rulers. Ask finance which convention your total column follows and write it down next to the metric name.
Optimization tips
GMV queries hit the orders table hard, so warehouse cost is mostly about reducing rows scanned. Partition orders by created_at at day or week granularity; on Snowflake and BigQuery the partition pruner skips everything outside the window automatically. Pair the partition with a clustering key on status so the most common filter does not require a full scan.
For dashboards that re-run the same monthly aggregation hourly, materialize a daily GMV summary table keyed by day, category, currency and refresh it incrementally. The dashboard then reads thousands of rows instead of millions. On BigQuery a scheduled MERGE INTO is the simplest pattern; on Snowflake a dynamic table or task does the same job.
On Postgres or another row store, build a composite index on (status, created_at) rather than two single-column indexes. The planner can range-scan the second column only when the first is equality-filtered, and the GMV query is exactly that shape. Run EXPLAIN ANALYZE on a representative slice to confirm.
When GMV needs to be broken out by seller, category, day, and currency in the same query, push the heavy aggregation into a CTE and let the outer query do only the cosmetic math. That keeps the wide group-by from being recomputed inside window functions — the single biggest accidental cost driver in marketplace SQL.
Related reading
- How to calculate AOV in SQL
- How to calculate cohort revenue in SQL
- How to calculate contribution margin in SQL
- SQL window functions interview questions
If you want to drill marketplace SQL questions like this every day, NAILDD is launching with 500+ analyst problems built around exactly this pattern.
FAQ
Is GMV the same as revenue?
No. GMV is the total order value that flowed through the platform; revenue is what the platform earned from those orders. On a marketplace with a 12 percent take rate, revenue is roughly 12 percent of GMV. The two numbers move together but diverge when the platform funds discounts, runs promos at a loss, or changes the take rate. Confusing them in a board deck is the fastest way to lose the analytics seat at the table.
What take rate counts as healthy?
Benchmarks vary by category. Consumer marketplaces like Etsy and eBay sit at 10 to 14 percent. Food delivery (DoorDash, Uber Eats) runs 20 to 35 percent including delivery fees. Concert and event tickets land around 5 to 10 percent. Travel platforms like Airbnb take 14 to 25 percent. Compare against your direct competitors rather than against the global average, because category economics differ a lot.
Should I report GMV or net GMV?
Internally, both. Net GMV is the honest number for management decisions because it reflects refunds and platform-funded discounts. Gross GMV is the headline number most companies put in press releases because it looks larger. The cleanest practice is to publish both side by side and link to the definitions so nobody re-debates the choice in every meeting.
Should GMV be measured at order date or delivery date?
Most platforms anchor on order date because that is when the buyer commits and the money flows. Logistics and supply-chain teams sometimes want a delivery-date series for capacity planning. The important thing is to fix one convention per dashboard and label it clearly; switching between the two without documenting will create unexplained spikes near month boundaries.
How do I compare GMV across marketplaces?
Comparisons are noisy because of currency conversion, VAT inclusion, treatment of digital goods, and whether services revenue is bundled in. The safest approach is to use the published quarterly numbers from each company's filings, which apply the same methodology consistently inside that company, and avoid stitching together your own apples-to-oranges comparison from public data.
How do I forecast GMV?
For a stable marketplace, a seasonally-adjusted ARIMA or Prophet model on daily GMV gets you most of the way. For early-stage businesses where promo schedules dominate the signal, forecast new buyers, repeat buyers, and AOV separately and multiply — that decomposition gives growth a lever they can actually pull.