How to calculate Apdex score in SQL
Contents:
What Apdex is and why latency dashboards need it
"Our p95 latency is 500 milliseconds — is that good?" is the wrong question. It is good for a search page and catastrophic for a payment authorization. Percentiles tell you the shape of a distribution, but not whether users are happy, and they do not aggregate across endpoints with different latency expectations. Apdex — the Application Performance Index — folds a latency distribution into a single number between zero and one, anchored to a target T per endpoint, so a slow checkout and a fast static asset are not averaged into mush.
Apdex has stuck around for two decades because it survives a Slack DM. When the SRE manager at Stripe or Snowflake pings you with "is the API healthy?", they want a number, not a five-pane Grafana screenshot. 0.94 means "almost everyone is satisfied". 0.62 means "most users feel pain". The same number lands in a status badge, an executive deck, and a release postmortem with the same meaning each time.
This post walks through the SQL: the base score, the per-endpoint breakdown, dynamic thresholds, hour-over-hour trend, and the strict-versus-standard handling of errors interviewers love to ask about. Queries run in Postgres with minor edits for Snowflake, BigQuery, or Redshift.
The formula and the three buckets
Apdex sorts every request into one of three buckets relative to a target T. Requests at or under T are "satisfied", between T and 4T "tolerating", above 4T "frustrated". The score weights satisfied at full value, tolerating at half, frustrated at zero, then divides by total.
Apdex = (satisfied + tolerating / 2) / totalThe output is bounded between zero and one. Conventional reading is 0.94–1.0 excellent, 0.85–0.94 good, 0.70–0.85 fair, below 0.70 poor. The 4T multiplier assumes that once latency crosses four times the target, the user experience has degraded so much that the request is effectively a failure. Pick T correctly and the math follows; pick T wrong and the score either pins at 1.0 (too generous) or collapses to 0.2 (too aggressive).
The data model
Assume a single fact table of request samples — most observability pipelines land here once you flatten OpenTelemetry spans.
api_requests (request_id, endpoint, status_code, latency_ms, request_timestamp)endpoint is the route template (/api/orders/:id, not /api/orders/42) so per-endpoint aggregates do not explode into one row per unique URL.
Five SQL queries you will actually ship
1. Apdex over a single window
Set the threshold once, bucket with FILTER clauses, divide.
WITH config AS (
SELECT 500 AS t_ms
)
SELECT
COUNT(*) AS total,
COUNT(*) FILTER (WHERE latency_ms <= (SELECT t_ms FROM config)) AS satisfied,
COUNT(*) FILTER (WHERE latency_ms > (SELECT t_ms FROM config)
AND latency_ms <= 4 * (SELECT t_ms FROM config)) AS tolerating,
COUNT(*) FILTER (WHERE latency_ms > 4 * (SELECT t_ms FROM config)) AS frustrated,
(COUNT(*) FILTER (WHERE latency_ms <= (SELECT t_ms FROM config))
+ COUNT(*) FILTER (WHERE latency_ms > (SELECT t_ms FROM config)
AND latency_ms <= 4 * (SELECT t_ms FROM config)) / 2.0
)::NUMERIC / NULLIF(COUNT(*), 0) AS apdex
FROM api_requests
WHERE request_timestamp >= NOW() - INTERVAL '1 hour';The FILTER clause is the standards-compliant way to do conditional aggregation in Postgres. NULLIF(COUNT(*), 0) is defensive: during a traffic gap it returns null instead of crashing the dashboard with division-by-zero.
2. The compact SUM(CASE) form
Same calculation with weights inline; ports trivially to dialects without FILTER.
SELECT
SUM(CASE
WHEN latency_ms <= 500 THEN 1.0
WHEN latency_ms <= 4 * 500 THEN 0.5
ELSE 0.0
END)::NUMERIC / NULLIF(COUNT(*), 0) AS apdex
FROM api_requests
WHERE request_timestamp >= NOW() - INTERVAL '1 hour';This is the version I reach for on a whiteboard: three branches and a divide.
3. Apdex by endpoint
The version a platform team puts on the wall. Group by route, show endpoints with enough traffic to matter, sort ascending so the worst offenders surface first.
SELECT
endpoint,
COUNT(*) AS requests,
SUM(CASE
WHEN latency_ms <= 500 THEN 1.0
WHEN latency_ms <= 2000 THEN 0.5
ELSE 0.0
END)::NUMERIC / NULLIF(COUNT(*), 0) AS apdex
FROM api_requests
WHERE request_timestamp >= NOW() - INTERVAL '1 hour'
GROUP BY endpoint
HAVING COUNT(*) >= 100
ORDER BY apdex ASC
LIMIT 10;The endpoint with the lowest Apdex is almost always the right next thing to optimize. Unlike a raw p95 ranking, this accounts for traffic volume through the weighting — a flaky 50-request-per-hour route will not outrank a 200,000-request-per-hour login flow.
4. Per-endpoint dynamic threshold
A single T across all endpoints is the most common rookie mistake. Fix it with a small endpoint-class lookup joined in by longest-pattern-wins.
WITH thresholds AS (
SELECT '/api/payments/%' AS pattern, 100 AS t_ms UNION ALL
SELECT '/api/search/%', 300 UNION ALL
SELECT '/api/%', 150 UNION ALL
SELECT '/web/%', 500 UNION ALL
SELECT '%', 1000
),
classified AS (
SELECT
r.endpoint,
r.latency_ms,
(SELECT t_ms FROM thresholds t
WHERE r.endpoint LIKE t.pattern
ORDER BY LENGTH(t.pattern) DESC LIMIT 1) AS t_ms
FROM api_requests r
WHERE request_timestamp >= NOW() - INTERVAL '1 hour'
)
SELECT
endpoint,
t_ms,
SUM(CASE
WHEN latency_ms <= t_ms THEN 1.0
WHEN latency_ms <= 4 * t_ms THEN 0.5
ELSE 0.0
END)::NUMERIC / NULLIF(COUNT(*), 0) AS apdex
FROM classified
GROUP BY endpoint, t_ms
HAVING COUNT(*) >= 100
ORDER BY apdex ASC;Without longest-pattern-wins ordering, /api/payments/charge could match /api/% before the stricter /api/payments/% rule and silently apply the wrong target.
5. Hour-over-hour Apdex change
Window functions earn their keep finding the moment Apdex dropped. LAG pulls the previous hour's value into the same row so percent change is one expression away.
WITH hourly AS (
SELECT
DATE_TRUNC('hour', request_timestamp)::TIMESTAMP AS hour,
SUM(CASE
WHEN latency_ms <= 500 THEN 1.0
WHEN latency_ms <= 2000 THEN 0.5
ELSE 0.0
END)::NUMERIC / NULLIF(COUNT(*), 0) AS apdex
FROM api_requests
WHERE request_timestamp >= NOW() - INTERVAL '7 days'
GROUP BY 1
)
SELECT
hour,
apdex,
LAG(apdex) OVER (ORDER BY hour) AS prev_apdex,
ROUND(100.0 * (apdex - LAG(apdex) OVER (ORDER BY hour))
/ NULLIF(LAG(apdex) OVER (ORDER BY hour), 0), 2) AS hoh_pct
FROM hourly
ORDER BY hour DESC
LIMIT 24;This is the query I run when a PM at Notion or Linear says "did anything regress overnight?" — the worst hour-over-hour drop is the first row. The strict-Apdex variant prepends WHEN status_code >= 500 THEN 0.0 to the CASE so 5xx responses always count as frustrated regardless of latency; this is the version that lands in a status badge.
Picking the threshold T
T is the only knob in Apdex, and most poor scores trace back to a wrong T rather than to actually slow code. A defensible starting grid:
Web pages (server-rendered) T = 500ms frustrated > 2000ms
JSON APIs (user-facing) T = 100ms frustrated > 400ms
JSON APIs (server-to-server) T = 200ms frustrated > 800ms
Search queries T = 300ms frustrated > 1200ms
Payment authorization T = 800ms frustrated > 3200msThree rules keep T honest. First, T reflects the experience the product team has promised, not the latency the platform has shipped. If your homepage renders at 1.8 seconds but the brand promise is "fast", set T to 500 milliseconds and Apdex will tell you the homepage is failing. Second, T should not drift — comparing across releases only works if T is fixed. Third, T should not be uniform across endpoints; the dynamic-threshold query above is the version that holds up on any system with more than a handful of route classes.
Common pitfalls
The first trap is a global T across endpoints with different latency budgets. A single 500-millisecond T treats a static asset and a payment authorization the same way, so either the asset looks artificially perfect or the payment looks artificially broken. The fix is the per-endpoint lookup from section 4. Treat T as a product decision, not a default.
The second trap is silently counting errors as zero-weight requests when the score is reported as pure latency. A 500 returned in 80 milliseconds gets binned as "satisfied" — mathematically what the spec says, absurd from a user perspective. Choose explicitly: strict Apdex (errors frustrated) or standard Apdex (errors dropped and reported separately). Pick one, document it, apply it consistently.
The third trap is reporting one Apdex for the entire product. A team-wide score hides the failures it should expose: a 0.92 aggregate can mask a 0.55 on the most important endpoint, simply because the broken route handles less traffic. Produce Apdex per service and per endpoint class; an aggregate without a breakdown is comfort food, not signal.
The fourth trap is changing the threshold without flagging the historical comparison. If T moves from 500 to 300 milliseconds mid-month, every Apdex point after that date is on a different yardstick. Either freeze T for trend windows, recompute the history under the new T, or split the chart with a vertical line at the change. Silent threshold changes are the most common reason Apdex trends "lie" between quarters.
The fifth trap is treating Apdex as a sole health metric. It compresses latency and, depending on policy, partial reliability information. It does not capture saturation, freshness, or business outcomes. Put it alongside error rate, throughput, and one or two product metrics — Apdex tells you whether the requests ran fast enough; the rest tells you whether they were the right requests.
Optimization tips
Partition the request table by request_timestamp — daily partitions are standard, hourly for very high-volume services. Apdex queries are range-bounded by time, so partition pruning collapses the scan to "the last hour" without changing the SQL.
Materialize hourly endpoint-level aggregates. The per-endpoint Apdex query is a perfect candidate for a matview refreshed every 5–15 minutes. Store satisfied, tolerating, frustrated, and total per (endpoint, hour); every dashboard query reduces to a small SUM.
Store the buckets, not just the score. If you precompute Apdex as a single float, you cannot recompute under a different threshold. Store the histogram per endpoint per minute and you can change T retroactively, rerunning history in seconds.
A Python sanity check
When a SQL Apdex looks off, recompute from pandas and compare.
t = 500
buckets = (latencies <= t).astype(float) + ((latencies > t) & (latencies <= 4 * t)) * 0.5
apdex = buckets.sum() / max(len(latencies), 1)If pandas and SQL disagree, the bug is almost always in the SQL — a missed boundary on the tolerating bucket (< vs <=) or a timezone-shifted timestamp filter. The spec uses <= T and <= 4T, which the queries above follow.
Related reading
- SQL window functions interview questions —
LAGand a dozen more patterns. - How to detect anomalies in SQL — what to do once the Apdex trend produces a step change.
- A/B testing peeking mistake — when Apdex is the success metric of a performance experiment.
If you want to drill SQL like this every day, NAILDD is launching with hundreds of analytics and SRE-flavored problems — bucketed aggregations, threshold tuning, window functions.
FAQ
What counts as a "good" Apdex score?
Conventional bands are 0.94–1.0 excellent, 0.85–0.94 good, 0.70–0.85 fair, below 0.70 poor. The bands come from the original Apdex Alliance spec and correlate reasonably well with self-reported user satisfaction. They only mean anything if T is set correctly: 0.97 against a generous T is worse news than 0.88 against an aggressive one — the first is silently failing, the second is honestly reporting a real product.
How do I pick T for a checkout flow?
Anchor on the brand promise, not the current latency. For a user-facing checkout, target T in 600–1000 milliseconds and frustrated in 2400–4000. For the backend payment authorization inside that flow, target T closer to 200–500 milliseconds. Two targets — end-to-end experience and internal service — give you both views without conflating them.
Is Apdex deprecated in favor of raw percentiles?
Some teams have moved to percentile-based SLO dashboards (p50, p95, p99 against a target) and dropped Apdex. Percentiles preserve more information and slot into burn-rate alerting. The counterargument is that Apdex is one number that fits in a status badge. Most mature platform orgs keep both and compute them from the same histogram.
How should I handle errors in the calculation?
Decide between strict mode (errors are frustrated) and standard mode (errors reported separately), document it, never silently switch. Strict is the right default for a status badge. Standard is the right default for a latency-focused dashboard paired with an error-rate panel next to it.
How do I calibrate T against actual user perception?
With a survey channel, ask users directly with a slider; the median is a defensible T. Without a survey, the next best signal is bounce or abandonment as a function of latency — find the threshold above which abandonment rises sharply and set T at or below that elbow. Google's RAIL guidelines are a reasonable starting point for web pages.
Can I compute Apdex over a histogram instead of raw samples?
Yes, and at high volume you usually should. Store latency in fixed buckets or a t-digest sketch per endpoint per minute, then sum the counts on each side of T and 4T. The math is identical — Apdex is a function of counts in three buckets, not individual samples. This is what makes retroactive threshold tuning practical: change T, sum the buckets differently, history rebuilds.