How to calculate agent productivity in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

What agent productivity actually measures

Support leaders love the phrase "agent productivity," and most of them mean different things by it. To a workforce manager it's tickets closed per shift. To a head of CX it's CSAT divided by burnout. To a finance partner it's cost per contact. None of those, alone, will answer the question when a VP at Stripe, DoorDash, or Notion asks on Monday morning who the top performers are this quarter and what they're doing differently.

The pattern that works is to stop chasing a single productivity number and treat agent productivity as a small basket of metrics that are honest about the trade-offs. Volume tells you who is moving the queue. Average handle time tells you who is moving it fast. First contact resolution tells you who is solving the problem in one pass. CSAT tells you how the customer felt. Optimized in isolation, any one of them breaks the others. An agent who closes forty tickets a day with a CSAT of three is not productive — they are damaging retention faster than they are clearing the queue.

Agent productivity is a balanced metric, the way pitcher productivity in baseball is balanced. Strikeouts matter, walks matter, ERA matters; nobody hires a pitcher on strikeouts alone. Build that intuition into your SQL and you'll always have a better answer than a one-line ranking.

Tickets per agent per shift

The simplest metric, and the one every dashboard already has, is closed tickets per agent per day. Start there. Just count resolved tickets, not all tickets touched — touching and reassigning is queue shuffling, not productivity.

SELECT
    agent_id,
    DATE(closed_at) AS day,
    COUNT(*)        AS tickets_closed
FROM support_tickets
WHERE status = 'resolved'
  AND closed_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY agent_id, DATE(closed_at)
ORDER BY day DESC, tickets_closed DESC;

Healthy ranges depend on tier and channel. Tier-1 chat agents at consumer products like Airbnb or Uber typically close twenty to forty tickets per shift. Tier-2 agents handling billing, refunds, and account issues run eight to fifteen. Enterprise success engineers debugging integrations at Snowflake or Databricks might close four to eight a day, and most of those span multiple sessions. If your numbers are far outside those bands, the first question is whether you're counting the same thing your peers are.

A subtle issue is shift overlap. An agent on a four-hour shift will look like they closed half as many tickets as a colleague on a full eight-hour shift even if their per-hour rate is identical. Divide by EXTRACT(EPOCH FROM (shift_end - shift_start))/3600 whenever you have shift data — productivity per hour beats per day for any team running mixed shifts.

Average handle time

Average handle time is the second metric in the basket, and the most misused. AHT is the elapsed time between when an agent picks up a ticket and when it is marked resolved. It's a speed metric, not a quality metric. The moment you optimize it directly without watching CSAT and FCR alongside, you train your team to close tickets without solving them.

SELECT
    agent_id,
    AVG(EXTRACT(EPOCH FROM (closed_at - assigned_at)) / 60) AS avg_handle_minutes,
    PERCENTILE_CONT(0.5) WITHIN GROUP (
        ORDER BY EXTRACT(EPOCH FROM (closed_at - assigned_at)) / 60
    ) AS median_minutes
FROM support_tickets
WHERE status = 'resolved'
  AND closed_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY agent_id
ORDER BY avg_handle_minutes;

Always report both the average and the median. AHT distributions are heavily right-skewed — most tickets are short, a few are very long, and the average lags the median on every team. If the median is six minutes and the average is twenty, a cluster of long-running cases is pulling the number up. That cluster is usually a specific category (refund disputes, integration bugs, account merges) routed to one agent who becomes a bottleneck.

Healthy bands depend on channel. Chat sits at five to ten minutes. Email runs fifteen to thirty, because of context-loading on each reply. Phone is six to twelve. An agent whose AHT is sixty-plus minutes against a six-minute baseline is either swamped with hard cases or genuinely slow, and you can't tell which until you control for complexity.

First contact resolution

First contact resolution — FCR — distinguishes "fast closer" from "good closer." A ticket has FCR if the customer's problem was solved in a single back-and-forth: one customer message, one agent response, problem fixed, no reopens. Teams that ignore FCR train their agents to close prematurely, which spawns reopens and inflates the queue twice. Teams that watch it catch the closer-not-solver pattern early.

WITH ticket_summary AS (
    SELECT
        ticket_id,
        agent_id,
        COUNT(*) FILTER (
            WHERE message_type = 'reply' AND author = 'customer'
        ) AS customer_replies,
        BOOL_OR(status = 'resolved') AS resolved
    FROM ticket_messages
    GROUP BY ticket_id, agent_id
)
SELECT
    agent_id,
    COUNT(*) AS total_tickets,
    SUM(CASE WHEN customer_replies <= 1 AND resolved THEN 1 ELSE 0 END) AS fcr_count,
    SUM(CASE WHEN customer_replies <= 1 AND resolved THEN 1 ELSE 0 END)::NUMERIC * 100
        / NULLIF(COUNT(*), 0) AS fcr_pct
FROM ticket_summary
GROUP BY agent_id
ORDER BY fcr_pct DESC;

A healthy FCR for a mature support team is north of seventy percent. Below fifty and you have systemic back-and-forth — agents asking for information that should have been collected at intake, or tickets routed back because they were never fully solved. The fix usually lives upstream of the agent: better intake forms, routing rules, diagnostic context auto-attached to the ticket. Punishing an agent for low FCR when intake gives them nothing to work with is metric-driven cruelty.

One subtle issue is what counts as "first contact." If a customer messages Friday, an agent responds, the customer follows up Monday and the agent resolves it on the same thread — is that FCR? Most teams say yes (one continuous thread). Pick a definition, encode it in the SQL, and make sure it matches what CX leadership thinks the number means.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

A balanced scorecard

Now the payoff. You have four metrics — volume, speed, quality (CSAT), and resolution (FCR) — and want a composite ranking that doesn't reward gaming a single one. The clean trick is to rank each agent into deciles on every metric, then sum the deciles. An agent in the ninth decile on three of four outranks one in the tenth decile on volume but the second on CSAT — exactly the behavior you want.

WITH metrics AS (
    SELECT
        agent_id,
        COUNT(*)                                                          AS tickets,
        AVG(EXTRACT(EPOCH FROM (closed_at - assigned_at)) / 60)           AS avg_handle,
        AVG(csat_score)                                                   AS avg_csat,
        SUM(CASE WHEN fcr THEN 1 ELSE 0 END)::NUMERIC / COUNT(*)          AS fcr_rate
    FROM agent_kpi
    WHERE period_month = DATE '2026-04-01'
    GROUP BY agent_id
)
SELECT
    agent_id,
    NTILE(10) OVER (ORDER BY tickets)         AS volume_decile,
    NTILE(10) OVER (ORDER BY -avg_handle)     AS speed_decile,
    NTILE(10) OVER (ORDER BY avg_csat)        AS quality_decile,
    NTILE(10) OVER (ORDER BY fcr_rate)        AS fcr_decile,
    (NTILE(10) OVER (ORDER BY tickets)
       + NTILE(10) OVER (ORDER BY -avg_handle)
       + NTILE(10) OVER (ORDER BY avg_csat)
       + NTILE(10) OVER (ORDER BY fcr_rate))  AS composite_score
FROM metrics
ORDER BY composite_score DESC;

The composite lands between four (worst on every metric) and forty (best on every). Notice the negative sign in front of avg_handle — for speed, lower is better, so we flip the sort before bucketing. If you want a metric weighted more heavily (you care twice as much about CSAT as speed), multiply that decile by two before summing. Make the weights explicit in code, not hidden in someone's head.

What the composite cannot do is tell you why an agent ranks where they do. Always report the four underlying deciles alongside it. The composite is a starting point for coaching, not a verdict.

Common pitfalls

When teams stand up agent productivity reporting for the first time, the most damaging mistake is to declare volume the metric and stop there. The dashboard ranks agents by tickets closed, the team optimizes for tickets closed, and within a quarter you have CSAT problems, reopen problems, and your best agents quietly leaving because the metric punishes the careful. Publish volume only as part of the balanced scorecard, and gate any promotion or comp decision on the composite, not a single column.

A second trap is comparing agents across tiers or categories. A tier-one chat agent on password resets and a tier-two specialist on billing disputes are not in the same race. Their volume will differ by a factor of three or four, their AHT by more, and ranking them on a shared leaderboard makes no operational sense. Partition the scorecard by tier and channel: rank within the tier-one chat pool, within tier-two billing, within enterprise success.

A third pitfall is ignoring ticket complexity. A password reset takes ninety seconds; an integration debugging session can take an hour. Averaged together without controlling for category, the agent who handles more complex tickets looks slow when they're carrying more load. Attach a category tag at intake and report AHT weighted by category. The day you do that is often the day your slowest-looking agent turns out to be your best.

A fourth trap is making the leaderboard public. Ranking the bottom quartile in public is the fastest way to lose those agents, and they're often the ones doing the long, hard tickets the top quartile won't touch. Publish the team distribution publicly — median, top quartile, bottom quartile — and share individual rankings only with each agent and their manager.

The final pitfall is freezing the metric definition. Your first scorecard will be wrong. Treat it as a living artifact: review quarterly, publish a changelog when it changes, and re-run history under the new definition so you see an apples-to-apples trend.

Optimization tips

For support orgs with millions of tickets per quarter, the aggregations above start to slow down. The first lever is partitioning support_tickets by closed_at and ticket_messages by created_at — Snowflake and BigQuery push date predicates into a partition prune, so a thirty-day rollup reads thirty days instead of the whole table. The second lever is materializing an agent_daily table — one row per agent per day with precomputed tickets, AHT, FCR, CSAT — on a nightly schedule. The scorecard then reads a few thousand rows instead of millions.

For the FCR query, BOOL_OR and COUNT FILTER on long threads get expensive. If average tickets have twenty-plus messages, materialize a ticket_flags table with customer_reply_count and is_resolved once per day and join against that. On Postgres-backed reporting databases, an index on (agent_id, closed_at) and one on (ticket_id, message_type) covers most of the queries above.

If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems built around exactly this pattern.

FAQ

What's a reasonable ticket volume per agent per day?

It depends on tier and channel, and the bands are wider than most managers expect. Tier-one chat agents at consumer apps close twenty to forty tickets per shift — short, scripted, auto-routed. Tier-two agents on billing or account issues run eight to fifteen, because every ticket requires context loading. Enterprise success engineers might close four to eight a day, most spanning multiple touches. If you compare a tier-one chat queue against a tier-two billing queue on raw ticket count, you will reach the wrong conclusion every time.

What's a healthy AHT?

Chat sits at five to ten minutes per ticket. Email runs fifteen to thirty, because every reply requires reloading thread context. Phone is six to twelve. Outside those bands, ask why: an AHT of two minutes on chat probably means closing without solving; an AHT of an hour on chat probably means routing the hard category nobody else wants. The number is meaningless without the category distribution behind it.

How do FCR and reopen rate relate?

They're two views of the same behavior. FCR measures the share of tickets resolved on the first pass; reopen rate measures the share that came back after being marked resolved. A team with seventy-five percent FCR and five percent reopens is genuinely solving on first contact. A team with seventy-five percent FCR and fifteen percent reopens is closing prematurely. Always report the two together.

Quality or quantity — which matters more?

Quality, every time, but the framing is a trap. The point of a balanced scorecard is that you never make that choice in the abstract — an agent with high volume at low CSAT will rank below one with lower volume at high CSAT because the composite weights both. The scorecard turns "quality versus quantity" from a debate into a number.

Should we coach agents on these metrics?

Yes, but privately and with context. Pull up the four underlying deciles, look for the outlier, and ask the agent what's driving it — as a diagnostic, not an accusation. A low AHT decile might mean rushing, or an unusually simple queue. A low CSAT decile might mean brusqueness, or catching the angriest customers because of routing. Use the metric to start the conversation, not end it.

How often should we refresh the scorecard?

Daily for the underlying metrics, monthly for the composite. Daily refresh lets managers spot anomalies — an AHT that jumped fifty percent yesterday is a coaching opportunity today, not next month. Monthly composite smooths the noise and gives a fair view of sustained performance. Weekly is too noisy; quarterly is too lagging.