How to calculate CSAT by issue type in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why a single CSAT number hides the real story

A headline CSAT of 85 percent feels like good news until you remember it is the average of every possible customer experience. Billing disputes, password resets, shipping complaints, refund requests, feature questions, and outage tickets all collapse into one number. When that number drops a point, no one can tell you why. When it goes up, no one can tell you whether the win is a real fix or a quiet shift in ticket mix. The single number is a press release, not a diagnostic.

A breakdown by issue type turns CSAT from a vanity metric into something a product manager can act on. If billing disputes sit at 3.4 out of 5 while everything else clears 4.5, the gap points at a specific workflow, knowledge base article, or refund policy. If the technical category has been sliding for three weeks while the rest are flat, that is a release-quality signal for the next sprint review, not a quarterly QBR. The breakdown also tells you whether training is paying off, because you can compare agents category by category instead of pretending an agent on billing is comparable to one on cancellations.

The classic scenario: Friday afternoon, support asks why CSAT dropped from 85 to 82 last month. You have a csat_surveys table with survey_id, ticket_id, agent_id, issue_category, csat_score on a 1-5 scale, and survey_date. Scores land in your warehouse on Snowflake or Databricks. Your job is a real answer by end of day.

The baseline breakdown by issue type

The starting point is a flat aggregate that gives you average CSAT, a satisfied share, and a sample count per category over a rolling 30-day window. The satisfied share is the percent scored 4 or 5, the convention teams at Stripe, Airbnb, and DoorDash use for the executive-deck CSAT. The raw average is useful for sensitivity analysis, but the top-box share is what gets compared against targets.

SELECT
    issue_category,
    COUNT(*)                                            AS surveys_received,
    AVG(csat_score)                                     AS avg_csat,
    COUNT(*) FILTER (WHERE csat_score >= 4) * 100.0
        / NULLIF(COUNT(*), 0)                           AS satisfied_pct
FROM csat_surveys
WHERE survey_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY issue_category
HAVING COUNT(*) >= 30
ORDER BY avg_csat ASC;

Two details deserve attention. The HAVING COUNT(*) >= 30 clause drops categories with sample sizes too small to read as signal. A category with eight surveys in 30 days can swing 0.6 points on one angry customer, and that variance will get presented to leadership as a trend. Thirty is a working floor for weekly review; for a quarterly analysis, push to 100. The ORDER BY avg_csat ASC puts the worst categories at the top so you can talk about them first. Executives skim the top three rows of any table and forget the rest, so row order is part of the analysis.

A subtle pitfall lives in the satisfied share. Some teams count only scores of 5 as satisfied, others count 4 and 5, and a few count 3 through 5 because their scale is right-skewed. Whichever convention you pick, write it on the dashboard tile in plain text.

Worst-performing categories versus the overall baseline

A category sitting at 4.0 looks bad in a vacuum and great if the overall mean is 3.8. The next query computes the gap between each category and the rolling overall CSAT, which is what triages root-cause analysis. The trick is to compute the baseline once in a CTE and then cross-join it to the per-category aggregate so you can subtract.

WITH baseline AS (
    SELECT AVG(csat_score) AS overall_csat
    FROM csat_surveys
    WHERE survey_date >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
    s.issue_category,
    AVG(s.csat_score)                          AS category_csat,
    b.overall_csat,
    AVG(s.csat_score) - b.overall_csat         AS gap_vs_overall,
    COUNT(*)                                   AS n
FROM csat_surveys s
CROSS JOIN baseline b
WHERE s.survey_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY s.issue_category, b.overall_csat
HAVING COUNT(*) >= 30
ORDER BY gap_vs_overall ASC
LIMIT 10;

Categories with a gap_vs_overall of minus one point or worse are your investigation list. The rule of thumb across teams that ship this is that a category running half a point below baseline with at least 100 surveys is a real product or workflow problem, not noise. A category running a full point below almost always traces to a single broken flow, a confusing self-service page, or a policy that frustrates customers in a predictable way. Hand that row to a product manager with a sample of verbatim comments and you usually have a fix proposal within a week.

The reverse cut, categories running above baseline, is just as useful. If onboarding sits a point above the overall CSAT and billing sits a point below, you can copy the onboarding playbook into billing rather than reinventing it.

Agent by category, the training matrix

Comparing agents on a single CSAT number penalizes whoever happens to be assigned the harder ticket types. An agent who works billing all day looks worse than one who works password resets, but the difference is the queue, not the performer. The next query splits CSAT by agent and category, which is the matrix a workforce manager actually wants for coaching conversations.

SELECT
    agent_id,
    issue_category,
    AVG(csat_score)                            AS avg_csat,
    COUNT(*)                                   AS n
FROM csat_surveys
WHERE survey_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY agent_id, issue_category
HAVING COUNT(*) >= 20
ORDER BY agent_id, avg_csat ASC;

Three patterns to look for. An agent fine across most categories but sharply low in one is a coaching opportunity, not a performance issue. An agent consistently low across every category is a real conversation with their manager. An agent unexpectedly high in a difficult category like billing or cancellations has figured something out, and a 30-minute interview produces more training material than a quarter of formal QA reviews.

The 20-survey threshold per agent-category cell is lower than the category threshold of 30 because the cells are smaller by construction. Even at 20, expect noise. Pull a quarter of data if you are using this for a performance review rather than a coaching huddle.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Weekly trend by category

Aggregates tell you where you are. Trends tell you where you are heading. The last query buckets the same data into weekly windows and filters to the top three categories so the chart is readable. DATE_TRUNC casts the timestamp to a date so the join key is predictable across Postgres, Snowflake, and Databricks.

SELECT
    DATE_TRUNC('week', survey_date)::DATE      AS week,
    issue_category,
    AVG(csat_score)                            AS avg_csat,
    COUNT(*)                                   AS n
FROM csat_surveys
WHERE survey_date >= CURRENT_DATE - INTERVAL '12 weeks'
  AND issue_category IN ('billing', 'technical', 'feature_request')
GROUP BY DATE_TRUNC('week', survey_date), issue_category
ORDER BY week, issue_category;

A two- to three-week drop in a single category, while others stay flat, is almost always a release problem. The drop usually shows up one to two weeks after the release because customers do not file tickets the day they hit a bug; they try to work around it, then give up, then write in, then wait for resolution, then fill out the survey. Map your release calendar onto the trend chart and the correlation jumps out. A drop that hits every category at once is usually a queue problem, not a product problem. Average handle time has spiked, the wait queue is longer, and customers are punishing you across the board. That is when you look at agent productivity and average handle time before you blame the product.

Common pitfalls

The first pitfall is averaging CSAT without normalizing for ticket mix. Billing may account for 40 percent of your ticket volume, which means its CSAT dominates the overall number even when other categories matter more strategically. When the mix shifts week over week, the overall CSAT moves without anything actually changing in customer experience. The fix is to report by category from the start or to weight the overall CSAT by a stable mix from a reference period.

The second pitfall is ignoring survey selection bias. Only 10 to 30 percent of customers respond to a CSAT survey after a ticket closes, and the responders are not a random sample. Customers who had a strongly negative or strongly positive experience are more likely to fill out the form. The honest CSAT is a little lower than the responders suggest. The fix is to disclose the response rate next to the score on the dashboard and to periodically run a random-sample sweepstake to estimate the gap.

The third pitfall is treating low-response categories as comparable to high-response ones. A category with eight surveys in 30 days cannot be read as signal, and presenting it next to a category with 800 surveys invites bad decisions. The fix is the HAVING COUNT(*) >= 30 floor and a higher floor for any cut reviewed by leadership.

The fourth pitfall is computing CSAT only on closed and surveyed tickets. Abandoned tickets, tickets the customer gave up on, tickets the agent quietly archived to clear the queue are systematically excluded from the survey pool and almost certainly skew low. Track ticket abandonment as its own metric and report it alongside CSAT so that a fall in abandonment paired with a steady CSAT is not mistaken for a quiet win.

The fifth pitfall is conflating CSAT with NPS or CES. CSAT measures satisfaction with a specific interaction, usually on a 1-5 scale, right after that interaction. NPS measures relationship-level loyalty on a 0-10 scale, typically once a quarter. CES, the Customer Effort Score, measures how hard it was to get something done on a 1-7 scale and predicts churn better for support and self-service. Mixing them in a single chart with a shared y-axis will eventually produce a meeting where someone calls CSAT and NPS the same thing.

Optimization tips

For warehouses larger than a few hundred million survey rows, partition csat_surveys by survey_date at a daily grain and cluster on issue_category. The 30-day rolling queries hit a single partition range and stay fast. A materialized view that pre-aggregates per category per day cuts dashboard load times further and keeps the SQL above as the source of truth.

If the survey table joins to a tickets table to pick up issue_category, denormalize the category onto csat_surveys at write time. The join is the slow part of every CSAT query in production, and most teams do not change a category after the survey is sent. If categorization can shift because of a taxonomy update, store the category as it was at survey time plus a separate current_category field, and analyze on the historical value.

Set a window for late-arriving surveys. Customers fill out a survey three weeks after the ticket closes, and late responses silently change last month's number on refresh. Cap inclusion at seven or fourteen days post-ticket-close and document the cutoff so the trend is reproducible.

If you want to drill SQL cuts like this on real interview problems every day, NAILDD is launching with 500+ SQL problems built around exactly these analyst workflows.

FAQ

What counts as a good CSAT score?

For consumer support on a 1-5 scale, an average of 4.2 to 4.5 is excellent and 3.8 to 4.2 is the working norm. Anything below 3.5 is a real problem, not a stretch target. For top-box share, 85 percent is the common benchmark for healthy support organizations, with high performers like Notion and Linear running closer to 92 percent. What matters is not the absolute value but the trend versus your own baseline and the variation across categories.

How do I correct for survey selection bias?

You cannot fully correct for it without a random-sample mechanism, but you can bound it. Once a quarter, run a sweepstake on a random sample of tickets to push response rate above 50 percent, then compare CSAT from that period to your normal rate. The gap is your bias estimate. Most teams find the unbiased CSAT is 0.2 to 0.5 points below the responder-only number. Disclose the estimate on the dashboard.

Is email survey or in-product survey better?

In-product surveys that pop up right after the interaction get response rates of 30 to 50 percent because the experience is fresh. Email surveys land at 5 to 15 percent in a busy inbox. In-product wins on rate and recency, but clips surveys where the customer never returned, biasing against churn-adjacent experiences. Mature orgs run both.

Should I report CSAT per agent or per category?

Both, and together. Per-agent CSAT without a category split unfairly punishes agents who handle harder ticket types. Per-category CSAT without an agent split hides the fact that one strong agent may be carrying a category that would otherwise look bad. The matrix in the agent-by-category section is the version that supports both coaching and product investment decisions.

How long should the rolling window be?

Thirty days is the default because it stabilizes sample sizes for most categories while staying responsive to releases. Use 7 days inside an incident or release post-mortem when you need a tight signal. Use 90 days for quarterly business reviews where you want the trend to dominate noise. The mistake is to pick one window for every conversation; the analyst who pre-computes all three saves a lot of follow-up tickets to their own queue.