May 19, 2026·13 min read

Case interview for data analyst

Q: How long does a case interview usually run?

Most analyst cases run **30 to 60 minutes**. FAANG loops pair a 45-minute case with a separate SQL round; growth-stage startups often combine them in one 60-minute session. Leave 3–5 minutes for synthesis — running out of time before stating a recommendation is a top reason strong candidates get "no hire."

Q: Is it ever okay to say "I don't know"?

Yes, but phrase it as process, not shutdown. "I don't know off the top of my head, but I'd check it by pulling X and looking for Y" is the strong version. Interviewers want to see how you reason under uncertainty, not how many trivia answers you memorized.

Q: Are cases the same at every company?

The structure is consistent — clarify, decompose, hypothesize, prioritize, synthesize — but flavor varies. Product-led shops (Notion, Linear, Figma) lean on metric diagnosis. Experimentation-heavy companies (Netflix, DoorDash) test A/B reasoning hard. Marketplaces (Uber, Airbnb) favor supply-and-demand. Read the target's data blog and adjust your prep mix.

Q: What if I genuinely can't solve it?

Keep talking, but be honest about the obstacle. "I've narrowed it to two hypotheses and I'd need data X to choose between them — given that, I'd recommend Y next." A confident incomplete answer with explicit limitations beats a confident wrong answer.

Q: How is a senior case different from a mid-level case?

Senior prompts get **deliberately more ambiguous** — fewer numbers, vaguer goals, more stakeholder context. The interviewer expects you to push back on the framing and reason about cross-team tradeoffs. Mid-level rewards clean framework execution; senior rewards judgment about which framework applies and what to do when data is bad.

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

Contents:

What a case interview actually tests
The five case archetypes
A universal framework
Metric diagnosis: decomposition
Product metrics framework
Market sizing
SQL cases
A/B test cases
Worked examples
How to prepare
On the day of the interview
Common pitfalls
Related reading
FAQ

What a case interview actually tests

A case interview is a structured conversation where you solve a business problem out loud. The interviewer gives you a vague scenario — "DAU dropped 12% week-over-week, what do you do?" — and watches how you decompose it, generate hypotheses, and decide what to investigate. It is not a quiz, and that is exactly why strong SQL candidates often blow it.

For a data analyst role, the case is the single highest-signal loop. Stripe, Airbnb, DoorDash, Uber, and most growth-stage startups use one or two cases as a gate at mid-to-senior level. The rubric is the same everywhere — structure, product sense, statistical intuition, and the ability to drive your own analysis without being spoon-fed.

Load-bearing trick: the interviewer is watching your process, not waiting for an answer. Verbalizing tradeoffs beats arriving at a number in silence.

The five case archetypes

Almost every analyst case fits one of five buckets. Knowing which one you're in tells you which framework to reach for in the first thirty seconds.

Archetype	Typical prompt	What it tests
Metric diagnosis	"Signup conversion is down 8%, why?"	Decomposition, segmentation
Product launch	"How would you measure the new onboarding?"	Metric selection, north star
A/B test reasoning	"Test shows +3% lift at p=0.04, ship it?"	Stats + practical significance
Market sizing	"How many rides per day does Uber do in NYC?"	Assumption hygiene, sanity checks
Technical case	"How would you compute D7 retention in SQL?"	Definitions, joins, edge cases

In product-first companies (Meta, Notion, Linear), expect mostly metric-diagnosis and product-launch cases. In experimentation-heavy shops (Netflix, Airbnb, DoorDash), A/B reasoning shows up in every loop. Market sizing is more common at consulting-adjacent firms and on screening rounds at Amazon.

A universal framework

One scaffold solves every case, and almost no one executes it under pressure.

Step 1. Clarify (2–3 minutes). Ask which product, which exact metric, which time window, which segments, and whether any external event coincides with the drop. Repeat the problem back in your own words. This is the easiest place in the entire loop to score points.

Step 2. Lay out the structure (1–2 minutes). Say: "I'll check data quality, decompose by user type and platform, generate hypotheses, then validate the top one." The interviewer can now redirect — "skip data quality, go straight to segmentation."

Step 3. Generate hypotheses (5–10 minutes). Walk a tree of causes. For each branch, name the hypothesis, the data to test it, and the action if confirmed. Breadth before depth — list five candidates before drilling into one.

Step 4. Prioritize and dig. Pick the highest-likelihood, highest-impact branch. The interviewer will nudge if your prioritization is off.

Step 5. Synthesize. Close with a one-sentence recommendation, the assumption it rests on, and what you'd validate next. Cases without synthesis read as half-finished.

Metric diagnosis: decomposition

When a metric drops, decompose along three axes before generating hypotheses.

External vs internal causes is the first cut. External: seasonality, holidays, competitor launch, an iOS update. Internal: product release, pricing change, backend bug, marketing pullback. The first question is always "did anything ship on or near the drop date?"

Multiplicative decomposition breaks the metric into driver components. Revenue = Users × Conversion × AOV. DAU = New + Returning. Once decomposed, point to the offending factor instead of waving at the aggregate.

Segmentation isolates the responsible slice. Platform, country, product version, acquisition channel, new vs returning. If the drop is concentrated in one segment, the cause is almost always specific to it — a region-specific bug, a payment-processor outage, a single bad cohort.

Did the drop happen overnight or grade in over a week? A sharp cliff implies a release, config change, or tracking break. A gradual decline implies erosion — competitor pressure, fatigue, paid channel saturation.

Product metrics framework

For "how would you measure X" cases, the right answer is never one number. Strong candidates name a coherent set of metrics with explicit roles.

Layer	Purpose	Example: new onboarding flow
North Star	Captures user value	Week-1 activation rate
Input metrics	Drivers of the North Star	Step completion %, time-to-complete
Leading indicators	Early signals	First action within 24h, Day-2 return
Guardrails	Must not get worse	Crash rate, support tickets, D7 retention

"I'd track activation, four step-completion rates as inputs, Day-2 return as a leading indicator, and crash rate plus support volume as guardrails" is a different answer from "I'd look at activation." Same insight, ten times the signal.

Market sizing

Market sizing rewards assumption hygiene over arithmetic. Use two approaches and cross-check.

Top-down

US population:           ~335M
Adults 18+ (~77%):       ~258M
Smartphone owners (90%): ~232M
Active food-delivery (35%): ~81M
Orders per month per user: 4
Total monthly orders:    ~324M

Bottom-up

Avg DoorDash driver: 15 deliveries/day
Active drivers in US: ~2M
Daily deliveries:     ~30M
Monthly deliveries:   ~900M
(Off ~3x from top-down — revisit)

The two numbers should land in the same order of magnitude. A 3x gap is a signal that one of your assumptions is wrong, not a cue to average. Walk the interviewer through which assumption you'd refine. The number almost never matters — the logic and sanity check do.

SQL cases

Half of analyst cases embed a SQL prompt inside a business question. "How would you compute D30 retention?" is technically SQL, but the interviewer wants business reasoning first.

Clarify the definition. Retention of what — accounts, paying customers, sessions? Calendar-based or rolling D30? How do you handle "lapsed and returned"? The number changes meaningfully with each choice.

Sketch the structure before writing. Say out loud: "I'll need user-day activity, a self-join offset by 30 days, denominator being the anchor cohort." The interviewer can stop you before you write 40 lines of wrong SQL.

-- D30 retention by signup cohort
WITH user_days AS (
  SELECT user_id, DATE_TRUNC('day', event_ts) AS active_day
  FROM events
  GROUP BY 1, 2
),
cohorts AS (
  SELECT user_id, MIN(active_day) AS signup_day
  FROM user_days
  GROUP BY user_id
)
SELECT
  c.signup_day,
  COUNT(DISTINCT c.user_id) AS cohort_size,
  COUNT(DISTINCT CASE
    WHEN u.active_day = c.signup_day + INTERVAL '30 days'
    THEN c.user_id END) AS retained_d30,
  retained_d30 * 1.0 / cohort_size AS d30_retention
FROM cohorts c
LEFT JOIN user_days u USING (user_id)
GROUP BY 1;

Call out the edge cases the interviewer is scoring for: late returners, boundary signups, time-zone handling, and the difference between bounded D30 (active exactly on day 30) and rolling D30 (active any time in days 1–30).

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

A/B test cases

The classic prompt: "Treatment is up 4% with p=0.03 on 8,000 users per arm. Ship it?" A weak answer says "yes, p < 0.05." A strong answer asks five questions first.

Power and sample size — what MDE was the test designed for, and is 8,000 per arm enough to detect 4% reliably? Barely-significant results on underpowered tests are coin flips dressed up as discoveries.

Segmentation — a flat +4% can hide a +12% gain in new users and a −3% regression in power users. If power users drive 70% of revenue, you just shipped a revenue loss with a green light.

Practical significance — a 4% lift on a metric worth $0.20 per user is a very different decision from the same lift on retention or LTV. Effect size in business units beats p-value.

Guardrail health — what happened to D7 retention, NPS, support volume, and crash rate? A primary win that craters a guardrail is a discussion, not a ship.

Novelty effects — new surfaces show inflated engagement for two weeks. If the test ran ten days, ask for the treatment-effect time series, not the final number.

Worked examples

Example 1 — MAU drop. A dating app's MAU is down 15% QoQ. Clarify platform, markets, and exact quarter. Decompose MAU = new + returning to find whether loss is in acquisition or retention. List candidates: marketing cut, app store ranking drop, competitor launch, Android release regression, seasonality. Validate via cohort retention curves, paid-channel CAC and volume by week, and the release log.

Example 2 — Pricing change. Raise price 20%, conversion drops 30%. Naive math: 100 × $100 = $10,000 before, 70 × $120 = $8,400 after — a 16% short-term revenue loss. Keep going. Who churns at the higher price — lowest-LTV or highest? Decision: short-term lose, long-term ambiguous, run a holdout for one billing cycle before generalizing.

Example 3 — Onboarding launch. North Star: 7-day activation (completing the core action twice). Inputs: step completion rates, time-to-complete, per-step drop-off. Leading indicators: Day-2 return, first message sent. Guardrails: crash-free sessions, support volume, D30 retention. Test design: random assignment, MDE from historical activation noise, ~14-day window.

How to prepare

Practice out loud with another analyst at least twice a week — reading frameworks silently builds zero muscle memory. Mock interviews on interviewing.io or peer swaps in LinkedIn analyst communities beat another framework PDF.

Read product strategy regularly — Stratechery, Lenny's Newsletter, Reforge essays. The case-interview vocabulary is the vocabulary in these sources.

Drill one structured case per day for two weeks before a loop. Write the answer, time yourself to 25 minutes, re-read, and mark where you skipped clarification or synthesis. The point is to make structure automatic under pressure.

For a curated bank of cases plus SQL and product-sense questions from real loops, NAILDD is built for exactly this drill.

On the day of the interview

Think out loud — a verbalized partial answer beats a silent complete one. Use the whiteboard to draw the metric tree, segmentation grid, or funnel. Pauses are fine — ten seconds of silence reads better than rambling. Anchor assumptions — "assume 3% CTR, the banner-ad baseline" beats "a few percent." Say what you don't know — explicit honesty is a feature, not a weakness.

Common pitfalls

The most common mistake is jumping to the answer. The candidate hears "DAU dropped" and starts listing hypotheses before clarifying which DAU, which window, which segment. Three minutes later they've solved the wrong problem. The fix is mechanical — force yourself to spend 2 minutes on clarifying questions before saying anything that starts with "I think it's…"

A second pitfall is tunnel vision on one hypothesis. Candidates latch onto "it must be the iOS release" and spend 20 minutes on a single branch. Cases reward breadth-first thinking — list 4–6 hypotheses, then prioritize. Going deep on the wrong branch reads as inflexibility even when the analysis is competent.

Weak assumptions are the third killer, especially in market sizing. "Let's assume 30%" with no basis tells the interviewer you can't tell when an estimate is reasonable. Anchor every assumption — a benchmark, a personal observation, a back-of-envelope check. A poorly justified 28% is worse than a thoughtfully justified 50%.

A fourth trap is skipping the synthesis. The candidate burns through hypotheses, picks one to investigate, and runs out of time without stating a recommendation. Always close with a one-line answer plus a caveat, even when time is tight.

The last is monologuing — never checking in with the interviewer, never reading body language. Cases are conversations. If the interviewer leans forward or starts to interrupt, that's a redirect signal, not noise to power through.

FAQ

How long does a case interview usually run?

Most analyst cases run 30 to 60 minutes. FAANG loops pair a 45-minute case with a separate SQL round; growth-stage startups often combine them in one 60-minute session. Leave 3–5 minutes for synthesis — running out of time before stating a recommendation is a top reason strong candidates get "no hire."

Is it ever okay to say "I don't know"?