Case interview for data analyst
Contents:
What a case interview actually tests
A case interview is a structured conversation where you solve a business problem out loud. The interviewer gives you a vague scenario — "DAU dropped 12% week-over-week, what do you do?" — and watches how you decompose it, generate hypotheses, and decide what to investigate. It is not a quiz, and that is exactly why strong SQL candidates often blow it.
For a data analyst role, the case is the single highest-signal loop. Stripe, Airbnb, DoorDash, Uber, and most growth-stage startups use one or two cases as a gate at mid-to-senior level. The rubric is the same everywhere — structure, product sense, statistical intuition, and the ability to drive your own analysis without being spoon-fed.
Load-bearing trick: the interviewer is watching your process, not waiting for an answer. Verbalizing tradeoffs beats arriving at a number in silence.
The five case archetypes
Almost every analyst case fits one of five buckets. Knowing which one you're in tells you which framework to reach for in the first thirty seconds.
| Archetype | Typical prompt | What it tests |
|---|---|---|
| Metric diagnosis | "Signup conversion is down 8%, why?" | Decomposition, segmentation |
| Product launch | "How would you measure the new onboarding?" | Metric selection, north star |
| A/B test reasoning | "Test shows +3% lift at p=0.04, ship it?" | Stats + practical significance |
| Market sizing | "How many rides per day does Uber do in NYC?" | Assumption hygiene, sanity checks |
| Technical case | "How would you compute D7 retention in SQL?" | Definitions, joins, edge cases |
In product-first companies (Meta, Notion, Linear), expect mostly metric-diagnosis and product-launch cases. In experimentation-heavy shops (Netflix, Airbnb, DoorDash), A/B reasoning shows up in every loop. Market sizing is more common at consulting-adjacent firms and on screening rounds at Amazon.
A universal framework
One scaffold solves every case, and almost no one executes it under pressure.
Step 1. Clarify (2–3 minutes). Ask which product, which exact metric, which time window, which segments, and whether any external event coincides with the drop. Repeat the problem back in your own words. This is the easiest place in the entire loop to score points.
Step 2. Lay out the structure (1–2 minutes). Say: "I'll check data quality, decompose by user type and platform, generate hypotheses, then validate the top one." The interviewer can now redirect — "skip data quality, go straight to segmentation."
Step 3. Generate hypotheses (5–10 minutes). Walk a tree of causes. For each branch, name the hypothesis, the data to test it, and the action if confirmed. Breadth before depth — list five candidates before drilling into one.
Step 4. Prioritize and dig. Pick the highest-likelihood, highest-impact branch. The interviewer will nudge if your prioritization is off.
Step 5. Synthesize. Close with a one-sentence recommendation, the assumption it rests on, and what you'd validate next. Cases without synthesis read as half-finished.
Metric diagnosis: decomposition
When a metric drops, decompose along three axes before generating hypotheses.
External vs internal causes is the first cut. External: seasonality, holidays, competitor launch, an iOS update. Internal: product release, pricing change, backend bug, marketing pullback. The first question is always "did anything ship on or near the drop date?"
Multiplicative decomposition breaks the metric into driver components. Revenue = Users × Conversion × AOV. DAU = New + Returning. Once decomposed, point to the offending factor instead of waving at the aggregate.
Segmentation isolates the responsible slice. Platform, country, product version, acquisition channel, new vs returning. If the drop is concentrated in one segment, the cause is almost always specific to it — a region-specific bug, a payment-processor outage, a single bad cohort.
Did the drop happen overnight or grade in over a week? A sharp cliff implies a release, config change, or tracking break. A gradual decline implies erosion — competitor pressure, fatigue, paid channel saturation.
Product metrics framework
For "how would you measure X" cases, the right answer is never one number. Strong candidates name a coherent set of metrics with explicit roles.
| Layer | Purpose | Example: new onboarding flow |
|---|---|---|
| North Star | Captures user value | Week-1 activation rate |
| Input metrics | Drivers of the North Star | Step completion %, time-to-complete |
| Leading indicators | Early signals | First action within 24h, Day-2 return |
| Guardrails | Must not get worse | Crash rate, support tickets, D7 retention |
"I'd track activation, four step-completion rates as inputs, Day-2 return as a leading indicator, and crash rate plus support volume as guardrails" is a different answer from "I'd look at activation." Same insight, ten times the signal.
Market sizing
Market sizing rewards assumption hygiene over arithmetic. Use two approaches and cross-check.
Top-down
US population: ~335M
Adults 18+ (~77%): ~258M
Smartphone owners (90%): ~232M
Active food-delivery (35%): ~81M
Orders per month per user: 4
Total monthly orders: ~324MBottom-up
Avg DoorDash driver: 15 deliveries/day
Active drivers in US: ~2M
Daily deliveries: ~30M
Monthly deliveries: ~900M
(Off ~3x from top-down — revisit)The two numbers should land in the same order of magnitude. A 3x gap is a signal that one of your assumptions is wrong, not a cue to average. Walk the interviewer through which assumption you'd refine. The number almost never matters — the logic and sanity check do.
SQL cases
Half of analyst cases embed a SQL prompt inside a business question. "How would you compute D30 retention?" is technically SQL, but the interviewer wants business reasoning first.
Clarify the definition. Retention of what — accounts, paying customers, sessions? Calendar-based or rolling D30? How do you handle "lapsed and returned"? The number changes meaningfully with each choice.
Sketch the structure before writing. Say out loud: "I'll need user-day activity, a self-join offset by 30 days, denominator being the anchor cohort." The interviewer can stop you before you write 40 lines of wrong SQL.
-- D30 retention by signup cohort
WITH user_days AS (
SELECT user_id, DATE_TRUNC('day', event_ts) AS active_day
FROM events
GROUP BY 1, 2
),
cohorts AS (
SELECT user_id, MIN(active_day) AS signup_day
FROM user_days
GROUP BY user_id
)
SELECT
c.signup_day,
COUNT(DISTINCT c.user_id) AS cohort_size,
COUNT(DISTINCT CASE
WHEN u.active_day = c.signup_day + INTERVAL '30 days'
THEN c.user_id END) AS retained_d30,
retained_d30 * 1.0 / cohort_size AS d30_retention
FROM cohorts c
LEFT JOIN user_days u USING (user_id)
GROUP BY 1;Call out the edge cases the interviewer is scoring for: late returners, boundary signups, time-zone handling, and the difference between bounded D30 (active exactly on day 30) and rolling D30 (active any time in days 1–30).
A/B test cases
The classic prompt: "Treatment is up 4% with p=0.03 on 8,000 users per arm. Ship it?" A weak answer says "yes, p < 0.05." A strong answer asks five questions first.
Power and sample size — what MDE was the test designed for, and is 8,000 per arm enough to detect 4% reliably? Barely-significant results on underpowered tests are coin flips dressed up as discoveries.
Segmentation — a flat +4% can hide a +12% gain in new users and a −3% regression in power users. If power users drive 70% of revenue, you just shipped a revenue loss with a green light.
Practical significance — a 4% lift on a metric worth $0.20 per user is a very different decision from the same lift on retention or LTV. Effect size in business units beats p-value.
Guardrail health — what happened to D7 retention, NPS, support volume, and crash rate? A primary win that craters a guardrail is a discussion, not a ship.
Novelty effects — new surfaces show inflated engagement for two weeks. If the test ran ten days, ask for the treatment-effect time series, not the final number.
Worked examples
Example 1 — MAU drop. A dating app's MAU is down 15% QoQ. Clarify platform, markets, and exact quarter. Decompose MAU = new + returning to find whether loss is in acquisition or retention. List candidates: marketing cut, app store ranking drop, competitor launch, Android release regression, seasonality. Validate via cohort retention curves, paid-channel CAC and volume by week, and the release log.
Example 2 — Pricing change. Raise price 20%, conversion drops 30%. Naive math: 100 × $100 = $10,000 before, 70 × $120 = $8,400 after — a 16% short-term revenue loss. Keep going. Who churns at the higher price — lowest-LTV or highest? Decision: short-term lose, long-term ambiguous, run a holdout for one billing cycle before generalizing.
Example 3 — Onboarding launch. North Star: 7-day activation (completing the core action twice). Inputs: step completion rates, time-to-complete, per-step drop-off. Leading indicators: Day-2 return, first message sent. Guardrails: crash-free sessions, support volume, D30 retention. Test design: random assignment, MDE from historical activation noise, ~14-day window.
How to prepare
Practice out loud with another analyst at least twice a week — reading frameworks silently builds zero muscle memory. Mock interviews on interviewing.io or peer swaps in LinkedIn analyst communities beat another framework PDF.
Read product strategy regularly — Stratechery, Lenny's Newsletter, Reforge essays. The case-interview vocabulary is the vocabulary in these sources.
Drill one structured case per day for two weeks before a loop. Write the answer, time yourself to 25 minutes, re-read, and mark where you skipped clarification or synthesis. The point is to make structure automatic under pressure.
For a curated bank of cases plus SQL and product-sense questions from real loops, NAILDD is built for exactly this drill.
On the day of the interview
Think out loud — a verbalized partial answer beats a silent complete one. Use the whiteboard to draw the metric tree, segmentation grid, or funnel. Pauses are fine — ten seconds of silence reads better than rambling. Anchor assumptions — "assume 3% CTR, the banner-ad baseline" beats "a few percent." Say what you don't know — explicit honesty is a feature, not a weakness.
Common pitfalls
The most common mistake is jumping to the answer. The candidate hears "DAU dropped" and starts listing hypotheses before clarifying which DAU, which window, which segment. Three minutes later they've solved the wrong problem. The fix is mechanical — force yourself to spend 2 minutes on clarifying questions before saying anything that starts with "I think it's…"
A second pitfall is tunnel vision on one hypothesis. Candidates latch onto "it must be the iOS release" and spend 20 minutes on a single branch. Cases reward breadth-first thinking — list 4–6 hypotheses, then prioritize. Going deep on the wrong branch reads as inflexibility even when the analysis is competent.
Weak assumptions are the third killer, especially in market sizing. "Let's assume 30%" with no basis tells the interviewer you can't tell when an estimate is reasonable. Anchor every assumption — a benchmark, a personal observation, a back-of-envelope check. A poorly justified 28% is worse than a thoughtfully justified 50%.
A fourth trap is skipping the synthesis. The candidate burns through hypotheses, picks one to investigate, and runs out of time without stating a recommendation. Always close with a one-line answer plus a caveat, even when time is tight.
The last is monologuing — never checking in with the interviewer, never reading body language. Cases are conversations. If the interviewer leans forward or starts to interrupt, that's a redirect signal, not noise to power through.
Related reading
- SQL on data analyst interview
- How to design A/B tests step by step
- How to choose a North Star metric
- Case interview for systems analyst
- Peeking problem in A/B tests
FAQ
How long does a case interview usually run?
Most analyst cases run 30 to 60 minutes. FAANG loops pair a 45-minute case with a separate SQL round; growth-stage startups often combine them in one 60-minute session. Leave 3–5 minutes for synthesis — running out of time before stating a recommendation is a top reason strong candidates get "no hire."
Is it ever okay to say "I don't know"?
Yes, but phrase it as process, not shutdown. "I don't know off the top of my head, but I'd check it by pulling X and looking for Y" is the strong version. Interviewers want to see how you reason under uncertainty, not how many trivia answers you memorized.
Are cases the same at every company?
The structure is consistent — clarify, decompose, hypothesize, prioritize, synthesize — but flavor varies. Product-led shops (Notion, Linear, Figma) lean on metric diagnosis. Experimentation-heavy companies (Netflix, DoorDash) test A/B reasoning hard. Marketplaces (Uber, Airbnb) favor supply-and-demand. Read the target's data blog and adjust your prep mix.
What if I genuinely can't solve it?
Keep talking, but be honest about the obstacle. "I've narrowed it to two hypotheses and I'd need data X to choose between them — given that, I'd recommend Y next." A confident incomplete answer with explicit limitations beats a confident wrong answer.
How is a senior case different from a mid-level case?
Senior prompts get deliberately more ambiguous — fewer numbers, vaguer goals, more stakeholder context. The interviewer expects you to push back on the framing and reason about cross-team tradeoffs. Mid-level rewards clean framework execution; senior rewards judgment about which framework applies and what to do when data is bad.