Case study questions for data analyst interviews
Contents:
Why companies give case studies
A case study is a slice of real product work the interviewer drops on your desk: DAU is down 15% week over week, what do you do? The rubric is not SQL syntax — it is structured thinking under uncertainty. Hiring managers at Meta, Stripe, DoorDash, and Airbnb all weight this round heavily for mid-level and above.
Three things get scored: clarifying questions (pin down product, platform, time window first), decomposition (split the metric into math parts and the user base into segments), and communication (narrate so a PM on speaker can follow). Strong candidates think out loud; weak candidates go silent and surface a guess. There is rarely one right answer — only a method.
The framework for any case
Use the same five-step skeleton every time. It is boring, and that is the point — interviewers want a repeatable process, not improvisation.
- Clarify. Ask two or three sharp questions: which surface, which platform, which time window, what counts as normal.
- Decompose. Break the metric into math components —
DAU = new + returning,Revenue = orders × AOV,Conversion = step1 × step2 × step3. - Hypothesize. List three to five plausible causes, ranked by prior probability.
- Verify. For each hypothesis name the concrete check — a SQL slice, a cohort cut, a dashboard panel.
- Recommend. Land on the next action: what you fix, what you escalate, what you keep watching.
Load-bearing trick: never skip step one. Candidates who jump from prompt to SQL lose points even when their SQL is correct.
Metric-dropped cases
Case 1. DAU dropped 15% in a week
Decompose into new users and returning users. New-user drops point at acquisition — paid channels, ASO, a referral campaign that ended. Returning-user drops point at product — a bug in the latest release, a notification job that stopped firing, an auth migration forcing logouts. Cut by platform, geo, app version, and acquisition source. If the drop concentrates in one slice, suspect a deploy or a third-party outage. If it is uniform, suspect a tracking pipeline before you suspect product. A 15% uniform drop is more often a broken event than a broken product.
Case 2. Checkout conversion dropped 10%
Walk the funnel: view → add to cart → checkout start → payment → success. The drop almost always lives on one step, not spread evenly. If add-to-cart held but checkout-start fell, the cart UI changed. If payment success fell alone, a processor route is degraded. A common gotcha: the funnel looks fine in aggregate but is broken for one payment method that represents 30% of revenue.
Case 3. Average session length grew 30%
Growth is not automatically good. Three reads: positive (users engaging more with new content), negative (users wandering, cannot find what they need), technical (pages got slower, time-on-page is inflated). The decisive check is correlation with downstream conversion. Sessions up and purchases up proportionally — celebrate. Sessions up and purchases flat — UX regression.
Case 4. NPS dropped from 45 to 30
Segment by cohort first — new vs tenured users react differently to releases. Cut by plan tier, platform, region. Then read the free-text comments — NPS without the text is half a signal. Cross-reference release calendar, pricing changes, and incident reports. The fix depends on whether the cause is specific (a bug, a pricing test) or systemic (positioning drifted from product). Specific causes get hotfixed; systemic ones need a research project.
Design-this cases
Case 5. Design a metric system for a food-delivery app
Use AARRR as the scaffold (pirate metrics framework):
| Stage | Primary metric | Secondary |
|---|---|---|
| Acquisition | Installs by channel | CAC by channel |
| Activation | First-order completion | Time to first order |
| Retention | D1 / D7 / D30 | Order frequency |
| Revenue | GMV, AOV, ARPU | LTV by cohort |
| Referral | Invites sent | K-factor |
The North Star for a two-sided delivery marketplace is completed deliveries — it captures value for eaters and couriers in one number. Guardrails: delivery time, cancellation rate, courier earnings per hour.
Case 6. Design an A/B test for a new onboarding flow
Name the primary metric (activation rate) and guardrails (D1 retention, onboarding completion time, support tickets per new user) first. Sample size depends on baseline activation and MDE — for a 20% baseline and 2pp MDE you typically need around 10,000 users per arm. Run at least one full week for day-of-week effects, plus a few days for novelty to wear off. Randomize on user_id, restrict to net-new users. Common failure modes: sample ratio mismatch, novelty effect, overlap with other live experiments — see the peeking mistake guide.
Case 7. Measure the impact of a recommendation system
Do not stop at CTR. The real question is downstream conversion — did the recommended items get purchased, did the videos get watched to completion, did the user come back. A/B the new recommender against the previous algorithm or a popularity baseline. Beyond conversion, instrument diversity, coverage, and serendipity — recommenders often boost short-term clicks but hurt long-term retention by collapsing into a narrow loop.
Pick-the-metric cases
Case 8. How would you measure search quality?
Plain CTR is wrong because it rewards clickbait. Better candidates:
- CTR@N — share of sessions with a click in the top N
- pFound — modeled probability the user found their answer
- NDCG — ranking quality with positional discount
- Zero-result rate — share of queries returning nothing
- Reformulation rate — share of queries the user retyped
A strong answer names two or three, explains the trade-offs, and picks one as primary with the rest as guardrails. Naming five with no trade-off discussion is worse than naming two with one.
Case 9. How would you evaluate content moderation quality?
Precision measures how many blocks were correct — false positives drive churn. Recall measures how much bad content you caught — false negatives poison the platform. A kids' app weights recall, a creator platform weights precision. Add time to action (latency) and appeal rate (inverse-accuracy signal). A jump in appeals usually precedes a jump in false positives by a week.
Case 10. Compare two push-notification ranking models
Offline: AUC-ROC, precision@K, lift over baseline. Online: A/B watching CTR, downstream conversion, retention at D7/D30. The metric most teams forget — opt-out rate. A model that boosts CTR by 8% but pushes opt-out from 2% to 4% has destroyed a channel that cannot be rebuilt. Always run at least two weeks.
What-do-you-do cases
Case 11. Your numbers disagree with a teammate's numbers
Four checks in order: definitions (same event name, status, surface), filters (same window, timezone, internal-user exclusion), source (same upstream table, no stale snapshot), duplicates (does the join fan out rows someone else deduplicated). Nine times out of ten the answer is timezone (UTC vs local) or a status filter someone forgot.
Case 12. The PM asks you to "just show that the feature works"
Push back politely but firmly. An analyst is not a lawyer for the PM's roadmap — your value is objective signal, and cherry-picking burns it. Propose a proper A/B test instead. The right script: "Here is what the data shows. Here is what we could test next to make it work." Senior PMs respect this; junior PMs sometimes do not, and that is a signal about the team.
Case 13. The business wants to launch without an A/B test — no time
Offer a graduated rollout: 5% → 25% → 100% over two weeks with guardrail monitoring at each gate. Define guardrails up front — error rate, conversion, NPS proxy — plus rollback criteria. Capture baseline metrics before launch for pre/post analysis. If the feature is irreversible (pricing, permissions) push hard for the test anyway — the cost of an undetected regression dwarfs a week of delay.
Senior-level cases
Case 14. Design an analytics system for a new product, from scratch
Four layers. Event model: define the dozen core events that capture the funnel and stabilize them before adding more — chaotic taxonomy is the most expensive technical debt in analytics. Storage: Postgres for OLTP, Snowflake or ClickHouse for OLAP. Transform: dbt for modeling, Airflow or Dagster for orchestration, dbt tests for quality. Surfaces: executive dashboard on one screen, operational dashboards per team, ad-hoc SQL via Metabase or Hex.
Sanity check: if you cannot draw the system as one box per layer with one arrow between each, you are designing too much. Strip until it fits.
Case 15. Leadership rejected your recommendation. What now?
Write it down. Document your recommendation, the reasoning, the predicted outcome with metrics. Then propose a measurement plan for the path leadership chose. If their call lands worse, present the numbers without I told you so; the data carries the message. If their call lands better, study why your model was wrong. Being graceful in disagreement is the most reliable signal of senior judgment.
How to prep
Practice out loud — verbal narration is a separate motor skill. Ask a friend to read prompts and time your answers; five to seven minutes per case is the right pace. Pick five products you use (Spotify, Notion, Linear, DoorDash, your bank app) and run a full case on each. Skim the building blocks: DAU explained for PM, funnel conversion, LTV explained simply.
If you want to grind the underlying SQL and metric questions these cases lean on, NAILDD is launching with 1,500+ analyst interview problems calibrated to this loop.
Common pitfalls
The most common failure is diving into SQL before clarifying scope. The interviewer says DAU is down 15% and the candidate starts describing joins. You do not yet know if this is mobile or web, week-over-week or year-over-year, whether a 15% swing is unusual for this team. Interviewers notice the difference between candidates who pause to clarify and candidates who plough forward.
A second trap is flat hypothesis lists with no priors. Listing ten causes with equal weight is a brain dump. Senior candidates rank — most likely a tracking issue because the drop is uniform, next most likely an iOS deploy because that is where it concentrates, least likely seasonality because YoY rules it out. The ranking is what gets you to senior.
The third pitfall is forgetting guardrails on design cases. Candidates name the success metric and stop. Interviewers want D1 retention, support volume, and time-to-complete as guardrails — real tests fail when you optimize the headline and destroy something downstream. No guardrails, no senior offer.
A fourth trap is answering the metric question with a single metric. What measures search quality? — CTR. That is junior. Mid-level candidates name a primary, two secondaries, and the trade-off; senior candidates name failure modes and the setup that arbitrates between them. The fifth pitfall is going silent while thinking — narrate instead: "Let me think about which decomposition matters here." Silence reads as panic.
Related reading
- Case interview for data analyst
- AARRR pirate metrics for data analysts
- A/B testing peeking mistake
- Guardrail metrics in A/B testing
- How to calculate LTV in SQL
- SQL window functions interview questions
FAQ
How many cases will I get in one interview?
A typical onsite is one to two cases in a thirty-to-forty-minute slot. Product analyst loops at Meta, Airbnb, and DoorDash often run three to four across the day — one metric-dropped, one design, one pick-the-metric, one judgement call.
Can I ask the interviewer for clarification?
Yes, and you should. Is this mobile or web? Was the drop sudden or gradual? Were there deploys this week? is explicitly rewarded — it signals the structured thinking the rubric is built around. Candidates lose points by asking nothing and guessing.
Does the right answer matter more than the reasoning?
Reasoning almost always wins. Most cases have no single right answer by design. A confidently wrong answer with sloppy reasoning scores below a careful walk-through landing on a tentative conclusion.
How do I tell a good answer from an average one?
Average: I would look at DAU, retention, and conversion. Good: I would decompose DAU into new and returning users, cut by platform and acquisition source, propose three ranked hypotheses, and run three SQL queries to test the top one first. The difference is specificity and sequencing — concrete steps in a deliberate order, not a list of nouns.
Should I memorize specific case answers?
No. Interviewers spot rehearsed answers in fifteen seconds and pivot to a variant you have not memorized. Memorize the framework — clarify, decompose, hypothesize, verify, recommend — and trust it on whatever lands.
What if I freeze mid-case?
Say so. "Let me restart from the top of the framework" is an acceptable reset, and interviewers respond well to it. Freezing silently is the failure mode; resetting cleanly is a recovery signal.