RICE prioritization for product managers
Contents:
- What is RICE and why interviewers love it
- The RICE formula
- Reach: who actually touches the feature
- Impact: how much it moves the metric
- Confidence: how much you trust your numbers
- Effort: person-months, not vibes
- Worked example: scoring a real backlog
- RICE vs ICE vs WSJF
- Common pitfalls
- Related reading
- FAQ
What is RICE and why interviewers love it
RICE is a prioritization framework invented at Intercom in 2016 to compare unrelated product bets on a single numeric scale. It scores every initiative across four dimensions — Reach, Impact, Confidence, Effort — and outputs one number you can sort the backlog by. The framework became the default at Notion, Linear, and most growth-stage SaaS shops because it's defensible in roadmap reviews without being theatrical.
In a product manager interview, RICE shows up almost every loop. For junior and mid-level PM rounds at Stripe, Airbnb, or DoorDash, expect a direct prompt: "Walk me through how you'd prioritize these three features." For senior PM and Group PM rounds, interviewers care less about the formula and more about when RICE breaks — strategic bets, dependencies, and platform plays where the math lies. Treat it as a baseline literacy check, not a flex.
Load-bearing trick: RICE's job is to make trade-offs legible, not to make decisions for you. The PM who treats the score as gospel loses to the PM who treats it as a starting argument.
The RICE formula
RICE Score = (Reach × Impact × Confidence) / EffortHigher score wins. The trick is that the four inputs live on different scales (people, multiplier, percent, person-months), so the absolute number is meaningless on its own — you only use RICE to rank features within one backlog, never to compare across teams.
A common interview follow-up: "What's the unit of a RICE score?" The honest answer is (users × impact-points × probability) / person-months — a fraction without a clean real-world interpretation. That's fine. It's a relative ordering tool.
Reach: who actually touches the feature
Reach is the number of unique users the feature will affect over a fixed window — usually one quarter. Pick the window once and apply it consistently across the whole backlog, otherwise scores stop comparing.
The most common mistake juniors make here is confusing addressable market with actual touchpoints. If you ship a checkout improvement, Reach is not your total user base — it's the people who actually reach checkout in the quarter. Pull the number from product analytics (Amplitude, Mixpanel, an internal events table), not from a pitch deck.
A few concrete anchors you can defend in an interview:
- A new onboarding flow at a B2C app with 40k signups/quarter → Reach = 40,000
- An enterprise admin dashboard touching 80 paying accounts → Reach = 80
- A niche power-user shortcut hit 300 times/week → Reach ≈ 3,900/quarter
Don't over-engineer Reach with elaborate funnel projections. Interviewers want to see that you can pull a number from a real source and explain why it's the right scope.
Impact: how much it moves the metric
Impact estimates per-user effect on the target metric. Intercom's original rubric is a five-point scale that you should memorize cold because interviewers will ask you to apply it:
| Score | Label | Rough meaning |
|---|---|---|
| 3 | Massive | Transforms behavior, headline win |
| 2 | High | Clear positive shift, defensible |
| 1 | Medium | Modest but real lift |
| 0.5 | Low | Marginal effect for most users |
| 0.25 | Minimal | Nice-to-have, edge improvement |
The interview trap is anchoring. PMs default to 2 because it feels safe. A senior PM defends the choice with a comparable: "This is similar to the saved-payment-method change we shipped last year, which moved conversion by 1.4 percentage points — I'd score it Impact 2." If you can't name a comparable, drop your Impact one step.
Keep the scale consistent across the whole quarter's backlog. If "Massive 3" meant a 15% conversion lift in Q1, don't let "Massive 3" mean a 2% lift in Q2 just because the bar drifted.
Confidence: how much you trust your numbers
Confidence is a percent — how sure you are that your Reach × Impact estimate will hold up in reality. It is not the probability the feature succeeds; it is the probability your input numbers are correct.
A defensible scoring rubric:
- 100% — you have a prior A/B test, a shipped variant elsewhere, or hard funnel data
- 80% — you have directional analytics plus 5+ user interviews pointing the same way
- 50% — informed hypothesis, no quantitative validation
- 20% — you're guessing because the meeting demands a number
The honesty test: when in doubt, score 50%. PMs who report Confidence above 80% without a controlled experiment behind it are signaling overconfidence, and good interviewers will press: "What data drove the 90%?" If your answer is "intuition," your real Confidence is 50%.
Sanity check: if every feature in your backlog scores 80% Confidence or higher, the column is doing no work and you should recalibrate.
Effort: person-months, not vibes
Effort is the only divisor in the formula, which makes it the most leveraged input. One developer working one month = 1 person-month. Include design, QA, and PM time if they're meaningful.
A typical scope estimate looks like: one week of spec and design, two engineers for three weeks, one week of QA → roughly 2 person-months. Round to the nearest 0.5 and stop fiddling.
Effort should come from your tech lead, not from a PM playing engineer. If you're forced to estimate alone, multiply your gut by 1.5x — the planning fallacy is real and survives every framework. A common interview question: "Your dev says it'll take 2 months and the score puts the feature top of the list — do you ship it?" The answer is to probe the estimate's confidence interval before committing the team, not to wave the score around.
Worked example: scoring a real backlog
A consumer SaaS PM faces three Q3 candidates:
| Feature | Reach | Impact | Confidence | Effort | RICE |
|---|---|---|---|---|---|
| Reactivation push notifications | 50,000 | 1 | 100% | 1 | 50,000 |
| New signup onboarding | 10,000 | 2 | 80% | 4 | 4,000 |
| Enterprise admin dashboard | 50 | 3 | 70% | 6 | 17.5 |
The push notifications dominate: high reach, validated mechanism, tiny scope. The enterprise dashboard scores brutally low because the reach number tanks the numerator — that doesn't mean you skip it, it means you justify it on strategic grounds outside the RICE math (annual contract value, logo expansion, board narrative).
The PM presents this table in the prioritization review, not as a verdict but as the opening position. If a stakeholder argues the dashboard unblocks $400k of pipeline, that's exactly the conversation RICE is supposed to surface. The score did its job by forcing the trade-off into the open.
RICE vs ICE vs WSJF
Interviewers occasionally throw the framework comparison at you. Have a one-line answer for each:
| Framework | Inputs | Best for | Weak at |
|---|---|---|---|
| ICE | Impact × Confidence × Ease | Fast triage, hackathons | Ignores reach, biased toward easy wins |
| RICE | (Reach × Impact × Confidence) / Effort | Standard PM backlog, growth | Strategy, dependencies, platform bets |
| WSJF | Cost-of-Delay / Job Size | SAFe / enterprise roadmaps | Heavy ceremony, harder to defend numbers |
If asked which to use, the senior answer is: ICE for triage, RICE for the quarterly plan, WSJF only when your org already runs SAFe. Don't pretend a framework is universal.
Common pitfalls
Treating the score as an oracle is the most common failure mode. A junior PM reports "feature A has RICE 5,000 and feature B has 3,000, so we do A first" and stops thinking. The fix is to use RICE as the starting point of the trade-off conversation, not the conclusion. Sort by score, then explicitly check whether strategy, dependencies, or platform investment should override the math — and document the override.
Inconsistent Impact calibration silently corrupts the whole exercise. If "Impact 3" meant a 10-percentage-point lift in Q1 and a 1-point lift in Q2, the scores across quarters are noise. The fix is to keep a running list of past features with their actual measured outcomes and force the team to anchor every new Impact estimate against that ledger. Calibration beats precision — a slightly wrong but consistent scale produces correct rankings.
Inflated Confidence is the classic overconfidence trap, especially in stakeholder-heavy environments. PMs feel pressure to write 90% next to their favorite feature because 50% sounds weak. The fix is a hard rule: anything above 80% needs a named data source — a prior experiment, a shipped variant, or analytics with effect size. Without one, drop to 50% and own it. Interviewers respect calibrated humility far more than confident hand-waving.
Effort coming from PM gut instead of engineering produces fantasy scores. The PM divides by 2 person-months when the real number is 5, the feature shoots to the top, and the team misses the quarter. The fix is to refuse to commit Effort until tech lead and design have weighed in, and to inflate ambiguous estimates by 1.5x. Slow scoring is better than fast lying.
Ignoring dependencies turns RICE into a misranked queue. A foundational platform feature might score 200 while the consumer feature it unlocks scores 8,000. RICE has no native way to express "this enables that." The fix is a pre-pass over the backlog to mark dependencies explicitly, then score the dependency chains as bundles rather than individual rows.
Related reading
- ICE prioritization framework for PM
- Kano model for product managers
- JTBD framework: how to apply
- How to choose a North Star metric
- Product manager case interview guide
If you want to drill PM frameworks like RICE on realistic backlogs with feedback, NAILDD is launching with product manager case banks and a daily case routine.
FAQ
What's the difference between RICE and ICE?
ICE uses three factors — Impact, Confidence, Ease — and ignores Reach entirely. RICE adds Reach as a multiplier and replaces Ease (a 1-10 score) with Effort (person-months) in the denominator. ICE is faster for hackathons and early-stage triage where every idea touches roughly the same audience. RICE becomes essential once your backlog mixes features that touch 50,000 users with features that touch 50 — without Reach in the formula, ICE will wildly overrate niche bets.
When does RICE break down?
RICE struggles whenever the answer depends on context the formula can't see: strategic bets where you intentionally pursue low-reach enterprise plays, platform investments that unlock other features, or compliance work where the score is irrelevant because the feature is non-negotiable. It also fails when Impact is genuinely unmodelable — early discovery work, brand bets, or moonshots where Confidence is honestly 20% and the score becomes meaningless noise. Senior PMs flag these cases up-front and prioritize them outside the framework with explicit strategic reasoning.
How do I defend my RICE numbers in a roadmap review?
Cite the source for every input. Reach should reference a specific analytics dashboard or query. Impact should reference a comparable past feature with its measured lift. Confidence should reference the evidence underlying the percent. Effort should be attributed to your tech lead, dated, and noted as "still uncertain" if it is. The strongest defense isn't a higher score — it's transparent inputs, so when leadership disagrees they argue about the underlying claim rather than the math.
Should engineering be involved in RICE scoring?
Yes, for Effort, always. Optionally for Impact and Confidence when the feature is technically risky. PMs who score Effort alone consistently underestimate by 30-50%. The healthier ritual is a 30-minute joint scoring session per quarter where PM brings Reach and Impact, engineering brings Effort, and the room debates Confidence together. That meeting also tends to surface dependencies and platform risks RICE alone misses.
Is RICE used at top tech companies?
In practice, large companies blend frameworks. Meta and Google teams lean on OKR-driven impact estimation with internal versions of RICE for tactical bets. Stripe and Notion use RICE explicitly for quarterly planning. Linear and Vercel run lightweight ICE for weekly triage. Airbnb and DoorDash use a custom variant with explicit strategy weights bolted on. The framework you use matters less than the discipline of writing inputs down and revisiting them after launch.
How often should I re-score the backlog?
Once per quarter at minimum, before planning. Re-score sooner if a major input changes — a Reach number shifts because of a product pivot, a Confidence number changes because new experiment data came in, or an Effort estimate doubles after spec review. The anti-pattern is scoring once at the start of the year and treating the spreadsheet as immutable. RICE is a living artifact, not a contract.