How to choose a North Star metric
Contents:
Why this choice is load-bearing
The North Star metric is the single number a company points at when it wants every team to row in the same direction. Pick it well and growth, engineering, and design argue about how to move it, not whether the metric is the right one. Pick it poorly and you spend the next two years watching teams optimize numbers that nobody outside their squad cares about — and you eventually re-do the exercise anyway, after a board meeting in which someone asks why DAU is up but revenue is flat.
This guide is about the selection process, not the metric itself. There is no universal North Star — Spotify's time listened, Airbnb's nights booked, and Slack's teams with 2,000+ messages all work because they fit their business model, not because they are clever. What you need is a repeatable way to walk into a workshop, evaluate three or four candidates, and walk out with one number plus the language to defend it.
Load-bearing trick: a North Star metric is not the metric you measure most often. It is the metric you would still measure if you could only measure one thing all year.
This is also the question that shows up in product analytics interviews at Meta, Stripe, Airbnb, and DoorDash, usually phrased as "What would you choose as the North Star for [our product] and why?". The interviewer is not testing whether you have memorized Spotify's choice. They are testing whether you can reason about user value, business model, and measurement constraints in the same sentence.
The four selection criteria
A North Star candidate has to clear four bars. Skipping any of them produces a metric that either looks great on the dashboard while the business slowly dies, or moves only when a different team ships something unrelated.
User value. The metric has to capture the value the user actually receives. Daily active users is a participation count, not a value count — a user who opens the app, sees nothing useful, and leaves still counts as DAU. Questions answered correctly per week, minutes of music listened to, nights booked — these all encode the moment the user got what they came for. If you can describe your candidate metric without using the word user, you probably haven't picked one yet.
Leads revenue. The metric has to be a leading indicator of money, with a lag short enough to act on. Spotify's time listened leads paid subscriptions by roughly 30-60 days. Airbnb's nights booked is revenue, with a small delay for refunds. NPS is famously weakly correlated with next-quarter revenue at most companies, which is why it rarely survives as a North Star.
Actionable by product teams. Stock price is not a North Star. Market share in a category dominated by macroeconomic conditions is not a North Star. The metric must move when teams ship features, run experiments, or change pricing — not when the Fed changes rates.
Measurable daily or weekly with the data you already have. A metric you can only compute quarterly from a survey is useless as a steering wheel. The data pipeline for the North Star should be boring: an event table, a join, a GROUP BY week, done.
A scoring rubric you can run in a workshop
The fastest way to choose between three plausible candidates is to score them on the four criteria and force the team to write numbers, not vibes. Below is the rubric I use in workshops. Each criterion gets a 1-5 score, and the candidate with the highest total wins — unless one criterion is a zero, in which case the candidate is disqualified regardless of total.
| Criterion | 1 (fails) | 3 (passes) | 5 (excellent) |
|---|---|---|---|
| User value | Pure participation count (DAU, sessions) | Action that implies value (purchases) | Action that is the value (minutes listened, nights stayed) |
| Leads revenue | No correlation or unknown lag | Correlated, lag 1-2 quarters | Strong correlation, lag under 60 days |
| Actionable | Driven by macro factors | Some product levers exist | Multiple shippable levers per quarter |
| Measurable | Survey or manual rollup | Daily from one warehouse table | Real-time from existing event stream |
The rubric is deliberately blunt — fancier weighting schemes give the illusion of precision and let politics hide inside the weights. If your team disagrees on a score, the disagreement itself is the useful output. It usually means people have different mental models of what the product does for users.
A practical wrinkle: if two candidates tie, prefer the one with the shorter measurement cycle. A metric you can read weekly will accumulate ten times more decisions per year than one you read monthly, and most product organizations are bottlenecked on decisions, not on data.
Worked examples from real products
Pulling apart the metrics chosen by mature products is the fastest way to internalize the rubric. Each of these scores at least a 4 on every criterion, and they all share a structural pattern: a unit of value, a depth threshold, and an aggregation that survives sampling.
| Product | North Star | Unit of value | Depth signal |
|---|---|---|---|
| Spotify | Time spent listening | Minutes | Per-user minutes, not session count |
| Airbnb | Nights booked | A night of stay | Booked, not searched |
| Slack | Teams with 2,000+ messages sent | A team | Threshold of 2,000 messages |
| Messages sent | A message | Per-user volume | |
| Netflix | Hours streamed | An hour of video | Per-account, weekly |
| Stripe | Total payment volume processed | A dollar of GMV | Net of refunds |
Slack's choice is the most instructive because the depth threshold does so much work. Teams with 2,000+ messages sent is engineered to catch teams that have actually adopted the product — under that bar, most teams churn within ninety days. Slack discovered the 2,000-message threshold empirically by plotting six-month retention against cumulative messages sent and finding the inflection point. The North Star encodes that retention insight directly into the metric definition, which means every team optimizing it is implicitly optimizing retention.
Sanity check: if your North Star has no depth threshold, you are probably measuring participation, not value. Add one.
Airbnb's choice is interesting for the opposite reason. Nights booked has no threshold — but the unit itself contains the depth, because a single booking can be one night or fourteen nights, and a fourteen-night booking represents fourteen times the user value and roughly fourteen times the revenue. The depth is baked into the unit.
Components: breadth, depth, frequency, efficiency
When you are building a candidate from scratch rather than borrowing one, decompose the user's main action into four components and ask which combination matters.
Breadth is reach — how many users take the action at all. Depth is how much each user does — minutes per user, messages per team. Frequency is the temporal density — sessions per week, bookings per quarter. Efficiency is the quality — share of sessions that end in a completed action, share of messages that get a reply.
A North Star usually multiplies two of these. Weekly active users × messages sent per active user is a breadth × depth metric, which is roughly what WhatsApp tracks. Bookings × average nights per booking = nights booked is breadth × depth in disguise. Time listening = users × minutes per user × days active per week is a triple product.
The danger of single-component North Stars is that teams game the easy lever. If your North Star is pure breadth, growth marketing wins and the product gets shallower. If it is pure depth, you optimize for power users while abandoning new ones. The multiplicative structure forces teams to keep both honest.
Tiers of metrics around the North Star
The North Star does not stand alone. It sits at the top of a tree:
- L0 — North Star. One metric, company-wide.
- L1 — Input metrics. Three to five drivers that mathematically roll up into the North Star. For Slack: number of integrations adopted per team, daily active members per team, messages per active member.
- L2 — Team metrics. Ten to twenty tactical metrics per team, each owned by a specific squad. A growth team owns activation rate; a platform team owns p95 latency.
- L3 — Feature metrics. Per-feature numbers used inside A/B tests. These are not reported to the executive team.
The art is keeping the L1 inputs truly mathematically connected to the L0. If you cannot write North Star = f(L1_1, L1_2, ...) on a whiteboard, your inputs are aspirational, not driver metrics. Teams will optimize them and the North Star will not move, and within a quarter the executive team will lose faith in the whole framework.
Common pitfalls
The first trap is copying a competitor's metric without their context. Meta uses daily active users because their entire business model is ad impressions, and DAU is the closest leading indicator of ad inventory. A B2B SaaS product copying DAU as its North Star ends up optimizing logins, which is not what their customers pay for. The fix is to start from your own business model — subscription, marketplace, ad-supported, freemium — and let that constrain the candidate list before you look at anyone else's choice.
The second trap is picking too early. In a pre-product-market-fit phase, the user base is too small and too noisy for any aggregate metric to be meaningful. Teams that crown a North Star at this stage end up steering by noise and missing qualitative signals from the small number of users who actually love the product. A working rule: do not pick a permanent North Star until you have at least 10,000 weekly active users and stable week-over-week retention curves.
The third trap is gameable metrics. If your North Star is sessions per user and a clever growth team realizes they can split one session into three by tweaking the inactivity timeout, the North Star will go up by 200% in a week and nobody will buy more. Pressure-test every candidate with the question: what is the cheapest, least value-creating way to move this metric? If the answer is uncomfortably cheap, the metric is gameable and needs a depth threshold or a quality denominator.
The fourth trap is rigidity. A North Star is right for a phase, not forever. Early-stage products usually optimize for user growth; growth-stage products switch to engagement; scale-stage products switch to monetization efficiency. Companies that refuse to update their North Star when the business model evolves end up with engagement-up-revenue-flat dashboards that everyone privately ignores. Plan to revisit the choice annually and expect it to change every two to three years.
The fifth trap, the most subtle, is declaring a North Star without a clear definition document. Six months later, two teams are reporting different numbers because one is filtering out internal users and the other is not, or because one is using event time and the other ingestion time. The metric must have a definition document with the SQL, the filters, and the owner — otherwise the political fights about the number replace the productive fights about how to move it.
Related reading
- North Star metric for product managers
- AARRR framework — pirate metrics
- DAU explained for PMs
- Activation framework for product managers
- Growth loops primer for PM
If you want to drill product-sense interview questions like "What would you choose as the North Star for X?" with structured feedback, NAILDD is launching with hundreds of PM and analytics problems built around exactly this pattern.
FAQ
Can a company have more than one North Star?
Strictly, no — the whole point is alignment around a single number. In practice, a few companies run a pair of North Stars, typically one engagement metric and one revenue metric, to keep the two in tension. Amazon famously runs a constellation rather than a single star. For most companies under a few thousand employees, one North Star with three to five input metrics is the right shape. Adding a second star usually signals that the first one is wrong, not that two are needed.
Does this work for B2B products?
Yes, with adjustments. The unit of value in B2B is typically a team or account, not a user, because the buying unit is the team. Slack's North Star is teams-based for exactly this reason. Depth thresholds matter even more in B2B because the asymmetry between casual and committed accounts is enormous — a single Fortune 500 account can outweigh a thousand free trials in revenue terms. The four selection criteria are the same; only the unit changes.
How long should the cohort window be for evaluating whether a candidate "leads revenue"?
For consumer subscription products, run the correlation between the candidate metric and revenue with lags of 0, 30, 60, and 90 days, and pick the lag with the highest correlation as your evidence. For B2B SaaS with annual contracts, you may need to look at 6-12 month lags because the buying cycle is slower. If the candidate doesn't correlate at any lag under one year, it is not leading revenue — it is unrelated to revenue, and you have a different candidate to find.
What if my product is too new to have any revenue yet?
Then your North Star should be a value proxy, not a revenue proxy, and you should explicitly time-box it. Activation rate or time to first value are common early-stage North Stars. Set a milestone — for example, reach $1M ARR — at which you will re-evaluate and switch to a revenue-linked metric. Do not pretend a pre-revenue metric is your forever North Star; the moment revenue exists, the criteria shift and the metric will need to follow.
How do I get executive buy-in on the choice?
Run the rubric workshop with the executive team in the room, not separately. The scoring exercise itself is the alignment mechanism — by the time you have argued through four criteria for three candidates, the executives own the choice in a way they never would if you presented a fait accompli. Bring the data: correlations between each candidate and revenue, the depth-threshold inflection chart, and at least one back-of-envelope projection of where the metric goes if you ship the obvious next bets. The rubric without data is theater; the data without the rubric is a slide nobody remembers.
Should the North Star be public to employees?
Yes, including the definition, the current value, the target, and the input tree. Companies that hide their North Star end up with teams optimizing local metrics that pull in opposite directions. Companies that publish it — on the wall, in the all-hands deck, in the onboarding doc — get the alignment they were trying to buy by picking the metric in the first place. The whole exercise is wasted if only the executive team knows the answer.