North Star metric for product managers
Contents:
What a North Star metric actually is
A North Star Metric (NSM) is the single product number that captures the value users get from your product. It is not revenue, it is not DAU, and it is not a board-deck vanity number — it is a count of value delivered, sampled at a cadence your team can actually act on. The whole point is to give a 40-person product org one shared rallying number that beats arguing about a dashboard with sixty tiles.
Three things the NSM does. First, it ends quarterly priority debates: every initiative either moves the NSM or it does not. Second, it connects today's shipped feature to next year's revenue without a finance background. Third, it survives reorgs — the org chart changes, the NSM does not.
The North Star is a count, not a ratio. Ratios disguise growth — a flat ratio with a doubling base is enormous progress, and the executive review will not see it. Counts make growth legible.
Walk into a senior PM interview at Stripe, Notion, or Linear and expect this exact question: "What North Star would you choose for our product, and why?" The interviewer is testing whether you separate proxy of value from lagging financial output.
Criteria of a good North Star
A defensible NSM passes five tests, in this order:
- It reflects delivered value, not activity. "Logged in this week" is activity; "completed a workout" is value. The verb matters.
- It correlates with long-term retention. Users who hit the NSM more often in week 1 should still be around at D30 and D90. If they are not, you picked an engagement-bait metric.
- It is measurable in your warehouse without a data engineering project. If computing the NSM takes a 200-line dbt model with three CTEs, nobody will look at it daily.
- A non-technical teammate can explain it in one sentence. Slack's "messages sent within paid teams that reached 2,000 messages" is famously specific but still one sentence. That is the bar.
- Product decisions move it, not marketing or PR. If a paid acquisition campaign can spike your NSM, it is a top-of-funnel metric, not a North Star.
Load-bearing rule: If you cannot draw a straight line from "a designer shipped a better empty state" to "the NSM moves," your NSM is too far downstream. Pull it earlier in the value chain.
Industry examples
The strongest NSMs are public-domain at this point. Look at the verb in each one — that is where the team's product theory lives.
| Company | North Star metric | Why this verb, not another |
|---|---|---|
| Spotify | Time spent listening | Listening time correlates with subscription retention better than DAU; passive plays do not count. |
| Airbnb | Nights booked | A booking is the value moment for both guest and host; views and searches are noise. |
| Notion | Weekly active teams creating content | Teams, not seats — collaborative creation is the moat, solo note-taking is not. |
| Stripe | Payment volume processed for live businesses | "Live" excludes test mode; volume aligns with the business model without becoming pure revenue. |
| Airtable | Weekly active bases with collaborators | A base with one editor is a spreadsheet; a base with three is a workflow. |
| Linear | Issues closed per active team per week | The product exists to ship work; closing issues is the verb that proves it. |
| Figma | Multiplayer files edited per week | Single-player edits are Sketch; multiplayer is the wedge. |
| Slack | Messages sent within teams that crossed the 2,000-message threshold | The 2,000-message marker is the empirical retention cliff. |
Notice how every metric has a qualifier — "within paid teams," "with collaborators," "for live businesses." The qualifier is what separates a North Star from a vanity count. Without it, you ship feature flags that pad the number; with it, you ship features that pad value.
This is also why "DAU" alone fell out of fashion at Meta — shallow engagement scaled while satisfaction did not.
How to pick a North Star
The selection algorithm is mechanical once you have a value hypothesis.
- Name the core value moment. Not a feature — an outcome. For a food-delivery marketplace, the outcome is a meal arrives, hot, that the customer wanted. Not "an order is placed."
- Find the count of that moment. "Orders with a 4-or-5-star rating per week" counts the moment. "GMV" counts the dollars, which is a consequence.
- Test the retention correlation. Bucket users by their week-1 NSM count. Do high-count users retain at D30? If a user with 5 NSM events in week 1 retains at 70% but a user with 1 retains at 22%, the metric is predictive. If the curves are flat, the metric is engagement theater.
- Test the revenue correlation. Cohorts with rising NSM should have rising LTV 90 days later. If LTV is flat while NSM climbs, you are gaming yourselves.
- Pressure-test with the team. Read the candidate metric to engineering, design, and support. If they cannot name three things they would build to move it, it is too abstract.
A worked example for the food-delivery marketplace, with three candidate metrics and how they score:
| Candidate | Reflects value? | Drives retention? | Movable by product? | Verdict |
|---|---|---|---|---|
| Gross merchandise value (GMV) | No — counts dollars, not joy | Weakly | Partially | Reject — it is a financial output |
| Orders per week per active user | Partial | Yes | Yes | Promising but ignores quality |
| Orders with 4-5 star rating per week | Yes | Yes (strongest) | Yes | Adopt |
The third candidate wins because a bad order — cold food, missing item, rude driver — should not count. Counting all orders rewards volume; counting good orders rewards the product fixing the parts of the experience that customers actually feel.
Metric hierarchy under the NSM
The NSM is the apex; underneath sits a tree of drivers and operational metrics. The hierarchy is what turns the NSM from a poster into a planning tool.
North Star: 4-5★ orders per week
│
├── Driver: Acquisition
│ └── New users with first order
├── Driver: Activation
│ └── Share of users with 2+ orders in week 1
├── Driver: Retention
│ ├── D30 return rate
│ └── D90 return rate
├── Driver: Frequency
│ └── Orders per month per active user
└── Driver: Quality
├── Share of orders rated 4-5★
├── On-time delivery rate
└── Order accuracy rateEach driver has an owner. Acquisition is usually a growth PM, activation belongs to onboarding, retention sits with lifecycle, frequency with merchandising, quality with operations. The senior PM who owns the NSM does not own every driver — they own the portfolio decision of which driver to push this quarter.
Sanity check: If two drivers move opposite directions and the NSM stays flat, you do not have a problem with the NSM — you have a coordination problem between two teams. The hierarchy is doing its job by surfacing it.
A common second-level breakdown looks like the table below. The numbers are illustrative for a Series B consumer marketplace, not benchmarks to copy.
| Driver | Healthy quarterly trend | Warning threshold | Owner |
|---|---|---|---|
| Activation rate (2+ orders in week 1) | +3 to +5 points | Flat or declining 2 quarters in a row | Onboarding PM |
| D30 retention | +1 to +3 points | Drop of >2 points QoQ | Lifecycle PM |
| Frequency per active | +0.2 to +0.5 orders/mo | Flat with rising acquisition spend | Merchandising PM |
| 4-5★ share | ≥85% sustained | Drop below 80% | Ops + Quality |
The point of the table is not the numbers — it is that each row has an owner and a threshold. A driver without an owner is a driver nobody is moving.
Common pitfalls
Equating NSM with revenue. Revenue is the lagging consequence of delivered value, not value itself. If your NSM is ARR, you have built a finance dashboard, not a product compass. The fix is to write down the user outcome that produces revenue and count that outcome — ARR climbs as a side effect, and you know why it climbed.
Picking an NSM nobody can recite. "Weekly active users who completed five events including one purchase and rated four-plus" is a query, not a metric. If the PM org cannot recite the NSM at standup without reading it, the metric is dead. Slack's version is one sentence with one qualifier — that is the maximum complexity humans tolerate.
Never revisiting the NSM. Products evolve and the value hypothesis from year one is often wrong by year three. Notion shifted from "active users" to "active teams creating content" as the company pivoted from solo notes to collaboration. Schedule a yearly NSM review to force the question "is this still the right verb?" and document the answer either way.
Letting marketing influence the NSM. If a paid campaign can spike your North Star by 20% in a week, your NSM is too top-of-funnel. The metric should be downstream of marketing's reach and upstream of finance's revenue — the middle stretch where product actually lives.
Maintaining more than one North Star. A North Star is singular by definition; the moment you have three, you have OKRs. Multi-star orgs end up with teams optimizing numbers that fight each other — engagement minutes versus paid conversion, with no shared metric to reconcile them. Pick one number, and let the drivers handle the nuance.
Related reading
- AARRR framework: pirate metrics for product growth
- A/B testing for product managers
- Guardrail metrics in A/B testing
- Cohort analysis interview prep
If you want to drill product-sense and metrics questions like this on a daily cadence — including North Star teardowns from Stripe, Notion, Linear, and Airbnb — NAILDD is launching with 500+ PM and analytics interview problems shaped exactly around this pattern.
FAQ
How is the NSM different from an OKR?
The NSM is a permanent strategic compass; an OKR is a time-boxed quarterly or annual commitment. An OKR can include the NSM as its Objective, and the Key Results are usually drivers underneath the NSM. The NSM should not change every quarter — if it does, you have not picked an NSM, you have picked a moving KPI dressed up in starry language. A healthy company keeps the same NSM for 2 to 4 years and only revisits it after a major product pivot.
How is the NSM different from a KPI?
A KPI is any indicator a team tracks; you have many. An NSM is exactly one per product, sitting at the top of the hierarchy. Every KPI in the product org should ladder up to the NSM, otherwise the KPI is measuring something the company has not decided to care about. Practically: KPIs are how individual teams talk about their progress; the NSM is how the whole product org talks about progress to the CEO.
Can a B2B SaaS use the same playbook as a consumer app?
Yes, with one substitution — count active accounts or active teams, not active users. A 10-seat team with 3 active users is often a healthier signal than a 100-seat team with 4 active users, and per-user metrics will mislead you. Notion, Linear, Figma, and Slack all use team-level NSMs for exactly this reason.
Should the NSM be a leading or a lagging indicator?
Leading, but not so leading that it becomes activity. The NSM sits between activation (very leading, very early) and revenue (very lagging, very late). The sweet spot is roughly the value moment — the action where the user gets what they came for, repeated often enough to be measured weekly. If the NSM only moves quarterly, it is too lagging; if it spikes from a tutorial completion, it is too leading.
What if my product has two genuinely distinct user types?
Marketplaces are the classic case — Airbnb has guests and hosts, DoorDash has eaters and merchants and dashers. You still pick one NSM, but it is usually the transactional verb that requires both sides: nights booked, orders delivered. That metric only moves if both sides of the marketplace are healthy, which is the whole point of using one NSM instead of two.
How long should I wait before declaring a new NSM "the" NSM?
Run it in parallel with your existing primary metric for one to two quarters. Watch whether the new NSM predicts retention and revenue better in cohort backtests. If it does, retire the old metric publicly and migrate dashboards. Switching the NSM without a parallel period loses the trust of engineering and design.
Is the NSM an official term from a specific framework?
No. The phrase was popularized by Sean Ellis and the early growth-hacking community around 2010 to 2015, and the public examples were assembled from talks and company case studies. What matters is the practice: one count of delivered value, owned by product, ladderable to revenue.