Capacity planning in a systems analyst interview
Contents:
Why interviewers ask this
You are in the second round at a payments startup. The interviewer draws a box labeled "API" and asks, "How would you size this for 10 million MAU?" The role says systems analyst, not SRE, and nobody warned you that capacity planning would be on the menu. Almost every senior SA loop at Stripe, DoorDash, Uber, and Snowflake has one question forcing you to translate product numbers into machine numbers — events per second, gigabytes per day, connections per pod.
A systems analyst who cannot estimate load writes requirements that quietly explode in production. A spec that says "the endpoint should be fast" is useless. A spec that says "p99 under 300 ms at 8,000 requests per second with 40 percent headroom for Black Friday" is a contract engineers can build against and SREs can verify. That gap between vague and quantified is what the capacity planning question is testing.
Interviewers are not after the right answer. They want a structured walk through five moves: estimate, separate peak from average, add headroom, identify the first bottleneck, pick a scaling strategy with explicit cost trade-offs. Do those five in order and you pass even if numbers are off by 2×.
Estimation from product metrics
Start bottom-up from a product metric (DAU, orders per day, messages per user) and convert to a per-second technical number. Round aggressively — interviewers check reasoning, not arithmetic.
Given: 1,000,000 DAU
Assume: 50 actions per user per day
Events per day = 1,000,000 × 50 = 50,000,000
Seconds per day ≈ 86,400
Average events / second ≈ 50,000,000 / 86,400 ≈ 580 epsNow translate to storage. Pick a reasonable per-event size and multiply:
Average event size ≈ 1 KB (JSON payload + metadata)
Daily volume = 50,000,000 × 1 KB = 50 GB / day
Annual volume = 50 GB × 365 ≈ 18 TB / year
With columnar compression (Parquet, ZSTD) → ~1.8 TB / yearState the assumption out loud. "I am assuming 1 KB events because we are talking JSON click events, not video chunks." That sentence separates a senior answer from a junior one. If the interviewer disagrees, they will say so and you adjust.
Peak vs average load
The most common interview failure is sizing for average traffic. Real systems do not see uniform load — they get spikes routinely 5× to 10× the daily average, and the system has to survive the spike, not the mean.
| Pattern | Typical multiplier vs. daily average | Example |
|---|---|---|
| Hour-of-day peak | 2× to 4× | Office hours for B2B SaaS |
| Day-of-week peak | 1.3× to 2× | Sunday for streaming |
| Seasonal peak | 5× to 10× | Black Friday, tax day |
| Marketing event | 10× to 50× | Super Bowl ad, app store feature |
| Viral / unplanned | 20×+ | Reddit front page, celebrity tweet |
So if your back-of-envelope says 580 eps average, your sizing target is closer to 3,000 to 6,000 eps before headroom. Saying it explicitly — "I am sizing for 10× average because the product has Black Friday exposure" — separates a confident answer from a hand-wavy one.
Load-bearing trick: Always quote the peak multiplier you are assuming. "10× average" is the single most important number in the entire answer because every downstream calculation (instance count, DB connections, bandwidth) inherits from it.
Headroom and safety buffer
Headroom is the buffer above peak you provision so the system survives unexpected events. A common mistake is sizing exactly to peak, which leaves zero margin for instance failures, deploy spikes, or autoscaler lag.
Required capacity = peak load × (1 + headroom)
Standard headroom = 30% to 50%For our example: 5,800 eps peak × 1.4 = 8,120 eps provisioned. The reasons to carry that buffer:
A single instance failing means survivors absorb the load. A cluster of ten pods at 90 percent CPU losing one sends the remaining nine to 100 percent — they queue, crash, then cascade. Classic "thundering herd" outage, shows up in postmortems weekly.
Autoscaling is not instantaneous. New pods take 30 to 120 seconds to provision, pull image, warm caches, pass health checks. During that window you run current capacity against rising load. Headroom keeps you alive until the autoscaler catches up. Deploys add transient load too — rolling deploys briefly shrink the cluster, and without headroom you cannot safely ship.
Bottlenecks to call out
Interviewers love when you proactively name what will break first. Sizing the API tier perfectly does nothing if the database falls over at half that load. Walk through the stack and pick the weakest link.
Compute is rarely the bottleneck for I/O-bound services — modern x86 cores handle tens of thousands of small requests per second. Exceptions: image resize, large JSON parsing, cryptographic verification, ML inference.
Database connections and write throughput are the usual culprit. A Postgres primary tops out between 5,000 and 20,000 writes per second depending on row size and indexes. Connection limits hit far sooner — default 100 to 500 connections on managed Postgres, which is why pooling (PgBouncer, RDS Proxy) is mandatory above a few thousand active clients.
Network bandwidth matters for fan-out and cross-region traffic — a 1 Gbps NIC caps at 125 MB/s, which sounds generous until you push large payloads. External services impose rate limits you do not control — Stripe defaults to 100 read rps; Twilio and every payment gateway have ceilings. Include them or you throttle when the third party throttles you.
Disk I/O is frequently forgotten and the first thing to die. A managed Postgres on gp3 storage with 3,000 IOPS sounds generous until vacuum runs during a backfill and saturates the disk. Lock contention is not infrastructure but blocks scaling — one hot row in a counter table will serialize an entire fleet of stateless app servers. Finding the actual bottleneck usually requires load testing — k6, Locust, Gatling — against staging that mirrors production.
Autoscaling answer template
Autoscaling means adding or removing instances based on metrics. You do not need to write the YAML for an SA answer — you need to know which metric drives the trigger and what the gotchas are.
| Trigger | When to use | Watch out for |
|---|---|---|
| CPU utilization | Stateless web services, scale up at 70% | Lags I/O-bound workloads |
| Memory | JVM apps, ML inference servers | Memory pressure is hard to recover from |
| Queue depth | Worker pools draining Kafka, SQS | Spikes can blow autoscaler latency |
| Request rate (RPS) | API gateways, frontends | Most predictive but needs custom metrics |
| Response time p95 | When SLO is latency-driven | Reactive — already degraded |
Standard tooling: Kubernetes HPA, AWS Auto Scaling Groups, Cloud Run autoscale, Lambda concurrency. Each has its own scale-up speed and minimum step.
Three caveats. Scale-up takes time — provisioning, image pull, JVM warmup, health checks — plan 30 to 120 seconds before a new instance serves traffic. Cold start latency affects serverless: a fresh Lambda can take 500 ms to 5 seconds on the first request. Stateful tiers do not scale linearly with stateless ones — app pods go from 10 to 100, but Postgres primary cannot. Plan replicas or sharding before you hit that ceiling.
Cost trade-offs
The senior signal here is naming the money side explicitly. Over-provision and you waste budget. Under-provision and you lose customers.
| Capacity type | Discount vs. on-demand | Trade-off |
|---|---|---|
| On-demand | 0% | Pay full price, instant elasticity |
| Reserved (1-3 yr) | 30% to 70% off | Commitment, hard to change |
| Savings Plans | 20% to 50% off | Flexible across instance families |
| Spot / preemptible | 50% to 90% off | Can be terminated with 2 min notice |
| Multi-tier storage | Varies | Hot vs. cold paths priced differently |
Common pattern at companies like Netflix or DoorDash: baseline on Reserved (the floor you always need), diurnal peaks on On-demand (predictable elasticity), and batch workloads on Spot (massive savings on training and analytics). Storage tiers the same way — hot path on SSD, cold path on S3 Glacier at 1/10th the cost.
If you want to drill this question pattern, NAILDD has 1,500+ interview drills covering systems analyst, data analyst, and product manager loops.
A worked example: ride-hail surge
The interviewer asks: "Size the trip-request service for a ride-hail app launching in a new city."
500,000 MAU × 30% DAU ratio = 150,000 DAU
× 2 trip requests / day = 300,000 daily requests
÷ 86,400 ≈ 3.5 rps average
× 8 weekend peak ≈ 28 rps peak
× 1.4 headroom ≈ 40 rps provisioned
Per-pod throughput: ~500 rps for a small Go service
→ 1 pod needed for compute, but min replicas = 3 for HA
First bottleneck: Redis geo-index for driver matching
Cost: 3 pods × $30/month = $90 baseline + on-demand burstSanity check: If your final number is a round multiple of three digits with no caveats, you skipped a step. Real capacity answers always sound like "between 35 and 50 rps depending on assumption X."
Common pitfalls
Sizing for average traffic instead of peak is the most common failure in interview answers and in real systems. Average load looks comfortable until Friday night, when the spike multiplier kicks in and the cluster melts. The fix is to always state your peak multiplier — "I am assuming 8× average for a consumer app with weekend spikes" — and size to that plus headroom.
Forgetting stateful dependencies is the second classic trap. Candidates triumphantly autoscale the API tier from ten to a hundred pods and forget the database primary has a hard ceiling of, say, 8,000 writes per second. More pods just create more connection contention. Walk the full request path — API, cache, queue, DB, downstream — and flag the first non-linear scaler.
Ignoring autoscaler latency kills systems during sudden spikes. New pods take 30 to 120 seconds to ready, and a marketing email at 10:00 AM gives you zero window. Mitigations: pre-warm before known events, carry larger headroom on viral-exposed services, or use predictive autoscaling.
Treating third-party rate limits as someone else's problem is a senior-level red flag. If your service depends on Stripe or Twilio, their ceiling is yours. Otherwise you scale right up to the moment the partner returns 429s and your error rate explodes.
Skipping cost trade-offs entirely marks a junior answer. An engineer can size a system. A senior SA explains why this size and not 2× this size. Close with one sentence on on-demand vs reserved vs spot mix.
Related reading
- API gateway vs BFF systems analyst interview
- Cache strategies systems analyst interview
- CAP theorem systems analyst interview
- Backpressure systems analyst interview
- Case interview for systems analyst
- Systems analyst resume guide
FAQ
Does a systems analyst really need to do capacity math live?
Yes, at most senior loops. The interviewer is not grading you against an SRE — your numbers can be off by a factor of two and still pass — but they want to see the chain of reasoning from product metric to per-second technical number. Practice out loud with a five-minute timer, because the bottleneck under stress is verbal fluency, not arithmetic.
What is a reasonable peak multiplier if the interviewer gives no context?
Default to 5× average for B2B SaaS, 8× to 10× for consumer apps, and 20×+ if the product has viral or marketing exposure. State the assumption explicitly and offer to revise if the interviewer pushes back. They rarely push back on a clearly stated assumption — they push back on unstated ones.
How do I size databases when the interviewer expects only API-tier numbers?
Always volunteer the DB sizing even if not asked. Walk through writes per second, read amplification from joins, and connection count. A managed Postgres handles roughly 5,000 to 20,000 writes per second before you need sharding or a different engine. Calling this out before being asked is the highest-signal move in a capacity question.
When should I bring up autoscaling vs. fixed provisioning?
Bring it up after you have demonstrated you can size for peak. The order matters: estimate, peak, headroom, bottlenecks, then autoscaling. Leading with "we will just autoscale" is a junior answer because it hides the fact that you do not know what you are scaling to.
How important is cost in a systems analyst interview?
More important than candidates think. A senior SA is a translator between engineering and business. Showing an opinion on Reserved vs. On-demand vs. Spot mix, or hot vs. cold storage tiering, signals you can have a finance conversation without an engineer in the room. One sentence on cost trade-offs at the end of the answer is enough.
Is this guidance official from any framework or vendor?
No. It is a synthesis of standard SRE and capacity engineering practice used at hyperscalers and high-growth startups. Treat the five-step structure here as a scaffold for thinking, not a checklist to memorize.