Backpressure for systems analyst interview

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

What backpressure actually means

If you walk into a systems analyst loop at Stripe, DoorDash, or Uber, expect at least one question that hinges on what happens when a producer is faster than its consumer. The interviewer is not testing whether you can recite the Reactive Streams spec — they want to see whether you understand that infinite queues do not exist and that someone, somewhere, has to slow down. That mechanism — the consumer signalling "I'm full, stop pushing" — is backpressure.

Backpressure is the feedback channel from a slow downstream component back to a fast upstream one. In a healthy system, the producer adapts to the consumer's drain rate rather than burying it under load. The signal can be explicit (a TCP window shrinking, a Kafka consumer lag breaching a threshold, a request(n) call in Reactive Streams) or implicit (a bounded queue starts blocking on insert). Either way, the producer's effective throughput equals the consumer's, not its own.

This is the whole point of the question: if you say "we'll just scale the consumer," the interviewer will ask what you do at 09:58 on Black Friday when scaling takes 90 seconds and traffic is already 8x baseline. Backpressure is the protection you have before autoscaling catches up.

Load-bearing trick: every queue in your design must have a documented bound, a documented drop policy, and a documented signal back to the producer. If any of the three is "we'll figure it out," you fail the round.

Life without backpressure

Without backpressure, two failure modes appear, and senior interviewers will probe both. The first is unbounded queue growth:

Producer  : 10,000 events/sec
Consumer  :    100 events/sec
Queue     : grows by 9,900 events/sec
Result    : memory exhaustion → OOM kill → cascading restart loop

The second is silent data loss when someone "fixes" the OOM by removing the buffer:

Producer  : 10,000 events/sec
Consumer  :    100 events/sec
Queue     : none — direct hand-off
Result    : 99% of events dropped silently, no signal upstream

Both look fine in a happy-path demo and catastrophic in production. The interviewer wants you to name both before they prompt you. Naming the failure mode is half the answer.

Strategies you can actually defend

There are six strategies worth knowing by name. The trick is not memorising them — it is knowing when each one is acceptable and when it gets you fired.

Strategy What it does Acceptable when Disaster when
Bounded buffer Fixed-size queue, blocks producer on full Bursty load, short-lived spikes Producer cannot block (HTTP request)
Drop oldest Evict head of queue on overflow Stale data is worse than missing data (telemetry) Audit/financial events
Drop newest Reject new arrivals on full Existing work must complete (ETL batch) Real-time pricing
Pause producer Push back via flow control Pull-based consumer, e.g. Kafka Push-only sources (webhooks)
Throttle / rate limit Cap producer rate at source Predictable workloads, SLA-driven APIs Bursty critical traffic
Spillover to disk Overflow to durable store Eventual consumption guaranteed Latency-critical paths

If you only memorise two, memorise bounded buffer and drop oldest — they cover roughly 70% of real interview scenarios. Bounded buffer with drop-oldest is what Kafka producers do by default with block.on.buffer.full=false, and what most metrics pipelines (Prometheus remote-write, Datadog agent) implement under the hood.

The choice depends on three questions the interviewer expects you to ask: how critical is each event, how stale is too stale, and can the producer be blocked. A payment authorisation event answers "critical, sub-second freshness, cannot block" — which rules out four of the six rows above and forces a conversation about load shedding at the edge before the queue is even touched.

Sanity check: if your answer does not mention what happens to the rejected work — retried, logged, surfaced to the user, written to a dead-letter queue — you have not finished the answer. Backpressure without a story for the dropped or paused work is half a design.

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

Reactive streams in one whiteboard

For pull-based async systems, the industry standard is the Reactive Streams spec, which formalises backpressure as a contract between Publisher and Subscriber. The subscriber explicitly requests n items; the publisher must emit no more than n before the next request arrives. This inverts the usual push model: the consumer dictates pace.

The four operators that come up most often in interviews:

  • buffer(n) — bounded in-memory queue, backpressure when full.
  • conflate() — drop intermediate values, keep only the latest (perfect for UI state).
  • sample(duration) — emit the most recent value at a fixed interval (perfect for sensor streams).
  • onBackpressureDrop() — explicit drop on overflow with a hook for logging.

A minimal example in Kotlin Flow that you can sketch on a whiteboard in under a minute:

flow {
    repeat(1_000_000) { emit(it) }
}
    .buffer(capacity = 100)        // bounded in-memory buffer
    .conflate()                    // drop intermediate, keep latest
    .collect { value ->
        delay(10)                  // slow consumer
        process(value)
    }

Equivalent in Project Reactor:

Flux.range(0, 1_000_000)
    .onBackpressureBuffer(100, dropped -> log.warn("dropped {}", dropped))
    .publishOn(Schedulers.parallel())
    .subscribe(this::process);

The implementations you should be able to name without googling: Project Reactor (Java/Kotlin, used at Netflix, Salesforce), RxJava 3 (Android, ex-Netflix), Akka Streams (JVM, used at LinkedIn, Verizon), Kotlin Flow (cross-platform JetBrains stack), and RxJS (Angular, NgRx). For the systems analyst track you do not need to write code in any of them — but you do need to explain why a request(n) pull model is fundamentally different from a fire-and-forget push.

A tip that lands well in loops: mention that TCP itself is a backpressure system — the receive window shrinks when the application stops reading, the sender slows down. Most candidates miss this. Interviewers light up.

Common pitfalls

When a systems analyst candidate fumbles backpressure questions, it is almost always one of the same five mistakes. The first is confusing throttling with backpressure. A rate limiter caps the producer regardless of consumer state — it is a unilateral policy. Backpressure is bilateral: the consumer's state changes the producer's rate. Both are useful, but they solve different problems. If your design only has a rate limiter, the consumer can still drown when the cap is set above its real drain rate.

The second is assuming Kafka has unlimited buffering. Kafka brokers are durable, but the producer's in-memory accumulator is bounded by buffer.memory (default 32 MB) and max.block.ms (default 60 s). When the accumulator fills, the producer either blocks or throws TimeoutException. Candidates who say "we'll just write to Kafka and forget about it" get a follow-up about what their service does when the broker is unreachable for two minutes — and most of them have no answer.

The third is ignoring the dropped-work problem. Saying "we drop excess events" without explaining where they go, who alerts on the drop rate, and what the user-visible behaviour looks like is an incomplete answer. A senior interviewer will press: what's the SLO for drop rate? Who pages when it breaches? Does the client retry, and with what backoff? Without a story for the rejected work, you do not have a design — you have a bug behind a politer name.

The fourth is placing backpressure too late in the chain. If the only bounded queue is at the database write step, the entire upstream pipeline can fill memory before the signal propagates back. Backpressure should propagate from the slowest stage all the way to the ingress — the load balancer or API gateway. Otherwise, every stage between the bottleneck and the source becomes its own potential OOM. The signal must travel the full distance for the design to actually protect anything.

The fifth, more subtle, is mixing backpressure with retries without coordination. A consumer rejects a batch because it is overloaded; the producer retries immediately; the consumer is still overloaded; the producer retries again. You now have an amplification loop where backpressure makes the situation worse. The fix is exponential backoff with jitter on the producer side, and ideally a circuit breaker that trips when overflow rate stays above a threshold for some window — say, drop rate > 5% for 30 s.

If you want to drill systems-analyst questions like this one against a feedback loop instead of a blank page, NAILDD has hundreds of distributed-systems prompts with worked answers.

FAQ

Is backpressure the same as rate limiting?

No, and conflating them is one of the fastest ways to lose points in a loop. Rate limiting is a unilateral policy: the producer caps itself at some rate regardless of consumer state. Backpressure is a bilateral feedback signal: the consumer tells the producer to slow down based on its own observed load. A well-designed system uses both — rate limiting at the edge to defend against malicious bursts, backpressure inside the pipeline to coordinate between cooperating stages.

When should I prefer drop-oldest over drop-newest?

Drop-oldest is correct when stale data is less useful than fresh data: live metrics dashboards, real-time location updates, current stock prices. Drop-newest is correct when each event represents committed work that must complete in order: ETL batches, message-ordered domain events, audit logs. If the interviewer hands you a scenario without specifying, the safest move is to ask which property — recency or completeness — the business actually values. That question alone signals seniority.

How does TCP implement backpressure?

TCP uses a sliding receive window. The receiver advertises how many bytes it can accept; the sender must not exceed that. When the application stops reading from the socket, the kernel buffer fills, the advertised window shrinks toward zero, and the sender slows down or pauses. This is flow control at the transport layer, and it is the canonical example of backpressure most candidates forget to mention. Bringing it up unprompted reliably scores well.

What about HTTP — does it support backpressure?

HTTP/1.1 has only a coarse mechanism via TCP flow control. HTTP/2 adds per-stream flow control with WINDOW_UPDATE frames, so a slow server can throttle individual streams without affecting others on the same connection. gRPC, built on HTTP/2, inherits this and exposes it through its streaming APIs. For interview purposes: HTTP/1.1 = TCP-only, HTTP/2 = per-stream, gRPC = per-stream with client/server symmetry.

How do I monitor backpressure in production?

The three signals you should be able to name: queue depth (current size of every bounded buffer), drop rate (events rejected per second, by reason), and producer block time (cumulative time the producer spent waiting on a full buffer). Alert on sustained high values, not single spikes — a buffer briefly hitting capacity is normal under load. The threshold most teams settle on is queue at >80% capacity for >2 minutes. Pair this with a dashboard that shows producer rate vs consumer rate over the same window; the gap is the leading indicator.

Can backpressure cause user-visible latency?

Yes — by design. If the producer pauses or blocks, requests upstream queue up, and end-to-end latency climbs. The point of a well-engineered system is to make that trade-off explicit: rather than crashing under load (worst possible UX), the system slows down predictably (degraded but functional UX). Pair backpressure with request shedding at the ingress — return 503 Service Unavailable with a Retry-After header — so the slowdown becomes a clean signal to the client rather than a mysterious timeout.