Inbox pattern for systems analyst interviews

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

Why the inbox pattern exists

Most modern message brokers — Kafka, RabbitMQ, AWS SQS, Google Pub/Sub — guarantee at-least-once delivery, not exactly-once. That phrasing sounds harmless in a slide deck, but on a real consumer it means the same event will arrive two, three, sometimes ten times during a retry storm. If your service charges a card, ships a package, or sends a notification on every received event, at-least-once becomes at-least-once-too-many.

The inbox pattern is the receiver-side answer to that problem. The idea is small and unglamorous: before you do anything with an incoming event, you write it into a local inbox table keyed by event_id. A unique constraint guarantees that the second copy of the same event collides and gets dropped, so the business logic runs exactly once even though the broker keeps redelivering. It is the symmetric twin of the outbox pattern, which solves the same class of problem on the sending side.

Load-bearing trick: the inbox is not a queue. It is a deduplication ledger plus a worker checkpoint, sitting inside the same database as your domain tables so that "we saw this event" and "we acted on this event" can be committed in a single transaction.

When an interviewer at Stripe, Uber, or Airbnb asks you to design a payment webhook handler, a fulfillment service, or a notification fan-out, the inbox is the one pattern that lets you answer "how do you handle duplicates?" without hand-waving about idempotency.

Reference implementation

The minimum viable inbox table is four columns. Everything else is optional polish:

CREATE TABLE inbox (
  event_id      TEXT PRIMARY KEY,
  aggregate_id  TEXT NOT NULL,
  payload       JSONB NOT NULL,
  received_at   TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  processed     BOOLEAN NOT NULL DEFAULT FALSE,
  processed_at  TIMESTAMPTZ,
  attempt_count INT NOT NULL DEFAULT 0,
  last_error    TEXT
);

CREATE INDEX inbox_unprocessed_idx
  ON inbox (aggregate_id, received_at)
  WHERE NOT processed;

The receive path is a single insert. Duplicates from the broker silently collapse:

INSERT INTO inbox (event_id, aggregate_id, payload)
VALUES ($1, $2, $3)
ON CONFLICT (event_id) DO NOTHING;

A pool of workers polls the unprocessed rows and locks them so two replicas do not pick up the same event:

BEGIN;

SELECT event_id, aggregate_id, payload
FROM inbox
WHERE NOT processed
ORDER BY received_at
LIMIT 100
FOR UPDATE SKIP LOCKED;

-- run business logic for each row, then:

UPDATE inbox
SET processed = TRUE,
    processed_at = NOW()
WHERE event_id = ANY($1);

COMMIT;

The two clauses that matter are ON CONFLICT DO NOTHING on insert and FOR UPDATE SKIP LOCKED on read. The first kills duplicates. The second lets you scale workers horizontally without coordination — each replica grabs a different slice of the backlog. Combine them with the requirement that side effects and the processed = true update share one transaction, and you have receiver-side exactly-once.

The non-obvious win here is that the inbox table doubles as a free audit log — every event ever delivered, with its raw payload, sits in one queryable place.

Idempotency on the receiver

Idempotency is the property that running the same operation twice produces the same result as running it once. Brokers cannot give you idempotency for free, because they don't know what your business logic does. The inbox pattern shifts the responsibility one layer up: by storing the event_id with a uniqueness constraint, the database becomes the dedup oracle.

There are three places where idempotency can break, and you should expect interviewers to drill into each of them:

Layer Failure mode Inbox mitigation
Insert Same event arrives twice in parallel PK on event_id + ON CONFLICT DO NOTHING
Process Worker crashes mid-side-effect FOR UPDATE SKIP LOCKED + transactional processed=true
Side effect External API call retried after partial success Send idempotency key = event_id to downstream service

The third row is where most candidates trip. Even with a perfect inbox, if your worker calls Stripe's POST /charges without an idempotency key and crashes after the HTTP request but before the local UPDATE, the next poll will retry — and you'll charge the card twice. The fix is to pass event_id (or a hash of it) as the Idempotency-Key header to every external system that supports it. Inbox plus downstream idempotency key gives you end-to-end exactly-once.

Sanity check: if you can describe a failure where the inbox commits but the side effect runs twice, your design is wrong. The side effect and the processed=true update must live in the same transaction, or the side effect must itself be idempotent.

Ordering and per-aggregate workers

The naive worker pool processes events in receive-order globally, but in practice you almost never want global ordering — you want per-entity ordering. Two events for order_id=7 must apply in the order they happened. Events for order_id=7 and order_id=8 can run in parallel without correctness issues.

Kafka enforces this at the partition level: same partition key, same partition, in order. The inbox preserves that guarantee on the consumer side by grouping work by aggregate_id:

SELECT event_id, payload
FROM inbox
WHERE NOT processed
  AND aggregate_id = $1
ORDER BY received_at
LIMIT 1
FOR UPDATE SKIP LOCKED;

A dispatcher hashes aggregate_id to a worker slot, so all events for a given aggregate land on the same worker and run sequentially. Aggregates fan out across workers for parallelism. The throughput ceiling is one worker per aggregate, which in practice is plenty — a checkout service might have 10,000 active orders per minute but rarely more than 5 events per order.

If you skip per-aggregate ordering and just process in arrival order, you get subtle bugs: an order_cancelled event arriving slightly before its matching order_paid due to a retry will leave a paid-but-cancelled record in your DB.

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

Inbox vs outbox

The two patterns are siblings — same problem, opposite ends of the wire. A clean interview answer names both and shows when each applies:

Property Outbox Inbox
Side Producer Consumer
Goal Reliable publishing Reliable processing
Solves Dual-write problem Duplicate deliveries
Mechanism Persist outgoing event in one TX with state Persist incoming event before side effect
Read pattern Relay polls and publishes Worker polls and processes
Storage cost Linear in events sent Linear in events received

In a real architecture you use both. The producer writes domain changes and an outbox row in one local transaction; a relay ships outbox rows to Kafka; the consumer pulls from Kafka and writes to its inbox in one transaction; a worker drains the inbox idempotently. End-to-end you get at-least-once on the wire, exactly-once on the business logic — which is the best any distributed system can offer.

If the system only sends events, outbox is enough. If it only receives them from a trusted internal bus that already guarantees no duplicates (rare), inbox is overkill. Webhook receivers from third parties almost always need an inbox because you have zero control over the sender's retry policy.

Common pitfalls

The first pitfall is treating the inbox like a queue and trying to delete rows after processing. Senior engineers do this to save space, then lose the dedup property — once the row is gone, a late-arriving duplicate from a stuck consumer will sail through the ON CONFLICT check and run again. The fix is to keep processed rows around for at least the broker's maximum retention or retry window (Kafka often 7 days, SQS up to 14 days, payment webhooks sometimes 30 days), then archive to cold storage rather than hard-delete.

The second trap is forgetting that the side effect and the processed=true update must commit together. A candidate will sometimes propose: insert into inbox, call the external API, update processed=true. Three separate steps. If the process dies between step two and step three, the next worker sees processed=false, retries the API call, and now the same email goes out twice. The only safe shape is one transaction: lock the row, do work that's either local-DB-only or guarded by a downstream idempotency key, then update — all inside BEGIN ... COMMIT.

A third subtle one is payload schema drift. The inbox stores raw JSON, which feels future-proof until the producer renames a field and your worker silently null-coalesces. The fix is to version events explicitly: include a schema_version field in the payload and have the worker dispatch to a versioned handler. Replaying old events after a code change becomes a routine operation instead of an outage.

A fourth pitfall is unbounded inbox growth blowing up query latency. Without the partial index on WHERE NOT processed, a SELECT ... FOR UPDATE SKIP LOCKED scans the whole table. After a million processed rows, polling latency creeps from 2 ms to 200 ms, workers stall, and the backlog grows. A partial index keeps the hot working set small regardless of total table size.

The fifth, and the one interviewers love, is ignoring poison events. A malformed payload that throws on every retry will pin a worker forever. The fix is attempt_count plus a dead-letter table: after 5 failed attempts, move the row to inbox_dead_letter with the error, mark it processed, and alert. Without this, one bad event from a flaky producer takes down the consumer.

If you want to drill systems analyst patterns like this one until they feel like muscle memory, NAILDD ships 1,500+ interview problems across exactly this shape.

FAQ

Is the inbox pattern a Chris Richardson invention?

The pattern in its modern form was popularized by Chris Richardson on microservices.io as the receiver-side complement to the transactional outbox pattern. The underlying idea — using a uniqueness constraint to dedupe at-least-once delivery — predates the name and shows up in older literature as "message deduplication table" or "idempotent receiver". The Richardson framing is the one most interviewers expect you to reference.

Can I implement inbox without a relational database?

In principle yes — any store with a strong unique-key guarantee works. DynamoDB with a conditional PutItem on event_id, Cassandra with IF NOT EXISTS, or Redis with SETNX can all act as the dedup oracle. The catch is the transactional coupling: if your business state lives in Postgres and your inbox lives in Redis, you've reintroduced the dual-write problem you were trying to escape. Keep the inbox in the same database as the state it guards.

What's the difference between inbox and a regular idempotency-key cache?

An idempotency-key cache (Redis-style, often TTL'd) dedupes incoming requests at the API edge — typically for synchronous POST endpoints. The inbox pattern dedupes incoming events from an asynchronous bus and ties dedup to the worker checkpoint. They overlap conceptually but solve different shapes: idempotency keys are stateless and short-lived; inbox rows are durable, queryable, and feed a worker loop.

How do I size the inbox table and indexes?

Sizing depends on event volume and retention. A service receiving 100 events per second with 7-day retention holds roughly 60 million rows. Postgres handles this fine if you partition by received_at (monthly partitions, drop old ones) and keep the partial index on unprocessed rows. The active working set — only unprocessed rows — typically stays under 10,000 even at high throughput, because workers drain faster than the broker delivers.

Does the inbox pattern work with Kafka consumer offsets?

It complements them. Kafka's consumer offsets tell the broker "I've read up to position X" — they're a checkpoint for the wire, not a guarantee about your business logic. The inbox gives you the second checkpoint: "I've actually processed event Y." Commit the Kafka offset only after the inbox transaction commits. This way a crash between Kafka delivery and inbox insert causes Kafka to redeliver, the inbox dedupes, and you stay consistent.

When is the inbox pattern overkill?

For internal events on a trusted bus that already guarantees exactly-once semantics (some configurations of Kafka transactions, Pulsar with effectively-once), or for read-only side effects where duplicate processing is harmless. If the worst-case impact of a duplicate is re-rendering a cache entry, you can skip the inbox. For anything involving money, messaging, or external state, build the inbox from day one — retrofitting it after a duplicate-charge incident is painful.