May 20, 2026·13 min read

Cassandra at the Data Engineering Interview

Q: When would you *not* use Cassandra?

When your workload needs **multi-row transactions**, complex joins, or arbitrary ad-hoc filtering, Cassandra is the wrong tool. Pick Postgres. Cassandra is also wrong as a **work queue** — the tombstone problem will eat you alive within days. If your dataset fits on a single beefy Postgres instance with read replicas, you don't need a ring.

Q: What's the read path versus write path?

Writes go to the **commitlog** for durability, then the **memtable** in memory, and asynchronously flush to **SSTables** on disk. Writes are append-only and extremely fast. Reads check the memtable first, then walk the **bloom filters** and **partition indexes** of each SSTable, then merge results in memory. Background **compaction** merges SSTables to bound read amplification. The asymmetry — fast writes, expensive reads — is why Cassandra fits write-heavy workloads and struggles with random-read OLTP.

Q: How do you size a Cassandra cluster?

Three numbers drive sizing: **write throughput**, **storage per node**, and **partition size distribution**. Rule of thumb is **5-10 TB per node** on modern NVMe. Past that, repair and bootstrap times become painful. Write throughput scales roughly linearly with node count up to gossip overhead at ~100 nodes per DC. Walk through the math in an interview — target QPS, RF, per-node disk budget, growth rate. Back-of-envelope beats reciting "use 6 nodes" with no reasoning.

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

Contents:

Why Cassandra still comes up in DE interviews
Ring architecture in one breath
Data model: query-first, not entity-first
Partition keys and the hot-partition trap
Consistency levels and the CAP triangle
Repair, hinted handoff, and what eventual really means
Common pitfalls
Related reading
FAQ

Why Cassandra still comes up in DE interviews

If you're interviewing for a data engineering role at any company shipping events at scale — DoorDash, Uber, Netflix, Apple — Cassandra (or its cousin ScyllaDB) is likely in the stack. Even teams that have migrated off keep the questions on rotation because Cassandra is a clean way to probe distributed trade-offs rather than "I used Postgres once".

The pattern is consistent across loops: an interviewer sketches a write-heavy workload — user events, IoT telemetry, payment ledger entries — and asks how you'd model it. The wrong answer talks about normalization. The right answer starts with the read path, picks a partition key that bounds the row count, and names a consistency level with the trade-off attached. Get that sequence right and you sound like someone who has on-called a cluster.

This guide assumes you've touched Cassandra once or twice and want to sound credible on the loop. The structure below mirrors the order interviewers walk you through.

Ring architecture in one breath

Cassandra is a masterless, distributed wide-column store. Every node is a peer. There is no primary, no leader election in the Raft sense, no single point of failure for writes. Nodes form a logical ring, and each one owns a contiguous range of token hashes. When a write comes in, the coordinator hashes the partition key, looks up which nodes own that token range, and forwards the write to all replicas in parallel.

The replication factor — RF=3 is the standard production setting — controls how many copies of every row exist across the ring. Multi-datacenter replication is configured per keyspace, which is why Cassandra is still the default choice for teams that need active-active across regions without writing their own conflict resolution.

Ring topology — 6 nodes, RF=3.
Each row hashed to a token, lands on 3 consecutive replicas.
No leader. Any node can serve any read or write.

Load-bearing trick: if you remember nothing else, remember that Cassandra trades consistency for availability and partition tolerance. That single sentence answers half the follow-ups.

Data model: query-first, not entity-first

This is where most candidates lose points. In Postgres or MySQL, you model entities, normalize, then write whatever query you need. In Cassandra, you do the opposite — you write down the queries the application will run, then build one table per query pattern. Denormalization is not a sin here, it's the design.

A typical table for an event stream looks like this:

CREATE TABLE events_by_user (
  user_id       UUID,
  event_time    TIMESTAMP,
  event_type    TEXT,
  payload       TEXT,
  PRIMARY KEY (user_id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);

The primary key has two parts. user_id is the partition key — it decides which physical node owns the row. event_time is the clustering column — it decides the sort order within that partition. A query like SELECT * FROM events_by_user WHERE user_id = ? LIMIT 100 is a single-partition read, which is what Cassandra is optimized for: hits one replica set, returns sorted rows, done in single-digit milliseconds.

If your application also needs "all events of type X across all users", you do not add a secondary index. You build a second table — events_by_type — with a different partition key. The cost is write amplification and storage; the win is that every read stays a single-partition lookup. Interviewers love when you call this out explicitly without being prompted.

Partition keys and the hot-partition trap

Partition key choice is the single most consequential design decision, and it's where senior interviewers separate the wheat from the chaff. A bad partition key produces hot partitions: one row in your cluster with millions of clustering rows underneath it, pinning load to a handful of nodes while the rest of the ring sits idle.

SELECT * FROM events WHERE user_id = '...';      -- single partition, fast
SELECT * FROM events WHERE event_type = '...';   -- ALLOW FILTERING, full scan, anti-pattern

The classic fix is time bucketing. Instead of PRIMARY KEY (user_id, event_time), you use PRIMARY KEY ((user_id, day_bucket), event_time). Each user's events are now sharded across N buckets — typically one per day or one per hour for very active users. The trade-off is that "give me the last 30 days for this user" becomes 30 single-partition reads instead of one, but each read stays bounded.

Partition strategy	Read pattern	Hot partition risk	When to use
`(user_id)`	Single read per user	High if some users have millions of events	Small, bounded per-key event counts
`((user_id, day_bucket))`	N reads per range	Low — bounded by bucket size	Time-series events, IoT, audit logs
`((tenant_id, shard))` with random shard 0-15	Fan-out N reads	Low — explicit load spread	Multi-tenant SaaS, write-heavy fanout

A rule of thumb interviewers expect you to cite: keep partitions under 100 MB and under 100,000 rows. Past that, repair gets painful, compaction stalls, and read latency tails blow up. If a candidate says "I'd pick user_id as the partition key" without asking about the event volume per user, the interviewer is already writing a "no hire" note.

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

Consistency levels and the CAP triangle

Cassandra exposes tunable consistency per query, which is one of its most-asked interview topics because it forces you to articulate the CAP trade-off in concrete terms.

The headline levels you must be able to name and justify:

ONE — one replica acknowledges. Fastest, but you can read stale data if the replica missed a recent write. Used for telemetry where loss tolerance is high.
QUORUM — RF/2 + 1 replicas acknowledge. With RF=3, that's 2 nodes. The sweet spot for most production workloads — strong-enough consistency with one-node fault tolerance.
ALL — every replica must respond. Strong consistency, but a single slow node tanks your latency. Almost never the right default.
LOCAL_QUORUM — quorum within the local datacenter only. The right answer for multi-DC deployments where you want low latency without waiting on cross-region acks.
EACH_QUORUM — quorum in every datacenter. Used for writes that must survive a full DC loss with strong consistency guarantees.

Sanity check: read consistency + write consistency must satisfy R + W > RF to guarantee strong consistency. With RF=3, that's the famous QUORUM read + QUORUM write combination. Memorize the inequality — it shows up in at least 60% of senior-level DE loops.

The follow-up interviewers like is: "You're running LOCAL_QUORUM in two DCs. The cross-DC link goes down. What happens?" The answer: local writes keep succeeding, cross-DC replication queues up via hinted handoff, and once the link recovers the hints replay. Reads from each DC continue returning data, but the two DCs can diverge until repair runs. That's eventual consistency in production, not in a textbook.

Repair, hinted handoff, and what eventual really means

Because every node accepts writes independently and forwards async, replicas drift. Cassandra has three mechanisms to reconcile them, and a senior DE is expected to explain when each one fires.

Hinted handoff is the first line of defense. When a coordinator can't reach a replica (node down, network blip), it stores a hint locally for up to 3 hours by default. When the missing replica comes back, the coordinator replays the hints. This is why short outages don't require manual intervention.

Read repair fires opportunistically. When a read at consistency level QUORUM returns mismatched data from the replicas it queried, Cassandra writes the newest version back to the lagging replicas in the background. This is also why your read latency can spike for the first few reads after a node returns to the ring.

Anti-entropy repair is the heavy weapon. Run via nodetool repair, it builds Merkle trees of the data on each replica, compares them, and streams missing or stale ranges between nodes. The standard cadence is once per week per node, scheduled with Reaper or the Cassandra 4.x incremental repair feature. Skip this and you risk resurrected deletes — a deleted row coming back to life after the tombstone TTL expires on a replica that never received the delete.

nodetool repair -pr           # primary range only, what you usually want
nodetool repair --full        # full repair, painful at scale

The interviewer's favorite trap question: "What's gc_grace_seconds and why does it matter?" The answer is that tombstones must outlive your repair cycle. Default is 864000 seconds (10 days). If you run repair less often than that, deletes can resurrect. Knowing this off the top of your head signals you've actually operated a cluster.

Common pitfalls

The most common failure mode in interviews is treating Cassandra like a relational database with a weird syntax. Candidates reach for secondary indexes to enable arbitrary filters, then walk into the trap when the interviewer asks about read latency — secondary indexes in Cassandra are local to each node, so a query that filters on one fans out across the entire ring and tail latency goes to seconds. The fix is to model a second table with the partition key the query needs.

A related mistake is using ALLOW FILTERING in production code. It exists for one-off exploratory queries, not application paths. If you reach for it, your data model is wrong — add the query-shaped table instead. Mentioning this proactively signals you understand the philosophy, not just the syntax.

The tombstone explosion trap catches even experienced candidates. Cassandra marks deletes with tombstones, and a partition with thousands of tombstones makes reads slow even though "nothing is there". This happens most often with queues-on-Cassandra — people use it as a work queue, write rows, delete them, write more, and within days every read scans a graveyard. The lesson is that Cassandra is not a queue; use Kafka or SQS for that workload. If the schema design implies a queue, push back on the design.

Another classic is misunderstanding lightweight transactions (LWT). Cassandra supports IF NOT EXISTS and IF condition using Paxos, but each LWT requires four round-trips between replicas, so latency is roughly 4x a regular write. Candidates who say "I'd use LWT to handle race conditions" without acknowledging the cost get marked down. The senior move is to design idempotent writes that don't need LWT in the first place, and reserve LWT for genuinely contended state transitions — account creation, idempotency keys, leader election.

Finally, candidates often conflate consistency with isolation. Cassandra has no transactional isolation in the SQL sense. Two writes to the same partition arrive in last-write-wins order, decided by client timestamps. If clocks skew, the wrong write can win silently. Mitigate by centralizing timestamp generation or using LWT for the small set of writes where ordering must be deterministic.

If you want to drill DE interview patterns like this every day across NoSQL, SQL, distributed systems, and pipelines, NAILDD is launching with 1500+ problems shaped by exactly these loops.

FAQ

How is Cassandra different from DynamoDB or ScyllaDB?

All three are wide-column distributed stores in the Dynamo lineage, so the data model and partition-key thinking carry across them cleanly. DynamoDB is a fully managed AWS service with per-request pricing, great for unpredictable workloads but expensive at steady high QPS. ScyllaDB is a C++ rewrite of Cassandra with the same CQL interface and 3-10x better per-node throughput, popular at Discord and similar shops that hit Cassandra's JVM ceiling.

When would you not use Cassandra?

When your workload needs multi-row transactions, complex joins, or arbitrary ad-hoc filtering, Cassandra is the wrong tool. Pick Postgres. Cassandra is also wrong as a work queue — the tombstone problem will eat you alive within days. If your dataset fits on a single beefy Postgres instance with read replicas, you don't need a ring.

What's the read path versus write path?

Writes go to the commitlog for durability, then the memtable in memory, and asynchronously flush to SSTables on disk. Writes are append-only and extremely fast. Reads check the memtable first, then walk the bloom filters and partition indexes of each SSTable, then merge results in memory. Background compaction merges SSTables to bound read amplification. The asymmetry — fast writes, expensive reads — is why Cassandra fits write-heavy workloads and struggles with random-read OLTP.

How do you size a Cassandra cluster?

Three numbers drive sizing: write throughput, storage per node, and partition size distribution. Rule of thumb is 5-10 TB per node on modern NVMe. Past that, repair and bootstrap times become painful. Write throughput scales roughly linearly with node count up to gossip overhead at ~100 nodes per DC. Walk through the math in an interview — target QPS, RF, per-node disk budget, growth rate. Back-of-envelope beats reciting "use 6 nodes" with no reasoning.

Is Cassandra still relevant in 2026?

Yes, for the workloads it was built for — high-write, multi-DC, predictable-query-pattern systems. Apple, Netflix, Uber, and Discord still operate large rings. Greenfield projects increasingly start on ScyllaDB or managed DynamoDB, but the Cassandra mental model — query-first design, partition keys, tunable consistency, eventual reconciliation — is the foundation for the entire wide-column family.