Cache strategies for the SA interview
Contents:
Why caching shows up in every SA interview
Every systems analyst loop at Amazon, Stripe, DoorDash, or Uber ends with a design round that squeezes on the same lever: the database is the bottleneck, so put something in front of it. The interviewer is not testing whether you memorized Redis docs — they are testing whether you can name a cache pattern, defend why you picked it, and reason about what breaks when it fails.
Caching is load-bearing for the SA role because a 90% cache hit rate gives you 10x effective database capacity without a schema change, a read replica, or a shard. Every cache pattern trades something — freshness, durability, write latency, operational complexity — and the interviewer wants to hear you make that trade out loud.
Load-bearing trick: when the interviewer asks "how would you cache this?", the first sentence out of your mouth should name the pattern (cache-aside, write-through, write-back, write-around) and the second should name what you are giving up.
The same loop also tests whether you can sketch the eviction policy without hesitating — LRU is the default, but knowing when LFU or FIFO wins is the differentiator.
The four patterns side by side
There are four patterns the SA panel will accept as a real answer. Memorize them, and more importantly, memorize the failure mode of each one — that is what the senior interviewer probes when they say "okay, what if the cache node restarts?"
| Pattern | Write path | Read path | Consistency | Best fit | Failure mode |
|---|---|---|---|---|---|
| Cache-aside (lazy load) | App writes to DB, then deletes or updates cache key | App checks cache, on miss reads DB and populates cache | Eventual; brief stale window | General-purpose, read-heavy with tolerable staleness | Cache miss storm on cold start or after flush |
| Write-through | App writes to cache, cache synchronously writes to DB | App reads from cache | Strong (cache and DB in lockstep) | Reference data, profiles, configs | Write latency = cache + DB combined |
| Write-back (write-behind) | App writes to cache, async batch flushes to DB | App reads from cache | Weak; window where cache leads DB | High-write, loss-tolerant counters and metrics | Cache node crash = lost writes |
| Write-around | App writes directly to DB, cache is not touched on write | App reads on cache miss as in cache-aside | Eventual; cache never sees write-only data | Write-once-read-rarely (logs, audit) | Cold reads always hit DB |
The pattern most teams default to is cache-aside, and the interviewer knows it. The premium answer is to start with cache-aside and then say where you would deviate. For a marketplace catalog where prices update from a merchant tool, cache-aside with event-driven invalidation is the right call. For a session store at Notion or Linear, write-through is safer because a stale session token is a security problem. For a real-time leaderboard, write-back makes sense because you can tolerate losing the last 200 millis of score updates if the cache crashes.
Eviction policies that the interviewer expects
Once the panel is satisfied with the pattern, the follow-up is always "and how do you decide what to evict?" The eviction policy is the second knob. Get this wrong and your 95% hit rate degrades to 60% under load — same hardware, same data, different policy.
| Policy | Rule | Wins when | Loses when |
|---|---|---|---|
| LRU (least recently used) | Evict the item touched longest ago | Working set has clear recency bias — user feeds, product detail pages | Scan workloads pollute cache and evict hot keys |
| LFU (least frequently used) | Evict the item with the lowest hit count | Long-tail content with stable popularity — top 1% of products | Trending content struggles to enter the cache |
| FIFO (first in, first out) | Evict the oldest entry regardless of access | Time-bounded data — TTL caches, queues | Hot keys with high recency get dropped early |
| Random | Evict any entry at random | Memory-constrained systems where bookkeeping cost is the bottleneck | Almost never the right answer in interviews |
Redis ships with allkeys-lru as the default, and that is right 80% of the time. The 20% where you want allkeys-lfu is a Pareto distribution — a marketplace where the top 10,000 SKUs out of 50 million account for 95% of pageviews. LRU lets a one-time Googlebot crawl evict your hot set; LFU pins it.
Sanity check: if the interviewer asks "what happens when your cache fills up?", your answer must mention an eviction policy by name, not "Redis handles it."
TTL, invalidation, and cache stampede
The Phil Karlton quote — "there are only two hard things in computer science: cache invalidation and naming things" — exists because invalidation is where every team eventually gets paged at 3am. Two strategies an SA should compare in one breath:
Time-to-live (TTL). Every key auto-expires after N seconds. Simple, no coordination, staleness bounded by N. The cost: you accept staleness up to N seconds for every key, even ones that did not change. TTL is the right call when the source of truth has no event stream — a third-party FX API where you cache the rate for 60 seconds because that is the upstream guarantee anyway.
Event-driven invalidation. When the source of truth changes, the writer emits a Kafka event or webhook that deletes the cache keys. Staleness collapses to message latency, often under 100 milliseconds. The cost is operational — every writer must publish the invalidation event, or you get split-brain.
The third thing the interviewer will push on is cache stampede (thundering herd). When a hot key expires, every concurrent reader misses simultaneously, and the database takes a synchronized flood of identical queries. At Stripe scale a single hot key stampede can take down a read replica. Mitigations:
- Jittered TTL — TTL=300s ±30s of random noise. Spreads expirations so they do not align.
- Refresh-ahead — a background worker refreshes a key before it expires, so readers never miss on hot keys.
- Single-flight lock — the first miss takes a lock and writes; concurrent readers wait or serve stale. Redlock or memcached add-and-get patterns work here.
- Probabilistic early expiration — readers refresh a key with a probability that grows as the key approaches its TTL.
Worked example: product catalog at a marketplace
The interviewer says: "Design caching for a marketplace catalog. 50 million SKUs, 200,000 RPS peak, merchants update prices every few minutes."
The read pattern is power-law — the top 1% of SKUs is 90% of traffic. Cold reads on the long tail are acceptable at 50-100ms; hot reads must be under 5ms p99. Writes are infrequent per SKU but globally hundreds per second.
Pattern: cache-aside with event-driven invalidation. The merchant tool publishes a price.updated event to Kafka; a consumer deletes the cache key for that SKU. Reads go through cache-aside — on miss, populate with a 1-hour TTL as a backstop in case an event is lost.
Eviction: allkeys-lfu. LRU would let crawler traffic evict hot SKUs; LFU pins them.
Stampede mitigation: single-flight lock on misses for any SKU in the top-10,000 hot set (precomputed daily). For long-tail SKUs the stampede risk is negligible.
Sizing: Redis cluster, 6 shards, replication factor 2, 200GB total memory for the top 10 million SKUs at 20KB per entry. Long-tail traffic is 5% of total RPS and absorbable by the database.
# Cache-aside read with single-flight protection
import redis
import json
import time
r = redis.Redis(host="cache", decode_responses=True)
LOCK_TTL = 3 # seconds
TTL_BASE = 3600
TTL_JITTER = 600
def get_product(sku: str) -> dict:
key = f"sku:{sku}"
cached = r.get(key)
if cached:
return json.loads(cached)
lock_key = f"lock:{key}"
if r.set(lock_key, "1", nx=True, ex=LOCK_TTL):
try:
product = fetch_from_db(sku)
ttl = TTL_BASE + random_jitter(TTL_JITTER)
r.set(key, json.dumps(product), ex=ttl)
return product
finally:
r.delete(lock_key)
else:
# Another worker is refreshing — back off briefly and retry cache
time.sleep(0.05)
cached = r.get(key)
if cached:
return json.loads(cached)
return fetch_from_db(sku)Sizing the cache to the hot set, not the full dataset, is what the panel wants to hear.
Common pitfalls
The mistake that costs the most candidates an offer is picking write-through when the workload is write-heavy. Write-through doubles your write latency because every write hits the cache and the database synchronously. If the interviewer says "we write 50,000 events per second" and you say "write-through", you just told them you would tank p99 write latency. The fix is to recognize that write-through is for reference data — configs, profiles, things written once and read a million times — not for transactional throughput.
Another trap is forgetting cache-aside has a race window. When two writers update the same row and both delete the cache key, a reader that fires in between can populate the cache with a stale value that persists for the full TTL. Delete the key after the database write, or use a versioned key scheme where each write bumps a version number. Naming this race is a strong signal you have run a cache in production.
A third pitfall is assuming Redis is durable. Redis is in-memory; AOF and RDB give you replay-on-restart, but they are async. With write-back, a cache crash before flush loses writes. Junior candidates use write-back for counters and stop; senior candidates add "and I would accept a 5-second loss window because these are pageview counters, not financial transactions". The reasoning about acceptable loss is what the panel wants.
The fourth pitfall is measuring hit rate but ignoring tail latency. A 95% hit rate sounds great, but if the 5% of misses hit a database at 80% CPU, your p99 read latency is bound by database performance. Look at the latency distribution conditioned on cache miss — at Snowflake or Databricks scale, the miss path is sized to handle the full QPS budget even though it serves 5% of traffic, because the alternative is cascading failure on cache restart.
The fifth pitfall is forgetting cache warmup. Deploy a fresh Redis cluster on Monday at 9am with zero keys, and the database takes the full unfiltered RPS until it warms. This is a known marketplace incident pattern. Warm the cache before traffic shifts: precompute the top 10,000 hot keys and load them, or canary drain the old cluster so the new one fills under controlled load.
Related reading
- Kafka for the systems analyst interview
- Case interview for systems analyst
- Acceptance criteria given-when-then for systems analyst
Drill SA design questions like this every day on NAILDD — structured prep for the systems analyst loop.
FAQ
Is cache-aside always the safe default?
Cache-aside is the safe default for read-heavy, latency-tolerant workloads, which covers maybe 70% of interview scenarios. It is not safe for write-heavy transactional systems, where write-through or a different architecture fits, and it is not safe for security-sensitive data like session tokens where the staleness window is a vulnerability. Defend it by stating the read/write ratio first — "roughly 100:1 read-to-write" — and then saying cache-aside is the right pattern because of that ratio.
How do I pick TTL versus event-driven invalidation?
TTL is appropriate when the source of truth has no reliable event stream, when staleness up to the TTL window is acceptable, or when you want a simple system you can debug at 3am. Event-driven invalidation is appropriate when staleness of more than a few seconds breaks the UX or violates a regulatory constraint, and when you already have Kafka or a webhook system. Many production systems combine both — events for primary invalidation, plus a long TTL as a safety net for missed events.
What is the difference between write-through and write-around?
Write-through writes to cache and database on every write, so the cache always has the latest value and the read path stays fast for newly written data. Write-around writes only to the database and skips the cache, so newly written data must be pulled in on a subsequent read. Write-around fits when data is written once but read rarely — audit logs, infrequent batch outputs — because populating the cache on every write would waste memory on data nobody reads.
When does LFU beat LRU in practice?
LFU beats LRU when the access pattern is heavy-tailed and stable, and when scan or crawler traffic is a significant fraction of reads. Marketplaces, social feeds, and recommendation systems have a stable top-1% set that LFU pins effectively. Recency-driven workloads — user session caches, per-user feature flag lookups — benefit more from LRU. Production systems often use TinyLFU or W-TinyLFU, hybrids that combine LFU pinning with LRU responsiveness.
How big should my cache be?
Size the cache to your hot set, not your full dataset. The hot set is the smallest subset of keys that absorbs your target hit rate — typically the top 1% to 10% of keys absorb 90% to 99% of reads. Calculate one entry's byte size (Redis overhead is roughly 80 bytes per key), multiply by the hot-set count, and add 30% headroom for write spikes and replication. Caches sized to the full dataset are over-provisioned; hot-set sizing hits the same 95% hit rate at a fraction of the cost.