Idempotency keys for the systems analyst interview
Contents:
Why interviewers ask about idempotency
You are on the systems-analyst loop at Stripe, the interviewer draws a POST /payments box on the whiteboard and asks the question that ends a third of candidates: "the client times out mid-request and retries — what stops us from charging the card twice?" If your answer is "we just don't retry POST", you are out. The honest answer is idempotency keys, and the interviewer wants you to draw the table, the lookup, and the race condition without prompting.
Idempotency sounds academic until you watch a $4,200 wire transfer get duplicated because a mobile client lost signal for 800ms and the retry policy was set to "always". It is the single load-bearing concept behind safe POST APIs at every payment provider, order system, and webhook pipeline in the industry. Interviewers reach for it because it is specific, falsifiable, and exposes how candidates think about partial failures — the thing systems analysts are paid to think about.
This post walks through what idempotency means in HTTP, the header on the wire, the dedupe table, race conditions, and the follow-ups interviewers reliably ask next.
The core idea
By the HTTP spec, GET, PUT, and DELETE are naturally idempotent — calling them N times has the same effect as calling them once. POST is the dangerous one: by definition, each POST /payments creates a new resource. Retry it after a timeout and you create a second payment. The retry was meant to recover from the network blip, not to charge the customer twice.
An idempotency key is a client-generated unique identifier sent with the request, usually as the Idempotency-Key HTTP header. The server stores the key alongside the response on first success. When the same key shows up again — because the client retried — the server returns the cached response instead of re-executing the side effect. The retry becomes safe. The customer gets charged once.
Load-bearing rule: the key is generated once per logical operation on the client, then reused across every retry until the server confirms. Generate a new UUID on each retry and you have rebuilt nothing — every attempt looks like a brand-new payment.
Here is what a single request and its retry look like on the wire:
POST /v1/payments
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
Content-Type: application/json
{"amount_usd": 100, "card_token": "tok_xyz"}
→ 201 Created
{"payment_id": 42, "status": "succeeded"}
# network blip — client never saw the 201, retries with the SAME key
POST /v1/payments
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
→ 201 Created
{"payment_id": 42, "status": "succeeded"} # same payment, no second chargeHeader convention and key generation
The header name Idempotency-Key is de facto standard — Stripe popularized it, and PayPal, Adyen, Square, and the IETF draft RFC converged on the same name. Some legacy APIs use X-Idempotency-Key; default to the unprefixed version. The value is opaque to the server: a unique string under a reasonable length cap (Stripe caps at 255 chars).
The common format is a v4 UUID, but any sufficiently random unique string works — ULIDs, KSUIDs, or a hash of (user_id, client_seq, operation_type). Avoid sequential integers from the client: a buggy client that resets to 1 will collide with another user's keys.
Generation happens on the client, before the first send attempt. The key is persisted in client storage until the server returns a definitive response — a 2xx or a deterministic 4xx like 422, not a 5xx. Drop it too early and a network retry on the next app launch looks like a new operation.
| Provider | Header name | Recommended format | TTL window |
|---|---|---|---|
| Stripe | Idempotency-Key |
UUID or random 32+ chars | 24 hours |
| PayPal | PayPal-Request-Id |
UUID v4 | 6 hours |
| Adyen | Idempotency-Key |
UUID or merchant reference | minutes to hours |
| Square | Idempotency-Key |
UUID v4 | 24 hours |
Notice that two of the four don't even agree on the header name — interviewers love to ask "what header does PayPal use?" as a trivia question to see if you have actually integrated a real API.
Server-side dedupe table
The server side is where most candidates get vague. The textbook implementation is a single dedupe table keyed by (scope, idempotency_key). Scope is critical: a key is unique per merchant or per user, not globally — otherwise two merchants generating the same UUID would collide.
CREATE TABLE idempotency_keys (
scope_id BIGINT NOT NULL, -- merchant_id OR user_id
key TEXT NOT NULL,
request_hash TEXT NOT NULL, -- SHA-256 of normalized request body
response_body JSONB, -- cached response payload
status_code INT, -- cached HTTP status
state TEXT NOT NULL, -- 'in_progress' | 'completed'
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
completed_at TIMESTAMPTZ,
PRIMARY KEY (scope_id, key)
);
CREATE INDEX idx_idem_created ON idempotency_keys (created_at);The request flow is:
1. Read Idempotency-Key from header.
2. Compute request_hash = SHA-256(normalized_body).
3. INSERT (scope, key, hash, state='in_progress') ON CONFLICT DO NOTHING.
4. If conflict (key already exists):
a. SELECT existing row.
b. If request_hash differs → 422 "Idempotency key reused with different body".
c. If state='completed' → return cached (status_code, response_body).
d. If state='in_progress' → 409 Conflict or wait-and-poll.
5. If no conflict (we own the row):
a. Execute the side effect (charge the card, write the order, etc.).
b. UPDATE row with response, status_code, state='completed'.
c. Return the response.The request_hash check is the part candidates forget. Without it, a buggy client could send the same key K with body {"amount": 100} then {"amount": 10000} and silently get the old $100 response back — charging nothing for the new $10,000 intent. Always hash the body and reject mismatches with 422.
Race conditions and locking
The interviewer's follow-up is always: "two retries arrive within the same millisecond — what happens?" This is where INSERT ... ON CONFLICT DO NOTHING earns its keep. Only one of the concurrent inserts wins the primary-key race; the other falls through to the "key exists" branch. The winner charges the card; the loser sees state='in_progress' and returns 409 or polls.
Sanity check: the unique constraint is doing the locking for you. If you implement it with SELECT then INSERT in two statements, you have a TOCTOU bug — both requests see "no row" and both insert.
The poll-vs-409 choice is a tradeoff. Returning 409 immediately pushes retry logic back to the client, which is already retry-aware. Polling the in-progress row is smoother for the client but ties up a server worker. Most payment APIs return 409 with a Retry-After hint.
A subtler trap: what if the server crashed between charging the card and marking the row completed? The row is stuck in in_progress forever. The fix is a stale-in-progress sweeper that flips rows older than 60 seconds to failed, plus reconciliation against the downstream ledger. This is also why idempotency keys do not replace reconciliation — they reduce the surface area, they don't eliminate it.
TTL, cleanup, and key reuse
Idempotency keys accumulate forever if you let them. The standard TTL is 24 hours to 7 days — long enough that any reasonable client retry policy has given up, short enough that the table doesn't grow to terabytes. Stripe uses 24 hours; some banks use 7 days for regulated wire transfers.
DELETE FROM idempotency_keys
WHERE created_at < NOW() - INTERVAL '7 days';After the TTL window the same key can be reused for a different logical operation — there is no row to collide with. Clients must not store keys longer than the server TTL: if the app caches a key for 30 days but the server only remembers 7, you have lost the dedupe guarantee on the long-tail retry.
| Retention | Use case | Storage cost (10M keys/day) |
|---|---|---|
| 1 hour | webhook ingestion, idempotent reads | ~400 MB |
| 24 hours | payments, order creation | ~10 GB |
| 7 days | bank transfers, regulated flows | ~70 GB |
| 30+ days | audit-driven compliance | ~300 GB+ |
For high volume, the dedupe table is often a Redis cluster with per-key TTL rather than Postgres, because the access pattern is read-by-PK, write-once, expire-after-N-hours — Redis's sweet spot. Postgres works up to a few thousand QPS; beyond that, Redis or a Cassandra-like KV store is the move.
Where it actually ships
Idempotency keys show up in four predictable places. Payment APIs — Stripe, Adyen, Square, every fintech requires it on POST. Order creation — a flaky mobile client hitting "Place order" twice should not produce two shipments. Webhook receivers — every webhook provider occasionally redelivers, so consumers dedupe by event ID. Bank transfers and money movement are the strictest: duplicate wires can be regulatory incidents.
The pattern also shows up in message-queue consumers that need exactly-once semantics on at-least-once delivery: the consumer writes (message_id, processed_at) to a dedupe table in the same transaction as the side effect, and replays become no-ops. The queue gives you at-least-once; idempotent consumers give you the rest.
Common pitfalls
The most frequent whiteboard mistake is forgetting to hash the request body. A junior candidate writes INSERT (key, response) and feels done; the senior interviewer asks "what if the client sends the same key with a different amount?" and watches the freeze. The fix is the request_hash column plus 422 on mismatch — one SHA-256 per request saves you from silently returning the wrong response.
A second pitfall is using the wrong scope. Treating the key as globally unique sounds fine until two unrelated merchants both generate the same UUID (UUID v4 collisions with bad embedded-client RNGs are not unheard of). The dedupe primary key must be (scope_id, key) — never just key alone.
A third trap is caching error responses indiscriminately. If the first attempt failed with a 500 because the database was momentarily unreachable, caching that 500 means every retry for the next 24 hours returns the cached failure even after recovery. The rule: cache 2xx and deterministic 4xx, but let 5xx fall through so the client can retry into a healthy server.
A fourth pitfall is the unbounded in-progress state. If your worker crashes mid-charge, the row sits in in_progress forever and every retry returns 409. You need a sweeper that ages out stale rows and forces reconciliation against the downstream ledger. Without it, a single crash can lock a customer out of the same operation for 24 hours.
Finally, candidates often conflate idempotent HTTP methods with idempotency keys. GET and PUT are idempotent by spec — no key needed. POST is the only method that needs a key, because it is non-idempotent by definition. If asked "do we need an idempotency key on GET?", the answer is no — and being able to explain why shows you understand the HTTP model.
Related reading
- HTTP methods and status codes — SA interview
- ETag headers — systems analyst interview
- Circuit breaker — systems analyst interview
- 2PC vs Saga — systems analyst interview
- API gateway vs BFF — systems analyst interview
- Systems analyst resume guide
If you want to drill systems-analyst questions like this until they feel boring, NAILDD ships interview-style problem sets across exactly this pattern — idempotency, retries, sagas, the whole loop.
FAQ
Is Idempotency-Key part of the HTTP standard?
Not yet, but there is an active IETF draft (draft-ietf-httpapi-idempotency-key-header) that codifies the convention Stripe and friends already use. For interviews, treat it as a strong de facto standard — every payment API on the planet implements it under that name. Pointing at the draft RFC is a nice senior-level touch.
Why not just make every endpoint idempotent without a key?
Because POST is non-idempotent by semantics, not by accident. The whole point of POST is to create a new resource — there is no natural way to distinguish "the client wants a new payment" from "the client is retrying because they didn't see the response". The key is the explicit signal that says this retry is the same logical operation, not a fresh request.
What is the difference between an idempotency key and a request ID?
A request ID is per-request, used for tracing — two retries of the same operation get different request IDs. An idempotency key is per-logical-operation — both retries share the same key. Stripe sends both: Stripe-Request-Id for tracing, Idempotency-Key for dedupe. Confusing the two is a tell that the candidate has not shipped production payments code.
How do you implement idempotency for a long-running operation that takes 30 seconds?
Return 202 Accepted immediately with an operation ID, mark the dedupe row as in_progress, and let the client poll GET /operations/:id for the eventual result. Retries with the same idempotency key during the window get 409 with Retry-After. After completion, retries return the cached 200 as usual. Same pattern Stripe uses for payouts and AWS for S3 multipart completion.
Do I need database-level idempotency on top of the API key?
Yes, defense in depth. The key dedupes at the API layer; you still want unique constraints inside the database so that even if the key check is bypassed (admin tool, replay attack, race you missed), the DB refuses to write two rows. For payments, a unique constraint on (merchant_id, external_reference) is the common second line of defense.