Data modeling approaches for the SA interview

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

Why panels grill SA candidates on data modeling

When a systems analyst job description says "strong data modeling skills", the panel wants to test whether you can sit between a product manager and a backend engineer without translating gibberish in either direction. The PM hands you a half-formed idea — "customers should be able to gift subscriptions" — and you need to leave the meeting with entities, cardinalities, and a working draft of the schema that the engineer can implement without rewriting it three times.

In practice, the panel checks whether you understand the three levels of abstraction (conceptual, logical, physical), where the boundary sits between an SA and a DBA, and what happens when you skip a level. Skipping the logical model is the most expensive mistake on a new project: the team writes CREATE TABLE statements before anyone agrees on what an Order even is, and six months later there are three definitions of Customer across services. That last part is what separates a mid-level answer from a senior one.

The three levels: conceptual, logical, physical

The mental model the interviewer checks for is a flow that adds detail at each step:

Conceptual (business) -> Logical (structure) -> Physical (engine)

Each level has a different audience. Conceptual is for the product owner. Logical is for engineers and analysts who need to reason about data without caring about Postgres versus Snowflake. Physical is for the team creating tables and tuning queries. Cross the boundary too early and you confuse the audience; stay too high-level and the engineer can't ship.

Level Audience Includes Excludes Typical tool
Conceptual Product, business Entities, relationships, business rules Attributes, types, keys Whiteboard, Miro, dbdiagram.io
Logical Engineers, analysts Attributes, PKs/FKs, cardinality, normal form Indexes, partitions, engine ERwin, draw.io, dbdiagram.io
Physical Backend, DBA Datatypes, indexes, partitions, constraints Business semantics DDL scripts, schema migration files

Load-bearing trick: if the interviewer asks "what changes between logical and physical for the same Customer entity?", the answer they want is datatypes, indexes, constraints, and partitioning — not the entity itself. The shape stays. The implementation gets opinionated.

Conceptual model

The conceptual model is the cheapest artifact you'll produce, and the most underrated. It is a small picture with boxes for entities and lines for relationships, no attributes, no types, no keys. The point is to lock in agreement on which nouns exist in this business before anyone argues about VARCHAR lengths.

[Customer] -- places --> [Order] -- contains --> [Product]
                              |
                              v
                          [Payment]

A good conceptual model is something a non-technical PM can read and sign off on. If your PM asks "wait, why is Subscription a separate entity from Order?", you have just saved a week of sprint planning arguments.

Sanity check: if a system has more than 15-20 conceptual entities, you've started smuggling implementation details into the model. Try again at a higher altitude.

Logical model

The logical model is where SA work pays for itself. You add attributes, primary keys, foreign keys, and cardinalities, and you commit to a normal form — almost always 3NF for transactional systems. You still do not commit to a database engine, and you do not specify indexes.

Customer
  - id              (PK)
  - email           (UNIQUE, NOT NULL)
  - full_name
  - created_at

Order
  - id              (PK)
  - customer_id     (FK -> Customer.id)
  - total_amount
  - currency
  - placed_at

Two cardinality questions decide most logical models: is the relationship 1:1, 1:N, or M:N, and is the child side optional or mandatory? Crow's foot makes the answer scannable — a single bar means one, a fork means many, a circle means optional. Four symbols cover 90% of ER diagrams in production codebases.

The logical model is also where you fight for naming consistency: id as the surrogate primary key, {entity}_id for foreign keys, _at for timestamps, is_ prefix for booleans. Sounds trivial; it is exactly what the panel grades.

Physical model

The physical model is the first time you commit to an engine — Postgres, MySQL, Snowflake, BigQuery, Oracle. Now datatypes get specific, indexes get planned, partitioning strategies appear, and engine-specific features (TIMESTAMPTZ, JSONB, ARRAY, MATERIALIZED VIEW) enter the picture.

CREATE TABLE customer (
  id          BIGSERIAL PRIMARY KEY,
  email       CITEXT UNIQUE NOT NULL,
  full_name   TEXT NOT NULL,
  created_at  TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_customer_email_lower
  ON customer (LOWER(email));

CREATE TABLE customer_order (
  id            BIGSERIAL PRIMARY KEY,
  customer_id   BIGINT NOT NULL REFERENCES customer(id) ON DELETE RESTRICT,
  total_amount  NUMERIC(12,2) NOT NULL,
  currency      CHAR(3) NOT NULL,
  placed_at     TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (placed_at);

Engine-specific decisions stack up: BIGSERIAL over INT because the table grows past two billion rows; CITEXT for case-insensitive email; ON DELETE RESTRICT instead of CASCADE so a vanishing customer doesn't drop all their orders; PARTITION BY RANGE (placed_at) because the table will be huge and you need pruning at query time.

Most SA candidates don't need to write physical DDL on the whiteboard, but they should be able to read it and challenge a backend engineer's choices. If the engineer proposes VARCHAR(20) for currency, push back — ISO 4217 is fixed at three letters, CHAR(3) is correct.

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

ER notations you should recognize

Three notations dominate interview discussions. Crow's foot is the default in modern tools; Chen is academic and rare in production; UML class diagrams show up when the team is using UML for everything else and wants visual consistency.

Notation Strengths Weaknesses Where you see it
Crow's foot Compact, scannable, cardinality at a glance No attribute detail in classical form dbdiagram.io, draw.io, most ERwin diagrams
Chen Explicit attributes via ovals, classical academic style Verbose, hard to scale beyond 10 entities Textbooks, university courses
UML class Reuses UML conventions, supports methods Heavyweight for pure data modeling Teams already standardized on UML

The question that catches candidates off guard is "how do you show a many-to-many relationship?" You don't, directly — you introduce a junction table (associative entity, link table). Student and Course are M:N, so you create Enrollment with student_id, course_id, and probably enrolled_at. The junction table earns its keep the moment you add an attribute to the relationship itself — like a grade.

Normalization, from 1NF to BCNF

Normalization is the part of data modeling that interviewers love because it has crisp rules and clear violations. The panel does not need you to recite Codd verbatim, but you should be able to spot a 2NF violation in a sample table and propose a fix in under a minute.

Normal form Rule What it kills
1NF Atomic values, no repeating groups Comma-separated lists in cells, JSON blobs masquerading as columns
2NF 1NF plus no partial dependencies on composite PK Attributes that only depend on part of a multi-column key
3NF 2NF plus no transitive dependencies customer_id, customer_email, customer_country, country_code
BCNF Stricter 3NF — every determinant is a candidate key Edge cases where 3NF still permits anomalies

A 2NF violation in the wild: an order_item table with composite PK (order_id, product_id) and a column product_name that depends only on product_id — move it to the product table. A 3NF violation: a customer table with both city_id and city_name, where city_name is transitively dependent through city_id and belongs in a city lookup table.

4NF, 5NF, and DKNF exist and you can mention them, but unless the role is specifically about data warehouse design or research, 3NF or BCNF is the realistic target for OLTP systems.

When to denormalize

Senior SA answers always include a section the junior answers skip: when normalization is the wrong default. In an analytical warehouse or a read-heavy reporting workload, 3NF will force expensive joins on every query, and the right answer is some flavor of star schema — one wide fact table, several dimension tables, intentional duplication of dimensional attributes for read speed.

Gotcha: denormalization is not "ignoring normal forms because we're lazy". It's a deliberate trade — write complexity for read speed. If you can't articulate the trade, you haven't earned it.

The classic case: an e-commerce cube with a fact_orders table containing customer_country, customer_segment, product_category denormalized in. The country lives in dim_customer too, but you accept the redundancy because a 200ms dashboard beats a 2-second join every time.

Common pitfalls

The first pitfall is building the physical model before the logical one because the team is in a hurry. CREATE TABLE statements feel productive, but implicit semantics — what does amount mean, gross or net, before or after tax? — end up embedded in the schema and nowhere else. The fix is to slow down for a single afternoon and produce the logical model first, even on a whiteboard.

The second pitfall is treating cardinality as obvious without confirming it with the business. A reviewer asks "can one customer have multiple addresses?" and the team says "of course, 1:N" — but the legal system says one customer has exactly one billing address. The right model has a 1:1 link to billing address plus a 1:N link to shipping addresses. Cardinality questions are where SA work creates real value.

The third pitfall is picking the wrong primary key. Natural keys — email, phone_number, tax_id — feel meaningful but break the moment a customer changes their email. Always default to surrogate keys (id BIGSERIAL or UUID), and put a UNIQUE constraint on the natural key separately. The interviewer wants the phrase "surrogate key plus unique constraint on the business key".

The fourth pitfall is ignoring audit columns until production. Every important table should have created_at, updated_at, and often deleted_at from day one. Retrofitting these across a hundred tables when compliance asks for an audit trail is one of the most expensive migrations in any company's history.

The fifth pitfall is over-engineering for scale you don't have. A startup that partitions every table on day one because "we'll have a billion rows soon" spends six months on infrastructure and never gets to product-market fit. Build for 10x your current size, refactor when you hit 100x.

If you want to drill SA modeling questions every day, NAILDD is launching with hundreds of interview problems — conceptual sketching, ER review, and normal-form spotting under interview pressure.

FAQ

Who actually owns the physical model, the SA or the DBA?

In most modern teams, the SA owns the conceptual and logical models and contributes strongly to the physical one, while the DBA (or backend engineer on smaller teams) owns the final DDL, indexing strategy, and performance tuning. The interview answer the panel wants is a clean handoff: SA produces a logical model the engineer can implement without rework, then both sides review the physical schema together.

How deep into normal forms should I go in an SA interview?

You should be fluent at 1NF, 2NF, and 3NF, and able to mention BCNF with a one-sentence explanation of how it tightens 3NF. 4NF, 5NF, and DKNF are name-drops, not deep dives — knowing they exist and govern multi-valued and join dependencies is enough. If the interviewer pushes past that, they're testing your honesty (say "I'd look it up before designing against it") or the role is genuinely a database research position.

Conceptual versus logical — is there a real difference, or is it just academic?

Real difference, and skipping the conceptual step causes most scope arguments mid-project. The conceptual model is the contract with the business: which nouns exist, which relationships matter, what's in scope. The logical model is the contract with the engineers: what attributes those nouns carry, what keys link them, what shape the data takes. Mix them and you get a diagram too detailed for the PM and too vague for the developer.

When do I introduce a junction table for many-to-many?

As soon as you spot M:N in the conceptual model. The junction table — Enrollment between Student and Course, OrderItem between Order and Product — is not just plumbing. It is the natural home for attributes of the relationship itself: grade, quantity, discount, enrolled_at. Beginners try to stuff those attributes onto one of the two sides and end up with awkward arrays.

How do I handle slowly changing dimensions in a model?

In OLTP systems, you usually don't — you update the row and live with the loss of history. In a warehouse, you use SCD Type 2: keep the old row, mark it inactive with valid_to, insert a new row with valid_from. The interviewer wants to hear that you know SCD Type 1 (overwrite), Type 2 (history), and Type 3 (one prior value) exist, and that you pick based on whether the business needs to ask "what was the customer's country last March?".