Acceptance criteria Given/When/Then for SA

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

Why interviewers ask about AC

Acceptance criteria are the single artefact that decides whether a feature is "done". Engineers build to them, QA tests against them, and the product owner accepts work using them. That is why interviewers for SA roles at companies like Stripe, Atlassian, Notion, and Linear open with "write the AC for a password reset flow" or "what is the difference between AC and Definition of Done".

The bad version of this skill looks like a Jira ticket that says "should work correctly on all browsers". The engineer guesses, QA cannot write a test plan, and the demo turns into a debate. The good version reads like a contract: predictable inputs, an explicit action, and a measurable outcome — written in a format that survives both the sprint planning meeting and the BDD test runner.

Load-bearing rule: an acceptance criterion is either pass or fail. If you cannot point at the system and say "this passes" or "this fails", it is not an AC — it is a wish.

What acceptance criteria are

Acceptance criteria are the conditions a feature must satisfy to be accepted by the business. They are objective, testable, and free of ambiguous adjectives. The INVEST mnemonic (Independent, Negotiable, Valuable, Estimable, Small, Testable) applies to the user story; the AC themselves carry the "Testable" half.

Good AC share five properties. Concreteness — "button is visible" is weak; "Submit button is rendered in the header on viewports >= 1024px" is testable. Testability — there is exactly one way to evaluate the criterion. Solution-agnostic — AC describe what must happen, not how; "use Redis for caching" is an implementation detail. Bounded scope — one user story carries its own AC, criteria do not leak across stories. Explicit edge cases — happy path and error paths are both listed.

It also helps to know what AC are not. A user story is the value framing. A use case is the formal interaction description, often with a UML diagram. AC sit between them: the contract that turns the story into a shippable increment.

Artefact Owner Purpose Granularity
User story PM / SA Express user value One story per outcome
Use case SA Describe interaction flow One per actor-goal pair
Acceptance criteria SA / QA Define "done" for the story 3-10 per story
Definition of Done Team Quality bar across all stories Team-wide, static

The Given/When/Then format (Gherkin)

Given/When/Then is the most structured format and the one BDD frameworks like Cucumber, behave, and pytest-bdd parse directly. The shape is always the same: a precondition, a triggering action, and an expected outcome.

Feature: User login

  Scenario: Successful login
    Given a user is registered with email "user@example.com" and password "secret123"
      And the user is on the page "/login"
    When the user enters email "user@example.com"
      And the user enters password "secret123"
      And the user clicks "Sign in"
    Then the user lands on "/dashboard"
      And the header shows the user's display name
      And localStorage contains the key "auth_token"

The same format scales to negative paths without changing structure, which is exactly why it survives regression suites.

  Scenario: Wrong password
    Given a user is registered with email "user@example.com"
      And the user is on the page "/login"
    When the user enters email "user@example.com"
      And the user enters password "wrong"
      And the user clicks "Sign in"
    Then an error message "Incorrect email or password" is displayed
      And the user remains on "/login"
      And after 5 failed attempts in 10 minutes the account is locked for 15 minutes

The advantages are real: it reads cleanly for stakeholders, leaves no ambiguity for engineering or QA, and feeds directly into automated test runners. The trade-off is verbosity. Use Gherkin where regression coverage matters; reach for lighter formats when the writing cost exceeds the automation value.

The checklist format

For small, mostly-visual user stories, a checklist beats Gherkin on signal-to-noise. The trade-off is that the precondition is implicit, so checklists work best when the state is obvious from the story title.

US-15: Shopper sees cart subtotal

Acceptance Criteria:
- [ ] Subtotal renders in the top-right corner of the cart drawer
- [ ] Subtotal recalculates when item quantity changes
- [ ] Subtotal includes sales tax with the label "incl. tax"
- [ ] Empty cart shows "$0.00"
- [ ] Amounts use a thousands separator (e.g. "$1,234.00")
- [ ] Subtotals above $999,999.99 render without rounding
- [ ] Subtotal carries an aria-label attribute for screen readers
- [ ] Subtotal updates within 100 ms of any quantity change on broadband

Checklists work for UI polish, copy changes, and incremental tweaks. They fall down whenever the precondition matters — a checkout flow where the user might be guest, logged in, or mid-coupon. In those cases Given/When/Then makes the state explicit and the checklist hides it.

The scenario format

The scenario format reads like a use case trimmed for sprint work. It enumerates the primary path and the alternates, without the formality of UML.

US-22: Password reset

Primary scenario:
1. User clicks "Forgot password?" on /login
2. User sees a form with one email field
3. User enters their email and clicks "Send reset link"
4. User sees the message "If this email is registered, we sent a link"
5. User receives an email within 60 seconds
6. User opens the link (valid for 24 hours, single-use)
7. User sees the new-password form
8. User sets a new password (rules per US-7)
9. User is logged in automatically and redirected to /dashboard

Alternate paths:
3a. Email is not registered → same generic message (do not leak account existence)
6a. Link expired → "This link has expired, request a new one"
6b. Link already used → "This link has been used"
7a. New password fails strength rules → inline validation, submit disabled

Scenario format fits flows with non-trivial branching but no automated-regression need — onboarding, recovery, admin tooling. It doubles as the QA test plan with little rework.

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

Definition of Ready vs Definition of Done

These two get conflated with AC in almost every interview, so know the difference cold.

Definition of Ready (DoR) is the gate a story must pass before the team commits to it. Typical DoR items: user value articulated, AC written and reviewed, design attached, technical unknowns spiked, story estimated, external dependencies flagged. DoR keeps the sprint from absorbing half-written work.

Definition of Done (DoD) is the gate a story must pass before it ships. Typical DoD items: code merged after review, unit and integration tests per the team's coverage policy, AC verified by QA, docs updated, monitoring added, build deployed at least to staging. DoD is team-wide and static.

AC are story-specific. DoD is team-wide. "Code review is complete" belongs in DoD. "Subtotal renders in the cart" belongs in AC. Interviewers love this distinction because candidates routinely mix them.

Concept Scope Changes per story? Owned by
Acceptance Criteria One story Yes SA / PM
Definition of Ready Backlog gate No (team-wide) Team
Definition of Done Release gate No (team-wide) Team

AC for non-functional requirements

Performance, security, accessibility, and browser support are not optional — and they need AC too. The trick is to use measurable thresholds, not adjectives.

Performance
- p95 response time for GET /api/orders is under 300 ms at 100 RPS sustained
- First Contentful Paint on /products is under 2.0 s on simulated 4G
- API payloads above 1 MB are served gzipped

Security
- Passwords are hashed with bcrypt at cost factor >= 12
- API requires Bearer JWT signed with RS256; refresh tokens rotate on use
- Application logs never contain plaintext passwords, full card PANs, or session tokens
- All POST endpoints reject requests missing a valid CSRF token

Accessibility
- Text contrast is at least 4.5:1 against its background (WCAG 2.2 AA)
- Every interactive control is reachable via Tab and operable via Enter or Space
- Every <img> has either an alt attribute or role="presentation"
- Form errors are announced via aria-live="polite"

Browser support
- Latest two stable versions of Chrome, Firefox, Safari, and Edge
- Graceful degradation (no JS crash) on iOS Safari 14+ and Chrome on Android 10+

For non-functional AC, thresholds matter more than prose**p95 < 300 ms at 100 RPS** and **WCAG 2.2 AA 4.5:1** are the numbers an interviewer wants to hear.

Common pitfalls

Vague phrasing is the most common failure mode. "Should be fast", "should work correctly", and "should feel intuitive" are aspirations, not criteria. Replace each with a measurable threshold or a deterministic outcome — response time under 300 ms, a specific error message string, a defined keyboard interaction. Adjectives without numbers signal a spec that was not finished.

Describing the implementation instead of the behaviour is the second trap. "Use Redis for caching" or "store sessions in DynamoDB" are engineering choices, not acceptance criteria. The AC should say "repeat requests to /api/profile return in under 50 ms" and let the team pick the data store. Slipping implementation details in also blocks future refactors that would otherwise leave behaviour intact.

Writing only the happy path is where most candidates lose interview points. A login flow without bad-password handling, an upload flow without a file-size cap, or a payment flow without a duplicate-submission guard is not done — it is a demo. Edge cases should appear in every spec: empty inputs, network failure, rate limits, validation errors, and timeouts. For every happy-path criterion there is usually at least one error-path criterion.

Missing preconditions ruin Given/When/Then specs in particular. "When the user clicks Pay" is ambiguous if the cart could be empty, the address could be missing, or the payment method could be expired. The "Given" clause is doing real work — skipping it forces engineers to guess the starting state.

Packing multiple features into one AC set is another scope failure. If a single story lists fifteen criteria covering "cart, checkout, and order history", split it. INVEST reminds you that stories should be Small and Independent; the AC count is your warning light.

Using Given/When/Then for everything is the opposite extreme. An API contract with twelve error codes is faster to convey as a JSON examples table than as twelve Gherkin scenarios. Pick the format that gives the most signal per line for the audience that will read it.

Confusing AC with DoD is the interview trap that catches candidates who otherwise know their craft. "Code review is complete" is a DoD item; "the system rejects a coupon after its expiry date" is an AC. If a criterion would apply to literally any story your team ships, it belongs in DoD.

Gotcha: if your AC contains the words "user-friendly", "intuitive", "modern", or "robust", rewrite it. None of those words are testable, and an interviewer will pull on that thread.

If you want to drill SA scenarios daily, NAILDD is launching with 500+ interview problems across AC, API design, data modelling, and integration patterns.

FAQ

How many acceptance criteria should one user story have?

Three to ten is the comfortable range. Below three usually means the story is trivial enough that you have missed edge cases — sanity-check the error paths. Above ten usually means the story is too big and should be split along the natural seams. The AC count is a rough proxy for whether the scope is right.

Who writes the acceptance criteria?

In most teams the SA or Product Manager drafts the AC, then engineering and QA review them before the story is accepted into a sprint. The SA owns the wording, the engineer challenges feasibility, the QA engineer challenges testability — that three-way review is the cheapest defect prevention there is.

Can acceptance criteria change after work starts?

They can, but treat each change as a signal that the spec was not ready. Mature teams use Definition of Ready precisely to prevent this. When a change is unavoidable mid-flight, route it through a lightweight change request: update the AC in writing, re-estimate, and either re-scope the sprint or push the change to the next one.

Do bug fixes need acceptance criteria?

Yes, just lighter ones. A bug AC has three pieces: reproduction steps, expected behaviour, and current behaviour. That trio is acceptance criteria in everything but name, and it is what QA will use to verify the fix. "The bug is gone" is not testable; "after entering an invalid coupon the checkout button is re-enabled within 200 ms" is.

How is Gherkin different from plain Given/When/Then prose?

Gherkin is the formalised syntax used by BDD frameworks (Cucumber, behave, pytest-bdd) with reserved keywords: Feature, Scenario, Given, When, Then, And, But. .feature files are parsed into executable tests via step definitions. Plain Given/When/Then prose is the same shape with no machine parsing.