Acceptance criteria Given/When/Then for SA
Contents:
Why interviewers ask about AC
Acceptance criteria are the single artefact that decides whether a feature is "done". Engineers build to them, QA tests against them, and the product owner accepts work using them. That is why interviewers for SA roles at companies like Stripe, Atlassian, Notion, and Linear open with "write the AC for a password reset flow" or "what is the difference between AC and Definition of Done".
The bad version of this skill looks like a Jira ticket that says "should work correctly on all browsers". The engineer guesses, QA cannot write a test plan, and the demo turns into a debate. The good version reads like a contract: predictable inputs, an explicit action, and a measurable outcome — written in a format that survives both the sprint planning meeting and the BDD test runner.
Load-bearing rule: an acceptance criterion is either pass or fail. If you cannot point at the system and say "this passes" or "this fails", it is not an AC — it is a wish.
What acceptance criteria are
Acceptance criteria are the conditions a feature must satisfy to be accepted by the business. They are objective, testable, and free of ambiguous adjectives. The INVEST mnemonic (Independent, Negotiable, Valuable, Estimable, Small, Testable) applies to the user story; the AC themselves carry the "Testable" half.
Good AC share five properties. Concreteness — "button is visible" is weak; "Submit button is rendered in the header on viewports >= 1024px" is testable. Testability — there is exactly one way to evaluate the criterion. Solution-agnostic — AC describe what must happen, not how; "use Redis for caching" is an implementation detail. Bounded scope — one user story carries its own AC, criteria do not leak across stories. Explicit edge cases — happy path and error paths are both listed.
It also helps to know what AC are not. A user story is the value framing. A use case is the formal interaction description, often with a UML diagram. AC sit between them: the contract that turns the story into a shippable increment.
| Artefact | Owner | Purpose | Granularity |
|---|---|---|---|
| User story | PM / SA | Express user value | One story per outcome |
| Use case | SA | Describe interaction flow | One per actor-goal pair |
| Acceptance criteria | SA / QA | Define "done" for the story | 3-10 per story |
| Definition of Done | Team | Quality bar across all stories | Team-wide, static |
The Given/When/Then format (Gherkin)
Given/When/Then is the most structured format and the one BDD frameworks like Cucumber, behave, and pytest-bdd parse directly. The shape is always the same: a precondition, a triggering action, and an expected outcome.
Feature: User login
Scenario: Successful login
Given a user is registered with email "user@example.com" and password "secret123"
And the user is on the page "/login"
When the user enters email "user@example.com"
And the user enters password "secret123"
And the user clicks "Sign in"
Then the user lands on "/dashboard"
And the header shows the user's display name
And localStorage contains the key "auth_token"The same format scales to negative paths without changing structure, which is exactly why it survives regression suites.
Scenario: Wrong password
Given a user is registered with email "user@example.com"
And the user is on the page "/login"
When the user enters email "user@example.com"
And the user enters password "wrong"
And the user clicks "Sign in"
Then an error message "Incorrect email or password" is displayed
And the user remains on "/login"
And after 5 failed attempts in 10 minutes the account is locked for 15 minutesThe advantages are real: it reads cleanly for stakeholders, leaves no ambiguity for engineering or QA, and feeds directly into automated test runners. The trade-off is verbosity. Use Gherkin where regression coverage matters; reach for lighter formats when the writing cost exceeds the automation value.
The checklist format
For small, mostly-visual user stories, a checklist beats Gherkin on signal-to-noise. The trade-off is that the precondition is implicit, so checklists work best when the state is obvious from the story title.
US-15: Shopper sees cart subtotal
Acceptance Criteria:
- [ ] Subtotal renders in the top-right corner of the cart drawer
- [ ] Subtotal recalculates when item quantity changes
- [ ] Subtotal includes sales tax with the label "incl. tax"
- [ ] Empty cart shows "$0.00"
- [ ] Amounts use a thousands separator (e.g. "$1,234.00")
- [ ] Subtotals above $999,999.99 render without rounding
- [ ] Subtotal carries an aria-label attribute for screen readers
- [ ] Subtotal updates within 100 ms of any quantity change on broadbandChecklists work for UI polish, copy changes, and incremental tweaks. They fall down whenever the precondition matters — a checkout flow where the user might be guest, logged in, or mid-coupon. In those cases Given/When/Then makes the state explicit and the checklist hides it.
The scenario format
The scenario format reads like a use case trimmed for sprint work. It enumerates the primary path and the alternates, without the formality of UML.
US-22: Password reset
Primary scenario:
1. User clicks "Forgot password?" on /login
2. User sees a form with one email field
3. User enters their email and clicks "Send reset link"
4. User sees the message "If this email is registered, we sent a link"
5. User receives an email within 60 seconds
6. User opens the link (valid for 24 hours, single-use)
7. User sees the new-password form
8. User sets a new password (rules per US-7)
9. User is logged in automatically and redirected to /dashboard
Alternate paths:
3a. Email is not registered → same generic message (do not leak account existence)
6a. Link expired → "This link has expired, request a new one"
6b. Link already used → "This link has been used"
7a. New password fails strength rules → inline validation, submit disabledScenario format fits flows with non-trivial branching but no automated-regression need — onboarding, recovery, admin tooling. It doubles as the QA test plan with little rework.
Definition of Ready vs Definition of Done
These two get conflated with AC in almost every interview, so know the difference cold.
Definition of Ready (DoR) is the gate a story must pass before the team commits to it. Typical DoR items: user value articulated, AC written and reviewed, design attached, technical unknowns spiked, story estimated, external dependencies flagged. DoR keeps the sprint from absorbing half-written work.
Definition of Done (DoD) is the gate a story must pass before it ships. Typical DoD items: code merged after review, unit and integration tests per the team's coverage policy, AC verified by QA, docs updated, monitoring added, build deployed at least to staging. DoD is team-wide and static.
AC are story-specific. DoD is team-wide. "Code review is complete" belongs in DoD. "Subtotal renders in the cart" belongs in AC. Interviewers love this distinction because candidates routinely mix them.
| Concept | Scope | Changes per story? | Owned by |
|---|---|---|---|
| Acceptance Criteria | One story | Yes | SA / PM |
| Definition of Ready | Backlog gate | No (team-wide) | Team |
| Definition of Done | Release gate | No (team-wide) | Team |
AC for non-functional requirements
Performance, security, accessibility, and browser support are not optional — and they need AC too. The trick is to use measurable thresholds, not adjectives.
Performance
- p95 response time for GET /api/orders is under 300 ms at 100 RPS sustained
- First Contentful Paint on /products is under 2.0 s on simulated 4G
- API payloads above 1 MB are served gzipped
Security
- Passwords are hashed with bcrypt at cost factor >= 12
- API requires Bearer JWT signed with RS256; refresh tokens rotate on use
- Application logs never contain plaintext passwords, full card PANs, or session tokens
- All POST endpoints reject requests missing a valid CSRF token
Accessibility
- Text contrast is at least 4.5:1 against its background (WCAG 2.2 AA)
- Every interactive control is reachable via Tab and operable via Enter or Space
- Every <img> has either an alt attribute or role="presentation"
- Form errors are announced via aria-live="polite"
Browser support
- Latest two stable versions of Chrome, Firefox, Safari, and Edge
- Graceful degradation (no JS crash) on iOS Safari 14+ and Chrome on Android 10+For non-functional AC, thresholds matter more than prose — **p95 < 300 ms at 100 RPS** and **WCAG 2.2 AA 4.5:1** are the numbers an interviewer wants to hear.
Common pitfalls
Vague phrasing is the most common failure mode. "Should be fast", "should work correctly", and "should feel intuitive" are aspirations, not criteria. Replace each with a measurable threshold or a deterministic outcome — response time under 300 ms, a specific error message string, a defined keyboard interaction. Adjectives without numbers signal a spec that was not finished.
Describing the implementation instead of the behaviour is the second trap. "Use Redis for caching" or "store sessions in DynamoDB" are engineering choices, not acceptance criteria. The AC should say "repeat requests to /api/profile return in under 50 ms" and let the team pick the data store. Slipping implementation details in also blocks future refactors that would otherwise leave behaviour intact.
Writing only the happy path is where most candidates lose interview points. A login flow without bad-password handling, an upload flow without a file-size cap, or a payment flow without a duplicate-submission guard is not done — it is a demo. Edge cases should appear in every spec: empty inputs, network failure, rate limits, validation errors, and timeouts. For every happy-path criterion there is usually at least one error-path criterion.
Missing preconditions ruin Given/When/Then specs in particular. "When the user clicks Pay" is ambiguous if the cart could be empty, the address could be missing, or the payment method could be expired. The "Given" clause is doing real work — skipping it forces engineers to guess the starting state.
Packing multiple features into one AC set is another scope failure. If a single story lists fifteen criteria covering "cart, checkout, and order history", split it. INVEST reminds you that stories should be Small and Independent; the AC count is your warning light.
Using Given/When/Then for everything is the opposite extreme. An API contract with twelve error codes is faster to convey as a JSON examples table than as twelve Gherkin scenarios. Pick the format that gives the most signal per line for the audience that will read it.
Confusing AC with DoD is the interview trap that catches candidates who otherwise know their craft. "Code review is complete" is a DoD item; "the system rejects a coupon after its expiry date" is an AC. If a criterion would apply to literally any story your team ships, it belongs in DoD.
Gotcha: if your AC contains the words "user-friendly", "intuitive", "modern", or "robust", rewrite it. None of those words are testable, and an interviewer will pull on that thread.
Related reading
- Kafka for the Systems Analyst interview
- SQL window functions interview questions
- Why are you leaving your job? Interview answer
If you want to drill SA scenarios daily, NAILDD is launching with 500+ interview problems across AC, API design, data modelling, and integration patterns.
FAQ
How many acceptance criteria should one user story have?
Three to ten is the comfortable range. Below three usually means the story is trivial enough that you have missed edge cases — sanity-check the error paths. Above ten usually means the story is too big and should be split along the natural seams. The AC count is a rough proxy for whether the scope is right.
Who writes the acceptance criteria?
In most teams the SA or Product Manager drafts the AC, then engineering and QA review them before the story is accepted into a sprint. The SA owns the wording, the engineer challenges feasibility, the QA engineer challenges testability — that three-way review is the cheapest defect prevention there is.
Can acceptance criteria change after work starts?
They can, but treat each change as a signal that the spec was not ready. Mature teams use Definition of Ready precisely to prevent this. When a change is unavoidable mid-flight, route it through a lightweight change request: update the AC in writing, re-estimate, and either re-scope the sprint or push the change to the next one.
Do bug fixes need acceptance criteria?
Yes, just lighter ones. A bug AC has three pieces: reproduction steps, expected behaviour, and current behaviour. That trio is acceptance criteria in everything but name, and it is what QA will use to verify the fix. "The bug is gone" is not testable; "after entering an invalid coupon the checkout button is re-enabled within 200 ms" is.
How is Gherkin different from plain Given/When/Then prose?
Gherkin is the formalised syntax used by BDD frameworks (Cucumber, behave, pytest-bdd) with reserved keywords: Feature, Scenario, Given, When, Then, And, But. .feature files are parsed into executable tests via step definitions. Plain Given/When/Then prose is the same shape with no machine parsing.