dbt exposures for DE interviews
Contents:
What exposures actually are
Most candidates walking into a Data Engineering loop can recite the dbt trio — sources, models, tests — but stumble the moment the interviewer asks "how do you track what depends on your warehouse downstream?" That is the exposures question, and it shows up at companies running dbt at any real scale: Snowflake-shop, Databricks-shop, Stripe-internal-fork, doesn't matter. The pattern is identical, and the answer signals whether you've actually shipped dbt in a team or only built toy projects.
An exposure is a declarative way to register a downstream consumer of your dbt models — a dashboard in a BI tool, a notebook used by analysts, an ML feature pipeline, an embedded application card. It is not materialized into the warehouse. It contains no SQL. It just sits in your .yml files and tells dbt: these models feed something a human cares about, and here is who owns it. That is the entire point.
The reason interviewers love this topic: it separates engineers who think of dbt as "SQL with version control" from engineers who think of dbt as the contract layer between the warehouse and everything downstream. The first group breaks dashboards on Friday afternoon and finds out from Slack. The second group runs dbt ls --resource-type exposure and pings the four affected owners before merging.
A canonical declaration looks like this:
# models/marketing/exposures.yml
exposures:
- name: weekly_revenue_dashboard
label: "Weekly Revenue — Exec"
type: dashboard
maturity: high
url: https://app.metabase.example.com/dashboard/42
description: >
Friday morning exec readout. Driven by fct_orders
and dim_customers. Owner is the GTM analytics lead.
owner:
name: Maya Chen
email: maya@example.com
depends_on:
- ref('fct_orders')
- ref('dim_customers')
- source('stripe', 'charges')
tags: ['exec', 'weekly', 'revenue']Load-bearing trick: the depends_on list is what lets dbt compute lineage. If you forget it, the exposure renders in docs but contributes nothing to impact analysis — which is the only reason exposures exist.
The YAML and the fields that matter
Five exposure types are supported as of dbt-core 1.7+: dashboard, notebook, analysis, ml, and application. The type is a tag for filtering and rendering — dashboard and application are the two that come up most in interviews because they map to the highest-blast-radius consumers.
The field set is small enough to memorize before a loop:
| Field | Required | What interviewers probe |
|---|---|---|
name |
yes | Unique within project; snake_case |
type |
yes | Why this type vs another |
owner.email |
yes | Who gets paged on breakage |
depends_on |
recommended | The whole reason exposures exist |
url |
optional | Lets reviewers click through from docs |
description |
optional | What the consumer actually does |
maturity |
optional | low / medium / high — gates CI strictness |
tags |
optional | Filter for selectors: dbt build --select +tag:exec |
The maturity field is the one most candidates miss. It's not enforced by dbt — it's a convention your team picks. The common pattern: high maturity exposures must have all upstream models passing not_null and unique tests, plus a freshness check on their sources. low maturity is essentially "someone's experiment, don't page anyone." Knowing this distinction makes you sound like you've run dbt in production, not just read the docs.
One detail that trips people up: exposures live alongside models in .yml, but they are not models themselves and do not produce SQL. They are pure metadata. You can put one exposure file per domain (marketing/exposures.yml, finance/exposures.yml) or one mega-file at the project root — most teams prefer the per-domain split because it keeps ownership boundaries explicit.
Lineage benefits in impact analysis
The interview question almost always lands on impact analysis. The framing varies but the substance is identical: "You're about to drop a column from fct_orders. Walk me through how you'd figure out who breaks."
The wrong answer is "grep the BI tool" or "Slack the team." The right answer leans on lineage:
dbt ls --select +exposure:weekly_revenue_dashboard
# everything upstream of that exposure
dbt ls --select fct_orders+ --resource-type exposure
# everything downstream of fct_orders that's an exposureThe second command is the gold. It returns a flat list of every dashboard, ML pipeline, and application that touches fct_orders — even transitively through other models. You then pull the owner emails out of the manifest:
dbt ls --select fct_orders+ \
--resource-type exposure \
--output json \
--output-keys "name owner" \
| jq '.[] | "\(.name): \(.owner.email)"'That's the answer to the impact analysis question. You ship the column drop only after you've notified those owners, given them a deprecation window (typically 1-2 sprints for high maturity, immediate for low), and confirmed they've updated their consumers.
In the dbt docs site, an exposure renders as a terminal node in the DAG. Hovering shows the description, the owner, and the URL. Click-through opens the actual dashboard. This is what teams demo to non-engineers — finance, marketing, exec ops — and it's what gets non-engineers to trust that the warehouse is a real product, not a black box. That trust is the deeper payoff, and worth mentioning if the interviewer pushes on the "why" rather than the "how."
| Downstream signal | Without exposures | With exposures |
|---|---|---|
Find consumers of fct_orders |
Grep BI repo, ask in Slack | dbt ls --select fct_orders+ |
| Notify owners before breaking change | Manual list in Notion | Owner emails in manifest |
| Onboard new analyst | "Click around the BI tool" | Open dbt docs, browse DAG |
| Audit unused models | Painful, often skipped | Models with no exposure path = candidates to deprecate |
The last row is underrated. Once you have exposures covering the dashboards and ML pipelines that matter, you can run dbt ls --select state:modified+ --exclude exposure:+ and find models that have no business reason to exist. Killing dead models is one of the highest-leverage refactors in a mature dbt project, and exposures are the lever that makes it safe.
Using exposures in CI
The second interview probe is operational: "How do you make sure exposures don't go stale?" Three patterns come up.
The first is the pre-merge selector. On every PR, your CI runs dbt build --select state:modified+ --defer --state ./prod-manifest to test only what changed. If you add +exposure: to the selector, you also rebuild every model that feeds an affected dashboard. This catches the case where a column rename breaks the weekly_revenue_dashboard even though the dashboard's direct upstream wasn't touched.
The second is the freshness gate. For exposures with maturity: high, CI fails the build if any source feeding the exposure is stale by more than the configured threshold. You configure this in the source block, not the exposure, but the exposure is what tells CI which sources to check:
sources:
- name: stripe
tables:
- name: charges
freshness:
warn_after: { count: 6, period: hour }
error_after: { count: 24, period: hour }Combine that with dbt source freshness --select source:stripe.charges in CI and you've got an end-to-end gate: a stale source upstream of a high maturity exposure fails the deploy.
The third is the periodic audit. A nightly job runs dbt ls --resource-type exposure --output json and diffs the URLs against a head-check (HTTP 200 on the dashboard URL, ML model artifact exists in the registry). Exposures whose downstream consumer has been deleted get flagged for cleanup. This is where teams catch the "we sunset that dashboard six months ago but nobody removed the exposure" rot.
# Nightly audit, simplified
dbt ls --resource-type exposure --output json \
| jq -r '.[] | "\(.name)\t\(.url)"' \
| while IFS=$'\t' read name url; do
status=$(curl -s -o /dev/null -w "%{http_code}" "$url")
[ "$status" != "200" ] && echo "STALE: $name -> $status"
doneSanity check: if your CI doesn't fail when an upstream source is stale for a high maturity exposure, you don't really have exposures — you have decorative YAML.
Common pitfalls
The biggest pitfall is declaring exposures without depends_on. The schema allows it, but a depends_on-less exposure contributes nothing to lineage — it just shows up as a floating node in docs. Interviewers ask follow-ups specifically to surface this. The fix is mechanical: every exposure must list at least one ref() or source(), and your CI should fail the build if any exposure has an empty depends_on. A five-line dbt run-operation macro can enforce this.
A second trap is using exposures as a substitute for actual contracts. An exposure says "this dashboard depends on fct_orders" but it does not pin the schema. If you drop customer_id from fct_orders and the dashboard breaks, the exposure didn't prevent it — it only made the breakage discoverable. The fix is to pair high-maturity exposures with model contracts (contract: enforced: true on the upstream model), which actually pin the column types and presence. Exposures are the address book; contracts are the lock on the door.
A third pitfall is owner email rot. Maya leaves the company, her email bounces, the dashboard breaks six months later, nobody gets paged. The fix is to point owner emails at distribution lists or team aliases, never at individuals. marketing-analytics@example.com survives turnover; maya@example.com does not. This sounds obvious until you've inherited a dbt project with 80 exposures all owned by people who left.
A fourth — subtler — pitfall is over-declaring exposures. Some teams try to register every single dashboard and notebook, including one-off analyst experiments. The signal-to-noise ratio collapses, the docs site becomes unreadable, and reviewers stop trusting impact analysis because half the exposures are dead. The discipline is to only register what you'd actually page someone about. If nobody would care about it breaking, it's not an exposure — it's an unmanaged consumer, and that's fine.
A fifth pitfall is forgetting that exposures don't run. They have no SQL, no materialization. Newer engineers sometimes try to dbt run --select exposure:weekly_revenue_dashboard expecting it to refresh the dashboard. It will run the upstream models, but the dashboard itself is refreshed by the BI tool on its own schedule. Make sure your runbook documents this — the dbt build doesn't push to Metabase or Looker; those have their own refresh mechanics.
Related reading
- What is dbt — the data build tool explained
- dbt incremental models for DE interviews
- Data lineage in the data engineering interview
- dbt Elementary for data engineering interviews
- SQL window functions interview questions
If you want to drill DE questions like this every day, NAILDD has 1,500+ problems covering dbt, Airflow, warehouse design, and the SQL patterns that come up in the loops.
FAQ
Are exposures required to ship dbt to production?
No, and most early-stage teams skip them. They become valuable around the point where you have 50+ models and at least two downstream consumer types — say, one BI tool and one ML pipeline. Below that scale, the maintenance overhead exceeds the impact-analysis payoff. Above that scale, not having exposures means every schema change becomes a Slack archaeology project.
How do exposures differ from sources?
Sources describe what comes into your dbt project from the warehouse — raw tables loaded by Fivetran, Airbyte, or whatever ingestion tool you run. Exposures describe what goes out of your project to humans and downstream systems. Both contribute to lineage, but they sit at opposite ends of the DAG. A useful mental model: sources are inputs, models are processing, exposures are outputs.
Can one exposure depend on another exposure?
No — exposures are leaf nodes in the DAG. They can only depend on models and sources. If you find yourself wanting an exposure-to-exposure dependency, what you actually need is a model in between, or you're modeling something exposures aren't designed for. The single-level constraint is intentional: exposures represent the boundary where dbt's responsibility ends and the consumer system's responsibility begins.
What's the difference between maturity levels in practice?
high maturity means CI gates everything: source freshness, contract enforcement, full test suite on upstream models. Breaking changes go through a deprecation window. medium means tests run but freshness is advisory, not blocking. low means the exposure is registered for visibility only — no CI gating, and changes can ship without owner notification. Most teams have a handful of high, a larger middle of medium, and a long tail of low they're slowly cleaning up.
Do exposures replace a data catalog like Atlan or DataHub?
No. Exposures live inside dbt and describe the dbt-to-consumer boundary. A full data catalog covers the full graph from operational systems through to consumption, plus business glossary, classification, and discovery. The two work together — many catalogs ingest dbt manifests (including exposure definitions) and surface them in the catalog UI. If you have neither, start with exposures; they ship with dbt and cost nothing. Add a catalog when the org outgrows the dbt-only view.
How do you keep exposure docs in sync with the actual dashboards?
The honest answer is you don't, fully. The audit pattern from the CI section catches dead URLs, and you should run it nightly. Beyond that, treat exposure descriptions like any other docstring — they drift, and you rely on review discipline plus the periodic audit to catch the worst of it. Some teams add a quarterly "exposure cleanup" ticket to the rotation, which is the lowest-effort way to keep the docs site honest.