Data analyst roadmap 2026
Contents:
Why you need a roadmap
Data analytics is a wide field — SQL, Python, statistics, visualization, product metrics, experimentation, and a sliver of ML. Without a plan it is trivially easy to burn three months on dashboards in Tableau when the recruiter screen you are about to fail is a one-hour SQL test on window functions and cohort retention. The roadmap below is sequenced so that the things hiring panels at Stripe, Airbnb, DoorDash, Uber, Meta and Notion actually ask about come first, and the things nobody asks about come last (or never).
The single most useful thing a roadmap does is kill optionality. When everything looks equally important, you do nothing for long enough. Commit to one ladder — SQL → metrics → Python → stats → A/B tests — and stop renegotiating the plan. That alone closes 80% of the gap between "studying analytics" and getting an offer.
This is built from real job descriptions on LinkedIn and Glassdoor, levels.fyi compensation bands for Data Analyst I, II, and Senior, and interview loops at FAANG, fintech, and consumer subscription companies. If a topic appears in fewer than one in five loops, it is not in this roadmap.
Level 1: Intern / Junior (0–6 months)
Load-bearing trick: at this level, you are not learning analytics. You are learning SQL well enough to be useful inside a year. Everything else is supporting cast.
SQL — non-negotiable
SQL is asked in roughly 95% of analyst loops — coding screen, take-home, and onsite. There is no negotiating around it. Aim for fluency before you touch a second tool.
What to cover at this level:
SELECT,WHERE,ORDER BY,LIMITGROUP BY,HAVING, aggregate functions (COUNT,SUM,AVG,MAX,MIN)- Joins —
INNER,LEFT,RIGHT,FULL— see join SQL cheat sheet CASE WHEN,COALESCE,NULLIF- Subqueries and CTEs (
WITH) — see CTE with SQL guide - Basic window functions:
ROW_NUMBER,RANK— see window functions in SQL overview - Dates:
DATE_TRUNC,EXTRACT,INTERVAL
A worked example you should be able to write under timed conditions by month three:
-- Daily new vs returning users in the last 30 days
WITH first_seen AS (
SELECT user_id, MIN(event_date) AS first_date
FROM events
GROUP BY user_id
)
SELECT
e.event_date,
COUNT(DISTINCT CASE WHEN e.event_date = f.first_date
THEN e.user_id END) AS new_users,
COUNT(DISTINCT CASE WHEN e.event_date > f.first_date
THEN e.user_id END) AS returning_users
FROM events e
JOIN first_seen f ON e.user_id = f.user_id
WHERE e.event_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY e.event_date
ORDER BY e.event_date;How to drill it: 3 problems a day, six days a week, for ten weeks. That is around 180 problems, which is the volume where the patterns (gaps-and-islands, dedup, rolling windows, funnel step conversion) stop looking like ad-hoc puzzles and start looking like the same 20 motifs in different costumes.
Outcome target: you can write a query with a join, a GROUP BY, and a window function in under 15 minutes without looking anything up.
Spreadsheets — one week, no more
Pivot tables, VLOOKUP / INDEX-MATCH, basic charts, IF / SUMIFS / COUNTIFS. That is it. Spreadsheets are great for quick one-off requests and terrible for everything else. Do not spend a month here — the marginal hour is much better invested in SQL.
Metric vocabulary
You need to be able to define, in one sentence each, every metric in this table before you apply to anything:
| Metric | One-line definition | Common gotcha |
|---|---|---|
| DAU / MAU | Distinct users in a day vs in 28 days | Defining "user" — device, account, or person |
| Retention (D1/D7/D30) | Share of a cohort active N days after acquisition | Confusing rolling retention with classic |
| Conversion rate | Step B users ÷ step A users in a funnel | Counting users vs events vs sessions |
| ARPU | Revenue ÷ active users in a period | Active denominator vs paying denominator |
| LTV | Expected total revenue per acquired user | Time horizon and discount rate |
Start with DAU explained for PM, retention vs churn, and funnel conversion for product managers. These three concepts show up in every product-sense screen.
Level 2: Junior+ / Mid (6–18 months)
Advanced SQL
Window functions stop being optional and become the primary tool for analytics queries. Get comfortable with LAG, LEAD, SUM() OVER (...), NTILE, frame clauses (ROWS BETWEEN, RANGE BETWEEN), recursive CTEs, and LATERAL / CROSS JOIN. You should also be able to read an EXPLAIN plan well enough to spot a sequential scan where you wanted an index — see how to read EXPLAIN ANALYZE.
Practice deliberately on the interview-grade question set: SQL window functions interview questions. If those feel hard, the gap is window functions, not "general SQL".
Python for analysis
Pandas is the workhorse: reading data, filtering, groupby, merge, pivot_table. NumPy for vectorized math. Matplotlib or Seaborn for charts. Jupyter for the actual work. The honest truth at mid-level is that most of your Python will be a thin layer over SQL — pulling a result set into a dataframe, doing two transformations, plotting it. Get good at that loop first, then expand.
Useful primers in roughly this order: Pandas cheat sheet for analysts, NumPy for data analysts, Python interview questions for data analysts.
Statistics
The interview-relevant slice of stats is smaller than people fear. You need to be able to talk about, in plain English:
- Null vs alternative hypotheses
- P-value explained simply — what it is and what it is not
- Normal distribution and the central limit theorem
- Confidence intervals
- Type I and Type II errors, and why they trade off
If you can explain each of these to a non-technical PM in under 60 seconds, you are ready for the statistics section of any analyst loop.
A/B testing
Experimentation is the highest-leverage skill at the Junior+ → Mid jump. Cover experiment design (control, treatment, success metric), sample size and power, and the canonical mistakes — peeking, multiple comparisons, sample ratio mismatch (SRM). For a step-by-step framing of test design see how to design an A/B test.
Sanity check: if you cannot articulate why peeking at a running A/B test inflates the false-positive rate, you will fail the experimentation section. This one concept gets asked in a majority of loops.
Product analytics
This is where you stop being "SQL person" and start being an analyst. Cover unit economics, cohort analysis, the "a metric dropped — what do you do?" case (case study questions for data analysts), North Star metric, and the AARRR framework.
Visualization and BI
Learn one BI tool well — Tableau, Looker, or Mode — not all three. Know the chart-type taxonomy (when bar, when line, when scatter, when small multiples), the principles of dashboard layout, and how to design a chart that survives a screenshot in Slack.
Level 3: Senior (18+ months)
A senior analyst is not someone who knows more SQL. A senior analyst is someone whose work changes decisions. Compensation reflects this — US bands for senior data analysts cluster around $140k–$190k base + $20k–$60k bonus + equity at public tech companies, per levels.fyi.
| Level | Years | Base (US median) | What you trade in |
|---|---|---|---|
| Junior | 0–2 | $80k–$110k | Time, motion, willingness to grind |
| Mid | 2–5 | $110k–$140k | Reliability, ownership of a domain |
| Senior | 5+ | $140k–$190k | Judgment, influence, scope |
What changes at senior:
- You design metric systems for whole product areas, not single dashboards.
- You pick the metric for an experiment instead of running the test someone else specified.
- You separate correlation from causation in messy observational data, and know when to ask for a randomized test instead.
- You can apply variance-reduction methods — CUPED and stratification — to ship tests with smaller samples or shorter runtimes.
The non-technical half matters at least as much. Presenting to stakeholders, translating a query result into a one-line recommendation, mentoring juniors, and being in the room when the product strategy gets set — these are what compounds.
Optional extensions, in roughly the order of return-on-effort:
- Basic ML — linear regression, classification, clustering
- Workflow tooling — Airflow, dbt
- Warehouse-side optimization — ClickHouse, BigQuery, Snowflake
- ETL/data-engineering fundamentals
How to use this roadmap
- Place yourself honestly. If you cannot write a
LEFT JOINfrom memory, you are at Level 1. If you can write window functions but freeze on cohort retention, you are mid-Level 2. - One skill at a time. Trying to learn SQL, Python, and statistics in parallel converges to learning none of them. Sequence.
- Daily practice beats weekend marathons. 45 minutes per day for ten weeks is worth more than four hours every Sunday for six months.
- Start interviewing before you feel ready. The first three loops are calibration, not offers. Treat them as free diagnostic tests.
If you want to drill SQL questions every day in this exact pattern, NAILDD is launching with hundreds of analyst-grade SQL problems sequenced to match this roadmap.
Common pitfalls
The first big trap is front-loading Python. Beginners reach for pandas because it feels more like programming, and SQL feels boring. The hiring market does not care: most analyst loops open with a live SQL exercise, and pandas chops do not transfer. Defer Python until you can write a window-function query without flinching.
The second trap is over-investing in BI tools. People burn weekends on advanced Tableau calculated fields, but interviewers rarely test BI tool depth — they ask whether you can build a useful dashboard. Learn one tool to "can-ship-a-dashboard" level, then stop.
A third pitfall is studying statistics from textbooks instead of interview questions. The interview-relevant slice is tiny: hypothesis tests, p-values, confidence intervals, Type I/II errors, and the central limit theorem. Working through an intro stats book end-to-end consumes two months on material that never gets asked. Use statistics for data analysts instead, then drill questions.
The fourth pitfall is delaying interviews until you "feel ready". You will not feel ready, and the calibration you get from real loops is not replicable from reading. Apply to three companies you do not care about by month four — the feedback is more valuable than 40 more hours of solo study.
A fifth pitfall is memorizing metric definitions without practicing the "why is it down 7%" case. Knowing what DAU means is table stakes. Knowing how to decompose a 7% DAU drop into platform, geography, cohort, and event slices in under ten minutes is the actual interview question.
Related reading
- How to become a data analyst from scratch
- How to become a senior data analyst
- Data analyst vs product analyst vs business analyst
- SQL on data analyst interview
- Case interview for data analyst
FAQ
Where should I start learning data analytics?
Start with SQL. It is the foundation, and no other skill compensates for not having it. Begin with SELECT and joins, then progress to GROUP BY, then to basic window functions. In parallel, read about core metrics — DAU, retention, conversion — so that you can talk about the numbers your queries produce. Do not start with Python or statistics; both are easier to learn after SQL is fluent.
Do I need Python as a data analyst?
For Junior roles it is helpful but not strictly required. For Mid and Senior it is expected at most companies. Python — specifically pandas plus a visualization library — typically lifts compensation by 15–25% versus an SQL-only profile because it opens up automation, ad-hoc modeling, and any team that runs experiments outside their BI tool. At top-tier consumer companies, Python in the loop is increasingly standard.
How long does each level realistically take?
Level 1 (Junior) takes three to six months of daily practice if you are starting from scratch. Level 2 (Mid) usually takes another six to eighteen months, most of which is real work experience rather than self-study. Level 3 (Senior) takes one and a half to three years past mid, and it is gated mostly by business judgment and scope rather than additional tooling — you cannot self-study your way to senior in a vacuum.
Which skills carry the most weight in interviews?
In rough order of how often they show up: SQL (always), product metrics fluency (DAU, retention, funnels, LTV), basic statistics (p-values, A/B tests, confidence intervals), and structured case reasoning. These four together determine whether you get the offer. Visualization, BI tools, and ML are tie-breakers, not gates.
Should I learn ML as a data analyst?
Only after the four core areas above are solid. ML rarely shows up in analyst interviews unless the role explicitly says "analytics engineer" or "ML-adjacent". When it does, the bar is conceptual — explain bias-variance, name a classification metric, sketch a baseline — not implementation.
Is a degree required?
No, but a quantitative background helps clear the resume screen at larger companies. The actual hiring decision is made on the SQL screen, the case interview, and the behavioral round — not the degree. The bottleneck is portfolio quality and interview prep, not credentials.