Datetime in Python for analysts

Practice Python for data interviews
200+ pandas, numpy, and data-wrangling problems with explanations.
Join the waitlist

Why analysts wrestle with datetime

Picture this: it is Monday morning, your PM pings you on Slack asking for weekly active users by signup cohort, and the timestamps in the export are a mix of 2025-03-15T14:30:00Z, 15/03/2025 14:30, and Unix epoch integers. You open a notebook, type pd.to_datetime, and pray. Three hours later you discover half your numbers are off by one day because the warehouse stores everything in UTC and your dashboard rendered in local time.

This is the everyday reality of working with dates. Python's standard datetime module is the foundation, and pandas builds on top with the .dt accessor and pd.to_datetime. Analysts who fluently handle strptime, strftime, timedelta, and timezone-aware datetimes ship cohort reports faster, debug fewer dashboard bugs, and pass interview questions that trip up self-taught candidates.

The good news: the surface area is small. Five concepts cover 80% of analytical work — parsing strings, formatting for output, doing arithmetic with timedelta, extracting components like month or weekday, and keeping timezones consistent. Once those click, the rest is muscle memory.

Load-bearing rule: store and compute in UTC, convert to local time only at the rendering boundary. Everything else flows from that.

The core classes

The datetime module ships four working classes, but analysts mostly touch two of them daily.

from datetime import date, time, datetime, timedelta

# date — calendar date, no time
d = date(2025, 3, 15)
print(d)  # 2025-03-15

# time — time-of-day, no date
t = time(14, 30, 0)
print(t)  # 14:30:00

# datetime — both
dt = datetime(2025, 3, 15, 14, 30, 0)
print(dt)  # 2025-03-15 14:30:00

# timedelta — a duration
delta = timedelta(days=7)

In analytics workflows, datetime and timedelta carry the entire load. date shows up when comparing to columns that are pure dates in the warehouse, and time is rare outside of scheduling logic.

Pulling the current moment uses three different calls depending on what you need:

from datetime import datetime, date, timezone

now = datetime.now()             # naive local time
today = date.today()             # date only
utc_now = datetime.now(timezone.utc)  # timezone-aware UTC

The datetime.utcnow() form still exists but is deprecated in modern Python — use datetime.now(timezone.utc) for new code.

Parsing strings with strptime

Parsing strings into datetimes is the single most common operation analysts perform. CSV exports, JSON APIs, log files — everything arrives as text first.

from datetime import datetime

dt = datetime.strptime('15.03.2025', '%d.%m.%Y')
print(dt)  # 2025-03-15 00:00:00

dt = datetime.strptime('2025-03-15 14:30:00', '%Y-%m-%d %H:%M:%S')
print(dt)  # 2025-03-15 14:30:00

dt = datetime.strptime('Mar 15, 2025', '%b %d, %Y')
print(dt)  # 2025-03-15 00:00:00

The format string must exactly match the input — every separator, every padding zero. A common five-minute debugging session is strptime('15/03/2025', '%Y-%m-%d') throwing ValueError because the slashes and dashes do not line up.

The format codes you actually need to memorize:

Code Meaning Example
%Y Year (4 digits) 2025
%m Month (01-12) 03
%d Day (01-31) 15
%H Hour (00-23) 14
%M Minute (00-59) 30
%S Second (00-59) 00
%B Month name March
%b Month abbreviation Mar
%A Weekday name Saturday
%w Weekday number (0=Sun) 6
%j Day of year (001-366) 074
%W ISO week number 11

Formatting with strftime

The inverse operation — datetime to string — uses the same codes with strftime.

from datetime import datetime

dt = datetime(2025, 3, 15, 14, 30)

print(dt.strftime('%d.%m.%Y'))            # 15.03.2025
print(dt.strftime('%Y-%m-%d'))            # 2025-03-15
print(dt.strftime('%d %B %Y, %H:%M'))     # 15 March 2025, 14:30
print(dt.strftime('%Y-%m'))               # 2025-03  (monthly buckets)
print(dt.strftime('%Y-W%W'))              # 2025-W11 (weekly buckets)

The mnemonic that sticks: strftime = format (to string), strptime = parse (from string). Reverse them once on a take-home and you spend an hour debugging.

Gotcha: %W and %U use different week-numbering conventions. For ISO weeks that match what BI tools display, prefer isocalendar() over strftime('%W') — Python's %W treats Monday as week start but does not always match ISO 8601 edge cases around year boundaries.

Arithmetic with timedelta

Once dates are typed correctly, timedelta makes shifting and differencing trivial.

from datetime import datetime, timedelta

now = datetime(2025, 3, 15, 14, 0)

tomorrow = now + timedelta(days=1)
last_week = now - timedelta(weeks=1)
two_hours_ago = now - timedelta(hours=2)

start = datetime(2025, 1, 1)
end = datetime(2025, 3, 15)
diff = end - start

print(diff.days)             # 73
print(diff.total_seconds())  # 6307200.0

A timedelta accepts days, seconds, microseconds, milliseconds, minutes, hours, and weeks. Notice what is missing: no months, no years. Calendar months have variable length, so the standard library refuses to guess. For "add one month" semantics, use dateutil.relativedelta:

from dateutil.relativedelta import relativedelta
from datetime import date

d = date(2025, 1, 31) + relativedelta(months=1)
print(d)  # 2025-02-28 — relativedelta clamps to month-end

Extracting components is direct attribute access:

dt = datetime(2025, 3, 15, 14, 30, 45)

dt.year, dt.month, dt.day        # (2025, 3, 15)
dt.hour, dt.minute, dt.second    # (14, 30, 45)
dt.weekday()                      # 5  (0=Mon, 6=Sun)
dt.isoweekday()                   # 6  (1=Mon, 7=Sun)
dt.date(), dt.time()              # split back into components
Practice Python for data interviews
200+ pandas, numpy, and data-wrangling problems with explanations.
Join the waitlist

Analyst recipes

Three patterns cover most ad-hoc work.

Filtering a list of events by recency is a one-liner once timestamps are typed:

from datetime import datetime, timedelta

events = [
    {'event': 'login',    'ts': '2025-03-10 09:00:00'},
    {'event': 'purchase', 'ts': '2025-03-14 15:30:00'},
    {'event': 'login',    'ts': '2025-03-15 10:00:00'},
    {'event': 'logout',   'ts': '2025-03-16 18:00:00'},
]

cutoff = datetime.now() - timedelta(days=7)
recent = [
    e for e in events
    if datetime.strptime(e['ts'], '%Y-%m-%d %H:%M:%S') >= cutoff
]

Bucketing events by day or week with Counter:

from collections import Counter
from datetime import datetime

timestamps = [
    '2025-03-14 09:00', '2025-03-14 11:30',
    '2025-03-15 08:00', '2025-03-15 14:00', '2025-03-15 19:30',
    '2025-03-16 10:00',
]

daily = Counter(
    datetime.strptime(ts, '%Y-%m-%d %H:%M').strftime('%Y-%m-%d')
    for ts in timestamps
)
# Counter({'2025-03-15': 3, '2025-03-14': 2, '2025-03-16': 1})

weekly = Counter(
    datetime.strptime(ts, '%Y-%m-%d %H:%M').strftime('%Y-W%W')
    for ts in timestamps
)

Retention windows for D1, D7, D30 calculations:

from datetime import datetime, timedelta

signup_date = datetime(2025, 1, 15)

day_1  = signup_date + timedelta(days=1)
day_7  = signup_date + timedelta(days=7)
day_30 = signup_date + timedelta(days=30)

event_date = datetime(2025, 1, 22)
is_within_d7 = signup_date <= event_date <= day_7  # True

Datetime in pandas

For dataframes, pandas wraps everything in pd.to_datetime plus the .dt accessor, which is dramatically faster than calling strptime per row.

import pandas as pd

df = pd.DataFrame({
    'user_id':    [1, 2, 3],
    'created_at': ['2025-01-15', '2025-02-20', '2025-03-10'],
})

df['created_at'] = pd.to_datetime(df['created_at'])

df['month']   = df['created_at'].dt.month
df['weekday'] = df['created_at'].dt.day_name()
df['quarter'] = df['created_at'].dt.quarter

recent = df[df['created_at'] >= '2025-02-01']

monthly = df.groupby(df['created_at'].dt.to_period('M')).size()

pd.to_datetime is smarter than strptime — it infers the format in most cases and accepts mixed inputs with format='mixed'. For tens of millions of rows, pass the explicit format= argument anyway — inference can be 10-50x slower than parsing with a known format string.

Pattern Plain datetime pandas equivalent
Parse a column loop + strptime pd.to_datetime(col)
Extract month dt.month per row col.dt.month (vectorized)
Weekly buckets strftime('%Y-W%W') col.dt.to_period('W')
Filter by date list comprehension df[df.col >= '2025-02-01']
Add 30 days + timedelta(days=30) + pd.Timedelta(days=30)

Timezones done right

A datetime without timezone info is naive, with timezone is aware. Mixing them in a comparison raises TypeError, and silently treating naive UTC data as local time is the most common dashboard bug in the wild.

from datetime import datetime, timezone, timedelta
from zoneinfo import ZoneInfo  # Python 3.9+

utc_now = datetime.now(timezone.utc)

# Convert to New York for display
ny_now = utc_now.astimezone(ZoneInfo('America/New_York'))

# A custom offset (rarely needed if you use ZoneInfo)
pst = timezone(timedelta(hours=-8))

Sanity check: if your warehouse stores created_at in UTC and your local notebook compares it to datetime.now() (naive local), every record will be off by your timezone offset. Always anchor to UTC at the boundary.

If you drill these patterns daily, NAILDD has 500+ SQL and Python problems built around exactly this kind of date arithmetic, cohort logic, and pitfall recognition.

Common pitfalls

The most expensive mistake is confusing strftime and strptime. They take identical format codes but flow in opposite directions — strftime writes a string, strptime reads one. A junior analyst recently spent 40 minutes debugging "why is my date column suddenly strings" when the answer was a copy-pasted strftime where strptime belonged. The fix is the mnemonic: f for format-out, p for parse-in.

The second trap is forgetting timezones. datetime.now() returns naive local time, while almost every production warehouse stores timestamps in UTC. Comparing naive local to aware UTC raises TypeError; silently stripping the timezone gives wrong numbers without complaining. The fix is to call datetime.now(timezone.utc) whenever you need "right now" for comparison with warehouse data.

A third common error is trying to put months or years into a timedelta. Code like timedelta(months=1) is a TypeError because calendar months are variable length. New analysts often write a try/except around it instead of switching to dateutil.relativedelta, which handles month-end clamping and year rollover correctly.

The fourth trap is a format-string mismatch in strptime. The error message — does not match format — is clear, but the fix is fiddly because it requires reading the actual string character-by-character. Print three sample values before writing the format and double-check separators, padding, and locale-specific month names.

Finally, beware %W vs ISO week-of-year edge cases around January 1st. Two years can disagree on whether the last week of December belongs to year N or N+1. For analytics that matches what Tableau or Looker shows, use dt.isocalendar() and pull .week and .year from the named tuple — it follows ISO 8601 strictly.

FAQ

What is the difference between datetime and pandas Timestamp?

pandas.Timestamp is a subclass of datetime.datetime with nanosecond precision and extra convenience methods. Anything pd.to_datetime produces is a Timestamp, and it interoperates with stdlib datetime in most contexts. For pure scripting without pandas in the dependency tree, stick with stdlib; inside a dataframe pipeline, lean on Timestamp and the .dt accessor for vectorized speed.

How do I handle a column with mixed date formats?

Use pd.to_datetime(df['date'], format='mixed') or pd.to_datetime(df['date'], errors='coerce') to coerce unparseable values to NaT instead of raising. For raw Python without pandas, wrap several strptime attempts in a try/except chain, falling through formats from most specific to least specific. Log the unparseable values so you can fix the upstream pipeline rather than silently dropping records.

How do I work with Unix timestamps?

datetime.fromtimestamp(1710510000) converts a Unix epoch integer to a local-time datetime, and datetime.fromtimestamp(1710510000, tz=timezone.utc) gives you the timezone-aware UTC version. The inverse is dt.timestamp(). In pandas, use pd.to_datetime(series, unit='s') for seconds, unit='ms' for milliseconds — getting the unit wrong silently produces dates in the year 51,000.

What is the difference between naive and aware datetimes?

A naive datetime carries no timezone information; an aware datetime stores a tzinfo object. You cannot compare or subtract one of each — Python raises TypeError. The fix is to make everything aware: attach UTC to data coming from your warehouse with dt.replace(tzinfo=timezone.utc) (only when you know the source is UTC) and use astimezone() to shift to local time at display time.

How do I round a datetime to the nearest hour or day?

For pandas: series.dt.floor('H'), series.dt.ceil('H'), or series.dt.round('H') — same with 'D', '15min', etc. For stdlib, do the math: dt.replace(minute=0, second=0, microsecond=0) for floor-to-hour, or compute the offset and add a timedelta. Pandas is far less error-prone for this and supports arbitrary frequencies like '5min' or '2H'.

Why does dateutil.relativedelta exist if timedelta already does arithmetic?

Because calendar arithmetic is not linear. Adding "one month" to January 31st is ambiguous — relativedelta resolves it by clamping to February 28th (or 29th in leap years), while timedelta can only express fixed durations like 30 days. For business logic involving billing cycles, subscription renewals, or "next month" comparisons, always reach for relativedelta.