library(tidyverse)
library(lubridate)Working with Dates and Times
What you’ll build today: safe patterns for parsing dates, computing time differences, and preparing time variables for modeling.
Learning Objectives
By the end of this lesson, you will be able to:
- Parse dates using
ymd(),mdy(), anddmy(). - Parse date-times using
ymd_hms(). - Handle mixed date formats safely when they appear in real datasets.
- Compute time differences safely.
- Convert character timestamps into modeling-ready time variables.
- Recognize common time-related data issues.
Setup
Example Dataset
df_time <- tibble::tribble(
~ID, ~DOSE_DATE, ~SAMPLE_TIME,
1, "2023-01-01", "2023-01-01 00:30:00",
2, "01/02/2023", "2023-01-02 01:00:00"
)
df_time# A tibble: 2 × 3
ID DOSE_DATE SAMPLE_TIME
<dbl> <chr> <chr>
1 1 2023-01-01 2023-01-01 00:30:00
2 2 01/02/2023 2023-01-02 01:00:00
Key Ideas
Date and time variables often:
- arrive as character strings
- use inconsistent formats
- require conversion before subtraction
- cause silent errors if left unparsed
In PMx work, date/time issues are common because data may come from:
- EDC exports
- lab systems
- multiple sites (with different date conventions)
- manual data entry
Worked Example 1: Parsing Dates (and detecting mixed formats)
If you try ymd() on the full column:
df_time %>%
mutate(DOSE_DATE_parsed = ymd(DOSE_DATE))# A tibble: 2 × 4
ID DOSE_DATE SAMPLE_TIME DOSE_DATE_parsed
<dbl> <chr> <chr> <date>
1 1 2023-01-01 2023-01-01 00:30:00 2023-01-01
2 2 01/02/2023 2023-01-02 01:00:00 NA
Notice that the second row becomes NA.
That’s because ymd() expects YYYY-MM-DD, but row 2 is MM/DD/YYYY.
A robust fix: parse mixed formats explicitly
parse_date_time() can try multiple formats in order:
df_time_fixed <- df_time %>%
mutate(
DOSE_DATE_parsed = parse_date_time(DOSE_DATE, orders = c("ymd", "mdy"))
)
df_time_fixed# A tibble: 2 × 4
ID DOSE_DATE SAMPLE_TIME DOSE_DATE_parsed
<dbl> <chr> <chr> <dttm>
1 1 2023-01-01 2023-01-01 00:30:00 2023-01-01 00:00:00
2 2 01/02/2023 2023-01-02 01:00:00 2023-01-02 00:00:00
Worked Example 2: Parsing Date-Times
Now parse the sample timestamp:
df_time_parsed <- df_time_fixed %>%
mutate(
SAMPLE_DT = ymd_hms(SAMPLE_TIME)
)
df_time_parsed# A tibble: 2 × 5
ID DOSE_DATE SAMPLE_TIME DOSE_DATE_parsed SAMPLE_DT
<dbl> <chr> <chr> <dttm> <dttm>
1 1 2023-01-01 2023-01-01 00:30:00 2023-01-01 00:00:00 2023-01-01 00:30:00
2 2 01/02/2023 2023-01-02 01:00:00 2023-01-02 00:00:00 2023-01-02 01:00:00
Worked Example 3: Calculating Elapsed Time
Compute elapsed time (in hours) from dose date to sample datetime:
df_time_parsed %>%
mutate(
elapsed_hours = as.numeric(
difftime(SAMPLE_DT, DOSE_DATE_parsed, units = "hours")
)
) %>%
select(ID, DOSE_DATE_parsed, SAMPLE_DT, elapsed_hours)# A tibble: 2 × 4
ID DOSE_DATE_parsed SAMPLE_DT elapsed_hours
<dbl> <dttm> <dttm> <dbl>
1 1 2023-01-01 00:00:00 2023-01-01 00:30:00 0.5
2 2 2023-01-02 00:00:00 2023-01-02 01:00:00 1
In real PMx datasets, dose time is often a datetime (not just a date).
Here we’re using a date for simplicity.
Strategies
- Parse dates immediately after import.
- Use
ymd(),mdy(), ordmy()intentionally based on format. - If formats are mixed, treat that as a QC signal and parse explicitly.
- Compute elapsed time using parsed datetime objects.
- Always verify units after time differences.
If a date parser returns NA, don’t “work around it.”
Treat it as a data quality issue and fix the format or parsing logic.
Common Mistakes
- Forgetting to parse before subtracting.
- Using the wrong parser (
mdy()vsymd()). - Assuming time differences are in hours without checking.
- Allowing
NAparses to persist into downstream steps. - Mixing time zones unintentionally.
Practice Problems
- Parse
DOSE_DATEso both rows produce valid dates. - Convert
SAMPLE_TIMEto a datetime. - Compute elapsed time in hours.
- Identify rows where elapsed time is negative (if any).
- Explain why
ymd()returnsNAfor"01/02/2023".
df_time %>%
mutate(
DOSE_DATE_parsed = parse_date_time(DOSE_DATE, orders = c("ymd", "mdy")),
SAMPLE_DT = ymd_hms(SAMPLE_TIME),
elapsed_hours = as.numeric(difftime(SAMPLE_DT, DOSE_DATE_parsed, units = "hours"))
)# A tibble: 2 × 6
ID DOSE_DATE SAMPLE_TIME DOSE_DATE_parsed SAMPLE_DT
<dbl> <chr> <chr> <dttm> <dttm>
1 1 2023-01-01 2023-01-01 00:30:00 2023-01-01 00:00:00 2023-01-01 00:30:00
2 2 01/02/2023 2023-01-02 01:00:00 2023-01-02 00:00:00 2023-01-02 01:00:00
# ℹ 1 more variable: elapsed_hours <dbl>
Summary
You now know how to:
- Parse date and datetime variables safely.
- Handle mixed date formats using
parse_date_time(). - Compute elapsed time correctly.
- Detect time inconsistencies early.
- Prepare time variables for modeling.
Time handling errors are subtle. Parsing early prevents downstream modeling issues.
- Parse dates immediately after import.
- Choose the parser intentionally (
ymd(),mdy(),dmy()). - If parsing creates
NA, treat it as a QC signal. - Always verify time units when using
difftime(). - Never subtract raw character timestamps.