Working with Dates and Times

Parse, clean, and compute time differences using lubridate for PMx-ready workflows.

Tip

What you’ll build today: safe patterns for parsing dates, computing time differences, and preparing time variables for modeling.

Learning Objectives

By the end of this lesson, you will be able to:

Parse dates using ymd(), mdy(), and dmy().
Parse date-times using ymd_hms().
Handle mixed date formats safely when they appear in real datasets.
Compute time differences safely.
Convert character timestamps into modeling-ready time variables.
Recognize common time-related data issues.

Setup

library(tidyverse)
library(lubridate)

Example Dataset

df_time <- tibble::tribble(
  ~ID, ~DOSE_DATE,    ~SAMPLE_TIME,
    1, "2023-01-01",  "2023-01-01 00:30:00",
    2, "01/02/2023",  "2023-01-02 01:00:00"
)

df_time

# A tibble: 2 × 3
     ID DOSE_DATE  SAMPLE_TIME        
  <dbl> <chr>      <chr>              
1     1 2023-01-01 2023-01-01 00:30:00
2     2 01/02/2023 2023-01-02 01:00:00

Key Ideas

Date and time variables often:

arrive as character strings
use inconsistent formats
require conversion before subtraction
cause silent errors if left unparsed

In PMx work, date/time issues are common because data may come from:

EDC exports
lab systems
multiple sites (with different date conventions)
manual data entry

Worked Example 1: Parsing Dates (and detecting mixed formats)

If you try ymd() on the full column:

df_time %>%
  mutate(DOSE_DATE_parsed = ymd(DOSE_DATE))

# A tibble: 2 × 4
     ID DOSE_DATE  SAMPLE_TIME         DOSE_DATE_parsed
  <dbl> <chr>      <chr>               <date>          
1     1 2023-01-01 2023-01-01 00:30:00 2023-01-01      
2     2 01/02/2023 2023-01-02 01:00:00 NA

Notice that the second row becomes NA.
That’s because ymd() expects YYYY-MM-DD, but row 2 is MM/DD/YYYY.

A robust fix: parse mixed formats explicitly

parse_date_time() can try multiple formats in order:

df_time_fixed <- df_time %>%
  mutate(
    DOSE_DATE_parsed = parse_date_time(DOSE_DATE, orders = c("ymd", "mdy"))
  )

df_time_fixed

# A tibble: 2 × 4
     ID DOSE_DATE  SAMPLE_TIME         DOSE_DATE_parsed   
  <dbl> <chr>      <chr>               <dttm>             
1     1 2023-01-01 2023-01-01 00:30:00 2023-01-01 00:00:00
2     2 01/02/2023 2023-01-02 01:00:00 2023-01-02 00:00:00

Worked Example 2: Parsing Date-Times

Now parse the sample timestamp:

df_time_parsed <- df_time_fixed %>%
  mutate(
    SAMPLE_DT = ymd_hms(SAMPLE_TIME)
  )

df_time_parsed

# A tibble: 2 × 5
     ID DOSE_DATE  SAMPLE_TIME         DOSE_DATE_parsed    SAMPLE_DT          
  <dbl> <chr>      <chr>               <dttm>              <dttm>             
1     1 2023-01-01 2023-01-01 00:30:00 2023-01-01 00:00:00 2023-01-01 00:30:00
2     2 01/02/2023 2023-01-02 01:00:00 2023-01-02 00:00:00 2023-01-02 01:00:00

Worked Example 3: Calculating Elapsed Time

Compute elapsed time (in hours) from dose date to sample datetime:

df_time_parsed %>%
  mutate(
    elapsed_hours = as.numeric(
      difftime(SAMPLE_DT, DOSE_DATE_parsed, units = "hours")
    )
  ) %>%
  select(ID, DOSE_DATE_parsed, SAMPLE_DT, elapsed_hours)

# A tibble: 2 × 4
     ID DOSE_DATE_parsed    SAMPLE_DT           elapsed_hours
  <dbl> <dttm>              <dttm>                      <dbl>
1     1 2023-01-01 00:00:00 2023-01-01 00:30:00           0.5
2     2 2023-01-02 00:00:00 2023-01-02 01:00:00           1

Note

In real PMx datasets, dose time is often a datetime (not just a date).
Here we’re using a date for simplicity.

Strategies

Parse dates immediately after import.
Use ymd(), mdy(), or dmy() intentionally based on format.
If formats are mixed, treat that as a QC signal and parse explicitly.
Compute elapsed time using parsed datetime objects.
Always verify units after time differences.

Warning

If a date parser returns NA, don’t “work around it.”
Treat it as a data quality issue and fix the format or parsing logic.

Common Mistakes

Forgetting to parse before subtracting.
Using the wrong parser (mdy() vs ymd()).
Assuming time differences are in hours without checking.
Allowing NA parses to persist into downstream steps.
Mixing time zones unintentionally.

Practice Problems

Parse DOSE_DATE so both rows produce valid dates.
Convert SAMPLE_TIME to a datetime.
Compute elapsed time in hours.
Identify rows where elapsed time is negative (if any).
Explain why ymd() returns NA for "01/02/2023".

Step-by-Step Solutions

df_time %>%
  mutate(
    DOSE_DATE_parsed = parse_date_time(DOSE_DATE, orders = c("ymd", "mdy")),
    SAMPLE_DT = ymd_hms(SAMPLE_TIME),
    elapsed_hours = as.numeric(difftime(SAMPLE_DT, DOSE_DATE_parsed, units = "hours"))
  )

# A tibble: 2 × 6
     ID DOSE_DATE  SAMPLE_TIME         DOSE_DATE_parsed    SAMPLE_DT          
  <dbl> <chr>      <chr>               <dttm>              <dttm>             
1     1 2023-01-01 2023-01-01 00:30:00 2023-01-01 00:00:00 2023-01-01 00:30:00
2     2 01/02/2023 2023-01-02 01:00:00 2023-01-02 00:00:00 2023-01-02 01:00:00
# ℹ 1 more variable: elapsed_hours <dbl>

Summary

You now know how to:

Parse date and datetime variables safely.
Handle mixed date formats using parse_date_time().
Compute elapsed time correctly.
Detect time inconsistencies early.
Prepare time variables for modeling.

Time handling errors are subtle. Parsing early prevents downstream modeling issues.

Quick Tips

Parse dates immediately after import.
Choose the parser intentionally (ymd(), mdy(), dmy()).
If parsing creates NA, treat it as a QC signal.
Always verify time units when using difftime().
Never subtract raw character timestamps.