Working with Dates and Times

Parse, clean, and compute time differences using lubridate for PMx-ready workflows.
Tip

What you’ll build today: safe patterns for parsing dates, computing time differences, and preparing time variables for modeling.

Learning Objectives

By the end of this lesson, you will be able to:

  • Parse dates using ymd(), mdy(), and dmy().
  • Parse date-times using ymd_hms().
  • Handle mixed date formats safely when they appear in real datasets.
  • Compute time differences safely.
  • Convert character timestamps into modeling-ready time variables.
  • Recognize common time-related data issues.

Setup

library(tidyverse)
library(lubridate)

Example Dataset

df_time <- tibble::tribble(
  ~ID, ~DOSE_DATE,    ~SAMPLE_TIME,
    1, "2023-01-01",  "2023-01-01 00:30:00",
    2, "01/02/2023",  "2023-01-02 01:00:00"
)

df_time
# A tibble: 2 × 3
     ID DOSE_DATE  SAMPLE_TIME        
  <dbl> <chr>      <chr>              
1     1 2023-01-01 2023-01-01 00:30:00
2     2 01/02/2023 2023-01-02 01:00:00

Key Ideas

Date and time variables often:

  • arrive as character strings
  • use inconsistent formats
  • require conversion before subtraction
  • cause silent errors if left unparsed

In PMx work, date/time issues are common because data may come from:

  • EDC exports
  • lab systems
  • multiple sites (with different date conventions)
  • manual data entry

Worked Example 1: Parsing Dates (and detecting mixed formats)

If you try ymd() on the full column:

df_time %>%
  mutate(DOSE_DATE_parsed = ymd(DOSE_DATE))
# A tibble: 2 × 4
     ID DOSE_DATE  SAMPLE_TIME         DOSE_DATE_parsed
  <dbl> <chr>      <chr>               <date>          
1     1 2023-01-01 2023-01-01 00:30:00 2023-01-01      
2     2 01/02/2023 2023-01-02 01:00:00 NA              

Notice that the second row becomes NA.
That’s because ymd() expects YYYY-MM-DD, but row 2 is MM/DD/YYYY.

A robust fix: parse mixed formats explicitly

parse_date_time() can try multiple formats in order:

df_time_fixed <- df_time %>%
  mutate(
    DOSE_DATE_parsed = parse_date_time(DOSE_DATE, orders = c("ymd", "mdy"))
  )

df_time_fixed
# A tibble: 2 × 4
     ID DOSE_DATE  SAMPLE_TIME         DOSE_DATE_parsed   
  <dbl> <chr>      <chr>               <dttm>             
1     1 2023-01-01 2023-01-01 00:30:00 2023-01-01 00:00:00
2     2 01/02/2023 2023-01-02 01:00:00 2023-01-02 00:00:00

Worked Example 2: Parsing Date-Times

Now parse the sample timestamp:

df_time_parsed <- df_time_fixed %>%
  mutate(
    SAMPLE_DT = ymd_hms(SAMPLE_TIME)
  )

df_time_parsed
# A tibble: 2 × 5
     ID DOSE_DATE  SAMPLE_TIME         DOSE_DATE_parsed    SAMPLE_DT          
  <dbl> <chr>      <chr>               <dttm>              <dttm>             
1     1 2023-01-01 2023-01-01 00:30:00 2023-01-01 00:00:00 2023-01-01 00:30:00
2     2 01/02/2023 2023-01-02 01:00:00 2023-01-02 00:00:00 2023-01-02 01:00:00

Worked Example 3: Calculating Elapsed Time

Compute elapsed time (in hours) from dose date to sample datetime:

df_time_parsed %>%
  mutate(
    elapsed_hours = as.numeric(
      difftime(SAMPLE_DT, DOSE_DATE_parsed, units = "hours")
    )
  ) %>%
  select(ID, DOSE_DATE_parsed, SAMPLE_DT, elapsed_hours)
# A tibble: 2 × 4
     ID DOSE_DATE_parsed    SAMPLE_DT           elapsed_hours
  <dbl> <dttm>              <dttm>                      <dbl>
1     1 2023-01-01 00:00:00 2023-01-01 00:30:00           0.5
2     2 2023-01-02 00:00:00 2023-01-02 01:00:00           1  
Note

In real PMx datasets, dose time is often a datetime (not just a date).
Here we’re using a date for simplicity.


Strategies

  • Parse dates immediately after import.
  • Use ymd(), mdy(), or dmy() intentionally based on format.
  • If formats are mixed, treat that as a QC signal and parse explicitly.
  • Compute elapsed time using parsed datetime objects.
  • Always verify units after time differences.
Warning

If a date parser returns NA, don’t “work around it.”
Treat it as a data quality issue and fix the format or parsing logic.


Common Mistakes

  • Forgetting to parse before subtracting.
  • Using the wrong parser (mdy() vs ymd()).
  • Assuming time differences are in hours without checking.
  • Allowing NA parses to persist into downstream steps.
  • Mixing time zones unintentionally.

Practice Problems

  1. Parse DOSE_DATE so both rows produce valid dates.
  2. Convert SAMPLE_TIME to a datetime.
  3. Compute elapsed time in hours.
  4. Identify rows where elapsed time is negative (if any).
  5. Explain why ymd() returns NA for "01/02/2023".

df_time %>%
  mutate(
    DOSE_DATE_parsed = parse_date_time(DOSE_DATE, orders = c("ymd", "mdy")),
    SAMPLE_DT = ymd_hms(SAMPLE_TIME),
    elapsed_hours = as.numeric(difftime(SAMPLE_DT, DOSE_DATE_parsed, units = "hours"))
  )
# A tibble: 2 × 6
     ID DOSE_DATE  SAMPLE_TIME         DOSE_DATE_parsed    SAMPLE_DT          
  <dbl> <chr>      <chr>               <dttm>              <dttm>             
1     1 2023-01-01 2023-01-01 00:30:00 2023-01-01 00:00:00 2023-01-01 00:30:00
2     2 01/02/2023 2023-01-02 01:00:00 2023-01-02 00:00:00 2023-01-02 01:00:00
# ℹ 1 more variable: elapsed_hours <dbl>

Summary

You now know how to:

  • Parse date and datetime variables safely.
  • Handle mixed date formats using parse_date_time().
  • Compute elapsed time correctly.
  • Detect time inconsistencies early.
  • Prepare time variables for modeling.

Time handling errors are subtle. Parsing early prevents downstream modeling issues.


  • Parse dates immediately after import.
  • Choose the parser intentionally (ymd(), mdy(), dmy()).
  • If parsing creates NA, treat it as a QC signal.
  • Always verify time units when using difftime().
  • Never subtract raw character timestamps.