Reshaping Data

Learn how to reshape PMx datasets between long and wide formats safely, and understand when each layout is appropriate.

Tip

What you’ll build today: reliable patterns for reshaping PMx data between long and wide formats without losing meaning or introducing errors.

Learning Objectives

By the end of this lesson, you will be able to:

Explain the difference between long and wide data in PMx contexts.
Use pivot_longer() to reshape wide data into tidy long format.
Use pivot_wider() to create wide summaries when appropriate.
Avoid common reshaping mistakes that break PMx assumptions.

Setup

library(tidyverse)

Key Ideas

In PMx, long format is the default:

one row per event (dose or observation)
explicit ID and TIME
compatible with NONMEM, nlmixr2, and rxode2

Wide format is sometimes useful:

covariate tables
summary reports
exploratory plots

Warning

Reshaping is not just cosmetic. A bad pivot can silently change the meaning of your data.

Example 1: Wide concentration data

Suppose you receive concentration data like this:

pk_wide <- tibble::tribble(
  ~ID, ~TIME, ~DV_TRT, ~DV_CTRL,
    1,  0.5,     2.1,      NA,
    1,  1.0,     3.8,      NA,
    2,  0.5,      NA,     1.6,
    2,  1.0,      NA,     2.9
)

pk_wide

# A tibble: 4 × 4
     ID  TIME DV_TRT DV_CTRL
  <dbl> <dbl>  <dbl>   <dbl>
1     1   0.5    2.1    NA  
2     1   1      3.8    NA  
3     2   0.5   NA       1.6
4     2   1     NA       2.9

Each subject belongs to one treatment arm, but the data arrive wide.

Worked Example 1: pivot_longer()

Convert to long format:

pk_long <- pk_wide %>%
  pivot_longer(
    cols = starts_with("DV_"),
    names_to = "ARM",
    values_to = "DV"
  )

pk_long

# A tibble: 8 × 4
     ID  TIME ARM        DV
  <dbl> <dbl> <chr>   <dbl>
1     1   0.5 DV_TRT    2.1
2     1   0.5 DV_CTRL  NA  
3     1   1   DV_TRT    3.8
4     1   1   DV_CTRL  NA  
5     2   0.5 DV_TRT   NA  
6     2   0.5 DV_CTRL   1.6
7     2   1   DV_TRT   NA  
8     2   1   DV_CTRL   2.9

Clean up the arm labels and remove missing rows:

pk_long <- pk_long %>%
  mutate(ARM = str_remove(ARM, "DV_")) %>%
  filter(!is.na(DV))

pk_long

# A tibble: 4 × 4
     ID  TIME ARM      DV
  <dbl> <dbl> <chr> <dbl>
1     1   0.5 TRT     2.1
2     1   1   TRT     3.8
3     2   0.5 CTRL    1.6
4     2   1   CTRL    2.9

Example 2: Covariates stored wide by visit

cov_wide <- tibble::tribble(
  ~ID, ~WT_BASE, ~WT_WK4, ~WT_WK8,
    1,     72,       71,       70,
    2,     88,       87,       86
)

cov_wide

# A tibble: 2 × 4
     ID WT_BASE WT_WK4 WT_WK8
  <dbl>   <dbl>  <dbl>  <dbl>
1     1      72     71     70
2     2      88     87     86

Worked Example 2: pivot_longer() with names_pattern

cov_long <- cov_wide %>%
  pivot_longer(
    cols = starts_with("WT_"),
    names_to = c("VAR", "VISIT"),
    names_pattern = "(WT)_(.*)",
    values_to = "WT"
  )

cov_long

# A tibble: 6 × 4
     ID VAR   VISIT    WT
  <dbl> <chr> <chr> <dbl>
1     1 WT    BASE     72
2     1 WT    WK4      71
3     1 WT    WK8      70
4     2 WT    BASE     88
5     2 WT    WK4      87
6     2 WT    WK8      86

Add visit times (example mapping):

cov_long <- cov_long %>%
  mutate(
    TIME = case_when(
      VISIT == "BASE" ~ 0,
      VISIT == "WK4"  ~ 4,
      VISIT == "WK8"  ~ 8,
      TRUE ~ NA_real_
    )
  )

cov_long

# A tibble: 6 × 5
     ID VAR   VISIT    WT  TIME
  <dbl> <chr> <chr> <dbl> <dbl>
1     1 WT    BASE     72     0
2     1 WT    WK4      71     4
3     1 WT    WK8      70     8
4     2 WT    BASE     88     0
5     2 WT    WK4      87     4
6     2 WT    WK8      86     8

Worked Example 3: pivot_wider() for summaries

Create a wide summary table:

pk_long %>%
  group_by(ARM) %>%
  summarise(
    mean_DV = mean(DV),
    sd_DV   = sd(DV),
    .groups = "drop"
  ) %>%
  pivot_wider(
    names_from = ARM,
    values_from = c(mean_DV, sd_DV)
  )

# A tibble: 1 × 4
  mean_DV_CTRL mean_DV_TRT sd_DV_CTRL sd_DV_TRT
         <dbl>       <dbl>      <dbl>     <dbl>
1         2.25        2.95      0.919      1.20

This is useful for reporting—but not for modeling.

Strategies

Keep event-level PMx data long.
Pivot covariates and summaries, not event histories.
Always inspect row counts after pivoting.
Verify that (ID, TIME) pairs remain meaningful.
Prefer explicit names over implicit reshaping.

Common Mistakes

Treating reshaping as purely cosmetic.
Mixing dose and observation rows during a pivot.
Forgetting to remove NA values after pivot_longer().
Breaking (ID, TIME) uniqueness.
Creating multiple DV columns unintentionally.
Using pivot_wider() on event-level data.
Not checking row counts before and after reshaping.
Misparsing column names without names_pattern.
Leaving labels like "DV_TRT" uncleaned.
Using wide format for modeling workflows.

Practice Problems

Convert a wide DV table into long format using pivot_longer().
Remove rows with missing DV values.
Extract treatment arm labels cleanly.
Create a wide summary table of mean DV by arm.
Explain why wide format is inappropriate for NLME modeling.

Step-by-Step Solutions

pk_wide %>%
  pivot_longer(
    cols = starts_with("DV_"),
    names_to = "ARM",
    values_to = "DV"
  ) %>%
  mutate(ARM = str_remove(ARM, "DV_")) %>%
  filter(!is.na(DV))

# A tibble: 4 × 4
     ID  TIME ARM      DV
  <dbl> <dbl> <chr> <dbl>
1     1   0.5 TRT     2.1
2     1   1   TRT     3.8
3     2   0.5 CTRL    1.6
4     2   1   CTRL    2.9

Summary

In this lesson you learned how to:

reshape data safely using pivot_longer() and pivot_wider()
recognize when long vs wide format is appropriate in PMx
avoid reshaping mistakes that break model assumptions

Reshaping is powerful—but in PMx, long format is king.

Quick Tips

Default to long format for modeling.
Pivot wide only for covariates or summaries.
Always check row counts and keys after reshaping.
If a pivot feels confusing, stop and re-check intent.