Reshaping Data

Learn how to reshape PMx datasets between long and wide formats safely, and understand when each layout is appropriate.
Tip

What you’ll build today: reliable patterns for reshaping PMx data between long and wide formats without losing meaning or introducing errors.

Learning Objectives

By the end of this lesson, you will be able to:

  • Explain the difference between long and wide data in PMx contexts.
  • Use pivot_longer() to reshape wide data into tidy long format.
  • Use pivot_wider() to create wide summaries when appropriate.
  • Avoid common reshaping mistakes that break PMx assumptions.

Setup

library(tidyverse)

Key Ideas

In PMx, long format is the default:

  • one row per event (dose or observation)
  • explicit ID and TIME
  • compatible with NONMEM, nlmixr2, and rxode2

Wide format is sometimes useful:

  • covariate tables
  • summary reports
  • exploratory plots
Warning

Reshaping is not just cosmetic. A bad pivot can silently change the meaning of your data.


Example 1: Wide concentration data

Suppose you receive concentration data like this:

pk_wide <- tibble::tribble(
  ~ID, ~TIME, ~DV_TRT, ~DV_CTRL,
    1,  0.5,     2.1,      NA,
    1,  1.0,     3.8,      NA,
    2,  0.5,      NA,     1.6,
    2,  1.0,      NA,     2.9
)

pk_wide
# A tibble: 4 × 4
     ID  TIME DV_TRT DV_CTRL
  <dbl> <dbl>  <dbl>   <dbl>
1     1   0.5    2.1    NA  
2     1   1      3.8    NA  
3     2   0.5   NA       1.6
4     2   1     NA       2.9

Each subject belongs to one treatment arm, but the data arrive wide.


Worked Example 1: pivot_longer()

Convert to long format:

pk_long <- pk_wide %>%
  pivot_longer(
    cols = starts_with("DV_"),
    names_to = "ARM",
    values_to = "DV"
  )

pk_long
# A tibble: 8 × 4
     ID  TIME ARM        DV
  <dbl> <dbl> <chr>   <dbl>
1     1   0.5 DV_TRT    2.1
2     1   0.5 DV_CTRL  NA  
3     1   1   DV_TRT    3.8
4     1   1   DV_CTRL  NA  
5     2   0.5 DV_TRT   NA  
6     2   0.5 DV_CTRL   1.6
7     2   1   DV_TRT   NA  
8     2   1   DV_CTRL   2.9

Clean up the arm labels and remove missing rows:

pk_long <- pk_long %>%
  mutate(ARM = str_remove(ARM, "DV_")) %>%
  filter(!is.na(DV))

pk_long
# A tibble: 4 × 4
     ID  TIME ARM      DV
  <dbl> <dbl> <chr> <dbl>
1     1   0.5 TRT     2.1
2     1   1   TRT     3.8
3     2   0.5 CTRL    1.6
4     2   1   CTRL    2.9

Example 2: Covariates stored wide by visit

cov_wide <- tibble::tribble(
  ~ID, ~WT_BASE, ~WT_WK4, ~WT_WK8,
    1,     72,       71,       70,
    2,     88,       87,       86
)

cov_wide
# A tibble: 2 × 4
     ID WT_BASE WT_WK4 WT_WK8
  <dbl>   <dbl>  <dbl>  <dbl>
1     1      72     71     70
2     2      88     87     86

Worked Example 2: pivot_longer() with names_pattern

cov_long <- cov_wide %>%
  pivot_longer(
    cols = starts_with("WT_"),
    names_to = c("VAR", "VISIT"),
    names_pattern = "(WT)_(.*)",
    values_to = "WT"
  )

cov_long
# A tibble: 6 × 4
     ID VAR   VISIT    WT
  <dbl> <chr> <chr> <dbl>
1     1 WT    BASE     72
2     1 WT    WK4      71
3     1 WT    WK8      70
4     2 WT    BASE     88
5     2 WT    WK4      87
6     2 WT    WK8      86

Add visit times (example mapping):

cov_long <- cov_long %>%
  mutate(
    TIME = case_when(
      VISIT == "BASE" ~ 0,
      VISIT == "WK4"  ~ 4,
      VISIT == "WK8"  ~ 8,
      TRUE ~ NA_real_
    )
  )

cov_long
# A tibble: 6 × 5
     ID VAR   VISIT    WT  TIME
  <dbl> <chr> <chr> <dbl> <dbl>
1     1 WT    BASE     72     0
2     1 WT    WK4      71     4
3     1 WT    WK8      70     8
4     2 WT    BASE     88     0
5     2 WT    WK4      87     4
6     2 WT    WK8      86     8

Worked Example 3: pivot_wider() for summaries

Create a wide summary table:

pk_long %>%
  group_by(ARM) %>%
  summarise(
    mean_DV = mean(DV),
    sd_DV   = sd(DV),
    .groups = "drop"
  ) %>%
  pivot_wider(
    names_from = ARM,
    values_from = c(mean_DV, sd_DV)
  )
# A tibble: 1 × 4
  mean_DV_CTRL mean_DV_TRT sd_DV_CTRL sd_DV_TRT
         <dbl>       <dbl>      <dbl>     <dbl>
1         2.25        2.95      0.919      1.20

This is useful for reporting—but not for modeling.


Strategies

  • Keep event-level PMx data long.
  • Pivot covariates and summaries, not event histories.
  • Always inspect row counts after pivoting.
  • Verify that (ID, TIME) pairs remain meaningful.
  • Prefer explicit names over implicit reshaping.

Common Mistakes

  • Treating reshaping as purely cosmetic.
  • Mixing dose and observation rows during a pivot.
  • Forgetting to remove NA values after pivot_longer().
  • Breaking (ID, TIME) uniqueness.
  • Creating multiple DV columns unintentionally.
  • Using pivot_wider() on event-level data.
  • Not checking row counts before and after reshaping.
  • Misparsing column names without names_pattern.
  • Leaving labels like "DV_TRT" uncleaned.
  • Using wide format for modeling workflows.

Practice Problems

  1. Convert a wide DV table into long format using pivot_longer().
  2. Remove rows with missing DV values.
  3. Extract treatment arm labels cleanly.
  4. Create a wide summary table of mean DV by arm.
  5. Explain why wide format is inappropriate for NLME modeling.

pk_wide %>%
  pivot_longer(
    cols = starts_with("DV_"),
    names_to = "ARM",
    values_to = "DV"
  ) %>%
  mutate(ARM = str_remove(ARM, "DV_")) %>%
  filter(!is.na(DV))
# A tibble: 4 × 4
     ID  TIME ARM      DV
  <dbl> <dbl> <chr> <dbl>
1     1   0.5 TRT     2.1
2     1   1   TRT     3.8
3     2   0.5 CTRL    1.6
4     2   1   CTRL    2.9

Summary

In this lesson you learned how to:

  • reshape data safely using pivot_longer() and pivot_wider()
  • recognize when long vs wide format is appropriate in PMx
  • avoid reshaping mistakes that break model assumptions

Reshaping is powerful—but in PMx, long format is king.


  • Default to long format for modeling.
  • Pivot wide only for covariates or summaries.
  • Always check row counts and keys after reshaping.
  • If a pivot feels confusing, stop and re-check intent.