library(tidyverse)Reshaping Data
Learn how to reshape PMx datasets between long and wide formats safely, and understand when each layout is appropriate.
Tip
What you’ll build today: reliable patterns for reshaping PMx data between long and wide formats without losing meaning or introducing errors.
Learning Objectives
By the end of this lesson, you will be able to:
- Explain the difference between long and wide data in PMx contexts.
- Use
pivot_longer()to reshape wide data into tidy long format. - Use
pivot_wider()to create wide summaries when appropriate. - Avoid common reshaping mistakes that break PMx assumptions.
Setup
Key Ideas
In PMx, long format is the default:
- one row per event (dose or observation)
- explicit
IDandTIME - compatible with NONMEM, nlmixr2, and rxode2
Wide format is sometimes useful:
- covariate tables
- summary reports
- exploratory plots
Warning
Reshaping is not just cosmetic. A bad pivot can silently change the meaning of your data.
Example 1: Wide concentration data
Suppose you receive concentration data like this:
pk_wide <- tibble::tribble(
~ID, ~TIME, ~DV_TRT, ~DV_CTRL,
1, 0.5, 2.1, NA,
1, 1.0, 3.8, NA,
2, 0.5, NA, 1.6,
2, 1.0, NA, 2.9
)
pk_wide# A tibble: 4 × 4
ID TIME DV_TRT DV_CTRL
<dbl> <dbl> <dbl> <dbl>
1 1 0.5 2.1 NA
2 1 1 3.8 NA
3 2 0.5 NA 1.6
4 2 1 NA 2.9
Each subject belongs to one treatment arm, but the data arrive wide.
Worked Example 1: pivot_longer()
Convert to long format:
pk_long <- pk_wide %>%
pivot_longer(
cols = starts_with("DV_"),
names_to = "ARM",
values_to = "DV"
)
pk_long# A tibble: 8 × 4
ID TIME ARM DV
<dbl> <dbl> <chr> <dbl>
1 1 0.5 DV_TRT 2.1
2 1 0.5 DV_CTRL NA
3 1 1 DV_TRT 3.8
4 1 1 DV_CTRL NA
5 2 0.5 DV_TRT NA
6 2 0.5 DV_CTRL 1.6
7 2 1 DV_TRT NA
8 2 1 DV_CTRL 2.9
Clean up the arm labels and remove missing rows:
pk_long <- pk_long %>%
mutate(ARM = str_remove(ARM, "DV_")) %>%
filter(!is.na(DV))
pk_long# A tibble: 4 × 4
ID TIME ARM DV
<dbl> <dbl> <chr> <dbl>
1 1 0.5 TRT 2.1
2 1 1 TRT 3.8
3 2 0.5 CTRL 1.6
4 2 1 CTRL 2.9
Example 2: Covariates stored wide by visit
cov_wide <- tibble::tribble(
~ID, ~WT_BASE, ~WT_WK4, ~WT_WK8,
1, 72, 71, 70,
2, 88, 87, 86
)
cov_wide# A tibble: 2 × 4
ID WT_BASE WT_WK4 WT_WK8
<dbl> <dbl> <dbl> <dbl>
1 1 72 71 70
2 2 88 87 86
Worked Example 2: pivot_longer() with names_pattern
cov_long <- cov_wide %>%
pivot_longer(
cols = starts_with("WT_"),
names_to = c("VAR", "VISIT"),
names_pattern = "(WT)_(.*)",
values_to = "WT"
)
cov_long# A tibble: 6 × 4
ID VAR VISIT WT
<dbl> <chr> <chr> <dbl>
1 1 WT BASE 72
2 1 WT WK4 71
3 1 WT WK8 70
4 2 WT BASE 88
5 2 WT WK4 87
6 2 WT WK8 86
Add visit times (example mapping):
cov_long <- cov_long %>%
mutate(
TIME = case_when(
VISIT == "BASE" ~ 0,
VISIT == "WK4" ~ 4,
VISIT == "WK8" ~ 8,
TRUE ~ NA_real_
)
)
cov_long# A tibble: 6 × 5
ID VAR VISIT WT TIME
<dbl> <chr> <chr> <dbl> <dbl>
1 1 WT BASE 72 0
2 1 WT WK4 71 4
3 1 WT WK8 70 8
4 2 WT BASE 88 0
5 2 WT WK4 87 4
6 2 WT WK8 86 8
Worked Example 3: pivot_wider() for summaries
Create a wide summary table:
pk_long %>%
group_by(ARM) %>%
summarise(
mean_DV = mean(DV),
sd_DV = sd(DV),
.groups = "drop"
) %>%
pivot_wider(
names_from = ARM,
values_from = c(mean_DV, sd_DV)
)# A tibble: 1 × 4
mean_DV_CTRL mean_DV_TRT sd_DV_CTRL sd_DV_TRT
<dbl> <dbl> <dbl> <dbl>
1 2.25 2.95 0.919 1.20
This is useful for reporting—but not for modeling.
Strategies
- Keep event-level PMx data long.
- Pivot covariates and summaries, not event histories.
- Always inspect row counts after pivoting.
- Verify that
(ID, TIME)pairs remain meaningful. - Prefer explicit names over implicit reshaping.
Common Mistakes
- Treating reshaping as purely cosmetic.
- Mixing dose and observation rows during a pivot.
- Forgetting to remove
NAvalues afterpivot_longer(). - Breaking
(ID, TIME)uniqueness. - Creating multiple DV columns unintentionally.
- Using
pivot_wider()on event-level data. - Not checking row counts before and after reshaping.
- Misparsing column names without
names_pattern. - Leaving labels like
"DV_TRT"uncleaned. - Using wide format for modeling workflows.
Practice Problems
- Convert a wide DV table into long format using
pivot_longer(). - Remove rows with missing DV values.
- Extract treatment arm labels cleanly.
- Create a wide summary table of mean DV by arm.
- Explain why wide format is inappropriate for NLME modeling.
TipStep-by-Step Solutions
pk_wide %>%
pivot_longer(
cols = starts_with("DV_"),
names_to = "ARM",
values_to = "DV"
) %>%
mutate(ARM = str_remove(ARM, "DV_")) %>%
filter(!is.na(DV))# A tibble: 4 × 4
ID TIME ARM DV
<dbl> <dbl> <chr> <dbl>
1 1 0.5 TRT 2.1
2 1 1 TRT 3.8
3 2 0.5 CTRL 1.6
4 2 1 CTRL 2.9
Summary
In this lesson you learned how to:
- reshape data safely using
pivot_longer()andpivot_wider() - recognize when long vs wide format is appropriate in PMx
- avoid reshaping mistakes that break model assumptions
Reshaping is powerful—but in PMx, long format is king.
TipQuick Tips
- Default to long format for modeling.
- Pivot wide only for covariates or summaries.
- Always check row counts and keys after reshaping.
- If a pivot feels confusing, stop and re-check intent.