Mutating, Transmuting, and Applying Functions Across Columns

Use mutate(), transmute(), if_else(), case_when(), and across() to create PMx-ready variables (flags, unit conversions, log transforms, and QC helpers).

Tip

What you’ll build today: a set of new PMx-friendly variables: dose/obs flags, unit conversions, log concentration, and simple QC indicators.

Learning Objectives

By the end of this lesson, you will be able to:

Create new columns using mutate() and understand when to use transmute().
Use if_else() and case_when() for clean, readable conditional logic.
Apply transformations across multiple columns using across().
Build common PMx-derived fields (flags, unit conversions, log transforms, and subject-level checks).
Avoid common transformation mistakes that silently affect modeling.

Setup

library(tidyverse)

We’ll use a small PMx-style dataset.

pk <- tibble::tribble(
  ~ID, ~TIME, ~EVID, ~AMT_mg, ~DV_ng_mL, ~CMT, ~WT_kg, ~SEX,
    1,   0.0,    1,    100,      NA,       1,   72,   "F",
    1,   0.5,    0,     NA,     2100,       1,   72,   "F",
    1,   1.0,    0,     NA,     3800,       1,   72,   "F",
    1,   2.0,    0,     NA,     3000,       1,   72,   "F",
    2,   0.0,    1,     80,      NA,       1,   88,   "M",
    2,   0.5,    0,     NA,     1600,       1,   88,   "M",
    2,   1.0,    0,     NA,     2900,       1,   88,   "M",
    2,   2.0,    0,     NA,     2400,       1,   88,   "M"
)

pk

# A tibble: 8 × 8
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX  
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr>
1     1   0       1    100       NA     1    72 F    
2     1   0.5     0     NA     2100     1    72 F    
3     1   1       0     NA     3800     1    72 F    
4     1   2       0     NA     3000     1    72 F    
5     2   0       1     80       NA     1    88 M    
6     2   0.5     0     NA     1600     1    88 M    
7     2   1       0     NA     2900     1    88 M    
8     2   2       0     NA     2400     1    88 M

Key Ideas

mutate() creates new variables while keeping existing ones.

In PMx workflows, derived variables encode:

Meaning (dose vs observation flags)
Units (explicit conversions)
Scale (log transformations)
QC signals (unexpected or extreme values)

transmute() behaves like mutate() but drops original columns, which is useful when building compact modeling datasets.

Warning

Guard transformations carefully.
log(0) produces -Inf, which can silently propagate into models and summaries.

Worked Example 1: Flags

pk2 <- pk %>%
  mutate(
    is_dose = EVID == 1,
    is_obs  = EVID == 0
  )

pk2

# A tibble: 8 × 10
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX   is_dose is_obs
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr> <lgl>   <lgl> 
1     1   0       1    100       NA     1    72 F     TRUE    FALSE 
2     1   0.5     0     NA     2100     1    72 F     FALSE   TRUE  
3     1   1       0     NA     3800     1    72 F     FALSE   TRUE  
4     1   2       0     NA     3000     1    72 F     FALSE   TRUE  
5     2   0       1     80       NA     1    88 M     TRUE    FALSE 
6     2   0.5     0     NA     1600     1    88 M     FALSE   TRUE  
7     2   1       0     NA     2900     1    88 M     FALSE   TRUE  
8     2   2       0     NA     2400     1    88 M     FALSE   TRUE

Logical flags are flexible and safe during cleaning.

Worked Example 2: Unit Conversion

pk3 <- pk %>%
  mutate(
    DV_mg_L = DV_ng_mL * 0.001,
    AMT     = AMT_mg
  )

pk3

# A tibble: 8 × 10
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX   DV_mg_L   AMT
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr>   <dbl> <dbl>
1     1   0       1    100       NA     1    72 F        NA     100
2     1   0.5     0     NA     2100     1    72 F         2.1    NA
3     1   1       0     NA     3800     1    72 F         3.8    NA
4     1   2       0     NA     3000     1    72 F         3      NA
5     2   0       1     80       NA     1    88 M        NA      80
6     2   0.5     0     NA     1600     1    88 M         1.6    NA
7     2   1       0     NA     2900     1    88 M         2.9    NA
8     2   2       0     NA     2400     1    88 M         2.4    NA

Note

Keep raw and model-ready columns distinct for auditability.

Worked Example 3: Conditional Logic

For simple conditional logic, you can use if_else(), which evaluates a condition and returns one value if the condition is TRUE and another if it is FALSE.

pk %>%
  mutate(
    record_type = if_else(EVID == 1, "dose", "obs")
  )

# A tibble: 8 × 9
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX   record_type
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr> <chr>      
1     1   0       1    100       NA     1    72 F     dose       
2     1   0.5     0     NA     2100     1    72 F     obs        
3     1   1       0     NA     3800     1    72 F     obs        
4     1   2       0     NA     3000     1    72 F     obs        
5     2   0       1     80       NA     1    88 M     dose       
6     2   0.5     0     NA     1600     1    88 M     obs        
7     2   1       0     NA     2900     1    88 M     obs        
8     2   2       0     NA     2400     1    88 M     obs

For more complex conditional logic, you can use case_when(), where each condition is written using ~ to map a condition to its result, and a final TRUE ~ ... acts as a default case when none of the earlier conditions are met.

pk %>%
  mutate(
    time_bin = case_when(
      TIME == 0 & EVID == 1 ~ "dose_time",
      TIME <= 1 & EVID == 0 ~ "early_obs",
      TIME > 1 & EVID == 0  ~ "late_obs",
      TRUE ~ "other"
    )
  )

# A tibble: 8 × 9
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX   time_bin 
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr> <chr>    
1     1   0       1    100       NA     1    72 F     dose_time
2     1   0.5     0     NA     2100     1    72 F     early_obs
3     1   1       0     NA     3800     1    72 F     early_obs
4     1   2       0     NA     3000     1    72 F     late_obs 
5     2   0       1     80       NA     1    88 M     dose_time
6     2   0.5     0     NA     1600     1    88 M     early_obs
7     2   1       0     NA     2900     1    88 M     early_obs
8     2   2       0     NA     2400     1    88 M     late_obs

Worked Example 4: Safe Log Transform

pk_log <- pk3 %>%
  mutate(
    log_DV = case_when(
      EVID == 0 & !is.na(DV_mg_L) & DV_mg_L > 0 ~ log(DV_mg_L),
      TRUE ~ NA_real_
    )
  )

pk_log %>% select(ID, TIME, DV_mg_L, log_DV)

# A tibble: 8 × 4
     ID  TIME DV_mg_L log_DV
  <dbl> <dbl>   <dbl>  <dbl>
1     1   0      NA   NA    
2     1   0.5     2.1  0.742
3     1   1       3.8  1.34 
4     1   2       3    1.10 
5     2   0      NA   NA    
6     2   0.5     1.6  0.470
7     2   1       2.9  1.06 
8     2   2       2.4  0.875

Worked Example 5: transmute()

pk_model <- pk3 %>%
  transmute(ID, TIME, EVID, AMT, DV = DV_mg_L)

pk_model

# A tibble: 8 × 5
     ID  TIME  EVID   AMT    DV
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1   0       1   100  NA  
2     1   0.5     0    NA   2.1
3     1   1       0    NA   3.8
4     1   2       0    NA   3  
5     2   0       1    80  NA  
6     2   0.5     0    NA   1.6
7     2   1       0    NA   2.9
8     2   2       0    NA   2.4

Worked Example 6: across()

Use across() to apply transformations consistently across multiple columns:

pk_scaled <- pk_log %>%
  mutate(
    across(
      .cols = WT_kg,
      .fns  = ~ (.x - mean(.x, na.rm = TRUE)) / sd(.x, na.rm = TRUE),
      .names = "z_{.col}"
    )
  )

pk_scaled

# A tibble: 8 × 12
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX   DV_mg_L   AMT log_DV
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr>   <dbl> <dbl>  <dbl>
1     1   0       1    100       NA     1    72 F        NA     100 NA    
2     1   0.5     0     NA     2100     1    72 F         2.1    NA  0.742
3     1   1       0     NA     3800     1    72 F         3.8    NA  1.34 
4     1   2       0     NA     3000     1    72 F         3      NA  1.10 
5     2   0       1     80       NA     1    88 M        NA      80 NA    
6     2   0.5     0     NA     1600     1    88 M         1.6    NA  0.470
7     2   1       0     NA     2900     1    88 M         2.9    NA  1.06 
8     2   2       0     NA     2400     1    88 M         2.4    NA  0.875
# ℹ 1 more variable: z_WT_kg <dbl>

Strategies

Use if_else() for strict typing.
Use case_when() for multi-branch logic.
Keep raw-unit columns separate from modeling-ready columns.
Convert units once and clearly.
Guard log transforms explicitly.
Use across() for consistent transformations.

Common Mistakes

Using ifelse() instead of if_else()
Converting units without renaming
Logging zero values
Mutating grouped data unintentionally
Overusing across() where clarity would benefit from explicit code

Practice Problems

Add dose and observation flags.
Convert DV to mg/L.
Create a record type column.
Compute log_DV safely.
Build a compact modeling dataset with transmute().
Use across() to round numeric columns.

Step-by-Step Solutions

pk_out <- pk %>%
  mutate(
    is_dose = EVID == 1,
    is_obs  = EVID == 0,
    DV_mg_L = DV_ng_mL * 0.001,
    AMT     = AMT_mg,
    record_type = if_else(EVID == 1, "dose", "obs"),
    log_DV = case_when(
      EVID == 0 & !is.na(DV_mg_L) & DV_mg_L > 0 ~ log(DV_mg_L),
      TRUE ~ NA_real_
    )
  ) %>%
  transmute(ID, TIME, EVID, AMT, DV = DV_mg_L, log_DV)

pk_out

# A tibble: 8 × 6
     ID  TIME  EVID   AMT    DV log_DV
  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1     1   0       1   100  NA   NA    
2     1   0.5     0    NA   2.1  0.742
3     1   1       0    NA   3.8  1.34 
4     1   2       0    NA   3    1.10 
5     2   0       1    80  NA   NA    
6     2   0.5     0    NA   1.6  0.470
7     2   1       0    NA   2.9  1.06 
8     2   2       0    NA   2.4  0.875

Summary

You now know how to:

Create new variables with mutate().
Use transmute() for compact modeling datasets.
Write safe conditional logic with if_else() and case_when().
Apply consistent transformations with across().
Protect modeling workflows with guarded transformations.

Derived variables encode modeling meaning.
Clear, intentional transformations prevent silent analytical errors.

Quick Tips

Keep raw units separate from model-ready variables.
Guard all log transforms.
Prefer if_else() for strict typing.
Convert units once and clearly.
Standardize transformation logic early.