Mutating, Transmuting, and Applying Functions Across Columns

Use mutate(), transmute(), if_else(), case_when(), and across() to create PMx-ready variables (flags, unit conversions, log transforms, and QC helpers).
Tip

What you’ll build today: a set of new PMx-friendly variables: dose/obs flags, unit conversions, log concentration, and simple QC indicators.

Learning Objectives

By the end of this lesson, you will be able to:

  • Create new columns using mutate() and understand when to use transmute().
  • Use if_else() and case_when() for clean, readable conditional logic.
  • Apply transformations across multiple columns using across().
  • Build common PMx-derived fields (flags, unit conversions, log transforms, and subject-level checks).
  • Avoid common transformation mistakes that silently affect modeling.

Setup

library(tidyverse)

We’ll use a small PMx-style dataset.

pk <- tibble::tribble(
  ~ID, ~TIME, ~EVID, ~AMT_mg, ~DV_ng_mL, ~CMT, ~WT_kg, ~SEX,
    1,   0.0,    1,    100,      NA,       1,   72,   "F",
    1,   0.5,    0,     NA,     2100,       1,   72,   "F",
    1,   1.0,    0,     NA,     3800,       1,   72,   "F",
    1,   2.0,    0,     NA,     3000,       1,   72,   "F",
    2,   0.0,    1,     80,      NA,       1,   88,   "M",
    2,   0.5,    0,     NA,     1600,       1,   88,   "M",
    2,   1.0,    0,     NA,     2900,       1,   88,   "M",
    2,   2.0,    0,     NA,     2400,       1,   88,   "M"
)

pk
# A tibble: 8 × 8
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX  
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr>
1     1   0       1    100       NA     1    72 F    
2     1   0.5     0     NA     2100     1    72 F    
3     1   1       0     NA     3800     1    72 F    
4     1   2       0     NA     3000     1    72 F    
5     2   0       1     80       NA     1    88 M    
6     2   0.5     0     NA     1600     1    88 M    
7     2   1       0     NA     2900     1    88 M    
8     2   2       0     NA     2400     1    88 M    

Key Ideas

mutate() creates new variables while keeping existing ones.

In PMx workflows, derived variables encode:

  • Meaning (dose vs observation flags)
  • Units (explicit conversions)
  • Scale (log transformations)
  • QC signals (unexpected or extreme values)

transmute() behaves like mutate() but drops original columns, which is useful when building compact modeling datasets.

Warning

Guard transformations carefully.
log(0) produces -Inf, which can silently propagate into models and summaries.


Worked Example 1: Flags

pk2 <- pk %>%
  mutate(
    is_dose = EVID == 1,
    is_obs  = EVID == 0
  )

pk2
# A tibble: 8 × 10
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX   is_dose is_obs
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr> <lgl>   <lgl> 
1     1   0       1    100       NA     1    72 F     TRUE    FALSE 
2     1   0.5     0     NA     2100     1    72 F     FALSE   TRUE  
3     1   1       0     NA     3800     1    72 F     FALSE   TRUE  
4     1   2       0     NA     3000     1    72 F     FALSE   TRUE  
5     2   0       1     80       NA     1    88 M     TRUE    FALSE 
6     2   0.5     0     NA     1600     1    88 M     FALSE   TRUE  
7     2   1       0     NA     2900     1    88 M     FALSE   TRUE  
8     2   2       0     NA     2400     1    88 M     FALSE   TRUE  

Logical flags are flexible and safe during cleaning.


Worked Example 2: Unit Conversion

pk3 <- pk %>%
  mutate(
    DV_mg_L = DV_ng_mL * 0.001,
    AMT     = AMT_mg
  )

pk3
# A tibble: 8 × 10
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX   DV_mg_L   AMT
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr>   <dbl> <dbl>
1     1   0       1    100       NA     1    72 F        NA     100
2     1   0.5     0     NA     2100     1    72 F         2.1    NA
3     1   1       0     NA     3800     1    72 F         3.8    NA
4     1   2       0     NA     3000     1    72 F         3      NA
5     2   0       1     80       NA     1    88 M        NA      80
6     2   0.5     0     NA     1600     1    88 M         1.6    NA
7     2   1       0     NA     2900     1    88 M         2.9    NA
8     2   2       0     NA     2400     1    88 M         2.4    NA
Note

Keep raw and model-ready columns distinct for auditability.


Worked Example 3: Conditional Logic

For simple conditional logic, you can use if_else(), which evaluates a condition and returns one value if the condition is TRUE and another if it is FALSE.

pk %>%
  mutate(
    record_type = if_else(EVID == 1, "dose", "obs")
  )
# A tibble: 8 × 9
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX   record_type
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr> <chr>      
1     1   0       1    100       NA     1    72 F     dose       
2     1   0.5     0     NA     2100     1    72 F     obs        
3     1   1       0     NA     3800     1    72 F     obs        
4     1   2       0     NA     3000     1    72 F     obs        
5     2   0       1     80       NA     1    88 M     dose       
6     2   0.5     0     NA     1600     1    88 M     obs        
7     2   1       0     NA     2900     1    88 M     obs        
8     2   2       0     NA     2400     1    88 M     obs        

For more complex conditional logic, you can use case_when(), where each condition is written using ~ to map a condition to its result, and a final TRUE ~ ... acts as a default case when none of the earlier conditions are met.

pk %>%
  mutate(
    time_bin = case_when(
      TIME == 0 & EVID == 1 ~ "dose_time",
      TIME <= 1 & EVID == 0 ~ "early_obs",
      TIME > 1 & EVID == 0  ~ "late_obs",
      TRUE ~ "other"
    )
  )
# A tibble: 8 × 9
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX   time_bin 
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr> <chr>    
1     1   0       1    100       NA     1    72 F     dose_time
2     1   0.5     0     NA     2100     1    72 F     early_obs
3     1   1       0     NA     3800     1    72 F     early_obs
4     1   2       0     NA     3000     1    72 F     late_obs 
5     2   0       1     80       NA     1    88 M     dose_time
6     2   0.5     0     NA     1600     1    88 M     early_obs
7     2   1       0     NA     2900     1    88 M     early_obs
8     2   2       0     NA     2400     1    88 M     late_obs 

Worked Example 4: Safe Log Transform

pk_log <- pk3 %>%
  mutate(
    log_DV = case_when(
      EVID == 0 & !is.na(DV_mg_L) & DV_mg_L > 0 ~ log(DV_mg_L),
      TRUE ~ NA_real_
    )
  )

pk_log %>% select(ID, TIME, DV_mg_L, log_DV)
# A tibble: 8 × 4
     ID  TIME DV_mg_L log_DV
  <dbl> <dbl>   <dbl>  <dbl>
1     1   0      NA   NA    
2     1   0.5     2.1  0.742
3     1   1       3.8  1.34 
4     1   2       3    1.10 
5     2   0      NA   NA    
6     2   0.5     1.6  0.470
7     2   1       2.9  1.06 
8     2   2       2.4  0.875

Worked Example 5: transmute()

pk_model <- pk3 %>%
  transmute(ID, TIME, EVID, AMT, DV = DV_mg_L)

pk_model
# A tibble: 8 × 5
     ID  TIME  EVID   AMT    DV
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1   0       1   100  NA  
2     1   0.5     0    NA   2.1
3     1   1       0    NA   3.8
4     1   2       0    NA   3  
5     2   0       1    80  NA  
6     2   0.5     0    NA   1.6
7     2   1       0    NA   2.9
8     2   2       0    NA   2.4

Worked Example 6: across()

Use across() to apply transformations consistently across multiple columns:

pk_scaled <- pk_log %>%
  mutate(
    across(
      .cols = WT_kg,
      .fns  = ~ (.x - mean(.x, na.rm = TRUE)) / sd(.x, na.rm = TRUE),
      .names = "z_{.col}"
    )
  )

pk_scaled
# A tibble: 8 × 12
     ID  TIME  EVID AMT_mg DV_ng_mL   CMT WT_kg SEX   DV_mg_L   AMT log_DV
  <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <chr>   <dbl> <dbl>  <dbl>
1     1   0       1    100       NA     1    72 F        NA     100 NA    
2     1   0.5     0     NA     2100     1    72 F         2.1    NA  0.742
3     1   1       0     NA     3800     1    72 F         3.8    NA  1.34 
4     1   2       0     NA     3000     1    72 F         3      NA  1.10 
5     2   0       1     80       NA     1    88 M        NA      80 NA    
6     2   0.5     0     NA     1600     1    88 M         1.6    NA  0.470
7     2   1       0     NA     2900     1    88 M         2.9    NA  1.06 
8     2   2       0     NA     2400     1    88 M         2.4    NA  0.875
# ℹ 1 more variable: z_WT_kg <dbl>

Strategies

  • Use if_else() for strict typing.
  • Use case_when() for multi-branch logic.
  • Keep raw-unit columns separate from modeling-ready columns.
  • Convert units once and clearly.
  • Guard log transforms explicitly.
  • Use across() for consistent transformations.

Common Mistakes

  • Using ifelse() instead of if_else()
  • Converting units without renaming
  • Logging zero values
  • Mutating grouped data unintentionally
  • Overusing across() where clarity would benefit from explicit code

Practice Problems

  1. Add dose and observation flags.
  2. Convert DV to mg/L.
  3. Create a record type column.
  4. Compute log_DV safely.
  5. Build a compact modeling dataset with transmute().
  6. Use across() to round numeric columns.

pk_out <- pk %>%
  mutate(
    is_dose = EVID == 1,
    is_obs  = EVID == 0,
    DV_mg_L = DV_ng_mL * 0.001,
    AMT     = AMT_mg,
    record_type = if_else(EVID == 1, "dose", "obs"),
    log_DV = case_when(
      EVID == 0 & !is.na(DV_mg_L) & DV_mg_L > 0 ~ log(DV_mg_L),
      TRUE ~ NA_real_
    )
  ) %>%
  transmute(ID, TIME, EVID, AMT, DV = DV_mg_L, log_DV)

pk_out
# A tibble: 8 × 6
     ID  TIME  EVID   AMT    DV log_DV
  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1     1   0       1   100  NA   NA    
2     1   0.5     0    NA   2.1  0.742
3     1   1       0    NA   3.8  1.34 
4     1   2       0    NA   3    1.10 
5     2   0       1    80  NA   NA    
6     2   0.5     0    NA   1.6  0.470
7     2   1       0    NA   2.9  1.06 
8     2   2       0    NA   2.4  0.875

Summary

You now know how to:

  • Create new variables with mutate().
  • Use transmute() for compact modeling datasets.
  • Write safe conditional logic with if_else() and case_when().
  • Apply consistent transformations with across().
  • Protect modeling workflows with guarded transformations.

Derived variables encode modeling meaning.
Clear, intentional transformations prevent silent analytical errors.


  • Keep raw units separate from model-ready variables.
  • Guard all log transforms.
  • Prefer if_else() for strict typing.
  • Convert units once and clearly.
  • Standardize transformation logic early.