library(tidyverse)Mutating, Transmuting, and Applying Functions Across Columns
What you’ll build today: a set of new PMx-friendly variables: dose/obs flags, unit conversions, log concentration, and simple QC indicators.
Learning Objectives
By the end of this lesson, you will be able to:
- Create new columns using
mutate()and understand when to usetransmute(). - Use
if_else()andcase_when()for clean, readable conditional logic. - Apply transformations across multiple columns using
across(). - Build common PMx-derived fields (flags, unit conversions, log transforms, and subject-level checks).
- Avoid common transformation mistakes that silently affect modeling.
Setup
We’ll use a small PMx-style dataset.
pk <- tibble::tribble(
~ID, ~TIME, ~EVID, ~AMT_mg, ~DV_ng_mL, ~CMT, ~WT_kg, ~SEX,
1, 0.0, 1, 100, NA, 1, 72, "F",
1, 0.5, 0, NA, 2100, 1, 72, "F",
1, 1.0, 0, NA, 3800, 1, 72, "F",
1, 2.0, 0, NA, 3000, 1, 72, "F",
2, 0.0, 1, 80, NA, 1, 88, "M",
2, 0.5, 0, NA, 1600, 1, 88, "M",
2, 1.0, 0, NA, 2900, 1, 88, "M",
2, 2.0, 0, NA, 2400, 1, 88, "M"
)
pk# A tibble: 8 × 8
ID TIME EVID AMT_mg DV_ng_mL CMT WT_kg SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2100 1 72 F
3 1 1 0 NA 3800 1 72 F
4 1 2 0 NA 3000 1 72 F
5 2 0 1 80 NA 1 88 M
6 2 0.5 0 NA 1600 1 88 M
7 2 1 0 NA 2900 1 88 M
8 2 2 0 NA 2400 1 88 M
Key Ideas
mutate() creates new variables while keeping existing ones.
In PMx workflows, derived variables encode:
- Meaning (dose vs observation flags)
- Units (explicit conversions)
- Scale (log transformations)
- QC signals (unexpected or extreme values)
transmute() behaves like mutate() but drops original columns, which is useful when building compact modeling datasets.
Guard transformations carefully.
log(0) produces -Inf, which can silently propagate into models and summaries.
Worked Example 1: Flags
pk2 <- pk %>%
mutate(
is_dose = EVID == 1,
is_obs = EVID == 0
)
pk2# A tibble: 8 × 10
ID TIME EVID AMT_mg DV_ng_mL CMT WT_kg SEX is_dose is_obs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <lgl> <lgl>
1 1 0 1 100 NA 1 72 F TRUE FALSE
2 1 0.5 0 NA 2100 1 72 F FALSE TRUE
3 1 1 0 NA 3800 1 72 F FALSE TRUE
4 1 2 0 NA 3000 1 72 F FALSE TRUE
5 2 0 1 80 NA 1 88 M TRUE FALSE
6 2 0.5 0 NA 1600 1 88 M FALSE TRUE
7 2 1 0 NA 2900 1 88 M FALSE TRUE
8 2 2 0 NA 2400 1 88 M FALSE TRUE
Logical flags are flexible and safe during cleaning.
Worked Example 2: Unit Conversion
pk3 <- pk %>%
mutate(
DV_mg_L = DV_ng_mL * 0.001,
AMT = AMT_mg
)
pk3# A tibble: 8 × 10
ID TIME EVID AMT_mg DV_ng_mL CMT WT_kg SEX DV_mg_L AMT
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 1 0 1 100 NA 1 72 F NA 100
2 1 0.5 0 NA 2100 1 72 F 2.1 NA
3 1 1 0 NA 3800 1 72 F 3.8 NA
4 1 2 0 NA 3000 1 72 F 3 NA
5 2 0 1 80 NA 1 88 M NA 80
6 2 0.5 0 NA 1600 1 88 M 1.6 NA
7 2 1 0 NA 2900 1 88 M 2.9 NA
8 2 2 0 NA 2400 1 88 M 2.4 NA
Keep raw and model-ready columns distinct for auditability.
Worked Example 3: Conditional Logic
For simple conditional logic, you can use if_else(), which evaluates a condition and returns one value if the condition is TRUE and another if it is FALSE.
pk %>%
mutate(
record_type = if_else(EVID == 1, "dose", "obs")
)# A tibble: 8 × 9
ID TIME EVID AMT_mg DV_ng_mL CMT WT_kg SEX record_type
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 1 0 1 100 NA 1 72 F dose
2 1 0.5 0 NA 2100 1 72 F obs
3 1 1 0 NA 3800 1 72 F obs
4 1 2 0 NA 3000 1 72 F obs
5 2 0 1 80 NA 1 88 M dose
6 2 0.5 0 NA 1600 1 88 M obs
7 2 1 0 NA 2900 1 88 M obs
8 2 2 0 NA 2400 1 88 M obs
For more complex conditional logic, you can use case_when(), where each condition is written using ~ to map a condition to its result, and a final TRUE ~ ... acts as a default case when none of the earlier conditions are met.
pk %>%
mutate(
time_bin = case_when(
TIME == 0 & EVID == 1 ~ "dose_time",
TIME <= 1 & EVID == 0 ~ "early_obs",
TIME > 1 & EVID == 0 ~ "late_obs",
TRUE ~ "other"
)
)# A tibble: 8 × 9
ID TIME EVID AMT_mg DV_ng_mL CMT WT_kg SEX time_bin
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 1 0 1 100 NA 1 72 F dose_time
2 1 0.5 0 NA 2100 1 72 F early_obs
3 1 1 0 NA 3800 1 72 F early_obs
4 1 2 0 NA 3000 1 72 F late_obs
5 2 0 1 80 NA 1 88 M dose_time
6 2 0.5 0 NA 1600 1 88 M early_obs
7 2 1 0 NA 2900 1 88 M early_obs
8 2 2 0 NA 2400 1 88 M late_obs
Worked Example 4: Safe Log Transform
pk_log <- pk3 %>%
mutate(
log_DV = case_when(
EVID == 0 & !is.na(DV_mg_L) & DV_mg_L > 0 ~ log(DV_mg_L),
TRUE ~ NA_real_
)
)
pk_log %>% select(ID, TIME, DV_mg_L, log_DV)# A tibble: 8 × 4
ID TIME DV_mg_L log_DV
<dbl> <dbl> <dbl> <dbl>
1 1 0 NA NA
2 1 0.5 2.1 0.742
3 1 1 3.8 1.34
4 1 2 3 1.10
5 2 0 NA NA
6 2 0.5 1.6 0.470
7 2 1 2.9 1.06
8 2 2 2.4 0.875
Worked Example 5: transmute()
pk_model <- pk3 %>%
transmute(ID, TIME, EVID, AMT, DV = DV_mg_L)
pk_model# A tibble: 8 × 5
ID TIME EVID AMT DV
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 100 NA
2 1 0.5 0 NA 2.1
3 1 1 0 NA 3.8
4 1 2 0 NA 3
5 2 0 1 80 NA
6 2 0.5 0 NA 1.6
7 2 1 0 NA 2.9
8 2 2 0 NA 2.4
Worked Example 6: across()
Use across() to apply transformations consistently across multiple columns:
pk_scaled <- pk_log %>%
mutate(
across(
.cols = WT_kg,
.fns = ~ (.x - mean(.x, na.rm = TRUE)) / sd(.x, na.rm = TRUE),
.names = "z_{.col}"
)
)
pk_scaled# A tibble: 8 × 12
ID TIME EVID AMT_mg DV_ng_mL CMT WT_kg SEX DV_mg_L AMT log_DV
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 1 0 1 100 NA 1 72 F NA 100 NA
2 1 0.5 0 NA 2100 1 72 F 2.1 NA 0.742
3 1 1 0 NA 3800 1 72 F 3.8 NA 1.34
4 1 2 0 NA 3000 1 72 F 3 NA 1.10
5 2 0 1 80 NA 1 88 M NA 80 NA
6 2 0.5 0 NA 1600 1 88 M 1.6 NA 0.470
7 2 1 0 NA 2900 1 88 M 2.9 NA 1.06
8 2 2 0 NA 2400 1 88 M 2.4 NA 0.875
# ℹ 1 more variable: z_WT_kg <dbl>
Strategies
- Use
if_else()for strict typing. - Use
case_when()for multi-branch logic. - Keep raw-unit columns separate from modeling-ready columns.
- Convert units once and clearly.
- Guard log transforms explicitly.
- Use
across()for consistent transformations.
Common Mistakes
- Using
ifelse()instead ofif_else() - Converting units without renaming
- Logging zero values
- Mutating grouped data unintentionally
- Overusing
across()where clarity would benefit from explicit code
Practice Problems
- Add dose and observation flags.
- Convert DV to mg/L.
- Create a record type column.
- Compute log_DV safely.
- Build a compact modeling dataset with
transmute(). - Use
across()to round numeric columns.
pk_out <- pk %>%
mutate(
is_dose = EVID == 1,
is_obs = EVID == 0,
DV_mg_L = DV_ng_mL * 0.001,
AMT = AMT_mg,
record_type = if_else(EVID == 1, "dose", "obs"),
log_DV = case_when(
EVID == 0 & !is.na(DV_mg_L) & DV_mg_L > 0 ~ log(DV_mg_L),
TRUE ~ NA_real_
)
) %>%
transmute(ID, TIME, EVID, AMT, DV = DV_mg_L, log_DV)
pk_out# A tibble: 8 × 6
ID TIME EVID AMT DV log_DV
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 100 NA NA
2 1 0.5 0 NA 2.1 0.742
3 1 1 0 NA 3.8 1.34
4 1 2 0 NA 3 1.10
5 2 0 1 80 NA NA
6 2 0.5 0 NA 1.6 0.470
7 2 1 0 NA 2.9 1.06
8 2 2 0 NA 2.4 0.875
Summary
You now know how to:
- Create new variables with
mutate(). - Use
transmute()for compact modeling datasets. - Write safe conditional logic with
if_else()andcase_when(). - Apply consistent transformations with
across(). - Protect modeling workflows with guarded transformations.
Derived variables encode modeling meaning.
Clear, intentional transformations prevent silent analytical errors.
- Keep raw units separate from model-ready variables.
- Guard all log transforms.
- Prefer
if_else()for strict typing. - Convert units once and clearly.
- Standardize transformation logic early.