Selecting, Renaming, and Relocating Columns

Use select(), rename(), relocate(), and helper functions to standardize column structure for PMx workflows.
Tip

What you’ll build today: a clean, consistently named, well-ordered PMx dataset ready for modeling and visualization.

Learning Objectives

By the end of this lesson, you will be able to:

  • Select columns efficiently using select() and helper functions.
  • Rename variables clearly and consistently using rename() and rename_with().
  • Reorder columns using relocate() for PMx workflows.
  • Enforce standard PMx naming conventions without rewriting downstream code.
  • Recognize structural naming issues that can create modeling errors.

Setup

library(tidyverse)

We’ll use a small sponsor-style dataset.

pk <- tibble::tribble(
  ~subject_id, ~time_hr, ~event, ~dose_mg, ~conc, ~compartment, ~weight, ~sex,
            1,       0,      1,     100,    NA,        1,        72,   "F",
            1,     0.5,      0,      NA,   2.1,        1,        72,   "F",
            1,     1.0,      0,      NA,   3.8,        1,        72,   "F",
            2,       0,      1,      80,    NA,        1,        88,   "M",
            2,     0.5,      0,      NA,   1.6,        1,        88,   "M"
)

pk
# A tibble: 5 × 8
  subject_id time_hr event dose_mg  conc compartment weight sex  
       <dbl>   <dbl> <dbl>   <dbl> <dbl>       <dbl>  <dbl> <chr>
1          1     0       1     100  NA             1     72 F    
2          1     0.5     0      NA   2.1           1     72 F    
3          1     1       0      NA   3.8           1     72 F    
4          2     0       1      80  NA             1     88 M    
5          2     0.5     0      NA   1.6           1     88 M    

Key Ideas

Column structure is part of modeling structure.

In PMx workflows, clean datasets typically:

  • Use consistent naming conventions (ID, TIME, EVID, AMT, DV, etc.).
  • Separate raw sponsor names from modeling-ready names.
  • Keep core modeling columns grouped together.
  • Remove unnecessary columns early to reduce cognitive load.

Selecting and renaming are not cosmetic steps — they reduce structural risk.

Warning

Inconsistent naming (e.g., id, Subject, subject_id) is a frequent source of silent joins, merge errors, and modeling mistakes.


Worked Example 1: Basic Selection

Explicit selection:

pk %>% select(subject_id, time_hr, conc)
# A tibble: 5 × 3
  subject_id time_hr  conc
       <dbl>   <dbl> <dbl>
1          1     0    NA  
2          1     0.5   2.1
3          1     1     3.8
4          2     0    NA  
5          2     0.5   1.6

Selecting reduces clutter and protects against accidental use of irrelevant columns.


Worked Example 2: Select Helpers

starts_with()

pk %>% select(starts_with("time"))
# A tibble: 5 × 1
  time_hr
    <dbl>
1     0  
2     0.5
3     1  
4     0  
5     0.5

ends_with()

pk %>% select(ends_with("_mg"))
# A tibble: 5 × 1
  dose_mg
    <dbl>
1     100
2      NA
3      NA
4      80
5      NA

contains()

pk %>% select(contains("comp"))
# A tibble: 5 × 1
  compartment
        <dbl>
1           1
2           1
3           1
4           1
5           1

matches() (regular expressions)

pk %>% select(matches("^(subj|time)"))
# A tibble: 5 × 2
  subject_id time_hr
       <dbl>   <dbl>
1          1     0  
2          1     0.5
3          1     1  
4          2     0  
5          2     0.5

everything() for ordering

pk %>% select(subject_id, time_hr, everything())
# A tibble: 5 × 8
  subject_id time_hr event dose_mg  conc compartment weight sex  
       <dbl>   <dbl> <dbl>   <dbl> <dbl>       <dbl>  <dbl> <chr>
1          1     0       1     100  NA             1     72 F    
2          1     0.5     0      NA   2.1           1     72 F    
3          1     1       0      NA   3.8           1     72 F    
4          2     0       1      80  NA             1     88 M    
5          2     0.5     0      NA   1.6           1     88 M    

Combined example (realistic workflow):

pk %>%
  select(subject_id, time_hr, starts_with("dose"), conc)
# A tibble: 5 × 4
  subject_id time_hr dose_mg  conc
       <dbl>   <dbl>   <dbl> <dbl>
1          1     0       100  NA  
2          1     0.5      NA   2.1
3          1     1        NA   3.8
4          2     0        80  NA  
5          2     0.5      NA   1.6
Note

Select helpers improve safety when datasets are large and unfamiliar.


Worked Example 3: Renaming to PMx Conventions

pk_std <- pk %>%
  rename(
    ID   = subject_id,
    TIME = time_hr,
    EVID = event,
    AMT  = dose_mg,
    DV   = conc,
    CMT  = compartment,
    WT   = weight,
    SEX  = sex
  )

pk_std
# A tibble: 5 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     2   0       1    80  NA       1    88 M    
5     2   0.5     0    NA   1.6     1    88 M    

Rename once — and downstream code becomes simpler.

Bulk renaming example:

pk %>% rename_with(toupper)
# A tibble: 5 × 8
  SUBJECT_ID TIME_HR EVENT DOSE_MG  CONC COMPARTMENT WEIGHT SEX  
       <dbl>   <dbl> <dbl>   <dbl> <dbl>       <dbl>  <dbl> <chr>
1          1     0       1     100  NA             1     72 F    
2          1     0.5     0      NA   2.1           1     72 F    
3          1     1       0      NA   3.8           1     72 F    
4          2     0       1      80  NA             1     88 M    
5          2     0.5     0      NA   1.6           1     88 M    

Worked Example 4: Relocating Columns

Move core modeling columns to the front:

pk_std %>% relocate(ID, TIME, EVID, AMT, DV)
# A tibble: 5 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     2   0       1    80  NA       1    88 M    
5     2   0.5     0    NA   1.6     1    88 M    

Relocation improves readability and aligns with modeling expectations.

A common PMx ordering:

  • ID
  • TIME
  • Event / dosing columns
  • Observations
  • Covariates

Strategies

  • Standardize names immediately after import.
  • Use select helpers when exploring unfamiliar sponsor exports.
  • Rename once, early, and consistently.
  • Keep modeling-relevant columns grouped at the front.
  • Preserve raw columns only if needed for auditability.

Common Mistakes

  • Renaming after modeling code is written.
  • Leaving dose and DV columns ambiguously named.
  • Relocating columns without checking downstream expectations.
  • Selecting columns before confirming their meaning.
  • Forgetting that select() drops unlisted columns.

Practice Problems

  1. Select only ID, time, and concentration columns.
  2. Use contains() to select weight-related columns.
  3. Rename all columns to uppercase.
  4. Rename sponsor-style names to PMx conventions.
  5. Relocate ID and TIME to the front.
  6. Create a dataset that contains only modeling-relevant columns (ID, TIME, EVID, AMT, DV).

# 1
pk %>% select(subject_id, time_hr, conc)
# A tibble: 5 × 3
  subject_id time_hr  conc
       <dbl>   <dbl> <dbl>
1          1     0    NA  
2          1     0.5   2.1
3          1     1     3.8
4          2     0    NA  
5          2     0.5   1.6
# 2
pk %>% select(contains("weight"))
# A tibble: 5 × 1
  weight
   <dbl>
1     72
2     72
3     72
4     88
5     88
# 3
pk %>% rename_with(toupper)
# A tibble: 5 × 8
  SUBJECT_ID TIME_HR EVENT DOSE_MG  CONC COMPARTMENT WEIGHT SEX  
       <dbl>   <dbl> <dbl>   <dbl> <dbl>       <dbl>  <dbl> <chr>
1          1     0       1     100  NA             1     72 F    
2          1     0.5     0      NA   2.1           1     72 F    
3          1     1       0      NA   3.8           1     72 F    
4          2     0       1      80  NA             1     88 M    
5          2     0.5     0      NA   1.6           1     88 M    
# 4
pk_std <- pk %>%
  rename(
    ID   = subject_id,
    TIME = time_hr,
    EVID = event,
    AMT  = dose_mg,
    DV   = conc,
    CMT  = compartment,
    WT   = weight,
    SEX  = sex
  )

# 5
pk_std %>% relocate(ID, TIME)
# A tibble: 5 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     2   0       1    80  NA       1    88 M    
5     2   0.5     0    NA   1.6     1    88 M    
# 6
pk_std %>% select(ID, TIME, EVID, AMT, DV)
# A tibble: 5 × 5
     ID  TIME  EVID   AMT    DV
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1   0       1   100  NA  
2     1   0.5     0    NA   2.1
3     1   1       0    NA   3.8
4     2   0       1    80  NA  
5     2   0.5     0    NA   1.6

Summary

You now know how to:

  • Use select() and helpers to safely isolate relevant variables.
  • Rename columns using consistent PMx conventions.
  • Reorder columns for modeling workflows.
  • Reduce structural risk early in the data wrangling pipeline.

Clean structure is not cosmetic — it protects your modeling workflow from silent errors.


  • Rename once, early.
  • Use select helpers when exploring new data.
  • Keep modeling columns grouped together.
  • Standardize naming conventions across datasets.
  • Structural clarity prevents modeling mistakes.