Selecting, Renaming, and Relocating Columns

Use select(), rename(), relocate(), and helper functions to standardize column structure for PMx workflows.

Tip

What you’ll build today: a clean, consistently named, well-ordered PMx dataset ready for modeling and visualization.

Learning Objectives

By the end of this lesson, you will be able to:

Select columns efficiently using select() and helper functions.
Rename variables clearly and consistently using rename() and rename_with().
Reorder columns using relocate() for PMx workflows.
Enforce standard PMx naming conventions without rewriting downstream code.
Recognize structural naming issues that can create modeling errors.

Setup

library(tidyverse)

We’ll use a small sponsor-style dataset.

pk <- tibble::tribble(
  ~subject_id, ~time_hr, ~event, ~dose_mg, ~conc, ~compartment, ~weight, ~sex,
            1,       0,      1,     100,    NA,        1,        72,   "F",
            1,     0.5,      0,      NA,   2.1,        1,        72,   "F",
            1,     1.0,      0,      NA,   3.8,        1,        72,   "F",
            2,       0,      1,      80,    NA,        1,        88,   "M",
            2,     0.5,      0,      NA,   1.6,        1,        88,   "M"
)

pk

# A tibble: 5 × 8
  subject_id time_hr event dose_mg  conc compartment weight sex  
       <dbl>   <dbl> <dbl>   <dbl> <dbl>       <dbl>  <dbl> <chr>
1          1     0       1     100  NA             1     72 F    
2          1     0.5     0      NA   2.1           1     72 F    
3          1     1       0      NA   3.8           1     72 F    
4          2     0       1      80  NA             1     88 M    
5          2     0.5     0      NA   1.6           1     88 M

Key Ideas

Column structure is part of modeling structure.

In PMx workflows, clean datasets typically:

Use consistent naming conventions (ID, TIME, EVID, AMT, DV, etc.).
Separate raw sponsor names from modeling-ready names.
Keep core modeling columns grouped together.
Remove unnecessary columns early to reduce cognitive load.

Selecting and renaming are not cosmetic steps — they reduce structural risk.

Warning

Inconsistent naming (e.g., id, Subject, subject_id) is a frequent source of silent joins, merge errors, and modeling mistakes.

Worked Example 1: Basic Selection

Explicit selection:

pk %>% select(subject_id, time_hr, conc)

# A tibble: 5 × 3
  subject_id time_hr  conc
       <dbl>   <dbl> <dbl>
1          1     0    NA  
2          1     0.5   2.1
3          1     1     3.8
4          2     0    NA  
5          2     0.5   1.6

Selecting reduces clutter and protects against accidental use of irrelevant columns.

Worked Example 2: Select Helpers

`starts_with()`

pk %>% select(starts_with("time"))

# A tibble: 5 × 1
  time_hr
    <dbl>
1     0  
2     0.5
3     1  
4     0  
5     0.5

`ends_with()`

pk %>% select(ends_with("_mg"))

# A tibble: 5 × 1
  dose_mg
    <dbl>
1     100
2      NA
3      NA
4      80
5      NA

`contains()`

pk %>% select(contains("comp"))

# A tibble: 5 × 1
  compartment
        <dbl>
1           1
2           1
3           1
4           1
5           1

`matches()` (regular expressions)

pk %>% select(matches("^(subj|time)"))

# A tibble: 5 × 2
  subject_id time_hr
       <dbl>   <dbl>
1          1     0  
2          1     0.5
3          1     1  
4          2     0  
5          2     0.5

`everything()` for ordering

pk %>% select(subject_id, time_hr, everything())

# A tibble: 5 × 8
  subject_id time_hr event dose_mg  conc compartment weight sex  
       <dbl>   <dbl> <dbl>   <dbl> <dbl>       <dbl>  <dbl> <chr>
1          1     0       1     100  NA             1     72 F    
2          1     0.5     0      NA   2.1           1     72 F    
3          1     1       0      NA   3.8           1     72 F    
4          2     0       1      80  NA             1     88 M    
5          2     0.5     0      NA   1.6           1     88 M

Combined example (realistic workflow):

pk %>%
  select(subject_id, time_hr, starts_with("dose"), conc)

# A tibble: 5 × 4
  subject_id time_hr dose_mg  conc
       <dbl>   <dbl>   <dbl> <dbl>
1          1     0       100  NA  
2          1     0.5      NA   2.1
3          1     1        NA   3.8
4          2     0        80  NA  
5          2     0.5      NA   1.6

Note

Select helpers improve safety when datasets are large and unfamiliar.

Worked Example 3: Renaming to PMx Conventions

pk_std <- pk %>%
  rename(
    ID   = subject_id,
    TIME = time_hr,
    EVID = event,
    AMT  = dose_mg,
    DV   = conc,
    CMT  = compartment,
    WT   = weight,
    SEX  = sex
  )

pk_std

# A tibble: 5 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     2   0       1    80  NA       1    88 M    
5     2   0.5     0    NA   1.6     1    88 M

Rename once — and downstream code becomes simpler.

Bulk renaming example:

pk %>% rename_with(toupper)

# A tibble: 5 × 8
  SUBJECT_ID TIME_HR EVENT DOSE_MG  CONC COMPARTMENT WEIGHT SEX  
       <dbl>   <dbl> <dbl>   <dbl> <dbl>       <dbl>  <dbl> <chr>
1          1     0       1     100  NA             1     72 F    
2          1     0.5     0      NA   2.1           1     72 F    
3          1     1       0      NA   3.8           1     72 F    
4          2     0       1      80  NA             1     88 M    
5          2     0.5     0      NA   1.6           1     88 M

Worked Example 4: Relocating Columns

Move core modeling columns to the front:

pk_std %>% relocate(ID, TIME, EVID, AMT, DV)

# A tibble: 5 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     2   0       1    80  NA       1    88 M    
5     2   0.5     0    NA   1.6     1    88 M

Relocation improves readability and aligns with modeling expectations.

A common PMx ordering:

ID
TIME
Event / dosing columns
Observations
Covariates

Strategies

Standardize names immediately after import.
Use select helpers when exploring unfamiliar sponsor exports.
Rename once, early, and consistently.
Keep modeling-relevant columns grouped at the front.
Preserve raw columns only if needed for auditability.

Common Mistakes

Renaming after modeling code is written.
Leaving dose and DV columns ambiguously named.
Relocating columns without checking downstream expectations.
Selecting columns before confirming their meaning.
Forgetting that select() drops unlisted columns.

Practice Problems

Select only ID, time, and concentration columns.
Use contains() to select weight-related columns.
Rename all columns to uppercase.
Rename sponsor-style names to PMx conventions.
Relocate ID and TIME to the front.
Create a dataset that contains only modeling-relevant columns (ID, TIME, EVID, AMT, DV).

Step-by-Step Solutions

# 1
pk %>% select(subject_id, time_hr, conc)

# A tibble: 5 × 3
  subject_id time_hr  conc
       <dbl>   <dbl> <dbl>
1          1     0    NA  
2          1     0.5   2.1
3          1     1     3.8
4          2     0    NA  
5          2     0.5   1.6

# 2
pk %>% select(contains("weight"))

# A tibble: 5 × 1
  weight
   <dbl>
1     72
2     72
3     72
4     88
5     88

# 3
pk %>% rename_with(toupper)

# A tibble: 5 × 8
  SUBJECT_ID TIME_HR EVENT DOSE_MG  CONC COMPARTMENT WEIGHT SEX  
       <dbl>   <dbl> <dbl>   <dbl> <dbl>       <dbl>  <dbl> <chr>
1          1     0       1     100  NA             1     72 F    
2          1     0.5     0      NA   2.1           1     72 F    
3          1     1       0      NA   3.8           1     72 F    
4          2     0       1      80  NA             1     88 M    
5          2     0.5     0      NA   1.6           1     88 M

# 4
pk_std <- pk %>%
  rename(
    ID   = subject_id,
    TIME = time_hr,
    EVID = event,
    AMT  = dose_mg,
    DV   = conc,
    CMT  = compartment,
    WT   = weight,
    SEX  = sex
  )

# 5
pk_std %>% relocate(ID, TIME)

# A tibble: 5 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     2   0       1    80  NA       1    88 M    
5     2   0.5     0    NA   1.6     1    88 M

# 6
pk_std %>% select(ID, TIME, EVID, AMT, DV)

# A tibble: 5 × 5
     ID  TIME  EVID   AMT    DV
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1   0       1   100  NA  
2     1   0.5     0    NA   2.1
3     1   1       0    NA   3.8
4     2   0       1    80  NA  
5     2   0.5     0    NA   1.6

Summary

You now know how to:

Use select() and helpers to safely isolate relevant variables.
Rename columns using consistent PMx conventions.
Reorder columns for modeling workflows.
Reduce structural risk early in the data wrangling pipeline.

Clean structure is not cosmetic — it protects your modeling workflow from silent errors.

Quick Tips

Rename once, early.
Use select helpers when exploring new data.
Keep modeling columns grouped together.
Standardize naming conventions across datasets.
Structural clarity prevents modeling mistakes.

Learning Objectives

Setup

Key Ideas

Worked Example 1: Basic Selection

Worked Example 2: Select Helpers

starts_with()

ends_with()

contains()

matches() (regular expressions)

everything() for ordering

Worked Example 3: Renaming to PMx Conventions

Worked Example 4: Relocating Columns

Strategies

Common Mistakes

Practice Problems

Summary

`starts_with()`

`ends_with()`

`contains()`

`matches()` (regular expressions)

`everything()` for ordering