library(tidyverse)Selecting, Renaming, and Relocating Columns
What you’ll build today: a clean, consistently named, well-ordered PMx dataset ready for modeling and visualization.
Learning Objectives
By the end of this lesson, you will be able to:
- Select columns efficiently using
select()and helper functions. - Rename variables clearly and consistently using
rename()andrename_with(). - Reorder columns using
relocate()for PMx workflows. - Enforce standard PMx naming conventions without rewriting downstream code.
- Recognize structural naming issues that can create modeling errors.
Setup
We’ll use a small sponsor-style dataset.
pk <- tibble::tribble(
~subject_id, ~time_hr, ~event, ~dose_mg, ~conc, ~compartment, ~weight, ~sex,
1, 0, 1, 100, NA, 1, 72, "F",
1, 0.5, 0, NA, 2.1, 1, 72, "F",
1, 1.0, 0, NA, 3.8, 1, 72, "F",
2, 0, 1, 80, NA, 1, 88, "M",
2, 0.5, 0, NA, 1.6, 1, 88, "M"
)
pk# A tibble: 5 × 8
subject_id time_hr event dose_mg conc compartment weight sex
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2.1 1 72 F
3 1 1 0 NA 3.8 1 72 F
4 2 0 1 80 NA 1 88 M
5 2 0.5 0 NA 1.6 1 88 M
Key Ideas
Column structure is part of modeling structure.
In PMx workflows, clean datasets typically:
- Use consistent naming conventions (
ID,TIME,EVID,AMT,DV, etc.). - Separate raw sponsor names from modeling-ready names.
- Keep core modeling columns grouped together.
- Remove unnecessary columns early to reduce cognitive load.
Selecting and renaming are not cosmetic steps — they reduce structural risk.
Inconsistent naming (e.g., id, Subject, subject_id) is a frequent source of silent joins, merge errors, and modeling mistakes.
Worked Example 1: Basic Selection
Explicit selection:
pk %>% select(subject_id, time_hr, conc)# A tibble: 5 × 3
subject_id time_hr conc
<dbl> <dbl> <dbl>
1 1 0 NA
2 1 0.5 2.1
3 1 1 3.8
4 2 0 NA
5 2 0.5 1.6
Selecting reduces clutter and protects against accidental use of irrelevant columns.
Worked Example 2: Select Helpers
starts_with()
pk %>% select(starts_with("time"))# A tibble: 5 × 1
time_hr
<dbl>
1 0
2 0.5
3 1
4 0
5 0.5
ends_with()
pk %>% select(ends_with("_mg"))# A tibble: 5 × 1
dose_mg
<dbl>
1 100
2 NA
3 NA
4 80
5 NA
contains()
pk %>% select(contains("comp"))# A tibble: 5 × 1
compartment
<dbl>
1 1
2 1
3 1
4 1
5 1
matches() (regular expressions)
pk %>% select(matches("^(subj|time)"))# A tibble: 5 × 2
subject_id time_hr
<dbl> <dbl>
1 1 0
2 1 0.5
3 1 1
4 2 0
5 2 0.5
everything() for ordering
pk %>% select(subject_id, time_hr, everything())# A tibble: 5 × 8
subject_id time_hr event dose_mg conc compartment weight sex
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2.1 1 72 F
3 1 1 0 NA 3.8 1 72 F
4 2 0 1 80 NA 1 88 M
5 2 0.5 0 NA 1.6 1 88 M
Combined example (realistic workflow):
pk %>%
select(subject_id, time_hr, starts_with("dose"), conc)# A tibble: 5 × 4
subject_id time_hr dose_mg conc
<dbl> <dbl> <dbl> <dbl>
1 1 0 100 NA
2 1 0.5 NA 2.1
3 1 1 NA 3.8
4 2 0 80 NA
5 2 0.5 NA 1.6
Select helpers improve safety when datasets are large and unfamiliar.
Worked Example 3: Renaming to PMx Conventions
pk_std <- pk %>%
rename(
ID = subject_id,
TIME = time_hr,
EVID = event,
AMT = dose_mg,
DV = conc,
CMT = compartment,
WT = weight,
SEX = sex
)
pk_std# A tibble: 5 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2.1 1 72 F
3 1 1 0 NA 3.8 1 72 F
4 2 0 1 80 NA 1 88 M
5 2 0.5 0 NA 1.6 1 88 M
Rename once — and downstream code becomes simpler.
Bulk renaming example:
pk %>% rename_with(toupper)# A tibble: 5 × 8
SUBJECT_ID TIME_HR EVENT DOSE_MG CONC COMPARTMENT WEIGHT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2.1 1 72 F
3 1 1 0 NA 3.8 1 72 F
4 2 0 1 80 NA 1 88 M
5 2 0.5 0 NA 1.6 1 88 M
Worked Example 4: Relocating Columns
Move core modeling columns to the front:
pk_std %>% relocate(ID, TIME, EVID, AMT, DV)# A tibble: 5 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2.1 1 72 F
3 1 1 0 NA 3.8 1 72 F
4 2 0 1 80 NA 1 88 M
5 2 0.5 0 NA 1.6 1 88 M
Relocation improves readability and aligns with modeling expectations.
A common PMx ordering:
- ID
- TIME
- Event / dosing columns
- Observations
- Covariates
Strategies
- Standardize names immediately after import.
- Use select helpers when exploring unfamiliar sponsor exports.
- Rename once, early, and consistently.
- Keep modeling-relevant columns grouped at the front.
- Preserve raw columns only if needed for auditability.
Common Mistakes
- Renaming after modeling code is written.
- Leaving dose and DV columns ambiguously named.
- Relocating columns without checking downstream expectations.
- Selecting columns before confirming their meaning.
- Forgetting that
select()drops unlisted columns.
Practice Problems
- Select only ID, time, and concentration columns.
- Use
contains()to select weight-related columns. - Rename all columns to uppercase.
- Rename sponsor-style names to PMx conventions.
- Relocate ID and TIME to the front.
- Create a dataset that contains only modeling-relevant columns (ID, TIME, EVID, AMT, DV).
# 1
pk %>% select(subject_id, time_hr, conc)# A tibble: 5 × 3
subject_id time_hr conc
<dbl> <dbl> <dbl>
1 1 0 NA
2 1 0.5 2.1
3 1 1 3.8
4 2 0 NA
5 2 0.5 1.6
# 2
pk %>% select(contains("weight"))# A tibble: 5 × 1
weight
<dbl>
1 72
2 72
3 72
4 88
5 88
# 3
pk %>% rename_with(toupper)# A tibble: 5 × 8
SUBJECT_ID TIME_HR EVENT DOSE_MG CONC COMPARTMENT WEIGHT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2.1 1 72 F
3 1 1 0 NA 3.8 1 72 F
4 2 0 1 80 NA 1 88 M
5 2 0.5 0 NA 1.6 1 88 M
# 4
pk_std <- pk %>%
rename(
ID = subject_id,
TIME = time_hr,
EVID = event,
AMT = dose_mg,
DV = conc,
CMT = compartment,
WT = weight,
SEX = sex
)
# 5
pk_std %>% relocate(ID, TIME)# A tibble: 5 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2.1 1 72 F
3 1 1 0 NA 3.8 1 72 F
4 2 0 1 80 NA 1 88 M
5 2 0.5 0 NA 1.6 1 88 M
# 6
pk_std %>% select(ID, TIME, EVID, AMT, DV)# A tibble: 5 × 5
ID TIME EVID AMT DV
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 100 NA
2 1 0.5 0 NA 2.1
3 1 1 0 NA 3.8
4 2 0 1 80 NA
5 2 0.5 0 NA 1.6
Summary
You now know how to:
- Use
select()and helpers to safely isolate relevant variables. - Rename columns using consistent PMx conventions.
- Reorder columns for modeling workflows.
- Reduce structural risk early in the data wrangling pipeline.
Clean structure is not cosmetic — it protects your modeling workflow from silent errors.
- Rename once, early.
- Use select helpers when exploring new data.
- Keep modeling columns grouped together.
- Standardize naming conventions across datasets.
- Structural clarity prevents modeling mistakes.