pk <- data.frame(
ID = c(1, 1, 1, 2, 2, 2),
TIME = c(0.5, 1, 2, 0.5, 1, 2),
DV = c(2.1, 3.8, 3.0, 1.6, 2.9, 2.4)
)
pk ID TIME DV
1 1 0.5 2.1
2 1 1.0 3.8
3 1 2.0 3.0
4 2 0.5 1.6
5 2 1.0 2.9
6 2 2.0 2.4
Big idea: Most PMx work happens in tables. If you understand data frames (and tibbles), you can understand PK datasets.
By the end of this lesson, you will be able to:
glimpse() and summary().A data frame is a table where:
In PMx, data frames represent:
ID TIME DV
1 1 0.5 2.1
2 1 1.0 3.8
3 1 2.0 3.0
4 2 0.5 1.6
5 2 1.0 2.9
6 2 2.0 2.4
ID, TIME, DV)'data.frame': 6 obs. of 3 variables:
$ ID : num 1 1 1 2 2 2
$ TIME: num 0.5 1 2 0.5 1 2
$ DV : num 2.1 3.8 3 1.6 2.9 2.4
ID TIME DV
Min. :1.0 Min. :0.500 Min. :1.600
1st Qu.:1.0 1st Qu.:0.625 1st Qu.:2.175
Median :1.5 Median :1.000 Median :2.650
Mean :1.5 Mean :1.167 Mean :2.633
3rd Qu.:2.0 3rd Qu.:1.750 3rd Qu.:2.975
Max. :2.0 Max. :2.000 Max. :3.800
If you use tidyverse:
Rows: 6
Columns: 3
$ ID <dbl> 1, 1, 1, 2, 2, 2
$ TIME <dbl> 0.5, 1.0, 2.0, 0.5, 1.0, 2.0
$ DV <dbl> 2.1, 3.8, 3.0, 1.6, 2.9, 2.4
glimpse() is a great PMx habit.
A tibble is a modern version of a data frame with nicer behavior:
Create a tibble:
# A tibble: 6 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 1 3.8
3 1 2 3
4 2 0.5 1.6
5 2 1 2.9
6 2 2 2.4
Subject-level covariates often come from another source.
# A tibble: 2 × 3
ID WT SEX
<dbl> <dbl> <chr>
1 1 72 M
2 2 64 F
Later, we’ll join this onto PK data.
A data frame requires all columns to be the same length.
In older versions of R (< 4.0.0), data.frame() often converted character columns into factors automatically.
In modern R (≥ 4.0.0), data.frame() keeps strings as character by default — so you may not see this behavior unless you’re working with older code or explicitly turn it on.
Let’s confirm what happens in your R:
'data.frame': 2 obs. of 2 variables:
$ ID : num 1 2
$ ARM: chr "A" "B"
You should see ARM listed as chr (character).
If you want a factor (common for grouping variables), convert it intentionally:
'data.frame': 2 obs. of 2 variables:
$ ID : num 1 2
$ ARM: Factor w/ 2 levels "A","B": 1 2
If you ever encounter older code that forces the previous behavior, you might see:
'data.frame': 2 obs. of 2 variables:
$ ID : num 1 2
$ ARM: Factor w/ 2 levels "A","B": 1 2
PMx habit: Always check column types early (using str() or glimpse()) so IDs, covariates, and grouping variables behave the way you expect.
A simplified event-record dataset:
# A tibble: 6 × 5
ID TIME EVID AMT DV
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 100 NA
2 1 0.5 0 NA 2.1
3 1 1 0 NA 3.8
4 2 0 1 80 NA
5 2 0.5 0 NA 1.6
6 2 1 0 NA 2.9
This structure is the bridge to modeling later.
glimpse() early to confirm types and missingness.ID, TIME, DV for two subjects.ID, WT, and SEX.glimpse() to inspect both tables.EVID, AMT, and DV.Rows: 4
Columns: 3
$ ID <dbl> 1, 1, 2, 2
$ TIME <dbl> 0.5, 1.0, 0.5, 1.0
$ DV <dbl> 2.1, 3.8, 1.6, 2.9
Rows: 2
Columns: 3
$ ID <dbl> 1, 2
$ WT <dbl> 72, 64
$ SEX <chr> "M", "F"
You now understand the PMx “home base” structure:
This is the foundation for data wrangling, visualization, and modeling.
glimpse() before doing anything else.