Data Frames and Tibbles

Learn the most important structure in PMx work: data frames/tibbles, columns vs rows, and how PK data lives in tables.
Tip

Big idea: Most PMx work happens in tables. If you understand data frames (and tibbles), you can understand PK datasets.

Learning Objectives

By the end of this lesson, you will be able to:

  • Explain what a data frame is (in plain language).
  • Distinguish rows vs columns and why that matters.
  • Create small data frames and tibbles.
  • Inspect data frames with glimpse() and summary().
  • Understand why tibbles are often preferred in PMx workflows.

What is a data frame?

A data frame is a table where:

  • each column is a vector (all same length)
  • each row is one record (one observation/event)

In PMx, data frames represent:

  • concentration measurements (observations)
  • dosing events (doses)
  • covariates (subject-level attributes)
  • model-ready event records

Create a small PMx-style table

pk <- data.frame(
  ID = c(1, 1, 1, 2, 2, 2),
  TIME = c(0.5, 1, 2, 0.5, 1, 2),
  DV = c(2.1, 3.8, 3.0, 1.6, 2.9, 2.4)
)

pk
  ID TIME  DV
1  1  0.5 2.1
2  1  1.0 3.8
3  1  2.0 3.0
4  2  0.5 1.6
5  2  1.0 2.9
6  2  2.0 2.4

Rows vs columns (a PMx way to think)

  • columns: variables (ID, TIME, DV)
  • rows: records (a measurement at a time)

Basic inspection tools

str(pk)
'data.frame':   6 obs. of  3 variables:
 $ ID  : num  1 1 1 2 2 2
 $ TIME: num  0.5 1 2 0.5 1 2
 $ DV  : num  2.1 3.8 3 1.6 2.9 2.4
summary(pk)
       ID           TIME             DV       
 Min.   :1.0   Min.   :0.500   Min.   :1.600  
 1st Qu.:1.0   1st Qu.:0.625   1st Qu.:2.175  
 Median :1.5   Median :1.000   Median :2.650  
 Mean   :1.5   Mean   :1.167   Mean   :2.633  
 3rd Qu.:2.0   3rd Qu.:1.750   3rd Qu.:2.975  
 Max.   :2.0   Max.   :2.000   Max.   :3.800  

If you use tidyverse:

library(tidyverse)
glimpse(pk)
Rows: 6
Columns: 3
$ ID   <dbl> 1, 1, 1, 2, 2, 2
$ TIME <dbl> 0.5, 1.0, 2.0, 0.5, 1.0, 2.0
$ DV   <dbl> 2.1, 3.8, 3.0, 1.6, 2.9, 2.4

glimpse() is a great PMx habit.


Tibbles (modern data frames)

A tibble is a modern version of a data frame with nicer behavior:

  • prints compactly and shows each variable’s type
  • does not convert strings to factors by default
  • generally plays better with tidyverse workflows

Create a tibble:

library(tidyverse)

pk_tbl <- tibble(
  ID = c(1, 1, 1, 2, 2, 2),
  TIME = c(0.5, 1, 2, 0.5, 1, 2),
  DV = c(2.1, 3.8, 3.0, 1.6, 2.9, 2.4)
)

pk_tbl
# A tibble: 6 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3  
4     2   0.5   1.6
5     2   1     2.9
6     2   2     2.4

A realistic PMx pattern: covariates live in a separate table

Subject-level covariates often come from another source.

cov <- tibble(
  ID = c(1, 2),
  WT = c(72, 64),
  SEX = c("M", "F")
)

cov
# A tibble: 2 × 3
     ID    WT SEX  
  <dbl> <dbl> <chr>
1     1    72 M    
2     2    64 F    

Later, we’ll join this onto PK data.


Common pitfalls (and why they matter)

1) Different lengths in columns

A data frame requires all columns to be the same length.

# This will error (and that's good!)
# data.frame(ID = c(1, 2), TIME = c(0.5, 1, 2))

2) Column types: characters vs factors (what changed in modern R)

In older versions of R (< 4.0.0), data.frame() often converted character columns into factors automatically.

In modern R (≥ 4.0.0), data.frame() keeps strings as character by default — so you may not see this behavior unless you’re working with older code or explicitly turn it on.

Let’s confirm what happens in your R:

pk_new <- data.frame(
  ID = c(1, 2),
  ARM = c("A", "B")
)

str(pk_new)
'data.frame':   2 obs. of  2 variables:
 $ ID : num  1 2
 $ ARM: chr  "A" "B"

You should see ARM listed as chr (character).

If you want a factor (common for grouping variables), convert it intentionally:

pk_new$ARM <- factor(pk_new$ARM)
str(pk_new)
'data.frame':   2 obs. of  2 variables:
 $ ID : num  1 2
 $ ARM: Factor w/ 2 levels "A","B": 1 2

If you ever encounter older code that forces the previous behavior, you might see:

pk_old_style <- data.frame(
  ID = c(1, 2),
  ARM = c("A", "B"),
  stringsAsFactors = TRUE
)

str(pk_old_style)
'data.frame':   2 obs. of  2 variables:
 $ ID : num  1 2
 $ ARM: Factor w/ 2 levels "A","B": 1 2
Tip

PMx habit: Always check column types early (using str() or glimpse()) so IDs, covariates, and grouping variables behave the way you expect.


PMx Example: Event records in a data frame

A simplified event-record dataset:

ev <- tibble(
  ID = c(1, 1, 1, 2, 2, 2),
  TIME = c(0, 0.5, 1, 0, 0.5, 1),
  EVID = c(1, 0, 0, 1, 0, 0),
  AMT = c(100, NA, NA, 80, NA, NA),
  DV = c(NA, 2.1, 3.8, NA, 1.6, 2.9)
)

ev
# A tibble: 6 × 5
     ID  TIME  EVID   AMT    DV
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1   0       1   100  NA  
2     1   0.5     0    NA   2.1
3     1   1       0    NA   3.8
4     2   0       1    80  NA  
5     2   0.5     0    NA   1.6
6     2   1       0    NA   2.9

This structure is the bridge to modeling later.


Strategies

  • Think: “columns are variables, rows are records.”
  • Use glimpse() early to confirm types and missingness.
  • Prefer tibbles for tidyverse-based PMx workflows.
  • Keep covariates in separate tables until you intentionally merge them.

Practice Problems

  1. Create a tibble with columns ID, TIME, DV for two subjects.
  2. Create a covariate tibble with ID, WT, and SEX.
  3. Use glimpse() to inspect both tables.
  4. Create a simple event-record table with EVID, AMT, and DV.
  5. Explain in one sentence the difference between “row” and “column” in PMx context.

library(tidyverse)

pk_tbl <- tibble(
  ID = c(1, 1, 2, 2),
  TIME = c(0.5, 1, 0.5, 1),
  DV = c(2.1, 3.8, 1.6, 2.9)
)

cov <- tibble(
  ID = c(1, 2),
  WT = c(72, 64),
  SEX = c("M", "F")
)

glimpse(pk_tbl)
Rows: 4
Columns: 3
$ ID   <dbl> 1, 1, 2, 2
$ TIME <dbl> 0.5, 1.0, 0.5, 1.0
$ DV   <dbl> 2.1, 3.8, 1.6, 2.9
glimpse(cov)
Rows: 2
Columns: 3
$ ID  <dbl> 1, 2
$ WT  <dbl> 72, 64
$ SEX <chr> "M", "F"
ev <- tibble(
  ID = c(1, 1, 2, 2),
  TIME = c(0, 0.5, 0, 0.5),
  EVID = c(1, 0, 1, 0),
  AMT = c(100, NA, 80, NA),
  DV = c(NA, 2.1, NA, 1.6)
)

Summary

You now understand the PMx “home base” structure:

  • data frames/tibbles are tables
  • columns are vectors (variables)
  • rows are records (events/observations)
  • covariates often live in separate tables
  • event records are just another data frame (with PMx-specific columns)

This is the foundation for data wrangling, visualization, and modeling.


  • Use glimpse() before doing anything else.
  • Keep IDs consistent (numeric or character, but not mixed).
  • Prefer tibbles when using tidyverse.
  • Don’t merge covariates until you intend to.