First 10 Minutes: Structural QC for PMx Data

A disciplined workflow for inspecting structure and catching common issues immediately after importing data.

Tip

What you build today: a repeatable QC checklist you run immediately after importing any PMx dataset.

Learning Objectives

By the end of this lesson, you will be able to:

Inspect dataset structure efficiently.
Detect missingness patterns.
Identify duplicate records.
Enforce correct sorting.
Perform subject-level sanity checks.
Explain why structural QC precedes any modeling step.

Setup

library(tidyverse)

Key Ideas

The first 10 minutes after import determine whether your modeling workflow is safe.

Structural QC focuses on:

Column names and types
Missingness patterns
Duplicate records
Correct record ordering
Subject-level plausibility

This is not exploratory analysis — it is structural validation.

Warning

Many modeling errors are not modeling errors — they are structural data errors discovered too late.

In this lesson, you will see functions like summarise() and group_by() used before their dedicated deep-dive lessons later in the course.

For now, think of them as tools that help collapse data into subject-level summaries.

Example Dataset (Demo Only)

pk <- tibble::tribble(
  ~ID, ~TIME, ~EVID, ~AMT, ~DV, ~CMT,
    1,   0.0,    1,  100,  NA,   1,
    1,   0.5,    0,   NA,  2.1,  1,
    2,   0.0,    1,   80,  NA,   1,
    2,   0.5,    0,   NA,  1.6,  1
)

Worked Example 1: Structural Inspection

glimpse(pk)

Rows: 4
Columns: 6
$ ID   <dbl> 1, 1, 2, 2
$ TIME <dbl> 0.0, 0.5, 0.0, 0.5
$ EVID <dbl> 1, 0, 1, 0
$ AMT  <dbl> 100, NA, 80, NA
$ DV   <dbl> NA, 2.1, NA, 1.6
$ CMT  <dbl> 1, 1, 1, 1

Confirm:

Column names
Column types
Whether IDs appear numeric or character
Whether time is stored as numeric

Structural issues at this stage often indicate upstream data problems.

Worked Example 2: Missingness Check

pk %>%
  summarise(across(everything(), ~ sum(is.na(.x))))

# A tibble: 1 × 6
     ID  TIME  EVID   AMT    DV   CMT
  <int> <int> <int> <int> <int> <int>
1     0     0     0     2     2     0

Look for unexpected patterns such as:

Missing doses (AMT) when EVID == 1
Missing observations (DV) when EVID == 0
Entire columns with high missingness

Note

summarise() collapses the dataset into summary statistics.
We will study this function formally later.
For now, use it as a structural diagnostic tool.

Worked Example 3: Duplicate Detection

pk %>%
  count(ID, TIME, EVID) %>%
  filter(n > 1)

# A tibble: 0 × 4
# ℹ 4 variables: ID <dbl>, TIME <dbl>, EVID <dbl>, n <int>

If any rows appear, investigate immediately.

Warning

Duplicate event records can silently break modeling workflows — especially in NONMEM, nlme, or simulation pipelines.

Worked Example 4: Sorting Enforcement

pk <- pk %>% arrange(ID, TIME, EVID)

Correct ordering is critical because many modeling tools assume:

Data are sorted by subject
Records are time-ordered
Dose records precede observations at identical times

Never assume imported data are properly ordered.

Worked Example 5: Subject-Level Sanity Check

pk %>%
  group_by(ID) %>%
  summarise(
    n_rows = n(),
    n_dose = sum(EVID == 1, na.rm = TRUE),
    n_obs  = sum(EVID == 0, na.rm = TRUE),
    time_min = min(TIME, na.rm = TRUE),
    time_max = max(TIME, na.rm = TRUE),
    .groups = "drop"
  )

# A tibble: 2 × 6
     ID n_rows n_dose n_obs time_min time_max
  <dbl>  <int>  <int> <int>    <dbl>    <dbl>
1     1      2      1     1        0      0.5
2     2      2      1     1        0      0.5

Scan for:

Subjects with no doses
Subjects with no observations
Implausible time ranges
Extremely short or long follow-up

This step frequently catches data merges gone wrong.

Strategies

Always run the same QC checklist.
Inspect structure before analysis.
Verify sorting before modeling.
Confirm each subject has sensible time ranges.
Document any structural corrections made.
Treat QC as a ritual, not an optional step.

Common Mistakes

Skipping QC because the dataset “looks small”
Checking only missing DV but not missing AMT
Forgetting to sort before modeling
Assuming duplicates are harmless
Trusting exported data without verification

Practice Problems

Inspect a dataset with glimpse().
Count missing values.
Detect duplicates.
Sort correctly.
Produce subject-level summaries.

Step-by-Step Solutions

glimpse(pk)

Rows: 4
Columns: 6
$ ID   <dbl> 1, 1, 2, 2
$ TIME <dbl> 0.0, 0.5, 0.0, 0.5
$ EVID <dbl> 1, 0, 1, 0
$ AMT  <dbl> 100, NA, 80, NA
$ DV   <dbl> NA, 2.1, NA, 1.6
$ CMT  <dbl> 1, 1, 1, 1

pk %>%
  summarise(across(everything(), ~ sum(is.na(.x))))

# A tibble: 1 × 6
     ID  TIME  EVID   AMT    DV   CMT
  <int> <int> <int> <int> <int> <int>
1     0     0     0     2     2     0

pk %>%
  count(ID, TIME, EVID) %>%
  filter(n > 1)

# A tibble: 0 × 4
# ℹ 4 variables: ID <dbl>, TIME <dbl>, EVID <dbl>, n <int>

pk <- pk %>% arrange(ID, TIME, EVID)

pk %>%
  group_by(ID) %>%
  summarise(n_rows = n(), .groups = "drop")

# A tibble: 2 × 2
     ID n_rows
  <dbl>  <int>
1     1      2
2     2      2

Summary

After import, you should immediately:

Inspect structure
Check missingness
Detect duplicates
Enforce sorting
Perform subject-level checks

Structural QC is a safety layer that protects all downstream modeling.

Quick Tips

Never analyze before inspecting.
Sorting errors are silent but dangerous.
Build a repeatable QC ritual.
Structural validation always precedes modeling.