Concentration Data Structure in Practice with Theoph and PKNCAconc()

Use a real PK dataset (Theoph) to structure concentration–time data for NCA, define profiles explicitly, and build a PKNCA concentration object you can trust.
Tip

Big idea: In NCA, your data structure is part of the method. If you don’t define “one profile” correctly, your AUC and half-life can be wrong even if the code runs.

Learning Objectives

By the end of this lesson, you will be able to:

  • Identify the minimum columns required for concentration–time NCA.
  • Define what counts as a single “profile” (ID, occasion, analyte, matrix, etc.).
  • Build a PKNCAconc() object using a formula interface.
  • Perform structural QC checks before running calculations.

Key Ideas

  • NCA is computed per profile.
  • A profile is typically one subject (and often one occasion) over one dosing interval.
  • Concentration data must be in long format.
  • Timepoints must be unique within each profile.
  • PKNCAconc() defines structure — it does not perform calculations.

Using a Real Dataset: Theoph

We will use the built-in R dataset Theoph, which contains theophylline concentration–time data after oral dosing.

library(tidyverse)
library(PKNCA)

data(Theoph)

theoph_conc <- as_tibble(Theoph) %>%
  transmute(
    ID   = Subject,
    TIME = Time,
    CONC = conc
  )

theoph_conc %>%
  arrange(ID, TIME) %>%
  head(12)
# A tibble: 12 × 3
   ID     TIME  CONC
   <ord> <dbl> <dbl>
 1 6      0     0   
 2 6      0.27  1.29
 3 6      0.58  3.08
 4 6      1.15  6.44
 5 6      2.03  6.32
 6 6      3.57  5.53
 7 6      5     4.94
 8 6      7     4.02
 9 6      9.22  3.46
10 6     12.1   2.78
11 6     23.8   0.92
12 7      0     0.15
Tip

Why head()?
When teaching, previewing a small slice keeps attention on structure instead of overwhelming output.


What Columns Are Required?

Minimum:

  • ID
  • TIME
  • CONC

Often also needed:

  • PERIOD / OCC
  • ANALYTE
  • MATRIX
  • ARM / TRT

Rule: If a variable changes the PK curve, include it in grouping.


Defining “One Profile”

Typical groupings:

  • Single-dose → ID
  • Crossover → ID + PERIOD
  • Multiple analytes → ID + ANALYTE
  • Multiple occasions → ID + OCC

Diagnostic question:

If I facet by this variable, would I expect separate PK curves?

For Theoph, ID alone is correct.


Structural QC Before PKNCA

theoph_conc %>%
  summarise(
    n_rows = n(),
    n_subjects = n_distinct(ID),
    any_negative_time = any(TIME < 0),
    any_negative_conc = any(CONC < 0),
    any_missing_conc = any(is.na(CONC))
  )
# A tibble: 1 × 5
  n_rows n_subjects any_negative_time any_negative_conc any_missing_conc
   <int>      <int> <lgl>             <lgl>             <lgl>           
1    132         12 FALSE             FALSE             FALSE           

Check observations per subject:

theoph_conc %>%
  count(ID, name = "n_obs") %>%
  arrange(ID) %>%
  head(12)
# A tibble: 12 × 2
   ID    n_obs
   <ord> <int>
 1 6        11
 2 7        11
 3 8        11
 4 11       11
 5 3        11
 6 2        11
 7 4        11
 8 9        11
 9 12       11
10 10       11
11 1        11
12 5        11

Check duplicate timepoints within subject:

theoph_conc %>%
  count(ID, TIME) %>%
  filter(n > 1)
# A tibble: 0 × 3
# ℹ 3 variables: ID <ord>, TIME <dbl>, n <int>

PKNCA requires zero duplicate timepoints within profile.


Building the Concentration Object

conc_obj <- PKNCAconc(CONC ~ TIME | ID, data = theoph_conc)
conc_obj
Formula for concentration:
 CONC ~ TIME | ID
Data are dense PK.
With 12 subjects defined in the 'ID' column.
Nominal time column is not specified.

First 6 rows of concentration data:
 ID TIME  CONC exclude volume duration
  1 0.00  0.74    <NA>     NA        0
  1 0.25  2.84    <NA>     NA        0
  1 0.57  6.57    <NA>     NA        0
  1 1.12 10.50    <NA>     NA        0
  1 2.02  9.66    <NA>     NA        0
  1 3.82  8.58    <NA>     NA        0

Interpretation:

CONC over TIME grouped by ID.

No calculations yet — just structure.


Worked Example: Grouping Mistake and Fix

Simulate two occasions per subject:

conc_with_two_periods <- theoph_conc %>%
  mutate(PERIOD = 1) %>%
  bind_rows(theoph_conc %>% mutate(PERIOD = 2))

Incorrect grouping (ID only):

bad_attempt <- tryCatch(
  PKNCAconc(CONC ~ TIME | ID, data = conc_with_two_periods),
  error = function(e) e
)

bad_attempt
<simpleError in duplicate_check(object = ret, data_type = "concentration"): Rows that are not unique per group and time (column names: TIME, ID) found within concentration data.  Row numbers: 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264>

Correct grouping:

good_attempt <- PKNCAconc(CONC ~ TIME | ID + PERIOD, data = conc_with_two_periods)
good_attempt
Formula for concentration:
 CONC ~ TIME | ID + PERIOD
Data are dense PK.
With 2 subjects defined in the 'PERIOD' column.
Nominal time column is not specified.

First 6 rows of concentration data:
 ID TIME  CONC PERIOD exclude volume duration
  1 0.00  0.74      1    <NA>     NA        0
  1 0.25  2.84      1    <NA>     NA        0
  1 0.57  6.57      1    <NA>     NA        0
  1 1.12 10.50      1    <NA>     NA        0
  1 2.02  9.66      1    <NA>     NA        0
  1 3.82  8.58      1    <NA>     NA        0

Strategies

  • Define profile grouping explicitly before coding.
  • Always check duplicates per ID + TIME.
  • Validate time ranges against protocol.
  • Treat structural QC as mandatory.

Common Mistakes

Warning
  • Grouping only by ID when multiple occasions exist.
  • Assuming time units are correct.
  • Ignoring duplicated rows after joins.
  • Treating “code ran” as “results are valid.”

Practice Problems

Conceptual

  1. In a crossover study with ID, PERIOD, TIME, CONC, what grouping should you use and why?
  2. Why must timepoints be unique within a profile?

Executable

  1. Count how many unique profiles are implied by:
    • ID
    • ID + PERIOD (after creating conc_with_two_periods)
  2. Verify that duplicate timepoints exist within ID but not within ID + PERIOD.

1. Crossover grouping:
Use CONC ~ TIME | ID + PERIOD because each period represents a separate PK profile.

2. Unique timepoints:
Trapezoidal integration and terminal slope estimation require ordered, non-duplicated time values.

3. Count profiles:

n_distinct(theoph_conc$ID)
[1] 12
n_distinct(conc_with_two_periods$ID, conc_with_two_periods$PERIOD)
[1] 24

4. Check duplicates:

# Duplicates within ID only
conc_with_two_periods %>%
  count(ID, TIME) %>%
  filter(n > 1)
# A tibble: 132 × 3
   ID     TIME     n
   <ord> <dbl> <int>
 1 6      0        2
 2 6      0.27     2
 3 6      0.58     2
 4 6      1.15     2
 5 6      2.03     2
 6 6      3.57     2
 7 6      5        2
 8 6      7        2
 9 6      9.22     2
10 6     12.1      2
# ℹ 122 more rows
# No duplicates within ID + PERIOD
conc_with_two_periods %>%
  count(ID, PERIOD, TIME) %>%
  filter(n > 1)
# A tibble: 0 × 4
# ℹ 4 variables: ID <ord>, PERIOD <dbl>, TIME <dbl>, n <int>

Summary

To use PKNCA correctly:

  • Keep data in long format.
  • Define profiles explicitly.
  • Ensure unique timepoints within profile.
  • Perform structural QC before calculations.

  • If a variable changes the curve, include it in grouping.
  • Count observations per profile before running NCA.
  • Check time ranges against study design.
  • Let PKNCA guardrails catch mistakes — but don’t rely on them as your only QC.