Concentration Data Structure in Practice with Theoph and PKNCAconc()

Use a real PK dataset (Theoph) to structure concentration–time data for NCA, define profiles explicitly, and build a PKNCA concentration object you can trust.

Tip

Big idea: In NCA, your data structure is part of the method. If you don’t define “one profile” correctly, your AUC and half-life can be wrong even if the code runs.

Learning Objectives

By the end of this lesson, you will be able to:

Identify the minimum columns required for concentration–time NCA.
Define what counts as a single “profile” (ID, occasion, analyte, matrix, etc.).
Build a PKNCAconc() object using a formula interface.
Perform structural QC checks before running calculations.

Key Ideas

NCA is computed per profile.
A profile is typically one subject (and often one occasion) over one dosing interval.
Concentration data must be in long format.
Timepoints must be unique within each profile.
PKNCAconc() defines structure — it does not perform calculations.

Using a Real Dataset: Theoph

We will use the built-in R dataset Theoph, which contains theophylline concentration–time data after oral dosing.

library(tidyverse)
library(PKNCA)

data(Theoph)

theoph_conc <- as_tibble(Theoph) %>%
  transmute(
    ID   = Subject,
    TIME = Time,
    CONC = conc
  )

theoph_conc %>%
  arrange(ID, TIME) %>%
  head(12)

# A tibble: 12 × 3
   ID     TIME  CONC
   <ord> <dbl> <dbl>
 1 6      0     0   
 2 6      0.27  1.29
 3 6      0.58  3.08
 4 6      1.15  6.44
 5 6      2.03  6.32
 6 6      3.57  5.53
 7 6      5     4.94
 8 6      7     4.02
 9 6      9.22  3.46
10 6     12.1   2.78
11 6     23.8   0.92
12 7      0     0.15

Tip

Why head()?
When teaching, previewing a small slice keeps attention on structure instead of overwhelming output.

What Columns Are Required?

Minimum:

ID
TIME
CONC

Often also needed:

PERIOD / OCC
ANALYTE
MATRIX
ARM / TRT

Rule: If a variable changes the PK curve, include it in grouping.

Defining “One Profile”

Typical groupings:

Single-dose → ID
Crossover → ID + PERIOD
Multiple analytes → ID + ANALYTE
Multiple occasions → ID + OCC

Diagnostic question:

If I facet by this variable, would I expect separate PK curves?

For Theoph, ID alone is correct.

Structural QC Before PKNCA

theoph_conc %>%
  summarise(
    n_rows = n(),
    n_subjects = n_distinct(ID),
    any_negative_time = any(TIME < 0),
    any_negative_conc = any(CONC < 0),
    any_missing_conc = any(is.na(CONC))
  )

# A tibble: 1 × 5
  n_rows n_subjects any_negative_time any_negative_conc any_missing_conc
   <int>      <int> <lgl>             <lgl>             <lgl>           
1    132         12 FALSE             FALSE             FALSE

Check observations per subject:

theoph_conc %>%
  count(ID, name = "n_obs") %>%
  arrange(ID) %>%
  head(12)

# A tibble: 12 × 2
   ID    n_obs
   <ord> <int>
 1 6        11
 2 7        11
 3 8        11
 4 11       11
 5 3        11
 6 2        11
 7 4        11
 8 9        11
 9 12       11
10 10       11
11 1        11
12 5        11

Check duplicate timepoints within subject:

theoph_conc %>%
  count(ID, TIME) %>%
  filter(n > 1)

# A tibble: 0 × 3
# ℹ 3 variables: ID <ord>, TIME <dbl>, n <int>

PKNCA requires zero duplicate timepoints within profile.

Building the Concentration Object

conc_obj <- PKNCAconc(CONC ~ TIME | ID, data = theoph_conc)
conc_obj

Formula for concentration:
 CONC ~ TIME | ID
Data are dense PK.
With 12 subjects defined in the 'ID' column.
Nominal time column is not specified.

First 6 rows of concentration data:
 ID TIME  CONC exclude volume duration
  1 0.00  0.74    <NA>     NA        0
  1 0.25  2.84    <NA>     NA        0
  1 0.57  6.57    <NA>     NA        0
  1 1.12 10.50    <NA>     NA        0
  1 2.02  9.66    <NA>     NA        0
  1 3.82  8.58    <NA>     NA        0

Interpretation:

CONC over TIME grouped by ID.

No calculations yet — just structure.

Worked Example: Grouping Mistake and Fix

Simulate two occasions per subject:

conc_with_two_periods <- theoph_conc %>%
  mutate(PERIOD = 1) %>%
  bind_rows(theoph_conc %>% mutate(PERIOD = 2))

Incorrect grouping (ID only):

bad_attempt <- tryCatch(
  PKNCAconc(CONC ~ TIME | ID, data = conc_with_two_periods),
  error = function(e) e
)

bad_attempt

<simpleError in duplicate_check(object = ret, data_type = "concentration"): Rows that are not unique per group and time (column names: TIME, ID) found within concentration data.  Row numbers: 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264>

Correct grouping:

good_attempt <- PKNCAconc(CONC ~ TIME | ID + PERIOD, data = conc_with_two_periods)
good_attempt

Formula for concentration:
 CONC ~ TIME | ID + PERIOD
Data are dense PK.
With 2 subjects defined in the 'PERIOD' column.
Nominal time column is not specified.

First 6 rows of concentration data:
 ID TIME  CONC PERIOD exclude volume duration
  1 0.00  0.74      1    <NA>     NA        0
  1 0.25  2.84      1    <NA>     NA        0
  1 0.57  6.57      1    <NA>     NA        0
  1 1.12 10.50      1    <NA>     NA        0
  1 2.02  9.66      1    <NA>     NA        0
  1 3.82  8.58      1    <NA>     NA        0

Strategies

Define profile grouping explicitly before coding.
Always check duplicates per ID + TIME.
Validate time ranges against protocol.
Treat structural QC as mandatory.

Common Mistakes

Warning

Grouping only by ID when multiple occasions exist.
Assuming time units are correct.
Ignoring duplicated rows after joins.
Treating “code ran” as “results are valid.”

Practice Problems

Conceptual

In a crossover study with ID, PERIOD, TIME, CONC, what grouping should you use and why?
Why must timepoints be unique within a profile?

Executable

Count how many unique profiles are implied by:
- ID
- ID + PERIOD (after creating conc_with_two_periods)
Verify that duplicate timepoints exist within ID but not within ID + PERIOD.

Step-by-Step Solutions

1. Crossover grouping:
Use CONC ~ TIME | ID + PERIOD because each period represents a separate PK profile.

2. Unique timepoints:
Trapezoidal integration and terminal slope estimation require ordered, non-duplicated time values.

3. Count profiles:

n_distinct(theoph_conc$ID)

[1] 12

n_distinct(conc_with_two_periods$ID, conc_with_two_periods$PERIOD)

[1] 24

4. Check duplicates:

# Duplicates within ID only
conc_with_two_periods %>%
  count(ID, TIME) %>%
  filter(n > 1)

# A tibble: 132 × 3
   ID     TIME     n
   <ord> <dbl> <int>
 1 6      0        2
 2 6      0.27     2
 3 6      0.58     2
 4 6      1.15     2
 5 6      2.03     2
 6 6      3.57     2
 7 6      5        2
 8 6      7        2
 9 6      9.22     2
10 6     12.1      2
# ℹ 122 more rows

# No duplicates within ID + PERIOD
conc_with_two_periods %>%
  count(ID, PERIOD, TIME) %>%
  filter(n > 1)

# A tibble: 0 × 4
# ℹ 4 variables: ID <ord>, PERIOD <dbl>, TIME <dbl>, n <int>

Summary

To use PKNCA correctly:

Keep data in long format.
Define profiles explicitly.
Ensure unique timepoints within profile.
Perform structural QC before calculations.

Quick Tips

If a variable changes the curve, include it in grouping.
Count observations per profile before running NCA.
Check time ranges against study design.
Let PKNCA guardrails catch mistakes — but don’t rely on them as your only QC.