Naive Pooled Modeling with nls()

Fit a one-compartment oral PK model using naive pooled data, interpret structural parameters, and understand the limitations of ignoring inter-individual variability.
Tip

Big picture: Before hierarchical modeling, a common first step is a naive pooled fit.
Here, we ignore inter-individual variability and fit one structural model to all subjects simultaneously.

Learning Objectives

By the end of this lesson, you will be able to:

  • Define a one-compartment oral PK model using CL/V parameterization.
  • Perform a naive pooled nonlinear fit using nls().
  • Interpret ka, CL, V, ke, and half-life from a pooled model.
  • Generate pooled fitted curves and diagnostics.
  • Explain the limitations of naive pooled modeling.

Key Ideas

  • Naive pooling treats all subjects as if they share identical parameters.
  • Dose differences still affect predictions through the structural equation.
  • Placebo (DOSE = 0) is excluded from fitting because the model describes post-dose PK after an administered dose.
  • Between-subject variability is absorbed into residual error.
  • Parameter estimates represent an “average structural subject.”
  • This approach is simple but statistically limited.

Setup

library(tidyverse)
library(here)

derivedpath <- here("courses", "foundations-r", "data", "derived")
resultspath <- here("courses", "foundations-r", "results", "modeling", "naive_pooled")

dir.create(resultspath, recursive = TRUE, showWarnings = FALSE)

final_path <- file.path(derivedpath, "analysis_final_locked.csv")
if (!file.exists(final_path)) {
  stop("Final locked dataset not found. Render the final dataset assembly lesson  first:\n", final_path)
}

analysis <- read_csv(final_path)

obs <- analysis %>% filter(EVID == 0, !is.na(DV), DOSE > 0, is.na(BLQ))

Worked Example 1: Structural Model (CL/V Parameterization)

First, we define the concentration-time function that nls() will use during estimation.

\[ C(t) = \frac{D \cdot k_a}{V (k_a - k)} \left(e^{-k t} - e^{-k_a t}\right), \quad k = \frac{CL}{V}. \]

one_comp_cl <- function(time, dose, ka, CL, V) {
  k <- CL / V
  (dose * ka / (V * (ka - k))) *
    (exp(-k * time) - exp(-ka * time))
}

Worked Example 2: Fit Naive Pooled Model

Now we fit one shared set of structural parameters to all non-placebo observations.

naive_fit <- nls(
  DV ~ one_comp_cl(TIME, DOSE, ka, CL, V),
  data = obs,
  start = list(ka = 1, CL = 5, V = 50),
  control = nls.control(maxiter = 200, warnOnly = TRUE)
)

summary(naive_fit)

Formula: DV ~ one_comp_cl(TIME, DOSE, ka, CL, V)

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
ka   1.3787     0.1649   8.361 2.75e-15 ***
CL   6.2273     0.4359  14.287  < 2e-16 ***
V   68.3905     3.4250  19.968  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1905 on 286 degrees of freedom

Number of iterations to convergence: 5 
Achieved convergence tolerance: 8.536e-07

Interpretation:

  • One ka, CL, V shared across all subjects.
  • Differences in profiles arise only from DOSE and residual error.

Worked Example 3: Convert to Interpretable Parameters

The fitted coefficients are useful, but derived quantities such as elimination rate and half-life are easier to interpret.

est <- coef(naive_fit)

ka_pop <- est["ka"]
CL_pop <- est["CL"]
V_pop  <- est["V"]
ke_pop <- CL_pop / V_pop
half_life <- log(2) / ke_pop

tibble(
  ka = ka_pop,
  CL = CL_pop,
  V  = V_pop,
  ke = ke_pop,
  half_life = half_life
)
# A tibble: 1 × 5
     ka    CL     V     ke half_life
  <dbl> <dbl> <dbl>  <dbl>     <dbl>
1  1.38  6.23  68.4 0.0911      7.61

Worked Example 4: Visual Diagnostics

Predictions by subject show where the pooled model follows the general trend and where it misses individual behavior.

Generate predictions:

obs <- obs %>%
  mutate(pred = predict(naive_fit))

Plot by subject:

p <- ggplot(obs, aes(TIME, DV)) +
  geom_point(alpha = 0.6) +
  geom_line(aes(y = pred, group = ID), color = "blue") +
  facet_wrap(~ ID) +
  labs(
    title = "Naive Pooled Fit by Subject",
    x = "Time After Dose (h)",
    y = "Concentration"
  )

p

Save:

ggsave(file.path(resultspath, "naive_pooled_fits.png"), p, width = 10, height = 8)

Worked Example 6: Residual Diagnostics

Residual plots help separate random scatter from systematic patterns that suggest model limitations.

obs <- obs %>%
  mutate(residual = DV - pred)

ggplot(obs, aes(pred, residual)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = 2) +
  labs(
    title = "Residuals vs Fitted (Naive Pooled)",
    x = "Fitted",
    y = "Residual"
  )

What to look for:

  • Systematic curvature → structural misspecification.
  • Increasing spread → heteroscedasticity.
  • Clustering by subject → ignored inter-individual variability.

In this fit, the residual spread increases at higher fitted concentrations, suggesting heteroscedasticity where variability grows with concentration magnitude. In practice, this often motivates log-transformed modeling or proportional error structures, which are handled more naturally in mixed-effects frameworks such as nlme and nlmixr2.


Warning
  • Naive pooled models underestimate true variability.
  • Residual error absorbs between-subject differences.
  • Do not interpret this as a population PK model.

Strategies

  • Work from the locked dataset (analysis_final_locked.csv).
  • Filter to observation rows (EVID == 0).
  • Use reasonable starting values.
  • Inspect residual patterns carefully.
  • Treat results as exploratory, not definitive.

Common Mistakes

  • Including placebo rows in a model that assumes a nonzero administered dose.
  • Interpreting naive pooled estimates as population mixed-effects estimates.
  • Trusting convergence without checking fitted curves and residuals.
  • Treating between-subject differences as residual noise only.
  • Using the pooled model for decisions without acknowledging its limitations.

Practice Problems

  1. Stratify residuals by DOSE — do higher doses show larger error?
  2. Create a log-scale concentration plot.
  3. Compute RMSE of the pooled fit.

Problem 1: Stratify residuals by DOSE

This checks whether residual magnitude changes with dose. Larger residuals at higher doses can suggest heteroscedasticity, scaling issues, or structural model misspecification.

obs %>%
  group_by(DOSE) %>%
  summarise(
    n = n(),
    rmse = sqrt(mean(residual^2, na.rm = TRUE)),
    median_abs_resid = median(abs(residual), na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(DOSE)
# A tibble: 4 × 4
   DOSE     n   rmse median_abs_resid
  <dbl> <int>  <dbl>            <dbl>
1  12.5    67 0.0428           0.0242
2  25      72 0.0774           0.0478
3  50      74 0.166            0.0854
4 100      76 0.320            0.170 

A quick visual:

ggplot(obs, aes(DOSE, residual)) +
  geom_point(alpha = 0.35, position = position_jitter(width = 0.15)) +
  geom_hline(yintercept = 0, linetype = 2) +
  labs(
    title = "Residuals by DOSE (Naive Pooled Fit)",
    x = "DOSE",
    y = "Residual"
  )

Problem 2: Create a log-scale concentration plot

For a log-scale plot, we filter to positive concentrations because scale_y_log10() cannot display zero or negative values.

p_log <- obs %>%
  filter(!is.na(DV) & DV > 0) %>%
  ggplot(aes(TIME, DV, group = ID)) +
  geom_line(alpha = 0.25) +
  geom_point(size = 0.7) +
  facet_wrap(~ DOSE) +
  scale_y_log10() +
  labs(
    title = "Observed Concentration–Time Profiles by DOSE (Log Scale)",
    x = "Time After Dose (h)",
    y = "Concentration (log scale)"
  )

p_log

Save it:

ggsave(file.path(resultspath, "profiles_by_dose_log_observed.png"), p_log, width = 9, height = 6)

Problem 3: Compute RMSE of the pooled fit

rmse <- sqrt(mean(obs$residual^2, na.rm = TRUE))
rmse
[1] 0.1895263

You can also compute RMSE by dose, which is often more informative than one pooled value:

obs %>%
  group_by(DOSE) %>%
  summarise(
    rmse = sqrt(mean(residual^2, na.rm = TRUE)),
    .groups = "drop"
  ) %>%
  arrange(DOSE)
# A tibble: 4 × 2
   DOSE   rmse
  <dbl>  <dbl>
1  12.5 0.0428
2  25   0.0774
3  50   0.166 
4 100   0.320 

Summary

  • You fit a one-compartment oral model using naive pooling.
  • One parameter set describes all subjects.
  • Dose explains systematic profile differences.
  • Residual error absorbs inter-individual variability.
  • This approach is simple but limited.

  • Naive pooling is a useful baseline, not a final answer.
  • Always facet plots by subject to reveal hidden variability.
  • Large residual spread often signals ignored hierarchy.
  • CL/V parameterization keeps interpretation physiologically meaningful.
  • Move to mixed-effects modeling when variability matters.