Why We Model: Questions, Assumptions, and Decisions

Exploratory modeling as structured thinking in pharmacometrics

Learning Objectives

By the end of this lesson, you will be able to:

  • Explain why modeling is used in pharmacometrics beyond curve fitting
  • Translate pharmacometric questions into explicit quantitative assumptions
  • Recognize what a naive pooled model assumes
  • Distinguish exploratory modeling from confirmatory modeling
  • Articulate what a model is assuming before trusting its output

Key Ideas

Modeling in pharmacometrics is not primarily about computing parameters.
It is about making assumptions explicit.

When you model, you are saying:

  • I believe the data follow a particular structural pattern.
  • I believe variability behaves in a particular way.
  • I believe one mathematical relationship summarizes the system.

Exploratory models are:

  • Simple
  • Transparent
  • Disposable
  • Designed to clarify thinking

They are not regulatory models. They are not population models.
They are structured questions written in mathematical form.


Worked Example 1: Start With Structure (Theoph)

We begin with the classic Theoph dataset.

library(tidyverse)
data(Theoph)

Theoph %>%
  ggplot(aes(Time, conc, group = Subject)) +
  geom_line(alpha = 0.4) +
  geom_point() +
  labs(title = "Theophylline Concentration–Time Profiles",
       x = "Time (h)", y = "Concentration")

Before modeling, pause.

Ask:

  • Do subjects share a similar structural shape?
  • Does variability widen over time?
  • Do some individuals consistently sit higher or lower?

At this stage, you are already modeling — just without equations.


Worked Example 2: Turning a Question Into an Assumption

Suppose we ask:

Does concentration decline approximately exponentially after peak?

An exponential decline implies:

\[ C(t) = C_0 e^{-kt} \]

Taking logs:

\[ \log C(t) = \log C_0 - kt \]

This transforms the structural assumption into a linear relationship on the log scale.

terminal_data <- Theoph %>%
  filter(Time >= 4)

terminal_data %>%
  ggplot(aes(Time, log(conc), group = Subject)) +
  geom_line(alpha = 0.4) +
  geom_point() +
  labs(title = "Terminal Phase (Log Scale)",
       x = "Time (h)", y = "log(Concentration)")

We are not estimating clearance formally.
We are testing whether a structural idea seems plausible.


Worked Example 3: A Naive Pooled Model

Now we make the assumption explicit by fitting a linear model on the log scale.

\[ \log(C_{ij}) = \beta_0 + \beta_1 t_{ij} + \epsilon_{ij} \]

WarningWhat does “naive pooled” mean?

A naive pooled model:

  • Ignores subject identity
  • Treats all observations as independent
  • Assumes one common slope and intercept describe everyone
  • Assumes variability is random noise, not hierarchical structure

This is often the first instinct in analysis — and it can be informative — but it is rarely realistic in pharmacometrics.

lm_pooled <- lm(log(conc) ~ Time, data = terminal_data)
summary(lm_pooled)

Call:
lm(formula = log(conc) ~ Time, data = terminal_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.4230 -0.1975 -0.0535  0.2047  0.9406 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.343802   0.069087   33.92   <2e-16 ***
Time        -0.086031   0.005185  -16.59   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2719 on 58 degrees of freedom
Multiple R-squared:  0.826, Adjusted R-squared:  0.823 
F-statistic: 275.3 on 1 and 58 DF,  p-value: < 2.2e-16

Interpret structurally:

  • The slope approximates an elimination rate (on the log scale).
  • The intercept represents extrapolated log concentration at time zero.

Notice what the model does not account for: between-subject variability.


Worked Example 4: Connecting Assumptions to Data with predict()

A model becomes meaningful when we compare it to the data.

terminal_data <- terminal_data %>%
  mutate(pred_log = predict(lm_pooled))

ggplot(terminal_data, aes(Time, log(conc), group = Subject)) +
  geom_point() +
  geom_line(aes(y = pred_log)) +
  labs(title = "Observed vs Pooled Log-Linear Fit")

Ask:

  • Does one slope represent all subjects reasonably?
  • Are some subjects systematically above or below the model line?
  • What variability structure is being ignored?

This is the beginning of model criticism.

We will formalize diagnostic reasoning in the next lesson.


Exploratory vs Confirmatory Modeling

Exploratory models:

  • Fast
  • Transparent
  • Used to clarify assumptions
  • Not suitable for regulatory decisions

Confirmatory models:

  • Carefully specified
  • Hierarchical
  • Robust to edge cases
  • Designed for decision support
Warning

A common mistake in PMx is treating an exploratory model as if it were a final population model.


Strategies

  • Start with plots before equations.
  • Say the assumption in words before coding it.
  • Fit the simplest defensible model first.
  • Treat early models as learning tools — not deliverables.

Common Mistakes

  • Fitting a model before looking at the data
  • Treating modeling as curve fitting instead of assumptions
  • Interpreting parameters without checking the model assumptions
  • Trusting a naive pooled model as realistic
  • Ignoring consistent differences between subjects
  • Using exploratory models as final conclusions
  • Jumping to complex models too early
  • Assuming a good fit means a correct model

Practice Problems

  1. Restrict Theoph to early time points (Time < 2). What structural behavior dominates?
  2. Fit a pooled linear model on the original concentration scale.
  3. Identify one structural assumption the pooled model clearly violates.

Problem 1

Theoph %>%
  filter(Time < 2) %>%
  ggplot(aes(Time, conc, group = Subject)) +
  geom_line(alpha = 0.4) +
  geom_point()

Early time points reflect absorption dynamics rather than simple elimination.

Problem 2

lm_linear <- lm(conc ~ Time, data = terminal_data)
summary(lm_linear)

Call:
lm(formula = conc ~ Time, data = terminal_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7364 -0.8468 -0.1718  0.7396  2.9051 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7.61771    0.30335   25.11   <2e-16 ***
Time        -0.26591    0.02277  -11.68   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.194 on 58 degrees of freedom
Multiple R-squared:  0.7017,    Adjusted R-squared:  0.6965 
F-statistic: 136.4 on 1 and 58 DF,  p-value: < 2.2e-16

The linear-scale model typically misrepresents exponential decay.

Problem 3

Pooling assumes identical elimination rates across subjects and ignores hierarchical variability.


Summary

  • Modeling starts with questions.
  • Every model encodes assumptions.
  • Naive pooling ignores hierarchy.
  • Exploratory models clarify thinking but are not final answers.

  • Say the assumption before fitting the model.
  • If you ignore hierarchy, say so explicitly.
  • A fitted model is only the beginning of the conversation.