Why We Model: Questions, Assumptions, and Decisions

Exploratory modeling as structured thinking in pharmacometrics

Learning Objectives

By the end of this lesson, you will be able to:

Explain why modeling is used in pharmacometrics beyond curve fitting
Translate pharmacometric questions into explicit quantitative assumptions
Recognize what a naive pooled model assumes
Distinguish exploratory modeling from confirmatory modeling
Articulate what a model is assuming before trusting its output

Key Ideas

Modeling in pharmacometrics is not primarily about computing parameters.
It is about making assumptions explicit.

When you model, you are saying:

I believe the data follow a particular structural pattern.
I believe variability behaves in a particular way.
I believe one mathematical relationship summarizes the system.

Exploratory models are:

Simple
Transparent
Disposable
Designed to clarify thinking

They are not regulatory models. They are not population models.
They are structured questions written in mathematical form.

Worked Example 1: Start With Structure (Theoph)

We begin with the classic Theoph dataset.

library(tidyverse)
data(Theoph)

Theoph %>%
  ggplot(aes(Time, conc, group = Subject)) +
  geom_line(alpha = 0.4) +
  geom_point() +
  labs(title = "Theophylline Concentration–Time Profiles",
       x = "Time (h)", y = "Concentration")

Before modeling, pause.

Ask:

Do subjects share a similar structural shape?
Does variability widen over time?
Do some individuals consistently sit higher or lower?

At this stage, you are already modeling — just without equations.

Worked Example 2: Turning a Question Into an Assumption

Suppose we ask:

Does concentration decline approximately exponentially after peak?

An exponential decline implies:

\[ C(t) = C_0 e^{-kt} \]

Taking logs:

\[ \log C(t) = \log C_0 - kt \]

This transforms the structural assumption into a linear relationship on the log scale.

terminal_data <- Theoph %>%
  filter(Time >= 4)

terminal_data %>%
  ggplot(aes(Time, log(conc), group = Subject)) +
  geom_line(alpha = 0.4) +
  geom_point() +
  labs(title = "Terminal Phase (Log Scale)",
       x = "Time (h)", y = "log(Concentration)")

We are not estimating clearance formally.
We are testing whether a structural idea seems plausible.

Worked Example 3: A Naive Pooled Model

Now we make the assumption explicit by fitting a linear model on the log scale.

\[ \log(C_{ij}) = \beta_0 + \beta_1 t_{ij} + \epsilon_{ij} \]

What does “naive pooled” mean?

A naive pooled model:

Ignores subject identity
Treats all observations as independent
Assumes one common slope and intercept describe everyone
Assumes variability is random noise, not hierarchical structure

This is often the first instinct in analysis — and it can be informative — but it is rarely realistic in pharmacometrics.

lm_pooled <- lm(log(conc) ~ Time, data = terminal_data)
summary(lm_pooled)


Call:
lm(formula = log(conc) ~ Time, data = terminal_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.4230 -0.1975 -0.0535  0.2047  0.9406 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.343802   0.069087   33.92   <2e-16 ***
Time        -0.086031   0.005185  -16.59   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2719 on 58 degrees of freedom
Multiple R-squared:  0.826, Adjusted R-squared:  0.823 
F-statistic: 275.3 on 1 and 58 DF,  p-value: < 2.2e-16

Interpret structurally:

The slope approximates an elimination rate (on the log scale).
The intercept represents extrapolated log concentration at time zero.

Notice what the model does not account for: between-subject variability.

Worked Example 4: Connecting Assumptions to Data with `predict()`

A model becomes meaningful when we compare it to the data.

terminal_data <- terminal_data %>%
  mutate(pred_log = predict(lm_pooled))

ggplot(terminal_data, aes(Time, log(conc), group = Subject)) +
  geom_point() +
  geom_line(aes(y = pred_log)) +
  labs(title = "Observed vs Pooled Log-Linear Fit")

Ask:

Does one slope represent all subjects reasonably?
Are some subjects systematically above or below the model line?
What variability structure is being ignored?

This is the beginning of model criticism.

We will formalize diagnostic reasoning in the next lesson.

Exploratory vs Confirmatory Modeling

Exploratory models:

Fast
Transparent
Used to clarify assumptions
Not suitable for regulatory decisions

Confirmatory models:

Carefully specified
Hierarchical
Robust to edge cases
Designed for decision support

Warning

A common mistake in PMx is treating an exploratory model as if it were a final population model.

Strategies

Start with plots before equations.
Say the assumption in words before coding it.
Fit the simplest defensible model first.
Treat early models as learning tools — not deliverables.

Common Mistakes

Fitting a model before looking at the data
Treating modeling as curve fitting instead of assumptions
Interpreting parameters without checking the model assumptions
Trusting a naive pooled model as realistic
Ignoring consistent differences between subjects
Using exploratory models as final conclusions
Jumping to complex models too early
Assuming a good fit means a correct model

Practice Problems

Restrict Theoph to early time points (Time < 2). What structural behavior dominates?
Fit a pooled linear model on the original concentration scale.
Identify one structural assumption the pooled model clearly violates.

Step-by-Step Solutions

Problem 1

Theoph %>%
  filter(Time < 2) %>%
  ggplot(aes(Time, conc, group = Subject)) +
  geom_line(alpha = 0.4) +
  geom_point()

Early time points reflect absorption dynamics rather than simple elimination.

Problem 2

lm_linear <- lm(conc ~ Time, data = terminal_data)
summary(lm_linear)


Call:
lm(formula = conc ~ Time, data = terminal_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7364 -0.8468 -0.1718  0.7396  2.9051 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7.61771    0.30335   25.11   <2e-16 ***
Time        -0.26591    0.02277  -11.68   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.194 on 58 degrees of freedom
Multiple R-squared:  0.7017,    Adjusted R-squared:  0.6965 
F-statistic: 136.4 on 1 and 58 DF,  p-value: < 2.2e-16

The linear-scale model typically misrepresents exponential decay.

Problem 3

Pooling assumes identical elimination rates across subjects and ignores hierarchical variability.

Summary

Modeling starts with questions.
Every model encodes assumptions.
Naive pooling ignores hierarchy.
Exploratory models clarify thinking but are not final answers.

Quick Tips

Say the assumption before fitting the model.
If you ignore hierarchy, say so explicitly.
A fitted model is only the beginning of the conversation.