Linear Models as First-Pass PK/PD Insight

Interpreting slopes, scales, and covariates before going nonlinear

Learning Objectives

By the end of this lesson, you will be able to:

  • Interpret linear model slopes mechanistically in PK contexts
  • Understand when log transformations are structurally appropriate
  • Incorporate simple covariates into linear models
  • Critically evaluate statistical vs scientific significance
  • Recognize when hierarchical modeling is required

Key Ideas

  • Linear models are often a first-pass structural approximation in PK/PD.
  • On the log scale, exponential decay becomes linear.
  • Slopes encode mechanistic meaning.
  • Statistical significance does not imply mechanistic validity.
  • Pooled models ignore hierarchy and must be interpreted cautiously.

Worked Example 1: Interpreting the Terminal Slope

library(tidyverse)
data(Theoph)

terminal_data <- Theoph %>%
  filter(Time >= 4)

lm_fit <- lm(log(conc) ~ Time, data = terminal_data)
summary(lm_fit)

Call:
lm(formula = log(conc) ~ Time, data = terminal_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.4230 -0.1975 -0.0535  0.2047  0.9406 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.343802   0.069087   33.92   <2e-16 ***
Time        -0.086031   0.005185  -16.59   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2719 on 58 degrees of freedom
Multiple R-squared:  0.826, Adjusted R-squared:  0.823 
F-statistic: 275.3 on 1 and 58 DF,  p-value: < 2.2e-16

Model form:

\[ \log(C) = \beta_0 + \beta_1 t \]

Interpretation:

  • \(\beta_1 \approx -k_e\)
  • \(\beta_0 \approx \log(C_0)\)

Half-life:

ke <- -coef(lm_fit)["Time"]
log(2)/ke
    Time 
8.056956 

This connects regression output directly to PK meaning.


Worked Example 2: Why Scale Matters

lm_linear <- lm(conc ~ Time, data = terminal_data)

Overlay comparison:

terminal_data %>%
  mutate(pred_log = predict(lm_fit),
         pred_linear = predict(lm_linear)) %>%
  ggplot(aes(Time)) +
  geom_point(aes(y = conc)) +
  geom_line(aes(y = exp(pred_log)), linewidth = 1) +
  geom_line(aes(y = pred_linear), linetype = 2) +
  labs(title = "Log-Scale vs Linear-Scale Fits",
       y = "Concentration")

Exponential structure is poorly represented on the original scale.


Worked Example 3: Adding a Covariate (Weight)

lm_cov <- lm(log(conc) ~ Time + Wt, data = terminal_data)
summary(lm_cov)

Call:
lm(formula = log(conc) ~ Time + Wt, data = terminal_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.46733 -0.17124 -0.03331  0.08599  1.03806 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.019678   0.263962  11.440   <2e-16 ***
Time        -0.086045   0.004937 -17.430   <2e-16 ***
Wt          -0.009711   0.003673  -2.644   0.0106 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2589 on 57 degrees of freedom
Multiple R-squared:  0.845, Adjusted R-squared:  0.8395 
F-statistic: 155.4 on 2 and 57 DF,  p-value: < 2.2e-16
anova(lm_fit, lm_cov)
Analysis of Variance Table

Model 1: log(conc) ~ Time
Model 2: log(conc) ~ Time + Wt
  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
1     58 4.2878                              
2     57 3.8194  1   0.46839 6.9902 0.01057 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Suppose the ANOVA table shows:

  • p-value ≈ 0.01
  • Reduced RSS

What This Means (Technically)

Weight explains additional variation in log concentration beyond Time alone.

What This Does Not Mean

  • It does not prove weight mechanistically alters clearance.
  • It does not justify inclusion in a population model.
  • It does not confirm a clinically meaningful effect.
  • It does not fix the hierarchical assumption violation.

The Critical PMx Perspective

This model:

  • Treats all observations as independent.
  • Ignores between-subject variability structure.
  • Violates core assumptions of independent residuals.

So the p-value is a screening signal, not confirmatory evidence.

A mature conclusion would be:

Weight appears associated with concentration in pooled analysis, but the hierarchical structure of the data is ignored. This motivates mixed-effects modeling rather than formal inference.

That is appropriate scientific restraint.


Worked Example 4: Does Residual Structure Improve?

After adding the covariate, inspect residuals:

plot(lm_cov, which = 1)

Ask:

  • Did curvature disappear?
  • Did variance patterns improve?
  • Or did we simply reduce RSS slightly?

A useful visual check is whether the residual spread stays roughly constant across fitted values.

For example:

  • A roughly even vertical spread is usually reassuring
  • A widening or narrowing “funnel shape” suggests changing variance

This pattern is called heteroscedasticity.

At this stage, the key idea is simple:

The model should not make much larger errors in one region of the data than another.

In pharmacometrics:

Residual structure matters more than p-values.


Strategies

  • Interpret coefficients in biological units.
  • Choose scale based on structure, not convenience.
  • Separate statistical detectability from mechanistic meaning.
  • Use pooled models for screening, not confirmation.

Common Mistakes

  • Interpreting statistical significance as mechanistic truth
  • Fitting PK data on the wrong scale
  • Forgetting that slopes have biological meaning
  • Assuming pooled observations are independent
  • Treating covariate screening as confirmatory analysis
  • Ignoring residual patterns after fitting the model
  • Trusting p-values without checking assumptions
  • Assuming a lower RSS automatically means a better model

Practice Problems

  1. Compute half-life from the fitted slope.
  2. Compare AIC between lm_fit and lm_cov.
  3. Interpret the weight coefficient in biological terms.
  4. Explain why pooled inference is fragile in repeated-measures PK data.

Problem 1

ke <- -coef(lm_fit)["Time"]
log(2)/ke
    Time 
8.056956 

Problem 2

AIC(lm_fit, lm_cov)
       df      AIC
lm_fit  3 17.95860
lm_cov  4 13.01791

Problem 3

The weight coefficient represents a change in log concentration per kg, not necessarily a change in clearance.

Problem 4

Because repeated measurements within subjects violate independence assumptions.


Summary

  • Slopes on the log scale approximate elimination rates.
  • Scale choice encodes structural assumptions.
  • Covariate significance does not equal mechanistic truth.
  • Pooled models ignore hierarchy.
  • Linear modeling is a screening tool before nonlinear and mixed-effects frameworks.

  • Interpret slopes biologically.
  • Treat pooled covariate effects as provisional.
  • Check residual structure before trusting p-values.
  • Escalate to hierarchical modeling when appropriate.