Exploring Covariates Visually

Use exploratory visualization to identify possible covariate relationships before building covariate models.
Tip

Big picture: Covariate modeling starts with understanding the data—not fitting equations.

Learning Objectives

By the end of this lesson, you will be able to:

  • explain why covariate exploration matters
  • distinguish continuous and categorical covariates
  • visualize covariate relationships
  • recognize useful exploratory patterns
  • identify candidate covariates before model building

Key Ideas

  • visualize before modeling
  • trends matter more than individual points
  • biology guides interpretation
  • exploration does not prove causality

Setup

library(tidyverse)
library(nlmixr2data)

data(
  "theo_sd",
  package = "nlmixr2data"
)

Create example covariates for exploration.

set.seed(100)

cov_tbl <-
  theo_sd %>%
  distinct(ID, WT) %>%
  mutate(
    SEX = sample(c("F", "M"), n(), replace = TRUE),
    AGE = round(rnorm(n(), mean = 35, sd = 10))
  )

Why Explore Covariates?

Suppose we observe variability.

Question:

Can we explain it?

Before building a model:

Visualize → Hypothesize → Model

Visualization helps generate candidate explanations.


Continuous vs Categorical Covariates

Covariates are often grouped into two broad types.

Type Description Examples
Continuous Can take many numerical values along a scale WT, AGE
Categorical Represent groups or categories SEX, RACE, FORMULATION

Continuous covariates are usually visualized using scatterplots, histograms, or density plots.

Categorical covariates are often visualized using bar charts or boxplots.

Both types may help explain variability between individuals.


Worked Example 1: Continuous Covariates

Inspect weight.

ggplot(cov_tbl, aes(WT)) +
    geom_histogram(bins = 20) +
    labs(
        title = "Weight Distribution",
        x = "Weight",
        y = "Count"
    )

Interpretation:

Ask:

  • realistic values?
  • broad range?
  • possible outliers?

Covariates with little variation often contribute less information.


Worked Example 2: Categorical Covariates

Inspect sex.

ggplot(cov_tbl, aes(SEX)) +
    geom_bar() +
    labs(title = "Sex Distribution",
         x = "Sex",
         y = "Count")

Interpretation:

Ask:

  • balanced groups?
  • enough observations?

Categorical covariates require representation across groups.


Worked Example 3: Explore Relationships Between Covariates

Visualize weight versus age.

ggplot(cov_tbl, aes(WT, AGE)) +
  geom_point() +
  geom_smooth(
    method = "lm",
    se = FALSE
  ) +
  labs(
    title = "Weight vs Age",
    x = "Weight",
    y = "Age"
  )

Interpretation:

Ask:

  • is there a visible trend?
  • are the points widely scattered?
  • do any unusual observations stand out?

Exploratory plots help us understand how variables are distributed and whether relationships might exist.

They generate hypotheses.

They do not prove relationships.


Worked Example 4: Simulate a Candidate Relationship

Create a simple simulated relationship.

cov_effect <-
cov_tbl %>%
    mutate(CL = 3 *(WT / 70)^0.75 + rnorm(n(), 0, 0.3))

ggplot(
  cov_effect, aes(WT, CL)
) +
geom_point() +
geom_smooth(
  method = "lm",
  se = FALSE
) +
labs(
  title = "Weight vs Clearance",
  x = "Weight",
  y = "Clearance"
)

Interpretation:

Possible observations:

  • upward trend
  • no trend
  • large scatter

Patterns generate hypotheses.

Not conclusions.


Worked Example 5: What Makes a Good Covariate?

Candidate covariates should be:

  • biologically plausible
  • measurable
  • interpretable
  • supported visually

Avoid selecting covariates only because they appear statistically significant.


Covariates Do Not Prove Causality

A relationship may appear because of:

  • confounding
  • study design
  • sampling
  • random variation

Visualization is the beginning.

Not the end.


Looking Ahead

We now have:

Variability → Visualization → Candidate Covariate

Next we ask:

How do we incorporate covariates into an nlmixr2 model?


Strategies

  • visualize first
  • think mechanistically
  • look for broad patterns

Common Mistakes

  • overinterpreting scatter
  • selecting every relationship
  • ignoring study design

Practice Problems

  1. Why explore covariates before modeling?

  2. Give one continuous and one categorical covariate.

  3. What pattern would suggest a possible relationship?

  4. Why is visualization not enough?

  5. What makes a good covariate?


Problem 1

Exploration generates hypotheses.


Problem 2

Examples:

Continuous:

  • weight

Categorical:

  • sex

Problem 3

Consistent directional behavior may suggest a relationship.


Problem 4

Visualization does not establish mechanism.


Problem 5

Good covariates are:

  • biologically meaningful
  • measurable
  • interpretable

Summary

  • covariate modeling begins with exploration
  • patterns generate hypotheses
  • biology matters
  • visualization precedes modeling

  • Visualize first
  • Trends > points
  • Biology first
  • Correlation ≠ causation