Data Preparation and Exploratory PK Visualization

Learn how to understand, inspect, and prepare population PK datasets before model fitting.

Tip

Module goal: Before fitting models, learn how to understand the dataset, identify dosing and observation records, check data quality, and visualize PK profiles.

Module Overview

Population modeling starts with data.

Before writing an nlmixr2 model, we need to understand what the dataset contains, how dosing and observation records are encoded, and whether the data are suitable for modeling.

This module introduces the structure of pharmacometric modeling datasets and uses the theophylline dataset as the main teaching example.

We focus on practical questions:

What does each row represent?
Which rows are observations?
Which rows are dosing events?
Are IDs, times, doses, and concentrations reasonable?
What do the concentration-time profiles look like?
Are there obvious issues before modeling begins?

The goal is to build a clean, defensible starting point for structural PK modeling.

Learning Objectives

By the end of this module, you will be able to:

Describe the structure of a population modeling dataset.
Distinguish observation records from dosing records.
Interpret common modeling columns such as ID, TIME, DV, AMT, EVID, and MDV.
Load and inspect the course dataset.
Create subject-level concentration-time plots.
Use log-scale visualization to inspect PK profiles.
Identify common data issues before model fitting.
Prepare an analysis-ready dataset for later modeling lessons.

Lessons in This Module

Lesson 1: Understanding Modeling Datasets

This lesson introduces the basic structure of population modeling datasets, including observation rows, dosing rows, and common pharmacometric data columns.

Lesson 2: Loading and Exploring the Course Dataset

This lesson introduces the theophylline dataset used throughout the early course modules and teaches how to inspect its structure.

Lesson 3: Preparing Analysis-Ready Data

This lesson focuses on basic data cleaning and QC before modeling, including missing values, duplicates, dose records, and observation records.

Lesson 4: Exploratory PK Visualization

This lesson uses concentration-time profiles to understand subject-level behavior, dose patterns, and potential modeling challenges.

Lesson 5: From Data to Modeling Dataset

This lesson finalizes the prepared dataset and creates a reproducible modeling-ready object for the next module.

Dataset Used

The main dataset in this module is the theophylline dataset provided by nlmixr2data package.

# nlmixr2-ready dataset
head(nlmixr2data::theo_sd)

  ID TIME    DV     AMT EVID CMT   WT
1  1 0.00  0.00 319.992  101   1 79.6
2  1 0.00  0.74   0.000    0   2 79.6
3  1 0.25  2.84   0.000    0   2 79.6
4  1 0.57  6.57   0.000    0   2 79.6
5  1 1.12 10.50   0.000    0   2 79.6
6  1 2.02  9.66   0.000    0   2 79.6

The purpose of using theophylline early is continuity. The dataset is small, interpretable, and rich enough to teach key PK modeling concepts without overwhelming the learner.

Software Used

This module uses mostly data wrangling and visualization packages.

library(tidyverse)
library(here)
library(nlmixr2data)

Later modules will introduce model fitting with:

library(nlmixr2)
library(rxode2)

Module Workflow

Conceptually, this module follows this path:

Raw Dataset
↓
Inspect Structure
↓
Identify Dosing and Observation Records
↓
Check Data Quality
↓
Visualize Profiles
↓
Save Modeling-Ready Dataset

This workflow prepares the data foundation for the structural PK modeling module that follows.

Why This Module Matters

Many modeling problems start before estimation.

Examples:

incorrect dosing records
inconsistent units
missing concentration values
duplicated observations
impossible times
misunderstood event codes

A model can only be as reliable as the data and assumptions used to build it.

This module teaches the habit of looking carefully before fitting.

How This Module Fits the Course

Module 1 introduced the modeling framework.

Module 2 now introduces the data.

After this module, you will be ready to move into:

structural PK models
one-compartment models
simulation of concentration-time profiles
first population model specifications

Expected Outputs

By the end of the module, you should have:

inspected the course dataset
generated subject-level PK plots
identified observation and dosing records
created a modeling-ready dataset
saved the prepared data for later lessons

Next Step

Start with Lesson 1 to understand the basic structure of population modeling datasets.