Understanding Modeling Datasets

Learn how pharmacometric datasets are organized and how dosing and observation records are represented before model fitting.

Tip

Big picture: Before fitting a model, you need to understand what each row represents. Population models are built on event-based datasets where doses and observations live together.

Learning Objectives

By the end of this lesson, you will be able to:

Explain how pharmacometric datasets differ from ordinary analysis datasets.
Interpret dosing and observation records.
Describe why modeling datasets use long format.
Interpret common columns including ID, TIME, DV, AMT, EVID, and MDV.
Recognize how event records support simulation and estimation.

Key Ideas

Modeling datasets are event-based.
One row usually represents one event.
Dosing and observations are often stored together.
Long format supports modeling and simulation.
Event columns communicate meaning to the software.

Why Modeling Datasets Look Different

Many analysis datasets are organized around subjects.

Example:

One row = One subject

Population modeling datasets are usually organized differently.

Instead:

One row = One event

Events may include:

dose administration
concentration measurement
response measurement
additional records used during simulation

This structure gives flexibility for modeling and simulation.

Worked Example 1: Subject Dataset vs Event Dataset

Traditional analysis dataset:

ID	AGE	WT
1	42	78
2	35	64

Population modeling dataset:

ID	TIME	AMT	DV
1	0	320	NA
1	1	0	5.4
1	2	0	3.9
2	0	320	NA
2	1	0	4.8

The same subject appears across multiple rows because events occur at multiple times.

Long Format

Population datasets are typically stored in long format.

Conceptually:

Subject

↓

Multiple Events

↓

Multiple Rows

This allows software to reconstruct time-dependent behavior.

Long format becomes especially important when:

repeated dosing exists
PK and PD endpoints coexist
simulation is required

Common Modeling Columns

Although datasets differ, several columns appear repeatedly.

Subject Identifier

ID

Identifies which rows belong to the same subject.

Example:

ID = 1
↓
All rows belong together

Time Variable

TIME

Represents elapsed time.

Example:

TIME
0
1
2
4

Interpretation depends on study design.

Dependent Variable

DV

Observed outcome.

Examples:

concentration
biomarker
response

Example:

TIME	DV
1	5.6
2	4.1

Worked Example 2: Dosing vs Observation Records

Consider:

ID	TIME	AMT	DV
1	0	300	NA
1	1	0	6.1
1	2	0	4.3

Interpretation:

Row 1:

Dose event

Rows 2–3:

Observation events

Dose rows may not contain observations.

Observation rows often have no administered amount.

Amount Variable

AMT

Represents administered amount.

Example:

TIME	AMT
0	300
12	300

Event Identifier

EVID

Indicates event type.

Common examples:

EVID	Meaning
0	Observation
1	Dose

Interpretation may vary slightly across workflows.

Missing Dependent Variable

MDV

Helps identify rows that should not contribute observations.

Typical values:

MDV	Meaning
0	Observation present
1	Observation ignored

Dose rows commonly use MDV=1.

Worked Example 3: Reading Event Records

Example:

ID	TIME	AMT	DV	EVID	MDV
1	0	300	NA	1	1
1	1	0	5.6	0	0
1	2	0	4.3	0	0

Interpretation:

Dose

↓

Observation

↓

Observation

This event structure is one of the core ideas of population modeling.

Why Event Structure Matters

Modeling software uses event information to determine:

when drug enters the system
when observations occur
which rows affect estimation
how simulations are generated

Without event information, concentration-time reconstruction becomes difficult.

Preview of the Course Dataset

In the next lesson, we will load:

nlmixr2data::theo_sd

and begin exploring real PK datasets.

At that point, the columns introduced here will become concrete.

Strategies

Read rows sequentially.
Interpret events chronologically.
Verify observation vs dose records.
Check identifiers early.

Common Mistakes

Assuming all rows contain observations.
Ignoring event columns.
Forgetting that time ordering matters.
Treating modeling datasets as spreadsheets.

Practice Problems

Why are modeling datasets usually long format?
Explain the role of DV.
Explain the difference between AMT and DV.
Interpret EVID=1.

Step-by-Step Solutions

Problem 1

Long format allows multiple events per subject.

Problem 2

DV stores observed outcomes.

Problem 3

AMT represents dose administration while DV stores observations.

Problem 4

EVID=1 typically represents a dose event.

Summary

Modeling datasets are event-based.
Long format supports repeated events.
Dosing and observations coexist.
Event variables drive modeling behavior.

Quick Tips

Read datasets row-by-row.
Think in events—not subjects.
Time ordering matters.
Event columns carry meaning.