Understanding Modeling Datasets

Learn how pharmacometric datasets are organized and how dosing and observation records are represented before model fitting.
Tip

Big picture: Before fitting a model, you need to understand what each row represents. Population models are built on event-based datasets where doses and observations live together.

Learning Objectives

By the end of this lesson, you will be able to:

  • Explain how pharmacometric datasets differ from ordinary analysis datasets.
  • Interpret dosing and observation records.
  • Describe why modeling datasets use long format.
  • Interpret common columns including ID, TIME, DV, AMT, EVID, and MDV.
  • Recognize how event records support simulation and estimation.

Key Ideas

  • Modeling datasets are event-based.
  • One row usually represents one event.
  • Dosing and observations are often stored together.
  • Long format supports modeling and simulation.
  • Event columns communicate meaning to the software.

Why Modeling Datasets Look Different

Many analysis datasets are organized around subjects.

Example:

One row = One subject

Population modeling datasets are usually organized differently.

Instead:

One row = One event

Events may include:

  • dose administration
  • concentration measurement
  • response measurement
  • additional records used during simulation

This structure gives flexibility for modeling and simulation.


Worked Example 1: Subject Dataset vs Event Dataset

Traditional analysis dataset:

ID AGE WT
1 42 78
2 35 64

Population modeling dataset:

ID TIME AMT DV
1 0 320 NA
1 1 0 5.4
1 2 0 3.9
2 0 320 NA
2 1 0 4.8

The same subject appears across multiple rows because events occur at multiple times.


Long Format

Population datasets are typically stored in long format.

Conceptually:

Subject

↓

Multiple Events

↓

Multiple Rows

This allows software to reconstruct time-dependent behavior.

Long format becomes especially important when:

  • repeated dosing exists
  • PK and PD endpoints coexist
  • simulation is required

Common Modeling Columns

Although datasets differ, several columns appear repeatedly.

Subject Identifier

ID

Identifies which rows belong to the same subject.

Example:

ID = 1
↓
All rows belong together

Time Variable

TIME

Represents elapsed time.

Example:

TIME
0
1
2
4

Interpretation depends on study design.


Dependent Variable

DV

Observed outcome.

Examples:

  • concentration
  • biomarker
  • response

Example:

TIME DV
1 5.6
2 4.1

Worked Example 2: Dosing vs Observation Records

Consider:

ID TIME AMT DV
1 0 300 NA
1 1 0 6.1
1 2 0 4.3

Interpretation:

Row 1:

Dose event

Rows 2–3:

Observation events

Dose rows may not contain observations.

Observation rows often have no administered amount.


Amount Variable

AMT

Represents administered amount.

Example:

TIME AMT
0 300
12 300

Event Identifier

EVID

Indicates event type.

Common examples:

EVID Meaning
0 Observation
1 Dose

Interpretation may vary slightly across workflows.


Missing Dependent Variable

MDV

Helps identify rows that should not contribute observations.

Typical values:

MDV Meaning
0 Observation present
1 Observation ignored

Dose rows commonly use MDV=1.


Worked Example 3: Reading Event Records

Example:

ID TIME AMT DV EVID MDV
1 0 300 NA 1 1
1 1 0 5.6 0 0
1 2 0 4.3 0 0

Interpretation:

Dose

↓

Observation

↓

Observation

This event structure is one of the core ideas of population modeling.


Why Event Structure Matters

Modeling software uses event information to determine:

  • when drug enters the system
  • when observations occur
  • which rows affect estimation
  • how simulations are generated

Without event information, concentration-time reconstruction becomes difficult.


Preview of the Course Dataset

In the next lesson, we will load:

nlmixr2data::theo_sd

and begin exploring real PK datasets.

At that point, the columns introduced here will become concrete.


Strategies

  • Read rows sequentially.
  • Interpret events chronologically.
  • Verify observation vs dose records.
  • Check identifiers early.

Common Mistakes

  • Assuming all rows contain observations.
  • Ignoring event columns.
  • Forgetting that time ordering matters.
  • Treating modeling datasets as spreadsheets.

Practice Problems

  1. Why are modeling datasets usually long format?
  2. Explain the role of DV.
  3. Explain the difference between AMT and DV.
  4. Interpret EVID=1.

Problem 1

Long format allows multiple events per subject.

Problem 2

DV stores observed outcomes.

Problem 3

AMT represents dose administration while DV stores observations.

Problem 4

EVID=1 typically represents a dose event.


Summary

  • Modeling datasets are event-based.
  • Long format supports repeated events.
  • Dosing and observations coexist.
  • Event variables drive modeling behavior.

  • Read datasets row-by-row.
  • Think in events—not subjects.
  • Time ordering matters.
  • Event columns carry meaning.