Missing Data and BLQ Handling in PMx

Understand, diagnose, and handle missing values and BLQ observations in pharmacometric datasets without breaking modeling assumptions.
Tip

What you’ll build today: a clear, defensible strategy for identifying missing data and handling BLQ (Below the Limit of Quantification) values in PMx datasets — consistent with simple R-based modeling workflows.

Learning Objectives

By the end of this lesson, you will be able to:

  • Distinguish different types of missing data in PMx datasets.
  • Define and identify LLOQ and BLQ observations clearly.
  • Create explicit flags for missingness and BLQ values.
  • Use tidyr::fill() appropriately for structural completion within grouped data.
  • Prepare a modeling-ready DV column for use with lm(), nls(), lme(), and nlme().
  • Export datasets in common PMx formats (including "." for missing values).

Setup

library(tidyverse)

Key Ideas

Not all missing values mean the same thing.

In PMx data, missingness often arises because:

  • a value was not measured
  • a value was below the limit of quantification
  • a row represents a dose/event record
  • a value was lost or corrupted
  • a value is structurally missing but recoverable within subject

What is LLOQ?

LLOQ (Lower Limit of Quantification) is the smallest concentration that can be reliably measured by the assay.

If a measured concentration is below this limit, it is typically flagged as:

  • BLQ (Below the Limit of Quantification)

In many real datasets:

  • BLQ is stored as an integer flag (0 = not BLQ, 1 = BLQ)
  • DV may still contain a numeric value
  • or DV may be set to NA

Treating all missing values the same is a serious modeling mistake.

Warning

Deleting rows with missing values without understanding why they are missing can bias your analysis.


Example PMx Dataset

pk <- tibble::tribble(
  ~ID, ~TIME, ~EVID, ~AMT, ~DV,  ~BLQ,
    1,   0.0,    1,  100,  NA,    NA,
    1,   0.5,    0,   NA,  2.1,   0,
    1,   1.0,    0,   NA,  0.2,   1,
    1,   2.0,    0,   NA,  NA,    0,
    2,   0.0,    1,   80,  NA,    NA,
    2,   0.5,    0,   NA,  1.6,   0,
    2,   1.0,    0,   NA,  0.1,   1
)

pk
# A tibble: 7 × 6
     ID  TIME  EVID   AMT    DV   BLQ
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1   0       1   100  NA      NA
2     1   0.5     0    NA   2.1     0
3     1   1       0    NA   0.2     1
4     1   2       0    NA  NA       0
5     2   0       1    80  NA      NA
6     2   0.5     0    NA   1.6     0
7     2   1       0    NA   0.1     1

Assume:

  • DV is concentration
  • BLQ is stored as integer (0/1)
  • DV == NA may indicate dose row or missing observation

Structural Missingness vs Analytical Missingness

Some values are missing by design, not because they are unknown.

For example, dose (AMT) appears only on dose rows (EVID == 1).
If downstream code expects dose to be available on every row within subject, we can propagate it structurally.

pk_filled <- pk %>%
  group_by(ID) %>%
  fill(AMT, .direction = "down")

pk_filled
# A tibble: 7 × 6
# Groups:   ID [2]
     ID  TIME  EVID   AMT    DV   BLQ
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1   0       1   100  NA      NA
2     1   0.5     0   100   2.1     0
3     1   1       0   100   0.2     1
4     1   2       0   100  NA       0
5     2   0       1    80  NA      NA
6     2   0.5     0    80   1.6     0
7     2   1       0    80   0.1     1

This does not invent data.
It simply carries forward known values within each subject.

Warning

fill() propagates existing values.
It is not statistical imputation and must never be used to guess unknown measurements.

Use fill() only when the structure of the study guarantees the value is constant within group (e.g., dose per subject, treatment arm, study cohort).


Worked Example 1: Inspect Missingness and BLQ

pk %>%
  summarise(
    n_missing_DV = sum(is.na(DV)),
    n_blq = sum(BLQ == 1, na.rm = TRUE)
  )
# A tibble: 1 × 2
  n_missing_DV n_blq
         <int> <int>
1            3     2

Separate dose vs observation rows:

pk <- pk %>%
  mutate(
    is_dose = EVID == 1,
    is_obs  = EVID == 0
  )

Worked Example 2: Create Explicit Flags

pk2 <- pk %>%
  mutate(
    is_blq = is_obs & BLQ == 1,
    is_missing_obs = is_obs & is.na(DV) & (BLQ == 0 | is.na(BLQ))
  )

pk2
# A tibble: 7 × 10
     ID  TIME  EVID   AMT    DV   BLQ is_dose is_obs is_blq is_missing_obs
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>   <lgl>  <lgl>  <lgl>         
1     1   0       1   100  NA      NA TRUE    FALSE  FALSE  FALSE         
2     1   0.5     0    NA   2.1     0 FALSE   TRUE   FALSE  FALSE         
3     1   1       0    NA   0.2     1 FALSE   TRUE   TRUE   FALSE         
4     1   2       0    NA  NA       0 FALSE   TRUE   FALSE  TRUE          
5     2   0       1    80  NA      NA TRUE    FALSE  FALSE  FALSE         
6     2   0.5     0    NA   1.6     0 FALSE   TRUE   FALSE  FALSE         
7     2   1       0    NA   0.1     1 FALSE   TRUE   TRUE   FALSE         
Note

Flags preserve information. You can always filter later—but you cannot recover deleted rows.


Worked Example 3: Prepare a Modeling DV

pk_model <- pk2 %>%
  mutate(
    DV_model = case_when(
      is_obs & BLQ == 0 & !is.na(DV) ~ DV,
      TRUE ~ NA_real_
    )
  )

pk_model %>% select(ID, TIME, DV, BLQ, DV_model)
# A tibble: 7 × 5
     ID  TIME    DV   BLQ DV_model
  <dbl> <dbl> <dbl> <dbl>    <dbl>
1     1   0    NA      NA     NA  
2     1   0.5   2.1     0      2.1
3     1   1     0.2     1     NA  
4     1   2    NA       0     NA  
5     2   0    NA      NA     NA  
6     2   0.5   1.6     0      1.6
7     2   1     0.1     1     NA  

This keeps modeling intent transparent.


Worked Example 4: Explicit Removal (If Required)

pk_m1 <- pk_model %>%
  filter(!(is_obs & BLQ == 1))

pk_m1
# A tibble: 5 × 11
     ID  TIME  EVID   AMT    DV   BLQ is_dose is_obs is_blq is_missing_obs
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>   <lgl>  <lgl>  <lgl>         
1     1   0       1   100  NA      NA TRUE    FALSE  FALSE  FALSE         
2     1   0.5     0    NA   2.1     0 FALSE   TRUE   FALSE  FALSE         
3     1   2       0    NA  NA       0 FALSE   TRUE   FALSE  TRUE          
4     2   0       1    80  NA      NA TRUE    FALSE  FALSE  FALSE         
5     2   0.5     0    NA   1.6     0 FALSE   TRUE   FALSE  FALSE         
# ℹ 1 more variable: DV_model <dbl>

This makes the removal explicit and reviewable.


Exporting Conventions: NA vs “.”

In many PMx workflows, final analysis datasets are saved with:

  • numeric values as numbers
  • missing values written as "."

R internally uses NA, but you can convert at export:

pk_export <- pk_model %>%
  mutate(
    across(everything(), ~ if_else(is.na(.x), ".", as.character(.x)))
  )

pk_export
# A tibble: 7 × 11
  ID    TIME  EVID  AMT   DV    BLQ   is_dose is_obs is_blq is_missing_obs
  <chr> <chr> <chr> <chr> <chr> <chr> <chr>   <chr>  <chr>  <chr>         
1 1     0     1     100   .     .     TRUE    FALSE  FALSE  FALSE         
2 1     0.5   0     .     2.1   0     FALSE   TRUE   FALSE  FALSE         
3 1     1     0     .     0.2   1     FALSE   TRUE   TRUE   FALSE         
4 1     2     0     .     .     0     FALSE   TRUE   FALSE  TRUE          
5 2     0     1     80    .     .     TRUE    FALSE  FALSE  FALSE         
6 2     0.5   0     .     1.6   0     FALSE   TRUE   FALSE  FALSE         
7 2     1     0     .     0.1   1     FALSE   TRUE   TRUE   FALSE         
# ℹ 1 more variable: DV_model <chr>
# write_csv(pk_export, "data/pk_analysis_dataset.csv", na = ".")
Warning

Keep NA inside R. Convert to "." only when writing the final file.


Strategies

  • Keep raw DV unchanged
  • Treat BLQ as an integer flag (0/1), not logical
  • Use fill() only for structural propagation within groups
  • Create explicit flags for modeling decisions
  • Separate data cleaning from modeling decisions
  • Convert NA to "." only at export time (if required)

Common Mistakes

  • Treating BLQ as missing without checking the flag
  • Converting BLQ to zero
  • Logging DV before handling BLQ
  • Dropping all NA rows blindly
  • Using fill() where structural guarantees do not exist
  • Mixing "." and NA inside R objects

Practice Problems

  1. Count BLQ observations using BLQ == 1.
  2. Create flags for is_obs, is_blq, and is_missing_obs.
  3. Use fill() to propagate AMT within subject.
  4. Create a DV_model column excluding BLQ rows.
  5. Filter out BLQ rows explicitly.
  6. Convert NA values to "." for export.

pk %>% summarise(n_blq = sum(BLQ == 1, na.rm = TRUE))
# A tibble: 1 × 1
  n_blq
  <int>
1     2
pk2 <- pk %>%
  mutate(
    is_obs = EVID == 0,
    is_blq = is_obs & BLQ == 1,
    is_missing_obs = is_obs & is.na(DV) & (BLQ == 0 | is.na(BLQ))
  )

pk %>% group_by(ID) %>% fill(AMT, .direction = "down")
# A tibble: 7 × 8
# Groups:   ID [2]
     ID  TIME  EVID   AMT    DV   BLQ is_dose is_obs
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>   <lgl> 
1     1   0       1   100  NA      NA TRUE    FALSE 
2     1   0.5     0   100   2.1     0 FALSE   TRUE  
3     1   1       0   100   0.2     1 FALSE   TRUE  
4     1   2       0   100  NA       0 FALSE   TRUE  
5     2   0       1    80  NA      NA TRUE    FALSE 
6     2   0.5     0    80   1.6     0 FALSE   TRUE  
7     2   1       0    80   0.1     1 FALSE   TRUE  
pk2 %>%
  mutate(
    DV_model = case_when(
      is_obs & BLQ == 0 & !is.na(DV) ~ DV,
      TRUE ~ NA_real_
    )
  )
# A tibble: 7 × 11
     ID  TIME  EVID   AMT    DV   BLQ is_dose is_obs is_blq is_missing_obs
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>   <lgl>  <lgl>  <lgl>         
1     1   0       1   100  NA      NA TRUE    FALSE  FALSE  FALSE         
2     1   0.5     0    NA   2.1     0 FALSE   TRUE   FALSE  FALSE         
3     1   1       0    NA   0.2     1 FALSE   TRUE   TRUE   FALSE         
4     1   2       0    NA  NA       0 FALSE   TRUE   FALSE  TRUE          
5     2   0       1    80  NA      NA TRUE    FALSE  FALSE  FALSE         
6     2   0.5     0    NA   1.6     0 FALSE   TRUE   FALSE  FALSE         
7     2   1       0    NA   0.1     1 FALSE   TRUE   TRUE   FALSE         
# ℹ 1 more variable: DV_model <dbl>
pk2 %>% filter(!(EVID == 0 & BLQ == 1))
# A tibble: 5 × 10
     ID  TIME  EVID   AMT    DV   BLQ is_dose is_obs is_blq is_missing_obs
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>   <lgl>  <lgl>  <lgl>         
1     1   0       1   100  NA      NA TRUE    FALSE  FALSE  FALSE         
2     1   0.5     0    NA   2.1     0 FALSE   TRUE   FALSE  FALSE         
3     1   2       0    NA  NA       0 FALSE   TRUE   FALSE  TRUE          
4     2   0       1    80  NA      NA TRUE    FALSE  FALSE  FALSE         
5     2   0.5     0    NA   1.6     0 FALSE   TRUE   FALSE  FALSE         
pk2 %>%
  mutate(across(everything(), ~ if_else(is.na(.x), ".", as.character(.x))))
# A tibble: 7 × 10
  ID    TIME  EVID  AMT   DV    BLQ   is_dose is_obs is_blq is_missing_obs
  <chr> <chr> <chr> <chr> <chr> <chr> <chr>   <chr>  <chr>  <chr>         
1 1     0     1     100   .     .     TRUE    FALSE  FALSE  FALSE         
2 1     0.5   0     .     2.1   0     FALSE   TRUE   FALSE  FALSE         
3 1     1     0     .     0.2   1     FALSE   TRUE   TRUE   FALSE         
4 1     2     0     .     .     0     FALSE   TRUE   FALSE  TRUE          
5 2     0     1     80    .     .     TRUE    FALSE  FALSE  FALSE         
6 2     0.5   0     .     1.6   0     FALSE   TRUE   FALSE  FALSE         
7 2     1     0     .     0.1   1     FALSE   TRUE   TRUE   FALSE         

Summary

In this lesson you learned how to:

  • define LLOQ and BLQ clearly
  • distinguish BLQ from true missingness
  • use fill() for structural completion (not imputation)
  • create modeling-ready DV columns explicitly
  • export datasets using "." for missing values

Missing data handling is a modeling decision — not a convenience step.


  • BLQ is usually stored as 0/1.
  • Keep raw DV untouched.
  • Use flags instead of silent deletion.
  • Use fill() only when structure guarantees validity.
  • Keep NA in R; convert to "." only at export.