library(tidyverse)Missing Data and BLQ Handling in PMx
What you’ll build today: a clear, defensible strategy for identifying missing data and handling BLQ (Below the Limit of Quantification) values in PMx datasets — consistent with simple R-based modeling workflows.
Learning Objectives
By the end of this lesson, you will be able to:
- Distinguish different types of missing data in PMx datasets.
- Define and identify LLOQ and BLQ observations clearly.
- Create explicit flags for missingness and BLQ values.
- Use
tidyr::fill()appropriately for structural completion within grouped data. - Prepare a modeling-ready DV column for use with
lm(),nls(),lme(), andnlme(). - Export datasets in common PMx formats (including
"."for missing values).
Setup
Key Ideas
Not all missing values mean the same thing.
In PMx data, missingness often arises because:
- a value was not measured
- a value was below the limit of quantification
- a row represents a dose/event record
- a value was lost or corrupted
- a value is structurally missing but recoverable within subject
What is LLOQ?
LLOQ (Lower Limit of Quantification) is the smallest concentration that can be reliably measured by the assay.
If a measured concentration is below this limit, it is typically flagged as:
- BLQ (Below the Limit of Quantification)
In many real datasets:
BLQis stored as an integer flag (0 = not BLQ, 1 = BLQ)DVmay still contain a numeric value- or
DVmay be set toNA
Treating all missing values the same is a serious modeling mistake.
Deleting rows with missing values without understanding why they are missing can bias your analysis.
Example PMx Dataset
pk <- tibble::tribble(
~ID, ~TIME, ~EVID, ~AMT, ~DV, ~BLQ,
1, 0.0, 1, 100, NA, NA,
1, 0.5, 0, NA, 2.1, 0,
1, 1.0, 0, NA, 0.2, 1,
1, 2.0, 0, NA, NA, 0,
2, 0.0, 1, 80, NA, NA,
2, 0.5, 0, NA, 1.6, 0,
2, 1.0, 0, NA, 0.1, 1
)
pk# A tibble: 7 × 6
ID TIME EVID AMT DV BLQ
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 100 NA NA
2 1 0.5 0 NA 2.1 0
3 1 1 0 NA 0.2 1
4 1 2 0 NA NA 0
5 2 0 1 80 NA NA
6 2 0.5 0 NA 1.6 0
7 2 1 0 NA 0.1 1
Assume:
DVis concentrationBLQis stored as integer (0/1)DV == NAmay indicate dose row or missing observation
Structural Missingness vs Analytical Missingness
Some values are missing by design, not because they are unknown.
For example, dose (AMT) appears only on dose rows (EVID == 1).
If downstream code expects dose to be available on every row within subject, we can propagate it structurally.
pk_filled <- pk %>%
group_by(ID) %>%
fill(AMT, .direction = "down")
pk_filled# A tibble: 7 × 6
# Groups: ID [2]
ID TIME EVID AMT DV BLQ
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 100 NA NA
2 1 0.5 0 100 2.1 0
3 1 1 0 100 0.2 1
4 1 2 0 100 NA 0
5 2 0 1 80 NA NA
6 2 0.5 0 80 1.6 0
7 2 1 0 80 0.1 1
This does not invent data.
It simply carries forward known values within each subject.
fill() propagates existing values.
It is not statistical imputation and must never be used to guess unknown measurements.
Use fill() only when the structure of the study guarantees the value is constant within group (e.g., dose per subject, treatment arm, study cohort).
Worked Example 1: Inspect Missingness and BLQ
pk %>%
summarise(
n_missing_DV = sum(is.na(DV)),
n_blq = sum(BLQ == 1, na.rm = TRUE)
)# A tibble: 1 × 2
n_missing_DV n_blq
<int> <int>
1 3 2
Separate dose vs observation rows:
pk <- pk %>%
mutate(
is_dose = EVID == 1,
is_obs = EVID == 0
)Worked Example 2: Create Explicit Flags
pk2 <- pk %>%
mutate(
is_blq = is_obs & BLQ == 1,
is_missing_obs = is_obs & is.na(DV) & (BLQ == 0 | is.na(BLQ))
)
pk2# A tibble: 7 × 10
ID TIME EVID AMT DV BLQ is_dose is_obs is_blq is_missing_obs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <lgl>
1 1 0 1 100 NA NA TRUE FALSE FALSE FALSE
2 1 0.5 0 NA 2.1 0 FALSE TRUE FALSE FALSE
3 1 1 0 NA 0.2 1 FALSE TRUE TRUE FALSE
4 1 2 0 NA NA 0 FALSE TRUE FALSE TRUE
5 2 0 1 80 NA NA TRUE FALSE FALSE FALSE
6 2 0.5 0 NA 1.6 0 FALSE TRUE FALSE FALSE
7 2 1 0 NA 0.1 1 FALSE TRUE TRUE FALSE
Flags preserve information. You can always filter later—but you cannot recover deleted rows.
Worked Example 3: Prepare a Modeling DV
pk_model <- pk2 %>%
mutate(
DV_model = case_when(
is_obs & BLQ == 0 & !is.na(DV) ~ DV,
TRUE ~ NA_real_
)
)
pk_model %>% select(ID, TIME, DV, BLQ, DV_model)# A tibble: 7 × 5
ID TIME DV BLQ DV_model
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 NA NA NA
2 1 0.5 2.1 0 2.1
3 1 1 0.2 1 NA
4 1 2 NA 0 NA
5 2 0 NA NA NA
6 2 0.5 1.6 0 1.6
7 2 1 0.1 1 NA
This keeps modeling intent transparent.
Worked Example 4: Explicit Removal (If Required)
pk_m1 <- pk_model %>%
filter(!(is_obs & BLQ == 1))
pk_m1# A tibble: 5 × 11
ID TIME EVID AMT DV BLQ is_dose is_obs is_blq is_missing_obs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <lgl>
1 1 0 1 100 NA NA TRUE FALSE FALSE FALSE
2 1 0.5 0 NA 2.1 0 FALSE TRUE FALSE FALSE
3 1 2 0 NA NA 0 FALSE TRUE FALSE TRUE
4 2 0 1 80 NA NA TRUE FALSE FALSE FALSE
5 2 0.5 0 NA 1.6 0 FALSE TRUE FALSE FALSE
# ℹ 1 more variable: DV_model <dbl>
This makes the removal explicit and reviewable.
Exporting Conventions: NA vs “.”
In many PMx workflows, final analysis datasets are saved with:
- numeric values as numbers
- missing values written as
"."
R internally uses NA, but you can convert at export:
pk_export <- pk_model %>%
mutate(
across(everything(), ~ if_else(is.na(.x), ".", as.character(.x)))
)
pk_export# A tibble: 7 × 11
ID TIME EVID AMT DV BLQ is_dose is_obs is_blq is_missing_obs
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 0 1 100 . . TRUE FALSE FALSE FALSE
2 1 0.5 0 . 2.1 0 FALSE TRUE FALSE FALSE
3 1 1 0 . 0.2 1 FALSE TRUE TRUE FALSE
4 1 2 0 . . 0 FALSE TRUE FALSE TRUE
5 2 0 1 80 . . TRUE FALSE FALSE FALSE
6 2 0.5 0 . 1.6 0 FALSE TRUE FALSE FALSE
7 2 1 0 . 0.1 1 FALSE TRUE TRUE FALSE
# ℹ 1 more variable: DV_model <chr>
# write_csv(pk_export, "data/pk_analysis_dataset.csv", na = ".")Keep NA inside R. Convert to "." only when writing the final file.
Strategies
- Keep raw DV unchanged
- Treat
BLQas an integer flag (0/1), not logical - Use
fill()only for structural propagation within groups - Create explicit flags for modeling decisions
- Separate data cleaning from modeling decisions
- Convert
NAto"."only at export time (if required)
Common Mistakes
- Treating BLQ as missing without checking the flag
- Converting BLQ to zero
- Logging DV before handling BLQ
- Dropping all
NArows blindly - Using
fill()where structural guarantees do not exist - Mixing
"."andNAinside R objects
Practice Problems
- Count BLQ observations using
BLQ == 1. - Create flags for
is_obs,is_blq, andis_missing_obs. - Use
fill()to propagateAMTwithin subject. - Create a
DV_modelcolumn excluding BLQ rows. - Filter out BLQ rows explicitly.
- Convert
NAvalues to"."for export.
pk %>% summarise(n_blq = sum(BLQ == 1, na.rm = TRUE))# A tibble: 1 × 1
n_blq
<int>
1 2
pk2 <- pk %>%
mutate(
is_obs = EVID == 0,
is_blq = is_obs & BLQ == 1,
is_missing_obs = is_obs & is.na(DV) & (BLQ == 0 | is.na(BLQ))
)
pk %>% group_by(ID) %>% fill(AMT, .direction = "down")# A tibble: 7 × 8
# Groups: ID [2]
ID TIME EVID AMT DV BLQ is_dose is_obs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
1 1 0 1 100 NA NA TRUE FALSE
2 1 0.5 0 100 2.1 0 FALSE TRUE
3 1 1 0 100 0.2 1 FALSE TRUE
4 1 2 0 100 NA 0 FALSE TRUE
5 2 0 1 80 NA NA TRUE FALSE
6 2 0.5 0 80 1.6 0 FALSE TRUE
7 2 1 0 80 0.1 1 FALSE TRUE
pk2 %>%
mutate(
DV_model = case_when(
is_obs & BLQ == 0 & !is.na(DV) ~ DV,
TRUE ~ NA_real_
)
)# A tibble: 7 × 11
ID TIME EVID AMT DV BLQ is_dose is_obs is_blq is_missing_obs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <lgl>
1 1 0 1 100 NA NA TRUE FALSE FALSE FALSE
2 1 0.5 0 NA 2.1 0 FALSE TRUE FALSE FALSE
3 1 1 0 NA 0.2 1 FALSE TRUE TRUE FALSE
4 1 2 0 NA NA 0 FALSE TRUE FALSE TRUE
5 2 0 1 80 NA NA TRUE FALSE FALSE FALSE
6 2 0.5 0 NA 1.6 0 FALSE TRUE FALSE FALSE
7 2 1 0 NA 0.1 1 FALSE TRUE TRUE FALSE
# ℹ 1 more variable: DV_model <dbl>
pk2 %>% filter(!(EVID == 0 & BLQ == 1))# A tibble: 5 × 10
ID TIME EVID AMT DV BLQ is_dose is_obs is_blq is_missing_obs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <lgl>
1 1 0 1 100 NA NA TRUE FALSE FALSE FALSE
2 1 0.5 0 NA 2.1 0 FALSE TRUE FALSE FALSE
3 1 2 0 NA NA 0 FALSE TRUE FALSE TRUE
4 2 0 1 80 NA NA TRUE FALSE FALSE FALSE
5 2 0.5 0 NA 1.6 0 FALSE TRUE FALSE FALSE
pk2 %>%
mutate(across(everything(), ~ if_else(is.na(.x), ".", as.character(.x))))# A tibble: 7 × 10
ID TIME EVID AMT DV BLQ is_dose is_obs is_blq is_missing_obs
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 0 1 100 . . TRUE FALSE FALSE FALSE
2 1 0.5 0 . 2.1 0 FALSE TRUE FALSE FALSE
3 1 1 0 . 0.2 1 FALSE TRUE TRUE FALSE
4 1 2 0 . . 0 FALSE TRUE FALSE TRUE
5 2 0 1 80 . . TRUE FALSE FALSE FALSE
6 2 0.5 0 . 1.6 0 FALSE TRUE FALSE FALSE
7 2 1 0 . 0.1 1 FALSE TRUE TRUE FALSE
Summary
In this lesson you learned how to:
- define LLOQ and BLQ clearly
- distinguish BLQ from true missingness
- use
fill()for structural completion (not imputation) - create modeling-ready DV columns explicitly
- export datasets using
"."for missing values
Missing data handling is a modeling decision — not a convenience step.
- BLQ is usually stored as 0/1.
- Keep raw DV untouched.
- Use flags instead of silent deletion.
- Use
fill()only when structure guarantees validity. - Keep
NAin R; convert to"."only at export.