library(tidyverse)Filtering, Arranging, and Slicing Rows
What you’ll build today: a small set of reliable patterns for selecting the right rows (and the right order) in PMx datasets.
Learning Objectives
By the end of this lesson, you will be able to:
- Use
filter()to subset rows with clear, readable logic. - Use
arrange()to enforce PMx-safe ordering (especially byIDandTIME). - Use
slice(),slice_head(), andslice_tail()for targeted row selection. - Apply common PMx filters: dose vs observation rows, time windows, and quick subject checks.
- Recognize row-selection mistakes that can silently affect modeling.
Setup
We’ll use a small PMx-style dataset.
pk <- tibble::tribble(
~ID, ~TIME, ~EVID, ~AMT, ~DV, ~CMT, ~WT, ~SEX,
1, 0.0, 1, 100, NA, 1, 72, "F",
1, 0.5, 0, NA, 2.1, 1, 72, "F",
1, 1.0, 0, NA, 3.8, 1, 72, "F",
1, 2.0, 0, NA, 3.0, 1, 72, "F",
2, 0.0, 1, 80, NA, 1, 88, "M",
2, 0.5, 0, NA, 1.6, 1, 88, "M",
2, 1.0, 0, NA, 2.9, 1, 88, "M",
2, 2.0, 0, NA, 2.4, 1, 88, "M"
)
pk# A tibble: 8 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2.1 1 72 F
3 1 1 0 NA 3.8 1 72 F
4 1 2 0 NA 3 1 72 F
5 2 0 1 80 NA 1 88 M
6 2 0.5 0 NA 1.6 1 88 M
7 2 1 0 NA 2.9 1 88 M
8 2 2 0 NA 2.4 1 88 M
Key Ideas
Row operations answer questions like:
- “Show me only observation records.”
- “What happens in the first 2 hours post-dose?”
- “Are subjects ordered correctly for modeling?”
- “Give me the first few records per subject to sanity-check.”
In PMx workflows, row selection and row order directly affect model behavior.
If you remove or misorder records, you can change the event history.
Row-order mistakes are dangerous because they rarely produce obvious errors —
they just produce wrong results.
Worked Example 1: Filtering observations vs dosing rows
Observation rows often use EVID == 0:
obs <- pk %>% filter(EVID == 0)
obs# A tibble: 6 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0.5 0 NA 2.1 1 72 F
2 1 1 0 NA 3.8 1 72 F
3 1 2 0 NA 3 1 72 F
4 2 0.5 0 NA 1.6 1 88 M
5 2 1 0 NA 2.9 1 88 M
6 2 2 0 NA 2.4 1 88 M
Dose rows often use EVID == 1:
dose <- pk %>% filter(EVID == 1)
dose# A tibble: 2 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 2 0 1 80 NA 1 88 M
This separation is foundational for exposure-response workflows.
Different systems may use different conventions (EVID, MDV, etc.),
but the concept is the same: separate events from observations intentionally.
Worked Example 2: Filtering time windows
Show observations within the first 1 hour:
pk %>%
filter(EVID == 0, TIME <= 1)# A tibble: 4 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0.5 0 NA 2.1 1 72 F
2 1 1 0 NA 3.8 1 72 F
3 2 0.5 0 NA 1.6 1 88 M
4 2 1 0 NA 2.9 1 88 M
Using between() for readability:
pk %>%
filter(EVID == 0, between(TIME, 0, 1))# A tibble: 4 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0.5 0 NA 2.1 1 72 F
2 1 1 0 NA 3.8 1 72 F
3 2 0.5 0 NA 1.6 1 88 M
4 2 1 0 NA 2.9 1 88 M
between() improves clarity and reduces boundary mistakes.
Worked Example 3: Filtering missing or non-missing DV
Keep only observation rows with an actual DV:
pk %>%
filter(EVID == 0, !is.na(DV))# A tibble: 6 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0.5 0 NA 2.1 1 72 F
2 1 1 0 NA 3.8 1 72 F
3 1 2 0 NA 3 1 72 F
4 2 0.5 0 NA 1.6 1 88 M
5 2 1 0 NA 2.9 1 88 M
6 2 2 0 NA 2.4 1 88 M
Filtering !is.na(DV) on the full dataset will remove dose rows.
If event history matters, isolate observation rows carefully.
Worked Example 4: Sorting with arrange()
A common PMx-safe ordering is:
- by
ID - then by
TIME - with observation rows first when time ties
pk_sorted <- pk %>% arrange(ID, TIME, EVID)
pk_sorted# A tibble: 8 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2.1 1 72 F
3 1 1 0 NA 3.8 1 72 F
4 1 2 0 NA 3 1 72 F
5 2 0 1 80 NA 1 88 M
6 2 0.5 0 NA 1.6 1 88 M
7 2 1 0 NA 2.9 1 88 M
8 2 2 0 NA 2.4 1 88 M
Consistent sorting ensures reproducible modeling inputs.
Worked Example 5: Slicing for quick QC
First few rows (global)
pk %>% slice_head(n = 5)# A tibble: 5 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2.1 1 72 F
3 1 1 0 NA 3.8 1 72 F
4 1 2 0 NA 3 1 72 F
5 2 0 1 80 NA 1 88 M
First few rows per subject
pk %>%
arrange(ID, TIME, EVID) %>%
group_by(ID) %>%
slice_head(n = 3) %>%
ungroup()# A tibble: 6 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 1 100 NA 1 72 F
2 1 0.5 0 NA 2.1 1 72 F
3 1 1 0 NA 3.8 1 72 F
4 2 0 1 80 NA 1 88 M
5 2 0.5 0 NA 1.6 1 88 M
6 2 1 0 NA 2.9 1 88 M
Last observation per subject
pk %>%
filter(EVID == 0) %>%
arrange(ID, TIME) %>%
group_by(ID) %>%
slice_tail(n = 1) %>%
ungroup()# A tibble: 2 × 8
ID TIME EVID AMT DV CMT WT SEX
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 2 0 NA 3 1 72 F
2 2 2 0 NA 2.4 1 88 M
These “sanity snapshots” are fast ways to catch structural issues.
Strategies
- Write filters as readable statements (one condition per argument).
- Use
between()for time windows. - Use
is.na()explicitly for missingness logic. - Enforce a standard ordering early:
arrange(ID, TIME, EVID). - Build quick “spot checks” with
slice_head()per subject.
Common Mistakes
- Writing
filter(A | B & C)without parentheses - Filtering on DV missingness and accidentally deleting event rows
- Forgetting to
ungroup()after grouped slicing - Assuming data are sorted when they are not
Practice Problems
- Create a dataset containing only observation rows.
- Keep only observation rows with
TIME <= 1. - Create a QC view showing the first 2 rows per subject.
- Compute the last observation time per subject.
- Enforce PMx-safe ordering.
# Observations only
obs <- pk %>% filter(EVID == 0)
# Observations within first hour
obs_1h <- pk %>% filter(EVID == 0, TIME <= 1)
# QC view
qc_view <- pk %>%
arrange(ID, TIME, EVID) %>%
group_by(ID) %>%
slice_head(n = 2) %>%
ungroup()
# Last observation time
last_obs <- pk %>%
filter(EVID == 0) %>%
group_by(ID) %>%
summarise(last_time = max(TIME), .groups = "drop")
# Enforce ordering
pk <- pk %>% arrange(ID, TIME, EVID)Summary
You now know how to:
- Use
filter()to target the correct rows intentionally. - Enforce consistent PMx ordering with
arrange(). - Use
slice_*()functions for fast QC snapshots. - Protect modeling workflows from silent row-selection errors.
Row logic determines modeling logic.
Disciplined row control prevents subtle, downstream mistakes.
- Write filters clearly:
filter(EVID == 0, TIME <= 1). - Use
between()for windows. - Sort early and consistently.
- Use grouped slicing for fast subject-level checks.