Filtering, Arranging, and Slicing Rows

Use filter(), arrange(), and slice() to focus on the right records, enforce ordering, and perform PMx-friendly row QC.

Tip

What you’ll build today: a small set of reliable patterns for selecting the right rows (and the right order) in PMx datasets.

Learning Objectives

By the end of this lesson, you will be able to:

Use filter() to subset rows with clear, readable logic.
Use arrange() to enforce PMx-safe ordering (especially by ID and TIME).
Use slice(), slice_head(), and slice_tail() for targeted row selection.
Apply common PMx filters: dose vs observation rows, time windows, and quick subject checks.
Recognize row-selection mistakes that can silently affect modeling.

Setup

library(tidyverse)

We’ll use a small PMx-style dataset.

pk <- tibble::tribble(
  ~ID, ~TIME, ~EVID, ~AMT, ~DV,  ~CMT, ~WT, ~SEX,
    1,   0.0,    1,  100,  NA,     1,  72, "F",
    1,   0.5,    0,   NA,  2.1,    1,  72, "F",
    1,   1.0,    0,   NA,  3.8,    1,  72, "F",
    1,   2.0,    0,   NA,  3.0,    1,  72, "F",
    2,   0.0,    1,   80,  NA,     1,  88, "M",
    2,   0.5,    0,   NA,  1.6,    1,  88, "M",
    2,   1.0,    0,   NA,  2.9,    1,  88, "M",
    2,   2.0,    0,   NA,  2.4,    1,  88, "M"
)

pk

# A tibble: 8 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     1   2       0    NA   3       1    72 F    
5     2   0       1    80  NA       1    88 M    
6     2   0.5     0    NA   1.6     1    88 M    
7     2   1       0    NA   2.9     1    88 M    
8     2   2       0    NA   2.4     1    88 M

Key Ideas

Row operations answer questions like:

“Show me only observation records.”
“What happens in the first 2 hours post-dose?”
“Are subjects ordered correctly for modeling?”
“Give me the first few records per subject to sanity-check.”

In PMx workflows, row selection and row order directly affect model behavior.
If you remove or misorder records, you can change the event history.

Warning

Row-order mistakes are dangerous because they rarely produce obvious errors —
they just produce wrong results.

Worked Example 1: Filtering observations vs dosing rows

Observation rows often use EVID == 0:

obs <- pk %>% filter(EVID == 0)
obs

# A tibble: 6 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5     0    NA   2.1     1    72 F    
2     1   1       0    NA   3.8     1    72 F    
3     1   2       0    NA   3       1    72 F    
4     2   0.5     0    NA   1.6     1    88 M    
5     2   1       0    NA   2.9     1    88 M    
6     2   2       0    NA   2.4     1    88 M

Dose rows often use EVID == 1:

dose <- pk %>% filter(EVID == 1)
dose

# A tibble: 2 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1     0     1   100    NA     1    72 F    
2     2     0     1    80    NA     1    88 M

This separation is foundational for exposure-response workflows.

Note

Different systems may use different conventions (EVID, MDV, etc.),
but the concept is the same: separate events from observations intentionally.

Worked Example 2: Filtering time windows

Show observations within the first 1 hour:

pk %>%
  filter(EVID == 0, TIME <= 1)

# A tibble: 4 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5     0    NA   2.1     1    72 F    
2     1   1       0    NA   3.8     1    72 F    
3     2   0.5     0    NA   1.6     1    88 M    
4     2   1       0    NA   2.9     1    88 M

Using between() for readability:

pk %>%
  filter(EVID == 0, between(TIME, 0, 1))

# A tibble: 4 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5     0    NA   2.1     1    72 F    
2     1   1       0    NA   3.8     1    72 F    
3     2   0.5     0    NA   1.6     1    88 M    
4     2   1       0    NA   2.9     1    88 M

between() improves clarity and reduces boundary mistakes.

Worked Example 3: Filtering missing or non-missing DV

Keep only observation rows with an actual DV:

pk %>%
  filter(EVID == 0, !is.na(DV))

# A tibble: 6 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5     0    NA   2.1     1    72 F    
2     1   1       0    NA   3.8     1    72 F    
3     1   2       0    NA   3       1    72 F    
4     2   0.5     0    NA   1.6     1    88 M    
5     2   1       0    NA   2.9     1    88 M    
6     2   2       0    NA   2.4     1    88 M

Warning

Filtering !is.na(DV) on the full dataset will remove dose rows.
If event history matters, isolate observation rows carefully.

Worked Example 4: Sorting with arrange()

A common PMx-safe ordering is:

by ID
then by TIME
with observation rows first when time ties

pk_sorted <- pk %>% arrange(ID, TIME, EVID)
pk_sorted

# A tibble: 8 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     1   2       0    NA   3       1    72 F    
5     2   0       1    80  NA       1    88 M    
6     2   0.5     0    NA   1.6     1    88 M    
7     2   1       0    NA   2.9     1    88 M    
8     2   2       0    NA   2.4     1    88 M

Consistent sorting ensures reproducible modeling inputs.

Worked Example 5: Slicing for quick QC

First few rows (global)

pk %>% slice_head(n = 5)

# A tibble: 5 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     1   2       0    NA   3       1    72 F    
5     2   0       1    80  NA       1    88 M

First few rows per subject

pk %>%
  arrange(ID, TIME, EVID) %>%
  group_by(ID) %>%
  slice_head(n = 3) %>%
  ungroup()

# A tibble: 6 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     2   0       1    80  NA       1    88 M    
5     2   0.5     0    NA   1.6     1    88 M    
6     2   1       0    NA   2.9     1    88 M

Last observation per subject

pk %>%
  filter(EVID == 0) %>%
  arrange(ID, TIME) %>%
  group_by(ID) %>%
  slice_tail(n = 1) %>%
  ungroup()

# A tibble: 2 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1     2     0    NA   3       1    72 F    
2     2     2     0    NA   2.4     1    88 M

These “sanity snapshots” are fast ways to catch structural issues.

Strategies

Write filters as readable statements (one condition per argument).
Use between() for time windows.
Use is.na() explicitly for missingness logic.
Enforce a standard ordering early: arrange(ID, TIME, EVID).
Build quick “spot checks” with slice_head() per subject.

Common Mistakes

Writing filter(A | B & C) without parentheses
Filtering on DV missingness and accidentally deleting event rows
Forgetting to ungroup() after grouped slicing
Assuming data are sorted when they are not

Practice Problems

Create a dataset containing only observation rows.
Keep only observation rows with TIME <= 1.
Create a QC view showing the first 2 rows per subject.
Compute the last observation time per subject.
Enforce PMx-safe ordering.

Step-by-Step Solutions

# Observations only
obs <- pk %>% filter(EVID == 0)

# Observations within first hour
obs_1h <- pk %>% filter(EVID == 0, TIME <= 1)

# QC view
qc_view <- pk %>%
  arrange(ID, TIME, EVID) %>%
  group_by(ID) %>%
  slice_head(n = 2) %>%
  ungroup()

# Last observation time
last_obs <- pk %>%
  filter(EVID == 0) %>%
  group_by(ID) %>%
  summarise(last_time = max(TIME), .groups = "drop")

# Enforce ordering
pk <- pk %>% arrange(ID, TIME, EVID)

Summary

You now know how to:

Use filter() to target the correct rows intentionally.
Enforce consistent PMx ordering with arrange().
Use slice_*() functions for fast QC snapshots.
Protect modeling workflows from silent row-selection errors.

Row logic determines modeling logic.
Disciplined row control prevents subtle, downstream mistakes.

Quick Tips

Write filters clearly: filter(EVID == 0, TIME <= 1).
Use between() for windows.
Sort early and consistently.
Use grouped slicing for fast subject-level checks.