Filtering, Arranging, and Slicing Rows

Use filter(), arrange(), and slice() to focus on the right records, enforce ordering, and perform PMx-friendly row QC.
Tip

What you’ll build today: a small set of reliable patterns for selecting the right rows (and the right order) in PMx datasets.

Learning Objectives

By the end of this lesson, you will be able to:

  • Use filter() to subset rows with clear, readable logic.
  • Use arrange() to enforce PMx-safe ordering (especially by ID and TIME).
  • Use slice(), slice_head(), and slice_tail() for targeted row selection.
  • Apply common PMx filters: dose vs observation rows, time windows, and quick subject checks.
  • Recognize row-selection mistakes that can silently affect modeling.

Setup

library(tidyverse)

We’ll use a small PMx-style dataset.

pk <- tibble::tribble(
  ~ID, ~TIME, ~EVID, ~AMT, ~DV,  ~CMT, ~WT, ~SEX,
    1,   0.0,    1,  100,  NA,     1,  72, "F",
    1,   0.5,    0,   NA,  2.1,    1,  72, "F",
    1,   1.0,    0,   NA,  3.8,    1,  72, "F",
    1,   2.0,    0,   NA,  3.0,    1,  72, "F",
    2,   0.0,    1,   80,  NA,     1,  88, "M",
    2,   0.5,    0,   NA,  1.6,    1,  88, "M",
    2,   1.0,    0,   NA,  2.9,    1,  88, "M",
    2,   2.0,    0,   NA,  2.4,    1,  88, "M"
)

pk
# A tibble: 8 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     1   2       0    NA   3       1    72 F    
5     2   0       1    80  NA       1    88 M    
6     2   0.5     0    NA   1.6     1    88 M    
7     2   1       0    NA   2.9     1    88 M    
8     2   2       0    NA   2.4     1    88 M    

Key Ideas

Row operations answer questions like:

  • “Show me only observation records.”
  • “What happens in the first 2 hours post-dose?”
  • “Are subjects ordered correctly for modeling?”
  • “Give me the first few records per subject to sanity-check.”

In PMx workflows, row selection and row order directly affect model behavior.
If you remove or misorder records, you can change the event history.

Warning

Row-order mistakes are dangerous because they rarely produce obvious errors —
they just produce wrong results.


Worked Example 1: Filtering observations vs dosing rows

Observation rows often use EVID == 0:

obs <- pk %>% filter(EVID == 0)
obs
# A tibble: 6 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5     0    NA   2.1     1    72 F    
2     1   1       0    NA   3.8     1    72 F    
3     1   2       0    NA   3       1    72 F    
4     2   0.5     0    NA   1.6     1    88 M    
5     2   1       0    NA   2.9     1    88 M    
6     2   2       0    NA   2.4     1    88 M    

Dose rows often use EVID == 1:

dose <- pk %>% filter(EVID == 1)
dose
# A tibble: 2 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1     0     1   100    NA     1    72 F    
2     2     0     1    80    NA     1    88 M    

This separation is foundational for exposure-response workflows.

Note

Different systems may use different conventions (EVID, MDV, etc.),
but the concept is the same: separate events from observations intentionally.


Worked Example 2: Filtering time windows

Show observations within the first 1 hour:

pk %>%
  filter(EVID == 0, TIME <= 1)
# A tibble: 4 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5     0    NA   2.1     1    72 F    
2     1   1       0    NA   3.8     1    72 F    
3     2   0.5     0    NA   1.6     1    88 M    
4     2   1       0    NA   2.9     1    88 M    

Using between() for readability:

pk %>%
  filter(EVID == 0, between(TIME, 0, 1))
# A tibble: 4 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5     0    NA   2.1     1    72 F    
2     1   1       0    NA   3.8     1    72 F    
3     2   0.5     0    NA   1.6     1    88 M    
4     2   1       0    NA   2.9     1    88 M    

between() improves clarity and reduces boundary mistakes.


Worked Example 3: Filtering missing or non-missing DV

Keep only observation rows with an actual DV:

pk %>%
  filter(EVID == 0, !is.na(DV))
# A tibble: 6 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5     0    NA   2.1     1    72 F    
2     1   1       0    NA   3.8     1    72 F    
3     1   2       0    NA   3       1    72 F    
4     2   0.5     0    NA   1.6     1    88 M    
5     2   1       0    NA   2.9     1    88 M    
6     2   2       0    NA   2.4     1    88 M    
Warning

Filtering !is.na(DV) on the full dataset will remove dose rows.
If event history matters, isolate observation rows carefully.


Worked Example 4: Sorting with arrange()

A common PMx-safe ordering is:

  • by ID
  • then by TIME
  • with observation rows first when time ties
pk_sorted <- pk %>% arrange(ID, TIME, EVID)
pk_sorted
# A tibble: 8 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     1   2       0    NA   3       1    72 F    
5     2   0       1    80  NA       1    88 M    
6     2   0.5     0    NA   1.6     1    88 M    
7     2   1       0    NA   2.9     1    88 M    
8     2   2       0    NA   2.4     1    88 M    

Consistent sorting ensures reproducible modeling inputs.


Worked Example 5: Slicing for quick QC

First few rows (global)

pk %>% slice_head(n = 5)
# A tibble: 5 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     1   2       0    NA   3       1    72 F    
5     2   0       1    80  NA       1    88 M    

First few rows per subject

pk %>%
  arrange(ID, TIME, EVID) %>%
  group_by(ID) %>%
  slice_head(n = 3) %>%
  ungroup()
# A tibble: 6 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0       1   100  NA       1    72 F    
2     1   0.5     0    NA   2.1     1    72 F    
3     1   1       0    NA   3.8     1    72 F    
4     2   0       1    80  NA       1    88 M    
5     2   0.5     0    NA   1.6     1    88 M    
6     2   1       0    NA   2.9     1    88 M    

Last observation per subject

pk %>%
  filter(EVID == 0) %>%
  arrange(ID, TIME) %>%
  group_by(ID) %>%
  slice_tail(n = 1) %>%
  ungroup()
# A tibble: 2 × 8
     ID  TIME  EVID   AMT    DV   CMT    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1     1     2     0    NA   3       1    72 F    
2     2     2     0    NA   2.4     1    88 M    

These “sanity snapshots” are fast ways to catch structural issues.


Strategies

  • Write filters as readable statements (one condition per argument).
  • Use between() for time windows.
  • Use is.na() explicitly for missingness logic.
  • Enforce a standard ordering early: arrange(ID, TIME, EVID).
  • Build quick “spot checks” with slice_head() per subject.

Common Mistakes

  • Writing filter(A | B & C) without parentheses
  • Filtering on DV missingness and accidentally deleting event rows
  • Forgetting to ungroup() after grouped slicing
  • Assuming data are sorted when they are not

Practice Problems

  1. Create a dataset containing only observation rows.
  2. Keep only observation rows with TIME <= 1.
  3. Create a QC view showing the first 2 rows per subject.
  4. Compute the last observation time per subject.
  5. Enforce PMx-safe ordering.

# Observations only
obs <- pk %>% filter(EVID == 0)

# Observations within first hour
obs_1h <- pk %>% filter(EVID == 0, TIME <= 1)

# QC view
qc_view <- pk %>%
  arrange(ID, TIME, EVID) %>%
  group_by(ID) %>%
  slice_head(n = 2) %>%
  ungroup()

# Last observation time
last_obs <- pk %>%
  filter(EVID == 0) %>%
  group_by(ID) %>%
  summarise(last_time = max(TIME), .groups = "drop")

# Enforce ordering
pk <- pk %>% arrange(ID, TIME, EVID)

Summary

You now know how to:

  • Use filter() to target the correct rows intentionally.
  • Enforce consistent PMx ordering with arrange().
  • Use slice_*() functions for fast QC snapshots.
  • Protect modeling workflows from silent row-selection errors.

Row logic determines modeling logic.
Disciplined row control prevents subtle, downstream mistakes.


  • Write filters clearly: filter(EVID == 0, TIME <= 1).
  • Use between() for windows.
  • Sort early and consistently.
  • Use grouped slicing for fast subject-level checks.