library(tidyverse)Indexing and Subsetting
Learn how to extract rows and columns safely in R using $, [], [[ ]], and logical indexing, with PMx examples.
Tip
Big idea: Most bugs in early R work come from extracting the wrong rows or columns. Learning indexing carefully prevents silent PMx errors.
Learning Objectives
By the end of this lesson, you will be able to:
- Use
$,[[ ]], and[ ]correctly. - Subset rows and columns intentionally.
- Apply logical indexing to filter data.
- Recognize common subsetting mistakes.
- Confidently extract PMx-relevant subsets (subjects, times, conditions).
Setup
A Simple PMx Dataset
We’ll reuse a small PK-style table.
pk <- tibble(
ID = c(1, 1, 1, 2, 2, 2),
TIME = c(0.5, 1, 2, 0.5, 1, 2),
DV = c(2.1, 3.8, 3.0, 1.6, 2.9, 2.4)
)
pk# A tibble: 6 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 1 3.8
3 1 2 3
4 2 0.5 1.6
5 2 1 2.9
6 2 2 2.4
Extracting Columns
Using $ (most common)
pk$DV[1] 2.1 3.8 3.0 1.6 2.9 2.4
This returns a vector, not a data frame.
Using [[ ]] (programmatic and safe)
pk[["DV"]][1] 2.1 3.8 3.0 1.6 2.9 2.4
Use this inside functions or when column names are stored as strings.
Using [ ] (returns a data frame)
pk["DV"]# A tibble: 6 × 1
DV
<dbl>
1 2.1
2 3.8
3 3
4 1.6
5 2.9
6 2.4
This returns a tibble with one column.
Note
Remember:
$and[[ ]]→ vectors
[ ]→ data frame
Extracting Rows
By position
pk[1, ] # first row# A tibble: 1 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
pk[1:3, ] # first three rows# A tibble: 3 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 1 3.8
3 1 2 3
By column position
pk[, 1] # first column (vector)# A tibble: 6 × 1
ID
<dbl>
1 1
2 1
3 1
4 2
5 2
6 2
pk[, 1:2] # first two columns (data frame)# A tibble: 6 × 2
ID TIME
<dbl> <dbl>
1 1 0.5
2 1 1
3 1 2
4 2 0.5
5 2 1
6 2 2
Logical Indexing (Essential for PMx)
Filter rows based on conditions.
pk$DV > 2[1] TRUE TRUE TRUE FALSE TRUE TRUE
Use that logical vector to subset rows:
pk[pk$DV > 2, ]# A tibble: 5 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 1 3.8
3 1 2 3
4 2 1 2.9
5 2 2 2.4
This pattern is everywhere in PMx workflows.
PMx Examples
1) Subset a single subject
pk[pk$ID == 1, ]# A tibble: 3 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 1 3.8
3 1 2 3
2) Subset a time window
pk[pk$TIME <= 1, ]# A tibble: 4 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 1 3.8
3 2 0.5 1.6
4 2 1 2.9
3) Combine conditions
pk[pk$ID == 2 & pk$DV > 2, ]# A tibble: 2 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 2 1 2.9
2 2 2 2.4
%in% for multiple values
pk[pk$ID %in% c(1, 2), ]# A tibble: 6 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 1 3.8
3 1 2 3
4 2 0.5 1.6
5 2 1 2.9
6 2 2 2.4
This scales better than chaining ==.
Missing Values in Indexing
Be careful with NA.
pk_na <- pk
pk_na$DV[2] <- NA
pk_na[pk_na$DV > 2, ]# A tibble: 5 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 NA NA NA
3 1 2 3
4 2 1 2.9
5 2 2 2.4
That row disappears because NA > 2 is NA, not TRUE.
Safer:
pk_na[!is.na(pk_na$DV) & pk_na$DV > 2, ]# A tibble: 4 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 2 3
3 2 1 2.9
4 2 2 2.4
Strategies
- Use logical conditions instead of row numbers whenever possible.
- Check results with
nrow()andhead(). - Be explicit about handling
NA. - Prefer clarity over clever one-liners.
- When in doubt, inspect intermediate objects.
Practice Problems
- Extract the
DVcolumn as a vector. - Extract the
DVcolumn as a tibble. - Subset rows where
ID == 1. - Subset rows where
TIME > 1. - Subset rows where
DV > 2andID == 2. - Repeat one subset safely when
DVcontainsNA.
TipStep-by-Step Solutions
pk$DV[1] 2.1 3.8 3.0 1.6 2.9 2.4
pk["DV"]# A tibble: 6 × 1
DV
<dbl>
1 2.1
2 3.8
3 3
4 1.6
5 2.9
6 2.4
pk[pk$ID == 1, ]# A tibble: 3 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 1 3.8
3 1 2 3
pk[pk$TIME > 1, ]# A tibble: 2 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 2 3
2 2 2 2.4
pk[pk$DV > 2 & pk$ID == 2, ]# A tibble: 2 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 2 1 2.9
2 2 2 2.4
pk_na[!is.na(pk_na$DV) & pk_na$DV > 2, ]# A tibble: 4 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 2 3
3 2 1 2.9
4 2 2 2.4
Summary
You now know how to:
- extract columns and rows intentionally
- choose the right indexing tool for the job
- use logical conditions for PMx-style filtering
- avoid silent subsetting errors
Indexing is a core skill you’ll use every day in PMx workflows.
TipQuick Tips
$is quick;[[ ]]is safer in functions.[ ]keeps data frames intact.- Logical indexing beats row numbers.
- Always think about
NAwhen filtering.