Indexing and Subsetting

Learn how to extract rows and columns safely in R using $, [], [[ ]], and logical indexing, with PMx examples.
Tip

Big idea: Most bugs in early R work come from extracting the wrong rows or columns. Learning indexing carefully prevents silent PMx errors.

Learning Objectives

By the end of this lesson, you will be able to:

  • Use $, [[ ]], and [ ] correctly.
  • Subset rows and columns intentionally.
  • Apply logical indexing to filter data.
  • Recognize common subsetting mistakes.
  • Confidently extract PMx-relevant subsets (subjects, times, conditions).

Setup

library(tidyverse)

A Simple PMx Dataset

We’ll reuse a small PK-style table.

pk <- tibble(
  ID   = c(1, 1, 1, 2, 2, 2),
  TIME = c(0.5, 1, 2, 0.5, 1, 2),
  DV   = c(2.1, 3.8, 3.0, 1.6, 2.9, 2.4)
)

pk
# A tibble: 6 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3  
4     2   0.5   1.6
5     2   1     2.9
6     2   2     2.4

Extracting Columns

Using $ (most common)

pk$DV
[1] 2.1 3.8 3.0 1.6 2.9 2.4

This returns a vector, not a data frame.


Using [[ ]] (programmatic and safe)

pk[["DV"]]
[1] 2.1 3.8 3.0 1.6 2.9 2.4

Use this inside functions or when column names are stored as strings.


Using [ ] (returns a data frame)

pk["DV"]
# A tibble: 6 × 1
     DV
  <dbl>
1   2.1
2   3.8
3   3  
4   1.6
5   2.9
6   2.4

This returns a tibble with one column.

Note

Remember:

  • $ and [[ ]] → vectors
  • [ ] → data frame

Extracting Rows

By position

pk[1, ]        # first row
# A tibble: 1 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
pk[1:3, ]     # first three rows
# A tibble: 3 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3  

By column position

pk[, 1]        # first column (vector)
# A tibble: 6 × 1
     ID
  <dbl>
1     1
2     1
3     1
4     2
5     2
6     2
pk[, 1:2]      # first two columns (data frame)
# A tibble: 6 × 2
     ID  TIME
  <dbl> <dbl>
1     1   0.5
2     1   1  
3     1   2  
4     2   0.5
5     2   1  
6     2   2  

Logical Indexing (Essential for PMx)

Filter rows based on conditions.

pk$DV > 2
[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE

Use that logical vector to subset rows:

pk[pk$DV > 2, ]
# A tibble: 5 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3  
4     2   1     2.9
5     2   2     2.4

This pattern is everywhere in PMx workflows.


PMx Examples

1) Subset a single subject

pk[pk$ID == 1, ]
# A tibble: 3 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3  

2) Subset a time window

pk[pk$TIME <= 1, ]
# A tibble: 4 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     2   0.5   1.6
4     2   1     2.9

3) Combine conditions

pk[pk$ID == 2 & pk$DV > 2, ]
# A tibble: 2 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     2     1   2.9
2     2     2   2.4

%in% for multiple values

pk[pk$ID %in% c(1, 2), ]
# A tibble: 6 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3  
4     2   0.5   1.6
5     2   1     2.9
6     2   2     2.4

This scales better than chaining ==.


Missing Values in Indexing

Be careful with NA.

pk_na <- pk
pk_na$DV[2] <- NA

pk_na[pk_na$DV > 2, ]
# A tibble: 5 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2    NA  NA    NA  
3     1   2     3  
4     2   1     2.9
5     2   2     2.4

That row disappears because NA > 2 is NA, not TRUE.

Safer:

pk_na[!is.na(pk_na$DV) & pk_na$DV > 2, ]
# A tibble: 4 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   2     3  
3     2   1     2.9
4     2   2     2.4

Strategies

  • Use logical conditions instead of row numbers whenever possible.
  • Check results with nrow() and head().
  • Be explicit about handling NA.
  • Prefer clarity over clever one-liners.
  • When in doubt, inspect intermediate objects.

Practice Problems

  1. Extract the DV column as a vector.
  2. Extract the DV column as a tibble.
  3. Subset rows where ID == 1.
  4. Subset rows where TIME > 1.
  5. Subset rows where DV > 2 and ID == 2.
  6. Repeat one subset safely when DV contains NA.

pk$DV
[1] 2.1 3.8 3.0 1.6 2.9 2.4
pk["DV"]
# A tibble: 6 × 1
     DV
  <dbl>
1   2.1
2   3.8
3   3  
4   1.6
5   2.9
6   2.4
pk[pk$ID == 1, ]
# A tibble: 3 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3  
pk[pk$TIME > 1, ]
# A tibble: 2 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1     2   3  
2     2     2   2.4
pk[pk$DV > 2 & pk$ID == 2, ]
# A tibble: 2 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     2     1   2.9
2     2     2   2.4
pk_na[!is.na(pk_na$DV) & pk_na$DV > 2, ]
# A tibble: 4 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   2     3  
3     2   1     2.9
4     2   2     2.4

Summary

You now know how to:

  • extract columns and rows intentionally
  • choose the right indexing tool for the job
  • use logical conditions for PMx-style filtering
  • avoid silent subsetting errors

Indexing is a core skill you’ll use every day in PMx workflows.


  • $ is quick; [[ ]] is safer in functions.
  • [ ] keeps data frames intact.
  • Logical indexing beats row numbers.
  • Always think about NA when filtering.