Indexing and Subsetting

Learn how to extract rows and columns safely in R using $, [], [[ ]], and logical indexing, with PMx examples.

Tip

Big idea: Most bugs in early R work come from extracting the wrong rows or columns. Learning indexing carefully prevents silent PMx errors.

Learning Objectives

By the end of this lesson, you will be able to:

Use $, [[ ]], and [ ] correctly.
Subset rows and columns intentionally.
Apply logical indexing to filter data.
Recognize common subsetting mistakes.
Confidently extract PMx-relevant subsets (subjects, times, conditions).

Setup

library(tidyverse)

A Simple PMx Dataset

We’ll reuse a small PK-style table.

pk <- tibble(
  ID   = c(1, 1, 1, 2, 2, 2),
  TIME = c(0.5, 1, 2, 0.5, 1, 2),
  DV   = c(2.1, 3.8, 3.0, 1.6, 2.9, 2.4)
)

pk

# A tibble: 6 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3  
4     2   0.5   1.6
5     2   1     2.9
6     2   2     2.4

Extracting Columns

Using `$` (most common)

pk$DV

[1] 2.1 3.8 3.0 1.6 2.9 2.4

This returns a vector, not a data frame.

Using `[[ ]]` (programmatic and safe)

pk[["DV"]]

[1] 2.1 3.8 3.0 1.6 2.9 2.4

Use this inside functions or when column names are stored as strings.

Using `[ ]` (returns a data frame)

pk["DV"]

# A tibble: 6 × 1
     DV
  <dbl>
1   2.1
2   3.8
3   3  
4   1.6
5   2.9
6   2.4

This returns a tibble with one column.

Note

Remember:

$ and [[ ]] → vectors
[ ] → data frame

Extracting Rows

By position

pk[1, ]        # first row

# A tibble: 1 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1

pk[1:3, ]     # first three rows

# A tibble: 3 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3

By column position

pk[, 1]        # first column (vector)

# A tibble: 6 × 1
     ID
  <dbl>
1     1
2     1
3     1
4     2
5     2
6     2

pk[, 1:2]      # first two columns (data frame)

# A tibble: 6 × 2
     ID  TIME
  <dbl> <dbl>
1     1   0.5
2     1   1  
3     1   2  
4     2   0.5
5     2   1  
6     2   2

Logical Indexing (Essential for PMx)

Filter rows based on conditions.

pk$DV > 2

[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE

Use that logical vector to subset rows:

pk[pk$DV > 2, ]

# A tibble: 5 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3  
4     2   1     2.9
5     2   2     2.4

This pattern is everywhere in PMx workflows.

PMx Examples

1) Subset a single subject

pk[pk$ID == 1, ]

# A tibble: 3 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3

2) Subset a time window

pk[pk$TIME <= 1, ]

# A tibble: 4 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     2   0.5   1.6
4     2   1     2.9

3) Combine conditions

pk[pk$ID == 2 & pk$DV > 2, ]

# A tibble: 2 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     2     1   2.9
2     2     2   2.4

`%in%` for multiple values

pk[pk$ID %in% c(1, 2), ]

# A tibble: 6 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3  
4     2   0.5   1.6
5     2   1     2.9
6     2   2     2.4

This scales better than chaining ==.

Missing Values in Indexing

Be careful with NA.

pk_na <- pk
pk_na$DV[2] <- NA

pk_na[pk_na$DV > 2, ]

# A tibble: 5 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2    NA  NA    NA  
3     1   2     3  
4     2   1     2.9
5     2   2     2.4

That row disappears because NA > 2 is NA, not TRUE.

Safer:

pk_na[!is.na(pk_na$DV) & pk_na$DV > 2, ]

# A tibble: 4 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   2     3  
3     2   1     2.9
4     2   2     2.4

Strategies

Use logical conditions instead of row numbers whenever possible.
Check results with nrow() and head().
Be explicit about handling NA.
Prefer clarity over clever one-liners.
When in doubt, inspect intermediate objects.

Practice Problems

Extract the DV column as a vector.
Extract the DV column as a tibble.
Subset rows where ID == 1.
Subset rows where TIME > 1.
Subset rows where DV > 2 and ID == 2.
Repeat one subset safely when DV contains NA.

Step-by-Step Solutions

pk$DV

[1] 2.1 3.8 3.0 1.6 2.9 2.4

pk["DV"]

# A tibble: 6 × 1
     DV
  <dbl>
1   2.1
2   3.8
3   3  
4   1.6
5   2.9
6   2.4

pk[pk$ID == 1, ]

# A tibble: 3 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     1   2     3

pk[pk$TIME > 1, ]

# A tibble: 2 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1     2   3  
2     2     2   2.4

pk[pk$DV > 2 & pk$ID == 2, ]

# A tibble: 2 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     2     1   2.9
2     2     2   2.4

pk_na[!is.na(pk_na$DV) & pk_na$DV > 2, ]

# A tibble: 4 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   2     3  
3     2   1     2.9
4     2   2     2.4

Summary

You now know how to:

extract columns and rows intentionally
choose the right indexing tool for the job
use logical conditions for PMx-style filtering
avoid silent subsetting errors

Indexing is a core skill you’ll use every day in PMx workflows.

Quick Tips

$ is quick; [[ ]] is safer in functions.
[ ] keeps data frames intact.
Logical indexing beats row numbers.
Always think about NA when filtering.

Learning Objectives

Setup

A Simple PMx Dataset

Extracting Columns

Using $ (most common)

Using [[ ]] (programmatic and safe)

Using [ ] (returns a data frame)

Extracting Rows

By position

By column position

Logical Indexing (Essential for PMx)

PMx Examples

1) Subset a single subject

2) Subset a time window

3) Combine conditions

%in% for multiple values

Missing Values in Indexing

Strategies

Practice Problems

Summary

Using `$` (most common)

Using `[[ ]]` (programmatic and safe)

Using `[ ]` (returns a data frame)

`%in%` for multiple values