Binding Data

Learn how to stack and combine tables safely using bind_rows() and bind_cols() in PMx workflows.

Tip

What you’ll build today: reliable patterns for stacking datasets and combining aligned tables without using relational joins.

Learning Objectives

By the end of this lesson, you will be able to:

Use bind_rows() to stack compatible datasets vertically.
Use bind_cols() to attach tables side-by-side safely.
Understand how binding differs conceptually from joining.
Detect structural mismatches before binding.
Apply binding safely in common PMx workflows (multi-cohort, batch outputs, simulations).

Setup

library(tidyverse)

Example Datasets

Cohort 1

cohort1 <- tibble::tribble(
  ~ID, ~TIME, ~DV,
    1,  0.5,  2.1,
    1,  1.0,  3.8
)

cohort1

# A tibble: 2 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8

Cohort 2

cohort2 <- tibble::tribble(
  ~ID, ~TIME, ~DV,
    2,  0.5,  1.6,
    2,  1.0,  2.9
)

cohort2

# A tibble: 2 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     2   0.5   1.6
2     2   1     2.9

Key Ideas

Binding is different from joining.

Joining matches rows using keys.
Binding simply stacks or attaches tables.

Use binding when:

Datasets have the same structure.
You want to append new records.
You are combining batches or cohorts.

Warning

Binding does not check logical consistency — only structural compatibility. Always verify column names and types first.

Worked Example 1: Stacking Cohorts with bind_rows()

pk_all <- bind_rows(cohort1, cohort2)
pk_all

# A tibble: 4 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     2   0.5   1.6
4     2   1     2.9

Add a cohort label:

cohort1_labeled <- cohort1 %>% mutate(COHORT = "A")
cohort2_labeled <- cohort2 %>% mutate(COHORT = "B")

pk_all_labeled <- bind_rows(cohort1_labeled, cohort2_labeled)
pk_all_labeled

# A tibble: 4 × 4
     ID  TIME    DV COHORT
  <dbl> <dbl> <dbl> <chr> 
1     1   0.5   2.1 A     
2     1   1     3.8 A     
3     2   0.5   1.6 B     
4     2   1     2.9 B

Worked Example 2: Handling Missing Columns

If one table has an extra column:

cohort3 <- tibble::tribble(
  ~ID, ~TIME, ~DV, ~ARM,
    3,  0.5,  2.5, "Test"
)

bind_rows(cohort1, cohort3)

# A tibble: 3 × 4
     ID  TIME    DV ARM  
  <dbl> <dbl> <dbl> <chr>
1     1   0.5   2.1 <NA> 
2     1   1     3.8 <NA> 
3     3   0.5   2.5 Test

Missing columns are filled with NA.

Worked Example 3: Combining Columns with bind_cols()

demographics <- tibble::tribble(
  ~WT, ~SEX,
   72, "F",
   72, "F",
   88, "M",
   88, "M"
)

bind_cols(pk_all, demographics)

# A tibble: 4 × 5
     ID  TIME    DV    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5   2.1    72 F    
2     1   1     3.8    72 F    
3     2   0.5   1.6    88 M    
4     2   1     2.9    88 M

Warning

bind_cols() attaches by row position — not by ID. Only use when rows are guaranteed aligned.

Binding vs Joining (Conceptual Contrast)

Operation	Purpose
join()	Match rows by keys
bind_rows()	Stack tables vertically
bind_cols()	Attach columns by position

If you need matching by ID, use a join — not bind_cols().

Strategies

Use colnames() and glimpse() before binding.
Ensure column order and types match.
Prefer bind_rows() for stacking similar datasets.
Use bind_cols() only when rows are already aligned.
Add identifiers (e.g., cohort labels) before stacking.

Common Mistakes

Assuming bind_rows() will match rows by ID (it does not — it simply stacks datasets in order).
Using bind_cols() when datasets are not perfectly aligned, leading to incorrect row-level combinations.
Ignoring differences in column names (e.g., TIME vs Time), which creates unintended new columns filled with NA.
Overlooking mismatched column types (e.g., numeric vs character), which can silently coerce values and cause downstream issues.
Forgetting to add identifiers (like COHORT) before stacking, making it impossible to trace data origin later.
Assuming missing columns indicate an error, rather than understanding that bind_rows() fills them with NA by design.
Binding datasets without checking structure first (glimpse(), colnames()), leading to subtle data integrity problems.
Treating binding as interchangeable with joins — instead of recognizing that joins are required when matching by keys.

Practice Problems

Stack cohort1 and cohort2 using bind_rows().
Add a COHORT label before stacking.
Create a table with an extra column and observe how bind_rows() handles it.
Create a second table with two rows and attach it using bind_cols().
Explain why bind_cols() can be dangerous in PMx workflows.

Step-by-Step Solutions

# 1
bind_rows(cohort1, cohort2)

# A tibble: 4 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     2   0.5   1.6
4     2   1     2.9

# 2
bind_rows(
  cohort1 %>% mutate(COHORT = "A"),
  cohort2 %>% mutate(COHORT = "B")
)

# A tibble: 4 × 4
     ID  TIME    DV COHORT
  <dbl> <dbl> <dbl> <chr> 
1     1   0.5   2.1 A     
2     1   1     3.8 A     
3     2   0.5   1.6 B     
4     2   1     2.9 B

# 3
cohort_extra <- tibble(ID = 3, TIME = 1.5, DV = 2.7, ARM = "Test")
bind_rows(cohort1, cohort_extra)

# A tibble: 3 × 4
     ID  TIME    DV ARM  
  <dbl> <dbl> <dbl> <chr>
1     1   0.5   2.1 <NA> 
2     1   1     3.8 <NA> 
3     3   1.5   2.7 Test

# 4
extra_cols <- tibble(WT = c(72, 88, 65, 70), SEX = c("F", "M", "F", "M"))
bind_cols(pk_all, extra_cols)

# A tibble: 4 × 5
     ID  TIME    DV    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5   2.1    72 F    
2     1   1     3.8    88 M    
3     2   0.5   1.6    65 F    
4     2   1     2.9    70 M

# 5
# bind_cols() does not match by ID; it attaches by row order only.

Summary

You now know how to:

Stack compatible datasets using bind_rows().
Attach aligned tables using bind_cols().
Distinguish binding from relational joins.
Detect structural issues before combining tables.

Binding is simple — but simplicity can hide mistakes. Use it intentionally.

Quick Tips

Use bind_rows() for stacking.
Add identifiers before stacking cohorts.
Use bind_cols() only when row order is guaranteed aligned.
Binding does not match on keys — joins do.