Binding Data

Learn how to stack and combine tables safely using bind_rows() and bind_cols() in PMx workflows.
Tip

What you’ll build today: reliable patterns for stacking datasets and combining aligned tables without using relational joins.

Learning Objectives

By the end of this lesson, you will be able to:

  • Use bind_rows() to stack compatible datasets vertically.
  • Use bind_cols() to attach tables side-by-side safely.
  • Understand how binding differs conceptually from joining.
  • Detect structural mismatches before binding.
  • Apply binding safely in common PMx workflows (multi-cohort, batch outputs, simulations).

Setup

library(tidyverse)

Example Datasets

Cohort 1

cohort1 <- tibble::tribble(
  ~ID, ~TIME, ~DV,
    1,  0.5,  2.1,
    1,  1.0,  3.8
)

cohort1
# A tibble: 2 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8

Cohort 2

cohort2 <- tibble::tribble(
  ~ID, ~TIME, ~DV,
    2,  0.5,  1.6,
    2,  1.0,  2.9
)

cohort2
# A tibble: 2 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     2   0.5   1.6
2     2   1     2.9

Key Ideas

Binding is different from joining.

  • Joining matches rows using keys.
  • Binding simply stacks or attaches tables.

Use binding when:

  • Datasets have the same structure.
  • You want to append new records.
  • You are combining batches or cohorts.
Warning

Binding does not check logical consistency — only structural compatibility. Always verify column names and types first.


Worked Example 1: Stacking Cohorts with bind_rows()

pk_all <- bind_rows(cohort1, cohort2)
pk_all
# A tibble: 4 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     2   0.5   1.6
4     2   1     2.9

Add a cohort label:

cohort1_labeled <- cohort1 %>% mutate(COHORT = "A")
cohort2_labeled <- cohort2 %>% mutate(COHORT = "B")

pk_all_labeled <- bind_rows(cohort1_labeled, cohort2_labeled)
pk_all_labeled
# A tibble: 4 × 4
     ID  TIME    DV COHORT
  <dbl> <dbl> <dbl> <chr> 
1     1   0.5   2.1 A     
2     1   1     3.8 A     
3     2   0.5   1.6 B     
4     2   1     2.9 B     

Worked Example 2: Handling Missing Columns

If one table has an extra column:

cohort3 <- tibble::tribble(
  ~ID, ~TIME, ~DV, ~ARM,
    3,  0.5,  2.5, "Test"
)

bind_rows(cohort1, cohort3)
# A tibble: 3 × 4
     ID  TIME    DV ARM  
  <dbl> <dbl> <dbl> <chr>
1     1   0.5   2.1 <NA> 
2     1   1     3.8 <NA> 
3     3   0.5   2.5 Test 

Missing columns are filled with NA.


Worked Example 3: Combining Columns with bind_cols()

demographics <- tibble::tribble(
  ~WT, ~SEX,
   72, "F",
   72, "F",
   88, "M",
   88, "M"
)

bind_cols(pk_all, demographics)
# A tibble: 4 × 5
     ID  TIME    DV    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5   2.1    72 F    
2     1   1     3.8    72 F    
3     2   0.5   1.6    88 M    
4     2   1     2.9    88 M    
Warning

bind_cols() attaches by row position — not by ID. Only use when rows are guaranteed aligned.


Binding vs Joining (Conceptual Contrast)

Operation Purpose
join() Match rows by keys
bind_rows() Stack tables vertically
bind_cols() Attach columns by position

If you need matching by ID, use a join — not bind_cols().


Strategies

  • Use colnames() and glimpse() before binding.
  • Ensure column order and types match.
  • Prefer bind_rows() for stacking similar datasets.
  • Use bind_cols() only when rows are already aligned.
  • Add identifiers (e.g., cohort labels) before stacking.

Common Mistakes

  • Assuming bind_rows() will match rows by ID (it does not — it simply stacks datasets in order).
  • Using bind_cols() when datasets are not perfectly aligned, leading to incorrect row-level combinations.
  • Ignoring differences in column names (e.g., TIME vs Time), which creates unintended new columns filled with NA.
  • Overlooking mismatched column types (e.g., numeric vs character), which can silently coerce values and cause downstream issues.
  • Forgetting to add identifiers (like COHORT) before stacking, making it impossible to trace data origin later.
  • Assuming missing columns indicate an error, rather than understanding that bind_rows() fills them with NA by design.
  • Binding datasets without checking structure first (glimpse(), colnames()), leading to subtle data integrity problems.
  • Treating binding as interchangeable with joins — instead of recognizing that joins are required when matching by keys.

Practice Problems

  1. Stack cohort1 and cohort2 using bind_rows().
  2. Add a COHORT label before stacking.
  3. Create a table with an extra column and observe how bind_rows() handles it.
  4. Create a second table with two rows and attach it using bind_cols().
  5. Explain why bind_cols() can be dangerous in PMx workflows.

# 1
bind_rows(cohort1, cohort2)
# A tibble: 4 × 3
     ID  TIME    DV
  <dbl> <dbl> <dbl>
1     1   0.5   2.1
2     1   1     3.8
3     2   0.5   1.6
4     2   1     2.9
# 2
bind_rows(
  cohort1 %>% mutate(COHORT = "A"),
  cohort2 %>% mutate(COHORT = "B")
)
# A tibble: 4 × 4
     ID  TIME    DV COHORT
  <dbl> <dbl> <dbl> <chr> 
1     1   0.5   2.1 A     
2     1   1     3.8 A     
3     2   0.5   1.6 B     
4     2   1     2.9 B     
# 3
cohort_extra <- tibble(ID = 3, TIME = 1.5, DV = 2.7, ARM = "Test")
bind_rows(cohort1, cohort_extra)
# A tibble: 3 × 4
     ID  TIME    DV ARM  
  <dbl> <dbl> <dbl> <chr>
1     1   0.5   2.1 <NA> 
2     1   1     3.8 <NA> 
3     3   1.5   2.7 Test 
# 4
extra_cols <- tibble(WT = c(72, 88, 65, 70), SEX = c("F", "M", "F", "M"))
bind_cols(pk_all, extra_cols)
# A tibble: 4 × 5
     ID  TIME    DV    WT SEX  
  <dbl> <dbl> <dbl> <dbl> <chr>
1     1   0.5   2.1    72 F    
2     1   1     3.8    88 M    
3     2   0.5   1.6    65 F    
4     2   1     2.9    70 M    
# 5
# bind_cols() does not match by ID; it attaches by row order only.

Summary

You now know how to:

  • Stack compatible datasets using bind_rows().
  • Attach aligned tables using bind_cols().
  • Distinguish binding from relational joins.
  • Detect structural issues before combining tables.

Binding is simple — but simplicity can hide mistakes. Use it intentionally.


  • Use bind_rows() for stacking.
  • Add identifiers before stacking cohorts.
  • Use bind_cols() only when row order is guaranteed aligned.
  • Binding does not match on keys — joins do.