library(tidyverse)Binding Data
Learn how to stack and combine tables safely using bind_rows() and bind_cols() in PMx workflows.
Tip
What you’ll build today: reliable patterns for stacking datasets and combining aligned tables without using relational joins.
Learning Objectives
By the end of this lesson, you will be able to:
- Use
bind_rows()to stack compatible datasets vertically. - Use
bind_cols()to attach tables side-by-side safely. - Understand how binding differs conceptually from joining.
- Detect structural mismatches before binding.
- Apply binding safely in common PMx workflows (multi-cohort, batch outputs, simulations).
Setup
Example Datasets
Cohort 1
cohort1 <- tibble::tribble(
~ID, ~TIME, ~DV,
1, 0.5, 2.1,
1, 1.0, 3.8
)
cohort1# A tibble: 2 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 1 3.8
Cohort 2
cohort2 <- tibble::tribble(
~ID, ~TIME, ~DV,
2, 0.5, 1.6,
2, 1.0, 2.9
)
cohort2# A tibble: 2 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 2 0.5 1.6
2 2 1 2.9
Key Ideas
Binding is different from joining.
- Joining matches rows using keys.
- Binding simply stacks or attaches tables.
Use binding when:
- Datasets have the same structure.
- You want to append new records.
- You are combining batches or cohorts.
Warning
Binding does not check logical consistency — only structural compatibility. Always verify column names and types first.
Worked Example 1: Stacking Cohorts with bind_rows()
pk_all <- bind_rows(cohort1, cohort2)
pk_all# A tibble: 4 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 1 3.8
3 2 0.5 1.6
4 2 1 2.9
Add a cohort label:
cohort1_labeled <- cohort1 %>% mutate(COHORT = "A")
cohort2_labeled <- cohort2 %>% mutate(COHORT = "B")
pk_all_labeled <- bind_rows(cohort1_labeled, cohort2_labeled)
pk_all_labeled# A tibble: 4 × 4
ID TIME DV COHORT
<dbl> <dbl> <dbl> <chr>
1 1 0.5 2.1 A
2 1 1 3.8 A
3 2 0.5 1.6 B
4 2 1 2.9 B
Worked Example 2: Handling Missing Columns
If one table has an extra column:
cohort3 <- tibble::tribble(
~ID, ~TIME, ~DV, ~ARM,
3, 0.5, 2.5, "Test"
)
bind_rows(cohort1, cohort3)# A tibble: 3 × 4
ID TIME DV ARM
<dbl> <dbl> <dbl> <chr>
1 1 0.5 2.1 <NA>
2 1 1 3.8 <NA>
3 3 0.5 2.5 Test
Missing columns are filled with NA.
Worked Example 3: Combining Columns with bind_cols()
demographics <- tibble::tribble(
~WT, ~SEX,
72, "F",
72, "F",
88, "M",
88, "M"
)
bind_cols(pk_all, demographics)# A tibble: 4 × 5
ID TIME DV WT SEX
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 0.5 2.1 72 F
2 1 1 3.8 72 F
3 2 0.5 1.6 88 M
4 2 1 2.9 88 M
Warning
bind_cols() attaches by row position — not by ID. Only use when rows are guaranteed aligned.
Binding vs Joining (Conceptual Contrast)
| Operation | Purpose |
|---|---|
| join() | Match rows by keys |
| bind_rows() | Stack tables vertically |
| bind_cols() | Attach columns by position |
If you need matching by ID, use a join — not bind_cols().
Strategies
- Use
colnames()andglimpse()before binding. - Ensure column order and types match.
- Prefer
bind_rows()for stacking similar datasets. - Use
bind_cols()only when rows are already aligned. - Add identifiers (e.g., cohort labels) before stacking.
Common Mistakes
- Assuming
bind_rows()will match rows by ID (it does not — it simply stacks datasets in order). - Using
bind_cols()when datasets are not perfectly aligned, leading to incorrect row-level combinations. - Ignoring differences in column names (e.g.,
TIMEvsTime), which creates unintended new columns filled withNA. - Overlooking mismatched column types (e.g., numeric vs character), which can silently coerce values and cause downstream issues.
- Forgetting to add identifiers (like
COHORT) before stacking, making it impossible to trace data origin later. - Assuming missing columns indicate an error, rather than understanding that
bind_rows()fills them withNAby design. - Binding datasets without checking structure first (
glimpse(),colnames()), leading to subtle data integrity problems. - Treating binding as interchangeable with joins — instead of recognizing that joins are required when matching by keys.
Practice Problems
- Stack
cohort1andcohort2usingbind_rows(). - Add a
COHORTlabel before stacking. - Create a table with an extra column and observe how
bind_rows()handles it. - Create a second table with two rows and attach it using
bind_cols(). - Explain why
bind_cols()can be dangerous in PMx workflows.
TipStep-by-Step Solutions
# 1
bind_rows(cohort1, cohort2)# A tibble: 4 × 3
ID TIME DV
<dbl> <dbl> <dbl>
1 1 0.5 2.1
2 1 1 3.8
3 2 0.5 1.6
4 2 1 2.9
# 2
bind_rows(
cohort1 %>% mutate(COHORT = "A"),
cohort2 %>% mutate(COHORT = "B")
)# A tibble: 4 × 4
ID TIME DV COHORT
<dbl> <dbl> <dbl> <chr>
1 1 0.5 2.1 A
2 1 1 3.8 A
3 2 0.5 1.6 B
4 2 1 2.9 B
# 3
cohort_extra <- tibble(ID = 3, TIME = 1.5, DV = 2.7, ARM = "Test")
bind_rows(cohort1, cohort_extra)# A tibble: 3 × 4
ID TIME DV ARM
<dbl> <dbl> <dbl> <chr>
1 1 0.5 2.1 <NA>
2 1 1 3.8 <NA>
3 3 1.5 2.7 Test
# 4
extra_cols <- tibble(WT = c(72, 88, 65, 70), SEX = c("F", "M", "F", "M"))
bind_cols(pk_all, extra_cols)# A tibble: 4 × 5
ID TIME DV WT SEX
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 0.5 2.1 72 F
2 1 1 3.8 88 M
3 2 0.5 1.6 65 F
4 2 1 2.9 70 M
# 5
# bind_cols() does not match by ID; it attaches by row order only.Summary
You now know how to:
- Stack compatible datasets using
bind_rows(). - Attach aligned tables using
bind_cols(). - Distinguish binding from relational joins.
- Detect structural issues before combining tables.
Binding is simple — but simplicity can hide mistakes. Use it intentionally.
TipQuick Tips
- Use
bind_rows()for stacking. - Add identifiers before stacking cohorts.
- Use
bind_cols()only when row order is guaranteed aligned. - Binding does not match on keys — joins do.