Reading and Writing Data

Safely read and write tabular data in R using tidyverse tools, with PMx-focused best practices and project-safe file paths.
Tip

Big idea: Most PMx errors start at data import.
Controlling how data enters your project is as important as modeling it.

Learning Objectives

By the end of this lesson, you will be able to:

  • Read common tabular file formats into R.
  • Understand and control column type guessing.
  • Write data back to disk reproducibly.
  • Distinguish raw, cleaned, and modeling-ready datasets.
  • Use project-safe paths so your code runs on any machine.

Setup

library(tidyverse)

Key Ideas

Data import is a structural decision.

In PMx workflows:

  • The first read defines column types.
  • Column types influence joins, summaries, and modeling behavior.
  • File paths determine whether your project is reproducible.
  • Raw data should never be modified in place.

Importing data is not a mechanical step — it sets the foundation for everything downstream.

Warning

A single incorrect column type (e.g., ID read as character instead of numeric) can silently break joins and modeling logic.


Common Data Formats in PMx

You will most often encounter:

  • CSV (.csv) — safest default
  • TSV (.tsv) — tab-separated
  • Excel (.xlsx) — common but fragile
  • Text exports from modeling tools

For reproducible workflows, prefer CSV.


Worked Example 1: Reading CSV Files

pk <- read_csv("data/pk_data.csv")

read_csv():

  • Reads column names from the first row
  • Guesses column types
  • Returns a tibble

Worked Example 2: Inspect Column Type Guessing

pk <- read_csv("data/pk_data.csv")

Common issues:

  • IDs read incorrectly
  • Dates misinterpreted
  • Numeric columns containing "." or text

Worked Example 3: Explicit Column Types

pk <- read_csv(
  "data/pk_data.csv",
  col_types = cols(
    ID   = col_integer(),
    TIME = col_double(),
    DV   = col_double(),
    AMT  = col_double()
  )
)
Tip

Be explicit when importing sponsor data or uncontrolled exports.


Worked Example 4: Reading TSV Files

pk <- read_tsv("data/pk_data.tsv")

Worked Example 5: Reading Excel (If Required)

library(readxl)
pk <- read_excel("data/pk_data.xlsx")

Best practice: convert Excel → CSV once, then work from CSV.


Worked Example 6: Writing Data

write_csv(pk, "data/pk_clean.csv")
Warning

Never overwrite raw data.


Raw vs Clean vs Modeling-Ready

Healthy PMx projects separate:

  • Raw data (untouched originals)
  • Clean data (corrected types, units, labels)
  • Modeling-ready data (event-record format, QC’d)

Clear separation protects reproducibility and auditability.


Worked Example 7: Project-Safe Paths with here()

library(here)
read_csv(here("data", "pk_data.csv"))
Note

here() builds paths relative to the project root, making code portable.


Strategies

  • Inspect column types immediately after import.
  • Be explicit when data quality is uncertain.
  • Separate raw, clean, and modeling-ready files.
  • Prefer CSV over Excel.
  • Use here() in multi-folder projects.
  • Document any type overrides you apply.

Common Mistakes

  • Trusting automatic type guessing blindly
  • Saving cleaned files over raw files
  • Hard-coding file paths with /Users/...
  • Mixing Excel and CSV versions of the same file

Practice Problems

  1. Read a CSV file.
  2. Display column types.
  3. Re-read with explicit types.
  4. Write a cleaned file.
  5. Read using here().

library(tidyverse)
library(here)

pk <- read_csv(here("data", "pk_data.csv"))
glimpse(pk)

pk <- read_csv(
  here("data", "pk_data.csv"),
  col_types = cols(
    ID = col_integer(),
    TIME = col_double(),
    DV = col_double()
  )
)

write_csv(pk, here("data", "pk_clean.csv"))

Summary

You now know how to:

  • Import tabular data safely
  • Control column types
  • Write reproducible output files
  • Avoid fragile file paths

Good data hygiene starts at import.


  • Inspect types immediately.
  • CSV is safest.
  • Never overwrite raw data.
  • Use project-relative paths.
  • Document any type overrides.