Reading and Writing Data

Safely read and write tabular data in R using tidyverse tools, with PMx-focused best practices and project-safe file paths.

Tip

Big idea: Most PMx errors start at data import.
Controlling how data enters your project is as important as modeling it.

Learning Objectives

By the end of this lesson, you will be able to:

Read common tabular file formats into R.
Understand and control column type guessing.
Write data back to disk reproducibly.
Distinguish raw, cleaned, and modeling-ready datasets.
Use project-safe paths so your code runs on any machine.

Setup

library(tidyverse)

Key Ideas

Data import is a structural decision.

In PMx workflows:

The first read defines column types.
Column types influence joins, summaries, and modeling behavior.
File paths determine whether your project is reproducible.
Raw data should never be modified in place.

Importing data is not a mechanical step — it sets the foundation for everything downstream.

Warning

A single incorrect column type (e.g., ID read as character instead of numeric) can silently break joins and modeling logic.

Common Data Formats in PMx

You will most often encounter:

CSV (.csv) — safest default
TSV (.tsv) — tab-separated
Excel (.xlsx) — common but fragile
Text exports from modeling tools

For reproducible workflows, prefer CSV.

Worked Example 1: Reading CSV Files

pk <- read_csv("data/pk_data.csv")

read_csv():

Reads column names from the first row
Guesses column types
Returns a tibble

Worked Example 2: Inspect Column Type Guessing

pk <- read_csv("data/pk_data.csv")

Common issues:

IDs read incorrectly
Dates misinterpreted
Numeric columns containing "." or text

Worked Example 3: Explicit Column Types

pk <- read_csv(
  "data/pk_data.csv",
  col_types = cols(
    ID   = col_integer(),
    TIME = col_double(),
    DV   = col_double(),
    AMT  = col_double()
  )
)

Tip

Be explicit when importing sponsor data or uncontrolled exports.

Worked Example 4: Reading TSV Files

pk <- read_tsv("data/pk_data.tsv")

Worked Example 5: Reading Excel (If Required)

library(readxl)
pk <- read_excel("data/pk_data.xlsx")

Best practice: convert Excel → CSV once, then work from CSV.

Worked Example 6: Writing Data

write_csv(pk, "data/pk_clean.csv")

Warning

Never overwrite raw data.

Raw vs Clean vs Modeling-Ready

Healthy PMx projects separate:

Raw data (untouched originals)
Clean data (corrected types, units, labels)
Modeling-ready data (event-record format, QC’d)

Clear separation protects reproducibility and auditability.

Worked Example 7: Project-Safe Paths with `here()`

library(here)
read_csv(here("data", "pk_data.csv"))

Note

here() builds paths relative to the project root, making code portable.

Strategies

Inspect column types immediately after import.
Be explicit when data quality is uncertain.
Separate raw, clean, and modeling-ready files.
Prefer CSV over Excel.
Use here() in multi-folder projects.
Document any type overrides you apply.

Common Mistakes

Trusting automatic type guessing blindly
Saving cleaned files over raw files
Hard-coding file paths with /Users/...
Mixing Excel and CSV versions of the same file

Practice Problems

Read a CSV file.
Display column types.
Re-read with explicit types.
Write a cleaned file.
Read using here().

Step-by-Step Solutions

library(tidyverse)
library(here)

pk <- read_csv(here("data", "pk_data.csv"))
glimpse(pk)

pk <- read_csv(
  here("data", "pk_data.csv"),
  col_types = cols(
    ID = col_integer(),
    TIME = col_double(),
    DV = col_double()
  )
)

write_csv(pk, here("data", "pk_clean.csv"))

Summary

You now know how to:

Import tabular data safely
Control column types
Write reproducible output files
Avoid fragile file paths

Good data hygiene starts at import.

Quick Tips

Inspect types immediately.
CSV is safest.
Never overwrite raw data.
Use project-relative paths.
Document any type overrides.

Learning Objectives

Setup

Key Ideas

Common Data Formats in PMx

Worked Example 1: Reading CSV Files

Worked Example 2: Inspect Column Type Guessing

Worked Example 3: Explicit Column Types

Worked Example 4: Reading TSV Files

Worked Example 5: Reading Excel (If Required)

Worked Example 6: Writing Data

Raw vs Clean vs Modeling-Ready

Worked Example 7: Project-Safe Paths with here()

Strategies

Common Mistakes

Practice Problems

Summary

Worked Example 7: Project-Safe Paths with `here()`