Reading and Writing Data
Big idea: Most PMx errors start at data import.
Controlling how data enters your project is as important as modeling it.
Learning Objectives
By the end of this lesson, you will be able to:
- Read common tabular file formats into R.
- Understand and control column type guessing.
- Write data back to disk reproducibly.
- Distinguish raw, cleaned, and modeling-ready datasets.
- Use project-safe paths so your code runs on any machine.
Setup
library(tidyverse)Key Ideas
Data import is a structural decision.
In PMx workflows:
- The first read defines column types.
- Column types influence joins, summaries, and modeling behavior.
- File paths determine whether your project is reproducible.
- Raw data should never be modified in place.
Importing data is not a mechanical step — it sets the foundation for everything downstream.
A single incorrect column type (e.g., ID read as character instead of numeric) can silently break joins and modeling logic.
Common Data Formats in PMx
You will most often encounter:
- CSV (
.csv) — safest default
- TSV (
.tsv) — tab-separated
- Excel (
.xlsx) — common but fragile
- Text exports from modeling tools
For reproducible workflows, prefer CSV.
Worked Example 1: Reading CSV Files
pk <- read_csv("data/pk_data.csv")read_csv():
- Reads column names from the first row
- Guesses column types
- Returns a tibble
Worked Example 2: Inspect Column Type Guessing
pk <- read_csv("data/pk_data.csv")Common issues:
- IDs read incorrectly
- Dates misinterpreted
- Numeric columns containing
"."or text
Worked Example 3: Explicit Column Types
pk <- read_csv(
"data/pk_data.csv",
col_types = cols(
ID = col_integer(),
TIME = col_double(),
DV = col_double(),
AMT = col_double()
)
)Be explicit when importing sponsor data or uncontrolled exports.
Worked Example 4: Reading TSV Files
pk <- read_tsv("data/pk_data.tsv")Worked Example 5: Reading Excel (If Required)
library(readxl)
pk <- read_excel("data/pk_data.xlsx")Best practice: convert Excel → CSV once, then work from CSV.
Worked Example 6: Writing Data
write_csv(pk, "data/pk_clean.csv")Never overwrite raw data.
Raw vs Clean vs Modeling-Ready
Healthy PMx projects separate:
- Raw data (untouched originals)
- Clean data (corrected types, units, labels)
- Modeling-ready data (event-record format, QC’d)
Clear separation protects reproducibility and auditability.
Worked Example 7: Project-Safe Paths with here()
library(here)
read_csv(here("data", "pk_data.csv"))here() builds paths relative to the project root, making code portable.
Strategies
- Inspect column types immediately after import.
- Be explicit when data quality is uncertain.
- Separate raw, clean, and modeling-ready files.
- Prefer CSV over Excel.
- Use
here()in multi-folder projects. - Document any type overrides you apply.
Common Mistakes
- Trusting automatic type guessing blindly
- Saving cleaned files over raw files
- Hard-coding file paths with
/Users/...
- Mixing Excel and CSV versions of the same file
Practice Problems
- Read a CSV file.
- Display column types.
- Re-read with explicit types.
- Write a cleaned file.
- Read using
here().
library(tidyverse)
library(here)
pk <- read_csv(here("data", "pk_data.csv"))
glimpse(pk)
pk <- read_csv(
here("data", "pk_data.csv"),
col_types = cols(
ID = col_integer(),
TIME = col_double(),
DV = col_double()
)
)
write_csv(pk, here("data", "pk_clean.csv"))Summary
You now know how to:
- Import tabular data safely
- Control column types
- Write reproducible output files
- Avoid fragile file paths
Good data hygiene starts at import.
- Inspect types immediately.
- CSV is safest.
- Never overwrite raw data.
- Use project-relative paths.
- Document any type overrides.