This repository is an intensive expansion and refinement of the curriculum presented in "R for Data Science (2e)" by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund.
While the source text provides the foundational "how-to," this project implements a "production-ready" layer—transforming instructional exercises into hardened, defensive, and accessible data engineering pipelines.
Every module in this repository is built to exceed standard instructional benchmarks through four "Refinement Pillars":
- Modern Syntax (Native Pipe
|>): Global transition frommagrittrto native R piping to reduce overhead and future-proof the codebase. - Universal Design (Accessibility): All visualizations are engineered using CVD-safe palettes (modified Okabe-Ito) and high-contrast typography (#000000).
- Defensive Programming: Implementation of explicit data-integrity checks (NA filtering) and automated filesystem verification (
dir.exists) to prevent silent failures. - Reproducibility Hardening: Strict "No-Persistence" workspace policy to ensure all results are generated programmatically from raw state.
- Objective: Expanding the
palmerpenguinsanalysis. - Key Improvement: Implemented disaggregated regression layers to address Simpson’s Paradox and automated CSV serialization logic.
- Objective: Establishing Operational Standards.
- Key Improvement: Hard-coded IDE configurations for native piping and established a POSIX-compliant naming architecture.
- Objective: Engineering high-integrity metrics from
nycflights13. - Key Improvement: Developed a normalized "Time Recovery Index" and a defensive pipeline for carrier-level performance auditing.
- Language: R 4.x+
- Framework: Tidyverse (Extended)
- Standards: MIT License with full attribution to original authors.
Lead Engineer: Nate Vavrock
Source: R for Data Science, 2nd Edition