Analysis pipeline for evaluating the Virginia Mason Brain-Gut Coaching Class program — a quality-improvement study examining patient-reported outcomes (IBS-SSS, PHQ-2, GAD-7), healthcare utilization, and thematic patterns from 182 patients across 34 class sessions.
.
├── code/ # All analysis scripts
│ ├── run_*.R # Entry-point wrappers (start here)
│ ├── VMBGCC_*.Rmd # Core analysis notebooks (literate programming)
│ ├── VMBGCC_*.R # Auto-generated from Rmd (do not edit directly)
│ └── VMBGCC_functions.R # Shared utility functions
├── data/
│ ├── inputData/ # Raw Excel data (not tracked in git)
│ └── outputData/ # Cleaned data and analysis results
├── documents/ # Drafts, literature, notes, study documents
└── figures/ # Publication-ready PDF and PNG figures
The analysis is organized into five sequential phases. Each phase has:
- A
run_*.Rwrapper — the entry point you should execute. It checks for required upstream outputs and auto-runs dependencies if they are missing. - A
VMBGCC_*.Rmdnotebook — the core analysis code with embedded documentation. These can also be knit interactively in RStudio for an exploratory workflow. - A
VMBGCC_*.Rscript — auto-generated byknitr::purl()from the Rmd. Do not edit these directly; they are overwritten each run.
┌──────────────────────────────────────────────────────────────────────┐
│ Phase 1: CLEANING │
│ run_cleaning.R → VMBGCC_cleaning.Rmd │
│ Reads raw Excel → Produces cleaned RDS/CSV + diagnosis matrix │
└──────────────────────┬───────────────────────────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌─────────────┐ ┌────────────┐ ┌────────────┐
│ Phase 2: │ │ Phase 3: │ │ Phase 4: │
│ DESCRIPTIVES│ │ OUTCOMES │ │ THEMATIC │
│ run_ │ │ run_ │ │ run_ │
│ descriptives│ │ outcomes.R │ │ thematic.R │
│ .R │ │ │ │ │
└──────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
└──────────────┼──────────────┘
▼
┌────────────────────────┐
│ Phase 5: FIGURES │
│ run_figures.R │
│ Reads all upstream │
│ results → pub-ready │
│ figures and tables │
└────────────────────────┘
Phases 2–4 are independent of each other and can be run in any order (or in parallel). Phase 5 requires all prior phases to be complete.
| Phase | Entry Point | Core Notebook | Reads | Produces |
|---|---|---|---|---|
| 1. Cleaning | run_cleaning.R |
VMBGCC_cleaning.Rmd |
Raw Excel (4 sheets) | bgccClean.rds/.csv, thematicCoding.rds, diagnosisOneHot.rds |
| 2. Descriptives | run_descriptives.R |
VMBGCC_descriptives.Rmd |
bgccClean.rds |
Table 1, EDA figures, temporal trends |
| 3. Outcomes | run_outcomes.R |
VMBGCC_outcomes.Rmd |
bgccClean.rds |
outcomeResults.rds (Wilcoxon tests, effect sizes, mixed models) |
| 4. Thematic | run_thematic.R |
VMBGCC_thematic.Rmd |
bgccClean.rds, thematicCoding.rds |
themeResults.rds (prevalence, co-occurrence, micro-goals) |
| 5. Figures | run_figures.R |
VMBGCC_figures.Rmd |
All of the above | 32 publication-ready figures (PDF + PNG) |
cd /path/to/VM_brainGutCoaching
Rscript code/run_figures.RThis single command will auto-detect missing upstream outputs and run the entire pipeline end-to-end (cleaning → descriptives → outcomes → thematic → figures).
# 1. Clean raw data
source("code/run_cleaning.R")
# 2-4. Run analyses (any order)
source("code/run_descriptives.R")
source("code/run_outcomes.R")
source("code/run_thematic.R")
# 5. Generate figures (requires 1-4)
source("code/run_figures.R")Or open the individual .Rmd files in RStudio and knit/run chunks interactively for exploratory work.
VMBGCC_functions.R provides functions used across all phases:
| Function | Purpose |
|---|---|
savePlot() |
Save ggplot objects as PDF and/or PNG at 600 DPI |
toNumericSafe() |
Numeric conversion handling "Unknown" / blank → NA |
excelDateToR() |
Convert Excel serial date numbers to R Date objects |
ibsSSSBand() |
Classify IBS-SSS severity (Remission / Mild / Moderate / Severe) |
gad7Band() |
Classify GAD-7 severity (Minimal / Mild / Moderate / Severe) |
phq2Screen() |
Binary PHQ-2 depression screen (Positive ≥ 3 / Negative) |
| Script | Purpose |
|---|---|
read_excel_sheets.R |
List all Excel sheets with structure and preview |
debug_cleaning.R |
Troubleshoot data matching and encoding issues |
inspect_data_deep.R |
Check row counts, unique values, summary statistics |
R packages (loaded automatically by each script):
- Data wrangling:
dplyr,tidyr,stringr,forcats - Excel I/O:
openxlsx - Statistics:
rstatix,effectsize,lme4,pwr - Visualization:
ggplot2,patchwork,ggalluvial,UpSetR - Tables:
gtsummary,gt,flextable
Raw data is stored in data/inputData/ and is not tracked in version control. The source file is a multi-sheet Excel workbook containing patient demographics, pre/post outcome scores, healthcare utilization metrics, thematic coding from consensus review, and a one-hot diagnosis matrix.
All output files in data/outputData/ follow the naming convention VMBGCC.<date>_<description>.<ext>.