Skip to content

BenaroyaResearch/VM_brainGutCoaching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VM Brain-Gut Coaching Class (VMBGCC) Analysis

Analysis pipeline for evaluating the Virginia Mason Brain-Gut Coaching Class program — a quality-improvement study examining patient-reported outcomes (IBS-SSS, PHQ-2, GAD-7), healthcare utilization, and thematic patterns from 182 patients across 34 class sessions.

Repository Structure

.
├── code/                   # All analysis scripts
│   ├── run_*.R             # Entry-point wrappers (start here)
│   ├── VMBGCC_*.Rmd        # Core analysis notebooks (literate programming)
│   ├── VMBGCC_*.R          # Auto-generated from Rmd (do not edit directly)
│   └── VMBGCC_functions.R  # Shared utility functions
├── data/
│   ├── inputData/          # Raw Excel data (not tracked in git)
│   └── outputData/         # Cleaned data and analysis results
├── documents/              # Drafts, literature, notes, study documents
└── figures/                # Publication-ready PDF and PNG figures

Pipeline Overview

The analysis is organized into five sequential phases. Each phase has:

  • A run_*.R wrapper — the entry point you should execute. It checks for required upstream outputs and auto-runs dependencies if they are missing.
  • A VMBGCC_*.Rmd notebook — the core analysis code with embedded documentation. These can also be knit interactively in RStudio for an exploratory workflow.
  • A VMBGCC_*.R script — auto-generated by knitr::purl() from the Rmd. Do not edit these directly; they are overwritten each run.

Execution Order

┌──────────────────────────────────────────────────────────────────────┐
│  Phase 1: CLEANING                                                   │
│  run_cleaning.R  →  VMBGCC_cleaning.Rmd                             │
│  Reads raw Excel  →  Produces cleaned RDS/CSV + diagnosis matrix     │
└──────────────────────┬───────────────────────────────────────────────┘
                       │
         ┌─────────────┼─────────────┐
         ▼             ▼             ▼
┌─────────────┐ ┌────────────┐ ┌────────────┐
│  Phase 2:   │ │  Phase 3:  │ │  Phase 4:  │
│ DESCRIPTIVES│ │  OUTCOMES  │ │  THEMATIC  │
│  run_       │ │  run_      │ │  run_      │
│ descriptives│ │ outcomes.R │ │ thematic.R │
│       .R    │ │            │ │            │
└──────┬──────┘ └─────┬──────┘ └─────┬──────┘
       │              │              │
       └──────────────┼──────────────┘
                      ▼
         ┌────────────────────────┐
         │  Phase 5: FIGURES      │
         │  run_figures.R         │
         │  Reads all upstream    │
         │  results → pub-ready   │
         │  figures and tables    │
         └────────────────────────┘

Phases 2–4 are independent of each other and can be run in any order (or in parallel). Phase 5 requires all prior phases to be complete.

Phase Details

Phase Entry Point Core Notebook Reads Produces
1. Cleaning run_cleaning.R VMBGCC_cleaning.Rmd Raw Excel (4 sheets) bgccClean.rds/.csv, thematicCoding.rds, diagnosisOneHot.rds
2. Descriptives run_descriptives.R VMBGCC_descriptives.Rmd bgccClean.rds Table 1, EDA figures, temporal trends
3. Outcomes run_outcomes.R VMBGCC_outcomes.Rmd bgccClean.rds outcomeResults.rds (Wilcoxon tests, effect sizes, mixed models)
4. Thematic run_thematic.R VMBGCC_thematic.Rmd bgccClean.rds, thematicCoding.rds themeResults.rds (prevalence, co-occurrence, micro-goals)
5. Figures run_figures.R VMBGCC_figures.Rmd All of the above 32 publication-ready figures (PDF + PNG)

Quick Start

Full pipeline (from terminal)

cd /path/to/VM_brainGutCoaching
Rscript code/run_figures.R

This single command will auto-detect missing upstream outputs and run the entire pipeline end-to-end (cleaning → descriptives → outcomes → thematic → figures).

Step-by-step (interactive in RStudio)

# 1. Clean raw data
source("code/run_cleaning.R")

# 2-4. Run analyses (any order)
source("code/run_descriptives.R")
source("code/run_outcomes.R")
source("code/run_thematic.R")

# 5. Generate figures (requires 1-4)
source("code/run_figures.R")

Or open the individual .Rmd files in RStudio and knit/run chunks interactively for exploratory work.

Shared Utilities

VMBGCC_functions.R provides functions used across all phases:

Function Purpose
savePlot() Save ggplot objects as PDF and/or PNG at 600 DPI
toNumericSafe() Numeric conversion handling "Unknown" / blank → NA
excelDateToR() Convert Excel serial date numbers to R Date objects
ibsSSSBand() Classify IBS-SSS severity (Remission / Mild / Moderate / Severe)
gad7Band() Classify GAD-7 severity (Minimal / Mild / Moderate / Severe)
phq2Screen() Binary PHQ-2 depression screen (Positive ≥ 3 / Negative)

Debug / Inspection Utilities

Script Purpose
read_excel_sheets.R List all Excel sheets with structure and preview
debug_cleaning.R Troubleshoot data matching and encoding issues
inspect_data_deep.R Check row counts, unique values, summary statistics

Prerequisites

R packages (loaded automatically by each script):

  • Data wrangling: dplyr, tidyr, stringr, forcats
  • Excel I/O: openxlsx
  • Statistics: rstatix, effectsize, lme4, pwr
  • Visualization: ggplot2, patchwork, ggalluvial, UpSetR
  • Tables: gtsummary, gt, flextable

Data

Raw data is stored in data/inputData/ and is not tracked in version control. The source file is a multi-sheet Excel workbook containing patient demographics, pre/post outcome scores, healthcare utilization metrics, thematic coding from consensus review, and a one-hot diagnosis matrix.

All output files in data/outputData/ follow the naming convention VMBGCC.<date>_<description>.<ext>.

About

Virginia Mason Gut-Brain Coaching Class. Investigator: Molly Anderson, DO, PI: Justin Brandler, MD.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages