OMOPCancerTherapy is an R package for defining and analyzing cancer and therapy cohorts from an OMOP CDM database. It allows users to generate cohorts based on cancer and therapy concept sets, calculate cohort counts, and produce standardized results including therapy-cancer overlaps and age-stratified analysis.
This package was used for the analysis of "Critical medicines utilization; cancer regristry-based insights to guide policy and investments in a changing world"
Download the package and install using the following command. Ensure that the file path matches the place where the downloaded package is saved.
install.packages("path/to/OMOPCancerTherapy", repos = NULL, type = "source")If you want to make changes to the package, install it as a development version (see Development). This way, changes are immediately incorporated. Otherwise you have to reinstall the package each time.
# Install devtools if not already installed
install.packages("devtools")
# Install package from local folder
devtools::install("path/to/OMOPCancerTherapy")OMOP_cancer_therapy/
├── DESCRIPTION
├── NAMESPACE
├── R/
│ ├── cohort_analysis_cancer.R # analysis of therapy prevalence per cancer type
│ ├── cohort_analysis_therapy.R # analysis of cancer prevalence per therapy
│ ├── cohort_creation.R # creation of cohorts based on CSV input files
│ ├── helpers.R # helper functions
│ ├── launch_shiny_app.R # launching of the shiny app
│ ├── run.R # main function to run the full pipeline
│ ├── sql_generation.R # generation custom sql queries, used for cohort creation
│ └── summarize_top_therapies.R # summary overview of the top therapies used
├── README.md
├── inst
│ ├── cohorts
│ │ ├── csv # Input CSVs for cancer and therapy concept sets
│ │ ├── json # json files with cohort definitions
│ │ ├── settings # cohort names and numbers
│ │ └── sql # sql files with cohort definitions
│ ├── shiny_app # shiny app definitions
Run the full package with default settings
library(OMOPCancerTherapy)
OMOPCancerTherapy::run()The run pipeline will do the following:
- Read the concept CSV files included in the package
- Generate cohort definitions and SQL. Cohorts are created with a default
cohort_start_date = '2022-01-01'and acohort_end_date = '2024-12-31'for cancer incidence andcohort_start_date = '2022-01-01'and acohort_end_date = '2025-09-30'(Cohort_end_date + 9 months) for medicine use. - Create and populate cohort tables in your results schema in your database (default:
cohortTable = "cancer_therapy_cohorts") - Run the analysis to get cohort counts and save output files into an output folder in the current working directory (default:
cwd/Results). - Generate a summary of the top 3 therapies and saves output file in an output folder in the current working directory (default:
Results). - Launches a Shiny app to visualize the results
For help on how to use the package or on how to modify parameters, call ?run() for the main function (or ?function_name for the other functions)
Default database settings:
The default settings assume that database variables are available in your Renviron file using the following variable names. If you need to change this,
then edit the below section and run this before calling OMOPCancerTherapy::run(connectionDetails=connectionDetails)
connectionDetails <- DatabaseConnector::createConnectionDetails(
dbms = Sys.getenv("DBMS"),
server = Sys.getenv("DB_SERVER"),
port = Sys.getenv("DB_PORT"),
user = if (Sys.getenv("DB_USER") == "") NULL else Sys.getenv("DB_USER"),
password = if (Sys.getenv("DB_PASSWORD") == "") NULL else Sys.getenv("DB_PASSWORD"),
pathToDriver = if (Sys.getenv("PATH_TO_DRIVER") == "") NULL else Sys.getenv("PATH_TO_DRIVER")
)Similarly, the CDM schema name and results schema name are also assumed to be stored in your .Renviron file. Please change these if they are stored
differently and then call OMOPCancerTherapy::run(cdmDatabaseSchema = cdmDatabaseSchema, resultsDatabaseSchema = resultsDatabaseSchema)
cdmDatabaseSchema = Sys.getenv("CDM_SCHEMA")
resultsDatabaseSchema = Sys.getenv("RESULTS_SCHEMA")
Disclaimer: The package has only been tested on a PostgreSQL database.
The main outputs saved in the Results/ folder are:
- results_therapyPrevalencePerCancer.csv: Detailed therapy x cancer cohort table with age stratification, counts, percentages, and censored counts < 5.
- results_cancerPrevalencePerTherapy.csv: Detailed therapy x cancer cohort table with age stratification, counts, percentages, and censored counts < 5.
- top_therapies_summary.csv: Top 3 therapies for adults and children, with number of patients, percentage of patients, and the list of cancer types treated.
The results can be visualized using the Shiny app. The data is available in table format and as a bar graph.
Age groups:
- All
- Children
- Adults
Therapies multiple options can be selected from the dropdown menu.
Cancer type multiple options can be selected from the dropdown menu. Default is set to "All Cancers", which is a joint total of all cancers together.
Metrics can be set to percentage of patients or absolute patient counts. For relative comparisons, percentages are better.
Exclude zero values is by default ticked so that therapies and cancer type with a patient count or percentage of 0 will be excluded. This makes the overview a bit easier to read. In the future the app can be extended to visualize multiple databases and then it would be better to untick this box.
Therapy prevalence per cancer
Cancer prevalence per therapy
To modify or extend the package:
- Make changes in the R/ scripts (e.g., add helper functions, new analysis features).
- Re-generate documentation with roxygen2:
install.packages("roxygen2")
library(roxygen2)
roxygen2::roxygenise("path/to/OMOPCancerTherapy")- Use devtools::load_all("path/to/OMOPCancerTherapy") to load the latest changes
- Re-install the package locally to test changes.
This project is licensed under the MIT License.


