A Hierarchical Spike-and-Slab Model for Pan-Cancer Survival Using Pan-Omic Data
Pan-omics, pan-cancer analysis has advanced our understanding of the molecular heterogeneity of cancer, expanding what was known from single-cancer or single-omics studies. However, pan-cancer, pan-omics analyses have been limited in their ability to use information from multiple sources of data (e.g., omics platforms) and multiple sample sets (e.g., cancer types) to predict important clinical outcomes, like overall survival. We address the issue of prediction across multiple high-dimensional sources of data and multiple sample sets by using exploratory results from BIDIFAC+, a method for integrative dimension reduction of bidimensionally-linked matrices, in a predictive model. We apply a Bayesian hierarchical model that performs variable selection using spike-and-slab priors which are modified to allow for the borrowing of information across clustered data. This method is used to predict overall patient survival from the Cancer Genome Atlas (TCGA) using data from 29 cancer types and 4 omics sources. Our model selected patterns of variation identified by BIDIFAC+ that differentiate clinical tumor subtypes with markedly different survival outcomes. We also use simulations to evaluate the performance of the modified spike-and-slab prior in terms of its variable selection accuracy and prediction accuracy under different underlying data-generating frameworks. Software and code used for our analysis can be found at https://github.com/sarahsamorodnitsky/HierarchicalSS_PanCanPanOmics/ .
READ FULL TEXT