Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data

11/29/2022
by   Sarah Samorodnitsky, et al.
0

Understanding of the pathophysiology of obstructive lung disease (OLD) is limited by available methods to examine the relationship between multi-omic molecular phenomena and clinical outcomes. Integrative factorization methods for multi-omic data can reveal latent patterns of variation describing important biological signal. However, most methods do not provide a framework for inference on the estimated factorization, simultaneously predict important disease phenotypes or clinical outcomes, nor accommodate multiple imputation. To address these gaps, we propose Bayesian Simultaneous Factorization (BSF). We use conjugate normal priors and show that the posterior mode of this model can be estimated by solving a structured nuclear norm-penalized objective that also achieves rank selection and motivates the choice of hyperparameters. We then extend BSF to simultaneously predict a continuous or binary response, termed Bayesian Simultaneous Factorization and Prediction (BSFP). BSF and BSFP accommodate concurrent imputation and full posterior inference for missing data, including "blockwise" missingness, and BSFP offers prediction of unobserved outcomes. We show via simulation that BSFP is competitive in recovering latent variation structure, as well as the importance of propagating uncertainty from the estimated factorization to prediction. We also study the imputation performance of BSF via simulation under missing-at-random and missing-not-at-random assumptions. Lastly, we use BSFP to predict lung function based on the bronchoalveolar lavage metabolome and proteome from a study of HIV-associated OLD. Our analysis reveals a distinct cluster of patients with OLD driven by shared metabolomic and proteomic expression patterns, as well as multi-omic patterns related to lung function decline. Software is freely available at https://github.com/sarahsamorodnitsky/BSFP .

READ FULL TEXT

page 22

page 23

research
03/05/2022

Remiod: Reference-based Controlled Multiple Imputation of Longitudinal Binary and Ordinal Outcomes with non-ignorable missingness

Missing data on response variables are common in clinical studies. Corre...
research
07/25/2021

Lung Cancer Risk Estimation with Incomplete Data: A Joint Missing Imputation Perspective

Data from multi-modality provide complementary information in clinical p...
research
07/01/2020

Bayesian tensor learning for structural monitoring data imputation and response forecasting

There has been increased interest in missing sensor data imputation, whi...
research
03/21/2022

Survival Analysis for Idiopathic Pulmonary Fibrosis using CT Images and Incomplete Clinical Data

Idiopathic Pulmonary Fibrosis (IPF) is an inexorably progressive fibroti...
research
07/01/2020

Incremental Bayesian tensor learning for structural monitoring data imputation and response forecasting

There has been increased interest in missing sensor data imputation, whi...
research
08/30/2023

Multiple Augmented Reduced Rank Regression for Pan-Cancer Analysis

Statistical approaches that successfully combine multiple datasets are m...
research
08/05/2022

Bayesian predictive modeling of multi-source multi-way data

We develop a Bayesian approach to predict a continuous or binary outcome...

Please sign up or login with your details

Forgot password? Click here to reset