Dynamic Factor Analysis with Dependent Gaussian Processes for High-Dimensional Gene Expression Trajectories
The increasing availability of high-dimensional, longitudinal measures of genetic expression can facilitate analysis of the biological mechanisms of disease and prediction of future trajectories, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterising such correlation among different pathways through Dependent Gaussian Processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian Sparse Factor Analysis. Compared to previous approaches that model each pathway expression trajectory independently, our model demonstrates better performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated in the simulation study and real data analysis. To fit the model, we propose a Monte Carlo Expectation Maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA (Konzen and others, 2021), which returns the maximum likelihood estimates of DGP parameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. An R package has been developed that implements the proposed approach.
READ FULL TEXT