Bayesian Multi-study Factor Analysis for High-throughput Biological Data

06/26/2018
by   Roberta De Vito, et al.
0

This paper presents a new modeling strategy for joint unsupervised analysis of multiple high-throughput biological studies. As in Multi-study Factor Analysis, our goals are to identify both common factors shared across studies and study-specific factors. Our approach is motivated by the growing body of high-throughput studies in biomedical research, as exemplified by the comprehensive set of expression data on breast tumors considered in our case study. To handle high-dimensional studies, we extend Multi-study Factor Analysis using a Bayesian approach that imposes sparsity. Specifically, we generalize the sparse Bayesian infinite factor model to multiple studies. We also devise novel solutions for the identification of the loading matrices: we recover the loading matrices of interest ex-post, by adapting the orthogonal Procrustes approach. Computationally, we propose an efficient and fast Gibbs sampling approach. Through an extensive simulation analysis, we show that the proposed approach performs very well in a range of different scenarios, and outperforms standard Factor analysis in all the scenarios identifying replicable signal in unsupervised genomic applications. The results of our analysis of breast cancer gene expression across seven studies identified replicable gene patterns, clearly related to well-known breast cancer pathways. An R package is implemented and available on GitHub.

READ FULL TEXT

page 19

page 21

research
07/24/2020

Bayesian Combinatorial Multi-Study Factor Analysis

Analyzing multiple studies allows leveraging data from a range of source...
research
11/07/2014

Differential gene co-expression networks via Bayesian biclustering models

Identifying latent structure in large data matrices is essential for exp...
research
05/08/2020

The scalable Birth-Death MCMC Algorithm for Mixed Graphical Model Learning with Application to Genomic Data Integration

Recent advances in biological research have seen the emergence of high-t...
research
08/15/2017

Sparse Inverse Covariance Estimation for High-throughput microRNA Sequencing Data in the Poisson Log-Normal Graphical Model

We introduce the Poisson Log-Normal Graphical Model for count data, and ...
research
06/29/2015

Integrative analysis of gene expression and phenotype data

The linking genotype to phenotype is the fundamental aim of modern genet...
research
10/07/2019

Perturbed factor analysis: Improving generalizability across studies

Factor analysis is routinely used for dimensionality reduction. However,...
research
12/22/2019

Pooled variable scaling for cluster analysis

We propose a new approach for scaling prior to cluster analysis based on...

Please sign up or login with your details

Forgot password? Click here to reset