D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multiple High-dimensional Datasets

by   Hai Shu, et al.

Modern biomedical studies often collect multiple types of high-dimensional data on a common set of objects. A popular model for the joint analysis of multi-type datasets decomposes each data matrix into a low-rank common-variation matrix generated by latent factors shared across all datasets, a low-rank distinctive-variation matrix corresponding to each dataset, and an additive noise matrix. We propose decomposition-based generalized canonical correlation analysis (D-GCCA), a novel decomposition method that appropriately defines those matrices on the L2 space of random variables, whereas most existing methods are developed on its approximation, the Euclidean dot product space. Moreover to well calibrate common latent factors, we impose a desirable orthogonality constraint on distinctive latent factors. Existing methods inadequately consider such orthogonality and can thus suffer from substantial loss of undetected common variation. Our D-GCCA takes one step further than GCCA by separating common and distinctive variations among canonical variables, and enjoys an appealing interpretation from the perspective of principal component analysis. Consistent estimators of our common-variation and distinctive-variation matrices are established with good finite-sample numerical performance, and have closed-form expressions leading to efficient computation especially for large-scale datasets. The superiority of D-GCCA over state-of-the-art methods is also corroborated in simulations and real-world data examples.


CDPA: Common and Distinctive Pattern Analysis between High-dimensional Datasets

A representative model in integrative analysis of two high-dimensional d...

Low-rank Latent Matrix Factor-Analysis Modeling for Generalized Linear Regression with High-dimensional Imaging Biomarkers

Medical imaging has been recognized as a phenotype associated with vario...

Joint and individual variation explained (JIVE) for integrated analysis of multiple data types

Research in several fields now requires the analysis of data sets in whi...

An Information-theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data

In this paper, we propose an information-theoretic approach to design th...

Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis

Canonical Correlation Analysis (CCA) is a widely used spectral technique...

Push it to the Limit: Discover Edge-Cases in Image Data with Autoencoders

In this paper, we focus on the problem of identifying semantic factors o...

Generalized Simultaneous Component Analysis of Binary and Quantitative data

In the current era of systems biological research there is a need for th...

Please sign up or login with your details

Forgot password? Click here to reset