DeepAI AI Chat
Log In Sign Up

Principal Structure Identification: Fast Disentanglement of Multi-source Dataset

by   SeoWon Choi, et al.

Analysis of multi-source data, where data on the same objects are collected from multiple sources, is of rising importance in many fields, e.g., multi-omics biology. Major challenges in multi-source data analysis include heterogeneity among different data sources and the entanglement of their association structure among several groups of variables. Our goal is to disentangle the association structure by identifying shared score subspaces among all or some of data blocks. We propose a sequential algorithm that gathers score subspaces of different data blocks within certain angle threshold and identifies partially-shared score components, using the concept of principal angles between subspaces of different dimensions. Our method shows better performance in identifying the linear association structure than competing methods in this field. In real data analysis, we apply our method to an oncological multi-omics dataset associated with drug responses. The proposed method boasts super-fast computational speed and results in revealing the scores in the estimated shared component showing strong correlations with well-known biological pathways.


page 1

page 2

page 3

page 4


Data Integration Via Analysis of Subspaces (DIVAS)

Modern data collection in many data paradigms, including bioinformatics,...

Angle-Based Joint and Individual Variation Explained

Integrative analysis of disparate data blocks measured on a common set o...

Van Trees inequality, group equivariance, and estimation of principal subspaces

We establish non-asymptotic lower bounds for the estimation of principal...

Structural Learning and Integrative Decomposition of Multi-View Data

The increased availability of the multi-view data (data on the same samp...

A Geometric Perspective on the Power of Principal Component Association Tests in Multiple Phenotype Studies

Joint analysis of multiple phenotypes can increase statistical power in ...

A geometric framework for asymptotic inference of principal subspaces in PCA

In this article, we develop an asymptotic method for testing hypothesis ...

Asymmetric Metrics on the Full Grassmannian of Subspaces of Different Dimensions

Metrics on Grassmannians have a wide array of applications: machine lear...