Principal Structure Identification: Fast Disentanglement of Multi-source Dataset

03/26/2022
by   SeoWon Choi, et al.
0

Analysis of multi-source data, where data on the same objects are collected from multiple sources, is of rising importance in many fields, e.g., multi-omics biology. Major challenges in multi-source data analysis include heterogeneity among different data sources and the entanglement of their association structure among several groups of variables. Our goal is to disentangle the association structure by identifying shared score subspaces among all or some of data blocks. We propose a sequential algorithm that gathers score subspaces of different data blocks within certain angle threshold and identifies partially-shared score components, using the concept of principal angles between subspaces of different dimensions. Our method shows better performance in identifying the linear association structure than competing methods in this field. In real data analysis, we apply our method to an oncological multi-omics dataset associated with drug responses. The proposed method boasts super-fast computational speed and results in revealing the scores in the estimated shared component showing strong correlations with well-known biological pathways.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2022

Data Integration Via Analysis of Subspaces (DIVAS)

Modern data collection in many data paradigms, including bioinformatics,...
research
04/07/2017

Angle-Based Joint and Individual Variation Explained

Integrative analysis of disparate data blocks measured on a common set o...
research
07/19/2021

Van Trees inequality, group equivariance, and estimation of principal subspaces

We establish non-asymptotic lower bounds for the estimation of principal...
research
07/20/2017

Structural Learning and Integrative Decomposition of Multi-View Data

The increased availability of the multi-view data (data on the same samp...
research
10/28/2017

A Geometric Perspective on the Power of Principal Component Association Tests in Multiple Phenotype Studies

Joint analysis of multiple phenotypes can increase statistical power in ...
research
09/05/2022

A geometric framework for asymptotic inference of principal subspaces in PCA

In this article, we develop an asymptotic method for testing hypothesis ...
research
08/09/2022

Asymmetric Metrics on the Full Grassmannian of Subspaces of Different Dimensions

Metrics on Grassmannians have a wide array of applications: machine lear...

Please sign up or login with your details

Forgot password? Click here to reset