Principal Structure Identification: Fast Disentanglement of Multi-source Dataset

by   SeoWon Choi, et al.

Analysis of multi-source data, where data on the same objects are collected from multiple sources, is of rising importance in many fields, e.g., multi-omics biology. Major challenges in multi-source data analysis include heterogeneity among different data sources and the entanglement of their association structure among several groups of variables. Our goal is to disentangle the association structure by identifying shared score subspaces among all or some of data blocks. We propose a sequential algorithm that gathers score subspaces of different data blocks within certain angle threshold and identifies partially-shared score components, using the concept of principal angles between subspaces of different dimensions. Our method shows better performance in identifying the linear association structure than competing methods in this field. In real data analysis, we apply our method to an oncological multi-omics dataset associated with drug responses. The proposed method boasts super-fast computational speed and results in revealing the scores in the estimated shared component showing strong correlations with well-known biological pathways.



page 1

page 2

page 3

page 4


Angle-Based Joint and Individual Variation Explained

Integrative analysis of disparate data blocks measured on a common set o...

Van Trees inequality, group equivariance, and estimation of principal subspaces

We establish non-asymptotic lower bounds for the estimation of principal...

Structural Learning and Integrative Decomposition of Multi-View Data

The increased availability of the multi-view data (data on the same samp...

Subspace Clustering using Ensembles of K-Subspaces

We present a novel approach to the subspace clustering problem that leve...

A Geometric Perspective on the Power of Principal Component Association Tests in Multiple Phenotype Studies

Joint analysis of multiple phenotypes can increase statistical power in ...

Identifying Illicit Drug Dealers on Instagram with Large-scale Multimodal Data Fusion

Illicit drug trafficking via social media sites such as Instagram has be...

RaJIVE: Robust Angle Based JIVE for Integrating Noisy Multi-Source Data

With increasing availability of high dimensional, multi-source data, the...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.