Integrated Principal Components Analysis

10/01/2018
by   Tiffany M. Tang, et al.
0

Data integration, or the strategic analysis of multiple sources of data simultaneously, can often lead to discoveries that may be hidden in individualistic analyses of a single data source. We develop a new statistical data integration method named Integrated Principal Components Analysis (iPCA), which is a model-based generalization of PCA and serves as a practical tool to find and visualize common patterns that occur in multiple datasets. The key idea driving iPCA is the matrix-variate normal model, whose Kronecker product covariance structure captures both individual patterns within each dataset and joint patterns shared by multiple datasets. Building upon this model, we develop several penalized (sparse and non-sparse) covariance estimators for iPCA and study their theoretical properties. We show that our sparse iPCA estimator consistently estimates the underlying joint subspace, and using geodesic convexity, we prove that our non-sparse iPCA estimator converges to the global solution of a non-convex problem. We also demonstrate the practical advantages of iPCA through simulations and a case study application to integrative genomics for Alzheimer's Disease. In particular, we show that the joint patterns extracted via iPCA are highly predictive of a patient's cognition and Alzheimer's diagnosis.

READ FULL TEXT
research
11/29/2022

Variable selection and covariance structure identification using loadings

We provide sparse principal loading analysis which is a new concept that...
research
01/31/2018

De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices

Sparse principal component analysis (sPCA) has become one of the most wi...
research
02/26/2013

Convex vs nonconvex approaches for sparse estimation: GLasso, Multiple Kernel Learning and Hyperparameter GLasso

The popular Lasso approach for sparse estimation can be derived via marg...
research
06/08/2015

Stay on path: PCA along graph paths

We introduce a variant of (sparse) PCA in which the set of feasible supp...
research
09/15/2016

Sparse Tensor Graphical Model: Non-convex Optimization and Statistical Inference

We consider the estimation and inference of sparse graphical models that...
research
05/25/2020

Supervised Convex Clustering

Clustering has long been a popular unsupervised learning approach to ide...
research
03/18/2021

Dynamic Kernel Matching for Non-conforming Data: A Case Study of T-cell Receptor Datasets

Most statistical classifiers are designed to find patterns in data where...

Please sign up or login with your details

Forgot password? Click here to reset