ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value Regularization

07/06/2023
by   Javier Salazar Cavazos, et al.
0

Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction that is useful for various data science problems. However, many applications involve heterogeneous data that varies in quality due to noise characteristics associated with different sources of the data. Methods that deal with this mixed dataset are known as heteroscedastic methods. Current methods like HePPCAT make Gaussian assumptions of the basis coefficients that may not hold in practice. Other methods such as Weighted PCA (WPCA) assume the noise variances are known, which may be difficult to know in practice. This paper develops a PCA method that can estimate the sample-wise noise variances and use this information in the model to improve the estimate of the subspace basis associated with the low-rank structure of the data. This is done without distributional assumptions of the low-rank component and without assuming the noise variances are known. Simulations show the effectiveness of accounting for such heteroscedasticity in the data, the benefits of using such a method with all of the data versus retaining only good data, and comparisons are made against other PCA methods established in the literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at https://github.com/javiersc1/ALPCAH

READ FULL TEXT

page 1

page 4

page 5

research
12/19/2014

Cauchy Principal Component Analysis

Principal Component Analysis (PCA) has wide applications in machine lear...
research
03/13/2018

Principal Component Analysis with Tensor Train Subspace

Tensor train is a hierarchical tensor network structure that helps allev...
research
01/10/2021

HePPCAT: Probabilistic PCA for Data with Heteroscedastic Noise

Principal component analysis (PCA) is a classical and ubiquitous method ...
research
06/22/2023

PCA matrix denoising is uniform

Principal component analysis (PCA) is a simple and popular tool for proc...
research
11/25/2019

Matrix Normal PCA for Interpretable Dimension Reduction and Graphical Noise Modeling

Principal component analysis (PCA) is one of the most widely used dimens...
research
02/25/2019

Logistic principal component analysis via non-convex singular value thresholding

Multivariate binary data is becoming abundant in current biological rese...
research
10/29/2021

Principal Component Pursuit for Pattern Identification in Environmental Mixtures

Environmental health researchers often aim to identify sources/behaviors...

Please sign up or login with your details

Forgot password? Click here to reset