Optimality and Sub-optimality of PCA I: Spiked Random Matrix Models

07/02/2018
by   Amelia Perry, et al.
0

A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, introduced by Johnstone, in which a prominent eigenvector (or "spike") is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Peche showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the spike strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, under structural assumptions on the spike, not all information is necessarily contained in the spectrum. We study the statistical limits of tests for the presence of a spike, including non-spectral tests. Our results leverage Le Cam's notion of contiguity, and include: i) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for certain natural priors for the spike. ii) For any non-Gaussian Wigner ensemble, PCA is sub-optimal for detection. However, an efficient variant of PCA achieves the optimal threshold (for natural priors) by pre-transforming the matrix entries. iii) For the Gaussian Wishart ensemble, the PCA threshold is optimal for positive spikes (for natural priors) but this is not always the case for negative spikes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2016

Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization

A central problem of random matrix theory is to understand the eigenvalu...
research
06/25/2018

Fundamental limits of detection in the spiked Wigner model

We study the fundamental limits of detecting the presence of an additive...
research
04/28/2021

Detection of Signal in the Spiked Rectangular Models

We consider the problem of detecting signals in the rank-one signal-plus...
research
01/12/2023

Detection problems in the spiked matrix models

We study the statistical decision process of detecting the low-rank sign...
research
06/24/2020

An ℓ_p theory of PCA and spectral clustering

Principal Component Analysis (PCA) is a powerful tool in statistics and ...
research
02/20/2018

Detection limits in the high-dimensional spiked rectangular model

We study the problem of detecting the presence of a single unknown spike...
research
02/24/2015

Phase Transitions for High Dimensional Clustering and Related Problems

Consider a two-class clustering problem where we observe X_i = ℓ_i μ + Z...

Please sign up or login with your details

Forgot password? Click here to reset