An ℓ_p theory of PCA and spectral clustering

06/24/2020
by   Emmanuel Abbe, et al.
0

Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery of principal components and their associated eigenvalues, there are few precise characterizations of individual principal component scores that yield low-dimensional embedding of samples. That hinders the analysis of various spectral methods. In this paper, we first develop an ℓ_p perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel ℓ_p analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in ℓ_p norm, which includes ℓ_2 and ℓ_∞ as special examples. For sub-Gaussian mixture models, the choice of p giving optimal bounds depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the ℓ_p theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery. These also provide optimal recovery results for Gaussian mixture and stochastic block models as special cases.

READ FULL TEXT
research
02/08/2022

Entrywise Recovery Guarantees for Sparse PCA via Sparsistent Algorithms

Sparse Principal Component Analysis (PCA) is a prevalent tool across a p...
research
02/24/2021

Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

The article introduces an elementary cost and storage reduction method f...
research
05/30/2022

Leave-one-out Singular Subspace Perturbation Analysis for Spectral Clustering

The singular subspaces perturbation theory is of fundamental importance ...
research
11/05/2019

Gaussian Mixture Models for Stochastic Block Models with Non-Vanishing Noise

Community detection tasks have received a lot of attention across statis...
research
02/09/2019

Optimal Latent Representations: Distilling Mutual Information into Principal Pairs

Principal component analysis (PCA) is generalized from one to two random...
research
09/19/2016

Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization

A central problem of random matrix theory is to understand the eigenvalu...
research
07/02/2018

Optimality and Sub-optimality of PCA I: Spiked Random Matrix Models

A central problem of random matrix theory is to understand the eigenvalu...

Please sign up or login with your details

Forgot password? Click here to reset