Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

02/24/2021
by   Romain Couillet, et al.
0

The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis. The method consists in randomly "puncturing" both the data matrix X∈ℂ^p× n (or ℝ^p× n) and its corresponding kernel (Gram) matrix K through Bernoulli masks: S∈{0,1}^p× n for X and B∈{0,1}^n× n for K. The resulting "two-way punctured" kernel is thus given by K=1/p[(X ⊙ S)^ H (X ⊙ S)] ⊙ B. We demonstrate that, for X composed of independent columns drawn from a Gaussian mixture model, as n,p→∞ with p/n→ c_0∈(0,∞), the spectral behavior of K – its limiting eigenvalue distribution, as well as its isolated eigenvalues and eigenvectors – is fully tractable and exhibits a series of counter-intuitive phenomena. We notably prove, and empirically confirm on GAN-generated image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering of PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models.

READ FULL TEXT
research
06/24/2020

An ℓ_p theory of PCA and spectral clustering

Principal Component Analysis (PCA) is a powerful tool in statistics and ...
research
10/05/2013

Role of normalization in spectral clustering for stochastic blockmodels

Spectral clustering is a technique that clusters elements using the top ...
research
01/09/2013

Spectral Clustering Based on Local PCA

We propose a spectral clustering method based on local principal compone...
research
12/18/2020

Upper and Lower Bounds on the Performance of Kernel PCA

Principal Component Analysis (PCA) is a popular method for dimension red...
research
02/05/2021

A simpler spectral approach for clustering in directed networks

We study the task of clustering in directed networks. We show that using...
research
05/03/2015

Kernel Spectral Clustering and applications

In this chapter we review the main literature related to kernel spectral...
research
12/28/2015

Outlier Detection In Large-scale Traffic Data By Naïve Bayes Method and Gaussian Mixture Model Method

It is meaningful to detect outliers in traffic data for traffic manageme...

Please sign up or login with your details

Forgot password? Click here to reset