Computationally efficient sparse clustering

05/21/2020
by   Matthias Löffler, et al.
0

We study statistical and computational limits of clustering when the means of the centres are sparse and their dimension is possibly much larger than the sample size. Our theoretical analysis focuses on the simple model X_i = z_i θ + ε_i, z_i ∈{-1,1}, ε_i 𝒩(0, I), which has two clusters with centres θ and -θ. We provide a finite sample analysis of a new sparse clustering algorithm based on sparse PCA and show that it achieves the minimax optimal misclustering rate in the regime θ→∞, matching asymptotically the Bayes error. Our results require the sparsity to grow slower than the square root of the sample size. Using a recent framework for computational lower bounds—the low-degree likelihood ratio—we give evidence that this condition is necessary for any polynomial-time clustering algorithm to succeed below the BBP threshold. This complements existing evidence based on reductions and statistical query lower bounds. Compared to these existing results, we cover a wider set of parameter regimes and give a more precise understanding of the runtime required and the misclustering error achievable. We also discuss extensions of our results to more than two clusters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2022

Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity

Tensor PCA is a stylized statistical inference problem introduced by Mon...
research
06/19/2015

Representation Learning for Clustering: A Statistical Framework

We address the problem of communicating domain knowledge from a user to ...
research
07/08/2013

The blessing of transitivity in sparse and stochastic networks

The interaction between transitivity and sparsity, two common features i...
research
04/03/2013

Computational Lower Bounds for Sparse PCA

In the context of sparse principal component detection, we bring evidenc...
research
12/07/2021

Lattice-Based Methods Surpass Sum-of-Squares in Clustering

Clustering is a fundamental primitive in unsupervised learning which giv...
research
08/30/2023

Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials

Graphon estimation has been one of the most fundamental problems in netw...
research
02/06/2014

Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices

We consider two closely related problems: planted clustering and submatr...

Please sign up or login with your details

Forgot password? Click here to reset