DeepAI AI Chat
Log In Sign Up

Do semidefinite relaxations solve sparse PCA up to the information limit?

by   Robert Krauthgamer, et al.

Estimating the leading principal components of data, assuming they are sparse, is a central task in modern high-dimensional statistics. Many algorithms were developed for this sparse PCA problem, from simple diagonal thresholding to sophisticated semidefinite programming (SDP) methods. A key theoretical question is under what conditions can such algorithms recover the sparse principal components? We study this question for a single-spike model with an ℓ_0-sparse eigenvector, in the asymptotic regime as dimension p and sample size n both tend to infinity. Amini and Wainwright [Ann. Statist. 37 (2009) 2877-2921] proved that for sparsity levels k≥Ω(n/ p), no algorithm, efficient or not, can reliably recover the sparse eigenvector. In contrast, for k≤ O(√(n/ p)), diagonal thresholding is consistent. It was further conjectured that an SDP approach may close this gap between computational and information limits. We prove that when k≥Ω(√(n)), the proposed SDP approach, at least in its standard usage, cannot recover the sparse spike. In fact, we conjecture that in the single-spike model, no computationally-efficient algorithm can recover a spike of ℓ_0-sparsity k≥Ω(√(n)). Finally, we present empirical results suggesting that up to sparsity levels k=O(√(n)), recovery is possible by a simple covariance thresholding algorithm.


page 1

page 2

page 3

page 4


Sparse PCA via Covariance Thresholding

In sparse principal component analysis we are given noisy observations o...

Spike and slab Bayesian sparse principal component analysis

Sparse principal component analysis (PCA) is a popular tool for dimensio...

Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap

A simple model to study subspace clustering is the high-dimensional k-Ga...

Sparsistency and agnostic inference in sparse PCA

The presence of a sparse "truth" has been a constant assumption in the t...

Signed Support Recovery for Single Index Models in High-Dimensions

In this paper we study the support recovery problem for single index mod...

Subexponential-Time Algorithms for Sparse PCA

We study the computational cost of recovering a unit-norm sparse princip...

Sparse PCA With Multiple Components

Sparse Principal Component Analysis is a cardinal technique for obtainin...