Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA
This paper is concerned with estimating the column subspace of a low-rank matrix X^⋆∈ℝ^n_1× n_2 from contaminated data. How to obtain optimal statistical accuracy while accommodating the widest range of signal-to-noise ratios (SNRs) becomes particularly challenging in the presence of heteroskedastic noise and unbalanced dimensionality (i.e., n_2≫ n_1). While the state-of-the-art algorithm emerges as a powerful solution for solving this problem, it suffers from "the curse of ill-conditioning," namely, its performance degrades as the condition number of X^⋆ grows. In order to overcome this critical issue without compromising the range of allowable SNRs, we propose a novel algorithm, called , that achieves near-optimal and condition-number-free theoretical guarantees in terms of both ℓ_2 and ℓ_2,∞ statistical accuracy. The proposed algorithm divides the spectrum of X^⋆ into well-conditioned and mutually well-separated subblocks, and applies to conquer each subblock successively. Further, an application of our algorithm and theory to two canonical examples – the factor model and tensor PCA – leads to remarkable improvement for each application.
READ FULL TEXT