Distributional Theory and Statistical Inference for Linear Functions of Eigenvectors with Small Eigengaps
Spectral methods have myriad applications in high-dimensional statistics and data science, and while previous works have primarily focused on ℓ_2 or ℓ_2,∞ eigenvector and singular vector perturbation theory, in many settings these analyses fall short of providing the fine-grained guarantees required for various inferential tasks. In this paper we study statistical inference for linear functions of eigenvectors and principal components with a particular emphasis on the setting where gaps between eigenvalues may be extremely small relative to the corresponding spiked eigenvalue, a regime which has been oft-neglected in the literature. It has been previously established that linear functions of eigenvectors and principal components incur a non-negligible bias, so in this work we provide Berry-Esseen bounds for empirical linear forms and their debiased counterparts respectively in the matrix denoising model and the spiked principal component analysis model, both under Gaussian noise. Next, we propose data-driven estimators for the appropriate bias and variance quantities resulting in approximately valid confidence intervals, and we demonstrate our theoretical results through numerical simulations. We further apply our results to obtain distributional theory and confidence intervals for eigenvector entries, for which debiasing is not necessary. Crucially, our proposed confidence intervals and bias-correction procedures can all be computed directly from data without sample-splitting and are asymptotically valid under minimal assumptions on the eigengap and signal strength. Furthermore, our Berry-Esseen bounds clearly reflect the effects of both signal strength and eigenvalue closeness on the estimation and inference tasks.
READ FULL TEXT