High dimensional PCA: a new model selection criterion

11/09/2020
by   Abhinav Chakraborty, et al.
0

Given a random sample from a multivariate population, estimating the number of large eigenvalues of the population covariance matrix is an important problem in Statistics with wide applications in many areas. In the context of Principal Component Analysis (PCA), the linear combinations of the original variables having the largest amounts of variation are determined by this number. In this paper, we study the high dimensional asymptotic regime where the number of variables grows at the same rate as the number of observations, and use the spiked covariance model proposed in Johnstone (2001), under which the problem reduces to model selection. Our focus is on the Akaike Information Criterion (AIC) which is known to be strongly consistent from the work of Bai et al. (2018). However, Bai et al. (2018) requires a certain "gap condition" ensuring the dominant eigenvalues to be above a threshold strictly larger than the BBP threshold (Baik et al. (2005), both quantities depending on the limiting ratio of the number of variables and observations. It is well-known that, below the BBP threshold, a spiked covariance structure becomes indistinguishable from one with no spikes. Thus the strong consistency of AIC requires some extra signal strength. In this paper, we investigate whether consistency continues to hold even if the "gap" is made smaller. We show that strong consistency under arbitrarily small gap is achievable if we alter the penalty term of AIC suitably depending on the target gap. Furthermore, another intuitive alteration of the penalty can indeed make the gap exactly zero, although we can only achieve weak consistency in this case. We compare the two newly-proposed estimators with other existing estimators in the literature via extensive simulation studies, and show, by suitably calibrating our proposals, that a significant improvement in terms of mean-squared error is achievable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2020

A generalized information criterion for high-dimensional PCA rank selection

Principal component analysis (PCA) is the most commonly used statistical...
research
10/14/2020

Robust covariance estimation for distributed principal component analysis

Principal component analysis (PCA) is a well-known tool for dimension re...
research
10/18/2017

Edgeworth correction for the largest eigenvalue in a spiked PCA model

We study improved approximations to the distribution of the largest eige...
research
07/24/2023

Consistent model selection in the spiked Wigner model via AIC-type criteria

Consider the spiked Wigner model X = ∑_i = 1^k λ_i u_i u_i^⊤ + ...
research
11/03/2019

Optimal two-stage testing of multiple mediators

Mediation analysis in high-dimensional settings often involves identifyi...
research
10/30/2018

Strong consistency of the AIC, BIC, C_p and KOO methods in high-dimensional multivariate linear regression

Variable selection is essential for improving inference and interpretati...
research
12/14/2021

On the Eigenstructure of Covariance Matrices with Divergent Spikes

For a generalization of Johnstone's spiked model, a covariance matrix wi...

Please sign up or login with your details

Forgot password? Click here to reset