Confidence Intervals for the Number of Components in Factor Analysis and Principal Components Analysis via Subsampling

05/10/2022
by   Chetkar Jha, et al.
0

Factor analysis (FA) and principal component analysis (PCA) are popular statistical methods for summarizing and explaining the variability in multivariate datasets. By default, FA and PCA assume the number of components or factors to be known a priori. However, in practice the users first estimate the number of factors or components and then perform FA and PCA analyses using the point estimate. Therefore, in practice the users ignore any uncertainty in the point estimate of the number of factors or components. For datasets where the uncertainty in the point estimate is not ignorable, it is prudent to perform FA and PCA analyses for the range of positive integer values in the confidence intervals for the number of factors or components. We address this problem by proposing a subsampling-based data-intensive approach for estimating confidence intervals for the number of components in FA and PCA. We study the coverage probability of the proposed confidence intervals and provide non-asymptotic theoretical guarantees concerning the accuracy of the confidence intervals. As a byproduct, we derive the first-order Edgeworth expansion for spiked eigenvalues of the sample covariance matrix when the data matrix is generated under a factor model. We also demonstrate the usefulness of our approach through numerical simulations and by applying our approach for estimating confidence intervals for the number of factors of the genotyping dataset of the Human Genome Diversity Project.

READ FULL TEXT
research
04/15/2021

Rates of Bootstrap Approximation for Eigenvalues in High-Dimensional PCA

In the context of principal components analysis (PCA), the bootstrap is ...
research
12/23/2019

Quantifying the Effects of the 2008 Recession using the Zillow Dataset

This report explores the use of Zillow's housing metrics dataset to inve...
research
09/20/2019

Novel algorithm for confidence sub-contour box estimation: an alternative to traditional confidence intervals

The factor estimation process is a really challenging task for non-linea...
research
09/20/2019

A novel algorithm for confidence sub-contour box estimation: an alternative to traditional confidence intervals

The factor estimation process is a really challenging task for non-linea...
research
07/26/2021

Inference for Heteroskedastic PCA with Missing Data

This paper studies how to construct confidence regions for principal com...
research
07/09/2018

Confidence Intervals for Stochastic Arithmetic

Quantifying errors and losses due to the use of Floating-Point (FP) calc...
research
08/04/2023

Distributional Theory and Statistical Inference for Linear Functions of Eigenvectors with Small Eigengaps

Spectral methods have myriad applications in high-dimensional statistics...

Please sign up or login with your details

Forgot password? Click here to reset