Choosing the number of factors in factor analysis with incomplete data via a hierarchical Bayesian information criterion

04/19/2022
by   Jianhua Zhao, et al.
0

The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size N, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the `complete' sample size N is the same no matter whether in a complete or incomplete data case. For incomplete data, there are often only N_i<N observations for variable i, which means that using the `complete' sample size N implausibly ignores the amounts of missing information inherent in incomplete data. Given this observation, a novel criterion called hierarchical BIC (HBIC) for factor analysis with incomplete data is proposed. The novelty is that it only uses the actual amounts of observed information, namely N_i's, in the penalty term. Theoretically, it is shown that HBIC is a large sample approximation of variational Bayesian (VB) lower bound, and BIC is a further approximation of HBIC, which means that HBIC shares the theoretical consistency of BIC. Experiments on synthetic and real data sets are conducted to access the finite sample performance of HBIC, BIC, and related criteria with various missing rates. The results show that HBIC and BIC perform similarly when the missing rate is small, but HBIC is more accurate when the missing rate is not small.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2020

Determining the Number of Factors in High-dimensional Generalised Latent Factor Models

As a generalisation of the classical linear factor model, generalised la...
research
03/11/2021

A hierarchical Bayesian model to find brain-behaviour associations in incomplete data sets

Canonical Correlation Analysis (CCA) and its regularised versions have b...
research
08/17/2018

Data Consistency Approach to Model Validation

In scientific inference problems, the underlying statistical modeling as...
research
06/23/2022

The Effective Sample Size in Bayesian Information Criterion for Level-Specific Fixed and Random Effects Selection in a Two-Level Nested Model

Popular statistical software provides Bayesian information criterion (BI...
research
01/06/2022

Predictive Criteria for Prior Selection Using Shrinkage in Linear Models

Choosing a shrinkage method can be done by selecting a penalty from a li...
research
08/01/2018

Forest Learning from Data and its Universal Coding

This paper considers structure learning from data with n samples of p va...
research
11/13/2018

What is really needed to justify ignoring the response mechanism for modelling purposes?

With incomplete data, the standard argument for when the response mechan...

Please sign up or login with your details

Forgot password? Click here to reset