Automatic dimensionality selection for principal component analysis models with the ignorance score

02/08/2019
by   Stefania Russo, et al.
0

Principal component analysis (PCA) is by far the most widespread tool for unsupervised learning with high-dimensional data sets. Its application is popularly studied for the purpose of exploratory data analysis and online process monitoring. Unfortunately, fine-tuning PCA models and particularly the number of components remains a challenging task. Today, this selection is often based on a combination of guiding principles, experience, and process understanding. Unlike the case of regression, where cross-validation of the prediction error is a widespread and trusted approach for model selection, there are no tools for PCA model selection which reach this level of acceptance. In this work, we address this challenge and evaluate the utility of the cross-validated ignorance score with both simulated and experimental data sets. Application of this method is based on the interpretation of PCA as a density model, as in probabilistic principal component analysis, and is shown to be a valuable tool to identify an optimal number of principal components.

READ FULL TEXT
research
11/06/2022

Cauchy robust principal component analysis with applications to high-deimensional data sets

Principal component analysis (PCA) is a standard dimensionality reductio...
research
04/03/2012

Validation of nonlinear PCA

Linear principal component analysis (PCA) can be extended to a nonlinear...
research
05/19/2016

Bayesian Variable Selection for Globally Sparse Probabilistic PCA

Sparse versions of principal component analysis (PCA) have imposed thems...
research
03/08/2017

Exact Dimensionality Selection for Bayesian PCA

We present a Bayesian model selection approach to estimate the intrinsic...
research
10/07/2021

AgFlow: Fast Model Selection of Penalized PCA via Implicit Regularization Effects of Gradient Flow

Principal component analysis (PCA) has been widely used as an effective ...
research
10/12/2015

Towards Meaningful Maps of Polish Case Law

In this work, we analyze the utility of two dimensional document maps fo...
research
03/13/2020

A Wide Dataset of Ear Shapes and Pinna-Related Transfer Functions Generated by Random Ear Drawings

Head-related transfer functions (HRTFs) individualization is a key matte...

Please sign up or login with your details

Forgot password? Click here to reset