On the efficiency-loss free ordering-robustness of product-PCA

02/22/2023
by   Hung Hung, et al.
0

This article studies the robustness of the eigenvalue ordering, an important issue when estimating the leading eigen-subspace by principal component analysis (PCA). In Yata and Aoshima (2010), cross-data-matrix PCA (CDM-PCA) was proposed and shown to have smaller bias than PCA in estimating eigenvalues. While CDM-PCA has the potential to achieve better estimation of the leading eigen-subspace than the usual PCA, its robustness is not well recognized. In this article, we first develop a more stable variant of CDM-PCA, which we call product-PCA (PPCA), that provides a more convenient formulation for theoretical investigation. Secondly, we prove that, in the presence of outliers, PPCA is more robust than PCA in maintaining the correct ordering of leading eigenvalues. The robustness gain in PPCA comes from the random data partition, and it does not rely on a data down-weighting scheme as most robust statistical methods do. This enables us to establish the surprising finding that, when there are no outliers, PPCA and PCA share the same asymptotic distribution. That is, the robustness gain of PPCA in estimating the leading eigen-subspace has no efficiency loss in comparison with PCA. Simulation studies and a face data example are presented to show the merits of PPCA. In conclusion, PPCA has a good potential to replace the role of the usual PCA in real applications whether outliers are present or not.

READ FULL TEXT

page 22

page 23

research
06/08/2022

Robust self-tuning semiparametric PCA for contaminated elliptical distribution

Principal component analysis (PCA) is one of the most popular dimension ...
research
08/07/2020

Modal Principal Component Analysis

Principal component analysis (PCA) is a widely used method for data proc...
research
10/12/2016

Towards a Theoretical Analysis of PCA for Heteroscedastic Data

Principal Component Analysis (PCA) is a method for estimating a subspace...
research
07/30/2022

Investigation of robustness and numerical stability of multiple regression and PCA in modeling world development data

Popular methods for modeling data both labelled and unlabeled, multiple ...
research
11/26/2017

Robust PCA and Robust Subspace Tracking

Principal Components Analysis (PCA) is one of the most widely used dimen...
research
04/13/2018

Fast, Parameter free Outlier Identification for Robust PCA

Robust PCA, the problem of PCA in the presence of outliers has been exte...
research
06/17/2021

Pre-treatment of outliers and anomalies in plant data: Methodology and case study of a Vacuum Distillation Unit

Data pre-treatment plays a significant role in improving data quality, t...

Please sign up or login with your details

Forgot password? Click here to reset