MacroPCA: An all-in-one PCA method allowing for missing values as well as cellwise and rowwise outliers

by   Mia Hubert, et al.

Multivariate data are typically represented by a rectangular matrix (table) in which the rows are the objects (cases) and the columns are the variables (measurements). When there are many variables one often reduces the dimension by principal component analysis (PCA), which in its basic form is not robust to outliers. Much research has focused on handling rowwise outliers, i.e. rows that deviate from the majority of the rows in the data (for instance, they might belong to a different population). In recent years also cellwise outliers are receiving attention. These are suspicious cells (entries) that can occur anywhere in the table. Even a relatively small proportion of outlying cells can contaminate over half the rows, which causes rowwise robust methods to break down. In this paper a new PCA method is constructed which combines the strengths of two existing robust methods in order to be robust against both cellwise and rowwise outliers. At the same time, the algorithm can cope with missing values. As of yet it is the only PCA method that can deal with all three problems simultaneously. Its name MacroPCA stands for PCA allowing for Missings And Cellwise & Rowwise Outliers. Several simulations and real data sets illustrate its robustness. New residual maps are introduced, which help to determine which variables are responsible for the outlying behavior. The method is well-suited for online process control.


page 4

page 12

page 17

page 22

page 25


Robust Principal Component Analysis Using Statistical Estimators

Principal Component Analysis (PCA) finds a linear mapping and maximizes ...

Robust Multivariate Estimation Based On Statistical Data Depth Filters

In the classical contamination models, such as the gross-error (Huber an...

Robust self-tuning semiparametric PCA for contaminated elliptical distribution

Principal component analysis (PCA) is one of the most popular dimension ...

Robust PCA for High Dimensional Data based on Characteristic Transformation

In this paper, we propose a novel robust Principal Component Analysis (P...

Robust Principal Component Analysis for Compositional Tables

A data table which is arranged according to two factors can often be con...

A robust method based on LOVO functions for solving least squares problems

The robust adjustment of nonlinear models to data is considered in this ...

robROSE: A robust approach for dealing with imbalanced data in fraud detection

A major challenge when trying to detect fraud is that the fraudulent act...

Please sign up or login with your details

Forgot password? Click here to reset