High-dimensional outlier detection using random projections

05/18/2020
by   P. Navarro-Esteban, et al.
0

There exist multiple methods to detect outliers in multivariate data in the literature, but most of them require to estimate the covariance matrix. The higher the dimension, the more complex the estimation of the matrix becoming impossible in high dimensions. In order to avoid estimating this matrix, we propose a novel random projections-based procedure to detect outliers in Gaussian multivariate data. It consists in projecting the data in several one-dimensional subspaces where an appropriate univariate outlier detection method, similar to Tukey's method but with a threshold depending on the initial dimension and the sample size, is applied. The required number of projections is determined using sequential analysis. Simulated and real datasets illustrate the performance of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/16/2020

Covariance matrix testing in high dimension using random projections

Estimation and hypothesis tests for the covariance matrix in high dimens...
02/16/2015

Random Subspace Learning Approach to High-Dimensional Outliers Detection

We introduce and develop a novel approach to outlier detection based on ...
08/05/2020

Scalable Multiple Changepoint Detection for Functional Data Sequences

We propose the Multiple Changepoint Isolation (MCI) method for detecting...
03/29/2018

The use of fourth order cumulant tensors to detect outlier features modelled by a t-Student copula

In this paper we use multivariate cumulant of order 4 to distinguish bet...
12/23/2020

Matrix optimization based Euclidean embedding with outliers

Euclidean embedding from noisy observations containing outlier errors is...
08/07/2020

Outlier detection in non-elliptical data by kernel MRCD

The minimum regularized covariance determinant method (MRCD) is a robust...
10/12/2019

Real-time outlier detection for large datasets by RT-DetMCD

Modern industrial machines can generate gigabytes of data in seconds, fr...