Outlier Detection in High Dimensional Data

09/09/2019
by   Firuz Kamalov, et al.
0

High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on data set of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by the F_1-score. Our method also produces better-than-average execution times compared to the benchmark methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2022

A geometric framework for outlier detection in high-dimensional data

Outlier or anomaly detection is an important task in data analysis. We d...
research
03/03/2021

Detecting Outliers in High-dimensional Data with Mixed Variable Types using Conditional Gaussian Regression Models

Outlier detection has gained increasing interest in recent years, due to...
research
02/08/2020

SUOD: Toward Scalable Unsupervised Outlier Detection

Outlier detection is a key field of machine learning for identifying abn...
research
05/26/2021

An algorithm-based multiple detection influence measure for high dimensional regression using expectile

The identification of influential observations is an important part of d...
research
05/20/2020

Consistent and Flexible Selectivity Estimation for High-dimensional Data

Selectivity estimation aims at estimating the number of database objects...
research
06/12/2023

Kernel Random Projection Depth for Outlier Detection

This paper proposes an extension of Random Projection Depth (RPD) to cop...
research
08/16/2017

Visualizing and Exploring Dynamic High-Dimensional Datasets with LION-tSNE

T-distributed stochastic neighbor embedding (tSNE) is a popular and priz...

Please sign up or login with your details

Forgot password? Click here to reset