Anomaly Detection in High Dimensional Data

The HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in high-dimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation that deviates markedly from the majority with a large distance gap. An approach based on extreme value theory is used for the anomalous threshold calculation. Using various synthetic and real datasets, we demonstrate the wide applicability and usefulness of our algorithm, which we call the stray algorithm. We also demonstrate how this algorithm can assist in detecting anomalies present in other data structures using feature engineering. We show the situations where the stray algorithm outperforms the HDoutliers algorithm both in accuracy and computational time. This framework is implemented in the open source R package stray.

READ FULL TEXT

page 6

page 12

page 14

page 16

page 19

page 22

page 23

page 24

research
09/28/2021

Anomaly Detection for High-Dimensional Data Using Large Deviations Principle

Most current anomaly detection methods suffer from the curse of dimensio...
research
12/05/2019

Transfer Learning from an Auxiliary Discriminative Task for Unsupervised Anomaly Detection

Unsupervised anomaly detection from high dimensional data like mobility ...
research
05/25/2020

Factor Analysis of Mixed Data for Anomaly Detection

Anomaly detection aims to identify observations that deviate from the ty...
research
05/30/2021

CSCAD: Correlation Structure-based Collective Anomaly Detection in Complex System

Detecting anomalies in large complex systems is a critical and challengi...
research
07/17/2019

A Multivariate Extreme Value Theory Approach to Anomaly Clustering and Visualization

In a wide variety of situations, anomalies in the behaviour of a complex...
research
12/04/2019

Copula-based anomaly scoring and localization for large-scale, high-dimensional continuous data

The anomaly detection method presented by this paper has a special featu...
research
09/24/2015

High Dimensional Data Modeling Techniques for Detection of Chemical Plumes and Anomalies in Hyperspectral Images and Movies

We briefly review recent progress in techniques for modeling and analyzi...

Please sign up or login with your details

Forgot password? Click here to reset