The Impact of Discretization Method on the Detection of Six Types of Anomalies in Datasets

08/27/2020
by   Ralph Foorthuis, et al.
0

Anomaly detection is the process of identifying cases, or groups of cases, that are in some way unusual and do not fit the general patterns present in the dataset. Numerous algorithms use discretization of numerical data in their detection processes. This study investigates the effect of the discretization method on the unsupervised detection of each of the six anomaly types acknowledged in a recent typology of data anomalies. To this end, experiments are conducted with various datasets and SECODA, a general-purpose algorithm for unsupervised non-parametric anomaly detection in datasets with numerical and categorical attributes. This algorithm employs discretization of continuous attributes, exponentially increasing weights and discretization cut points, and a pruning heuristic to detect anomalies with an optimal number of iterations. The results demonstrate that standard SECODA can detect all six types, but that different discretization methods favor the discovery of certain anomaly types. The main findings also hold for other detection techniques using discretization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2020

SECODA: Segmentation- and Combination-Based Detection of Anomalies

This study introduces SECODA, a novel general-purpose unsupervised non-p...
research
07/04/2021

A Typology of Data Anomalies

Anomalies are cases that are in some way unusual and do not appear to fi...
research
07/30/2020

On the Nature and Types of Anomalies: A Review

Anomalies are occurrences in a dataset that are in some way unusual and ...
research
01/13/2021

A Non-Parametric Subspace Analysis Approach with Application to Anomaly Detection Ensembles

Identifying anomalies in multi-dimensional datasets is an important task...
research
10/17/2011

Multi-criteria Anomaly Detection using Pareto Depth Analysis

We consider the problem of identifying patterns in a data set that exhib...
research
04/03/2020

Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

In this work, we apply anomaly detection to source code and bytecode to ...
research
10/25/2022

Unsupervised Anomaly Detection for Auditing Data and Impact of Categorical Encodings

In this paper, we introduce the Vehicle Claims dataset, consisting of fr...

Please sign up or login with your details

Forgot password? Click here to reset