Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data

04/15/2020
by   Georg Steinbuss, et al.
0

Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instance with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to include synthetic data in benchmarks for outlier detection. This might be due to the imprecise notion of outliers or to the difficulty to arrive at a good coverage of different domains with synthetic data. In this work we propose a generic process for the generation of data sets for such benchmarking. The core idea is to reconstruct regular instances from existing real-world benchmark data while generating outliers so that they exhibit insightful characteristics. This allows both for a good coverage of domains and for helpful interpretations of results. We also describe three instantiations of the generic process that generate outliers with specific characteristics, like local outliers. A benchmark with state-of-the-art detection methods confirms that our generic process is indeed practical.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2023

Boundary Peeling: Outlier Detection Method Using One-Class Peeling

Unsupervised outlier detection constitutes a crucial phase within data a...
research
06/05/2020

Generating Artificial Outliers in the Absence of Genuine Ones – a Survey

By definition, outliers are rarely observed in reality, making them diff...
research
07/16/2020

Re-weighting and 1-Point RANSAC-Based PnP Solution to Handle Outliers

The ability to handle outliers is essential for performing the perspecti...
research
10/19/2019

Efficient Discovery of Meaningful Outlier Relationships

We propose PODS (Predictable Outliers in Data-trendS), a method that, gi...
research
03/12/2020

A Multi-criteria Approach for Fast and Outlier-aware Representative Selection from Manifolds

The problem of representative selection amounts to sampling few informat...
research
05/19/2022

Identifying outliers in astronomical images with unsupervised machine learning

Astronomical outliers, such as unusual, rare or unknown types of astrono...
research
02/17/2019

A feature-based framework for detecting technical outliers in water-quality data from in situ sensors

Outliers due to technical errors in water-quality data from in situ sens...

Please sign up or login with your details

Forgot password? Click here to reset