A Generic Machine Learning Framework for Fully-Unsupervised Anomaly Detection with Contaminated Data

by   Markus Ulmer, et al.

Anomaly detection (AD) tasks have been solved using machine learning algorithms in various domains and applications. The great majority of these algorithms use normal data to train a residual-based model, and assign anomaly scores to unseen samples based on their dissimilarity with the learned normal regime. The underlying assumption of these approaches is that anomaly-free data is available for training. This is, however, often not the case in real-world operational settings, where the training data may be contaminated with a certain fraction of abnormal samples. Training with contaminated data, in turn, inevitably leads to a deteriorated AD performance of the residual-based algorithms. In this paper we introduce a framework for a fully unsupervised refinement of contaminated training data for AD tasks. The framework is generic and can be applied to any residual-based machine learning model. We demonstrate the application of the framework to two public datasets of multivariate time series machine data from different application fields. We show its clear superiority over the naive approach of training with contaminated data without refinement. Moreover, we compare it to the ideal, unrealistic reference in which anomaly-free data would be available for training. Since the approach exploits information from the anomalies, and not only from the normal regime, it is comparable and often outperforms the ideal baseline as well.


page 1

page 2

page 3

page 4


Self-Trained One-class Classification for Unsupervised Anomaly Detection

Anomaly detection (AD), separating anomalies from normal data, has vario...

Multivariate Time Series Anomaly Detection with Few Positive Samples

Given the scarcity of anomalies in real-world applications, the majority...

Domain-Generalized Textured Surface Anomaly Detection

Anomaly detection aims to identify abnormal data that deviates from the ...

Multivariate Time-Series Anomaly Detection with Contaminated Data: Application to Physiological Signals

Mainstream unsupervised anomaly detection algorithms often excel in acad...

What is Wrong with One-Class Anomaly Detection?

From a safety perspective, a machine learning method embedded in real-wo...

Holistic Features For Real-Time Crowd Behaviour Anomaly Detection

This paper presents a new approach to crowd behaviour anomaly detection ...

Anomaly Detection with Test Time Augmentation and Consistency Evaluation

Deep neural networks are known to be vulnerable to unseen data: they may...

Please sign up or login with your details

Forgot password? Click here to reset