AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection

06/30/2022
by   Marius Drăgoi, et al.
94

Analyzing the distribution shift of data is a growing research direction in nowadays Machine Learning, leading to emerging new benchmarks that focus on providing a suitable scenario for studying the generalization properties of ML models. The existing benchmarks are focused on supervised learning, and to the best of our knowledge, there is none for unsupervised learning. Therefore, we introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. This kind of data meets the premise of shifting the input distribution: it covers a large time span (10 years), with naturally occurring changes over time (users modifying their behavior patterns, and software updates). We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years. Next, we propose AnoShift, a protocol splitting the data in IID, NEAR, and FAR testing splits. We validate the performance degradation over time with diverse models (MLM to classical Isolation Forest). Finally, we show that by acknowledging the distribution shift problem and properly addressing it, the performance can be improved compared to the classical IID training (by up to 3%, on average). Dataset and code are available at https://github.com/bit-ml/AnoShift/.

READ FULL TEXT

page 6

page 7

research
10/06/2022

Env-Aware Anomaly Detection: Ignore Style Changes, Stay True to Content!

We introduce a formalization and benchmark for the unsupervised anomaly ...
research
04/05/2023

Industrial Anomaly Detection with Domain Shift: A Real-world Dataset and Masked Multi-scale Reconstruction

Industrial anomaly detection (IAD) is crucial for automating industrial ...
research
08/11/2020

BREEDS: Benchmarks for Subpopulation Shift

We develop a methodology for assessing the robustness of models to subpo...
research
08/04/2022

Interpretable Distribution Shift Detection using Optimal Transport

We propose a method to identify and characterize distribution shifts in ...
research
07/31/2021

Online unsupervised Learning for domain shift in COVID-19 CT scan datasets

Neural networks often require large amounts of expert annotated data to ...
research
02/20/2023

Unsupervised Layer-wise Score Aggregation for Textual OOD Detection

Out-of-distribution (OOD) detection is a rapidly growing field due to ne...

Please sign up or login with your details

Forgot password? Click here to reset