Setting the threshold for high throughput detectors: A mathematical approach for ensembles of dynamic, heterogeneous, probabilistic anomaly detectors

10/25/2017
by   Robert A. Bridges, et al.
0

Anomaly detection (AD) has garnered ample attention in security research, as such algorithms complement existing signature-based methods but promise detection of never-before-seen attacks. Cyber operations manage a high volume of heterogeneous log data; hence, AD in such operations involves multiple (e.g., per IP, per data type) ensembles of detectors modeling heterogeneous characteristics (e.g., rate, size, type) often with adaptive online models producing alerts in near real time. Because of high data volume, setting the threshold for each detector in such a system is an essential yet underdeveloped configuration issue that, if slightly mistuned, can leave the system useless, either producing a myriad of alerts and flooding downstream systems, or giving none. In this work, we build on the foundations of Ferragut et al. to provide a set of rigorous results for understanding the relationship between threshold values and alert quantities, and we propose an algorithm for setting the threshold in practice. Specifically, we give an algorithm for setting the threshold of multiple, heterogeneous, possibly dynamic detectors completely a priori, in principle. Indeed, if the underlying distribution of the incoming data is known (closely estimated), the algorithm provides provably manageable thresholds. If the distribution is unknown (e.g., has changed over time) our analysis reveals how the model distribution differs from the actual distribution, indicating a period of model refitting is necessary. We provide empirical experiments showing the efficacy of the capability by regulating the alert rate of a system with ≈2,500 adaptive detectors scoring over 1.5M events in 5 hours. Further, we demonstrate on the real network data and detection framework of Harshaw et al. the alternative case, showing how the inability to regulate alerts indicates the detection model is a bad fit to the data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2019

Learning Ensembles of Anomaly Detectors on Synthetic Data

The main aim of this work is to develop and implement an automatic anoma...
research
08/21/2023

Adaptive Thresholding Heuristic for KPI Anomaly Detection

A plethora of outlier detectors have been explored in the time series do...
research
05/22/2023

Unsupervised Anomaly Detection with Rejection

Anomaly detection aims at detecting unexpected behaviours in the data. B...
research
01/22/2016

Learning Minimum Volume Sets and Anomaly Detectors from KNN Graphs

We propose a non-parametric anomaly detection algorithm for high dimensi...
research
01/23/2019

Active Anomaly Detection via Ensembles: Insights, Algorithms, and Interpretability

Anomaly detection (AD) task corresponds to identifying the true anomalie...
research
04/25/2018

Robust Anomaly-Based Ship Proposals Detection from Pan-sharpened High-Resolution Satellite Image

Pre-screening of ship proposals is now employed by top ship detectors to...

Please sign up or login with your details

Forgot password? Click here to reset