Random Partitioning Forest for Point-Wise and Collective Anomaly Detection – Application to Intrusion Detection

06/29/2020
by   Pierre-François Marteau, et al.
0

In this paper, we propose DiFF-RF, an ensemble approach composed of random partitioning binary trees to detect point-wise and collective (as well as contextual) anomalies. Thanks to a distance-based paradigm used at the leaves of the trees, this semi-supervised approach solves a drawback that has been identified in the isolation forest (IF) algorithm. Moreover, taking into account the frequencies of visits in the leaves of the random trees allows to significantly improve the performance of DiFF-RF when considering the presence of collective anomalies. DiFF-RF is fairly easy to train, and excellent performance can be obtained by using a simple semi-supervised procedure to setup the extra hyper-parameter that is introduced. We first evaluate DiFF-RF on a synthetic data set to i) verify that the limitation of the IF algorithm is overcome, ii) demonstrate how collective anomalies are actually detected and iii) to analyze the effect of the meta-parameters it involves. We assess the DiFF-RF algorithm on a large set of datasets from the UCI repository, as well as two benchmarks related to intrusion detection applications. Our experiments show that DiFF-RF almost systematically outperforms the IF algorithm, but also challenges the one-class SVM baseline and a deep learning variational auto-encoder architecture. Furthermore, our experience shows that DiFF-RF can work well in the presence of small-scale learning data, which is conversely difficult for deep neural architectures. Finally, DiFF-RF is computationally efficient and can be easily parallelized on multi-core architectures.

READ FULL TEXT
research
03/22/2023

Feature Reduction Method Comparison Towards Explainability and Efficiency in Cybersecurity Intrusion Detection Systems

In the realm of cybersecurity, intrusion detection systems (IDS) detect ...
research
06/05/2018

A linear time method for the detection of point and collective anomalies

The challenge of efficiently identifying anomalies in data sequences is ...
research
10/15/2019

Breadth-first, Depth-next Training of Random Forests

In this paper we analyze, evaluate, and improve the performance of train...
research
11/30/2021

TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios

Unsupervised anomaly detection tackles the problem of finding anomalies ...
research
09/04/2019

Subset Multivariate Collective And Point Anomaly Detection

In recent years, there has been a growing interest in identifying anomal...
research
11/08/2021

There is no Double-Descent in Random Forests

Random Forests (RFs) are among the state-of-the-art in machine learning ...

Please sign up or login with your details

Forgot password? Click here to reset