Decision-forest voting scheme for classification of rare classes in network intrusion detection

07/25/2021
by   Jan Brabec, et al.
0

In this paper, Bayesian based aggregation of decision trees in an ensemble (decision forest) is investigated. The focus is laid on multi-class classification with number of samples significantly skewed toward one of the classes. The algorithm leverages out-of-bag datasets to estimate prediction errors of individual trees, which are then used in accordance with the Bayes rule to refine the decision of the ensemble. The algorithm takes prevalence of individual classes into account and does not require setting of any additional parameters related to class weights or decision-score thresholds. Evaluation is based on publicly available datasets as well as on an proprietary dataset comprising network traffic telemetry from hundreds of enterprise networks with over a million of users overall. The aim is to increase the detection capabilities of an operating malware detection system. While we were able to keep precision of the system higher than 94%, that is only 6 out of 100 detections shown to the network administrator are false alarms, we were able to achieve increase of approximately 7% in the number of detections. The algorithm effectively handles large amounts of data, and can be used in conjunction with most of the state-of-the-art algorithms used to train decision forests.

READ FULL TEXT
research
02/05/2018

Enhancing Multi-Class Classification of Random Forest using Random Vector Functional Neural Network and Oblique Decision Surfaces

Both neural networks and decision trees are popular machine learning met...
research
04/02/2019

An Efficient Network Intrusion Detection System Based on Feature Selection and Ensemble Classifier

Since Internet is so popular and prevailing in human life, countering cy...
research
09/10/2021

Preliminary Wildfire Detection Using State-of-the-art PTZ (Pan, Tilt, Zoom) Camera Technology and Convolutional Neural Networks

Wildfires are uncontrolled fires in the environment that can be caused b...
research
10/23/2018

On PAC-Bayesian Bounds for Random Forests

Existing guarantees in terms of rigorous upper bounds on the generalizat...
research
05/25/2017

Discriminative Metric Learning with Deep Forest

A Discriminative Deep Forest (DisDF) as a metric learning algorithm is p...
research
12/30/2020

A Novel Resampling Technique for Imbalanced Dataset Optimization

Despite the enormous amount of data, particular events of interest can s...
research
03/13/2021

Image Classifiers for Network Intrusions

This research recasts the network attack dataset from UNSW-NB15 as an in...

Please sign up or login with your details

Forgot password? Click here to reset