Ensemble Classifier Design Tuned to Dataset Characteristics for Network Intrusion Detection

05/08/2022
by   Zeinab Zoghi, et al.
0

Machine Learning-based supervised approaches require highly customized and fine-tuned methodologies to deliver outstanding performance. This paper presents a dataset-driven design and performance evaluation of a machine learning classifier for the network intrusion dataset UNSW-NB15. Analysis of the dataset suggests that it suffers from class representation imbalance and class overlap in the feature space. We employed ensemble methods using Balanced Bagging (BB), eXtreme Gradient Boosting (XGBoost), and Random Forest empowered by Hellinger Distance Decision Tree (RF-HDDT). BB and XGBoost are tuned to handle the imbalanced data, and Random Forest (RF) classifier is supplemented by the Hellinger metric to address the imbalance issue. Two new algorithms are proposed to address the class overlap issue in the dataset. These two algorithms are leveraged to help improve the performance of the testing dataset by modifying the final classification decision made by three base classifiers as part of the ensemble classifier which employs a majority vote combiner. The proposed design is evaluated for both binary and multi-category classification. Comparing the proposed model to those reported on the same dataset in the literature demonstrate that the proposed model outperforms others by a significant margin for both binary and multi-category classification cases.

READ FULL TEXT

page 5

page 23

research
01/13/2021

UNSW-NB15 Computer Security Dataset: Analysis through Visualization

This paper presents a visual analysis of the UNSW-NB25 computer network ...
research
02/28/2023

Testing the performance of Multi-class IDS public dataset using Supervised Machine Learning Algorithms

Machine learning, statistical-based, and knowledge-based methods are oft...
research
09/28/2017

Introducing DeepBalance: Random Deep Belief Network Ensembles to Address Class Imbalance

Class imbalance problems manifest in domains such as financial fraud det...
research
09/14/2020

Beyond Accuracy: ROI-driven Data Analytics of Empirical Data

This vision paper demonstrates that it is crucial to consider Return-on-...
research
06/30/2023

Application of data engineering approaches to address challenges in microbiome data for optimal medical decision-making

The human gut microbiota is known to contribute to numerous physiologica...
research
04/01/2022

Building Decision Forest via Deep Reinforcement Learning

Ensemble learning methods whose base classifier is a decision tree usual...

Please sign up or login with your details

Forgot password? Click here to reset