Efficient Detection of Botnet Traffic by features selection and Decision Trees

06/30/2021
by   Javier Velasco-Mata, et al.
0

Botnets are one of the online threats with the biggest presence, causing billionaire losses to global economies. Nowadays, the increasing number of devices connected to the Internet makes it necessary to analyze large amounts of network traffic data. In this work, we focus on increasing the performance on botnet traffic classification by selecting those features that further increase the detection rate. For this purpose we use two feature selection techniques, Information Gain and Gini Importance, which led to three pre-selected subsets of five, six and seven features. Then, we evaluate the three feature subsets along with three models, Decision Tree, Random Forest and k-Nearest Neighbors. To test the performance of the three feature vectors and the three models we generate two datasets based on the CTU-13 dataset, namely QB-CTU13 and EQB-CTU13. We measure the performance as the macro averaged F1 score over the computational time required to classify a sample. The results show that the highest performance is achieved by Decision Trees using a five feature set which obtained a mean F1 score of 85 average time of 0.78 microseconds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2012

Feature Selection via Regularized Trees

We propose a tree regularization framework, which enables many tree mode...
research
03/24/2014

Non-uniform Feature Sampling for Decision Tree Ensembles

We study the effectiveness of non-uniform randomized feature selection i...
research
07/22/2020

To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features

Automatic identification of mutiword expressions (MWEs) is a pre-requisi...
research
07/02/2019

Danish Stance Classification and Rumour Resolution

The Internet is rife with flourishing rumours that spread through microb...
research
12/29/2022

On the utility of feature selection in building two-tier decision trees

Nowadays, feature selection is frequently used in machine learning when ...
research
09/16/2022

Exploring the Whole Rashomon Set of Sparse Decision Trees

In any given machine learning problem, there may be many models that cou...

Please sign up or login with your details

Forgot password? Click here to reset