An Anomaly-based Multi-class Classifier for Network Intrusion Detection

Network intrusion detection systems (NIDS) are one of several solutions that make up a computer security system. They are responsible for inspecting network traffic and triggering alerts when detecting intrusion attempts. One of the most popular approaches in NIDS research today is the Anomaly-based technique, characterized by the ability to recognize previously unobserved attacks. Some A-NIDS systems go beyond the separation into normal and anomalous classes by trying to identify the type of detected anomalies. This is an important capability of a security system, as it allows a more effective response to an intrusion attempt. The existing systems with this ability are often subject to limitations such as high complexity and incorrect labeling of unknown attacks. In this work, we propose an algorithm to be used in NIDS that overcomes these limitations. Our proposal is an adaptation of the Anomaly-based classifier EFC to perform multi-class classification. It has a single layer, with low temporal complexity, and can correctly classify not only the known attacks, but also unprecedented attacks. Our proposal was evaluated in two up-to-date flow-based intrusion detection datasets: CIDDS-001 and CICIDS2017. We also conducted a specific experiment to assess our classifier's ability to correctly label unknown attacks. Our results show that the multi-class EFC is a promising classifier to be used in NIDS.



There are no comments yet.


page 1

page 9

page 10


Machine Learning Applications in Misuse and Anomaly Detection

Machine learning and data mining algorithms play important roles in desi...

A new method for flow-based network intrusion detection using inverse statistical physics

Network Intrusion Detection Systems (NIDS) play an important role as too...

Training a Bidirectional GAN-based One-Class Classifier for Network Intrusion Detection

The network intrusion detection task is challenging because of the imbal...

Experimental Review of Neural-based approaches for Network Intrusion Management

The use of Machine Learning (ML) techniques in Intrusion Detection Syste...

ToLeRating UR-STD

A new emerging paradigm of Uncertain Risk of Suspicion, Threat and Dange...

Towards a Privacy-preserving Deep Learning-based Network Intrusion Detection in Data Distribution Services

Data Distribution Service (DDS) is an innovative approach towards commun...

Towards Identifying Human Actions, Intent, and Severity of APT Attacks Applying Deception Techniques – An Experiment

Attacks by Advanced Persistent Threats (APTs) have been shown to be diff...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

As organizations and individuals become increasingly connected, network attacks become more dangerous for victims and more attractive to cybercriminals. Security reports such as ENISA Threat Landscape [ENISA], DCMS’s Cyber Security Breaches Survey [DCMS], and ACSC Annual Cyber Threat Report [ACSC] have shown, year after year, the devastating impact of cyber attacks, ranging from the exposure of personal identifiable information to extortion of millions of dollars in ransom payments. With the sophistication of threat capabilities increasing as we move deeper into the digital age, ensuring cybersecurity remains one of the greatest challenges of modern computing.

In response to this scenario, organizations have invested heavily on security in recent years. According to Gartner [Gartner], worldwide spending on information security and risk management technology and services grew 6.4% in 2020 and is expected to grow another 12.4% in 2021, reaching the mark of 150.4 billion dollars. Despite this, currently available security mechanisms still fail to protect institutions, given the growing complexity and rapid changes of modern threats [Accenture2021].

A Network Intrusion Detection System (NIDS) is a software used in conjunction with firewalls and antivirus to protect networked devices from unauthorized access [Tidjon2019]. These software are strategically placed on a network to inspect incoming traffic and trigger an alert when an intrusion is detected. Currently, the most promising approach to such systems is called Anomaly-based Network Intrusion Detection System (A-NIDS). These systems develop a model of normal behavior for network traffic and classify unusual activities as intrusions [Liao2013]. By definition, they are capable of detecting any kind of activity that differs from normal traffic, thus providing protection against new and sophisticated attacks in modern networks [Bhuyan2014].

A-NIDS can be implemented using several techniques, such as clustering algorithms, classification algorithms, cascading supervised techniques, or combining supervised and unsupervised techniques [Agrawal2015]. They can also be implemented for different purposes, such as simply detecting malicious traffic or detecting and categorizing unusual activity according to some attack taxonomy. These latter systems are particularly interesting because categorizing intrusions increases the overall system effectiveness, as instead of just blocking the attack, the system is able to formulate effective incident response actions for each type of intrusion [AHMAD2021102122]. Therefore, an A-NIDS capable of detecting and classifying intrusions is a very desirable network security asset.

Although several techniques can be employed to implement A-NIDSs that detect and classify intrusions, they often present serious limitations. First, most techniques are unable to classify unknown attacks, i.e., they classify every intrusion according to a known attack base, leading to incorrect labeling of new and unprecedented attacks [Toupas2019] [Ahmim2019]. Second, when unknown attacks are considered, the A-NIDS are usually implemented using cascading supervised techniques. In other words, each type of attack is detected at a system layer, having their processing sequentially chained and each part implemented with a different approach, e.g., using different machine learning techniques per layer [Al-Yaseen2017] [Yao2019]. Layering and chaining different techniques lead to increased complexity and performance bottlenecks, which is not desirable since fast traffic analysis is a key requirement of NIDS [Buczak2016]. Considering both limitations, an A-NIDS capable of detecting unknown attacks and classifying intrusions using a single phase technique with low complexity is desirable.

In this work, we propose a novel classifier to detect unknown attacks and to classify known intrusions in a single phase. Our model consists of an adaptation of the binary Anomaly-based Energy-based Flow Classifier (EFC) [pontes2019new] to perform multi-class classification. As a multi-class method, the algorithm defines whether traffic is benign or one of the known intrusions in a single step. In addition, our classifier implements a class for unknown attacks, i.e., traffic that is neither benign nor any of the known attack types. Lastly, it consists of a single classifier with low complexity, making it suitable for deployment in real-time systems [Buczak2016].

We evaluated our proposed classifier using two up-to-date flow-based network intrusion datasets: CIDDS-001 [Ring2017] and CICIDS2017 [Sharafaldin2018]. In addition, we compared the performance of multi-class EFC to other classic multi-class classifiers to assess its capability to distinguish classes of attacks. Finally, we conducted a comparative experiment to assess the behavior of EFC and other multi-class classifiers when faced with unknown attacks. The main contributions of our work are:

  • The proposal and development of the multi-class Energy-based Flow Classifier;

  • A performance comparison between multi-class EFC and other ML multi-class classifiers from the literature;

  • An assessment of different multi-class classifiers’ ability to correctly identify unknown attacks.

The remainder of this paper is organized as follows: section II briefly summarizes some recent work on A-NIDS. Section III presents the fundamentals of the statistical framework employed by EFC and the implementation of the multi-class EFC. Section IV

describes the methodology adopted in this work, including experiments, evaluation metrics and the datasets used. Section

V present and discusses the results of the experiments. Finally, section VI closes the paper with conclusions and suggestions for future research.

Ii Related work

We present a taxonomy of A-NIDS in Figure 1. In this section, we first discuss works in which the proposed system tries to label attacks with only known attacks classes. Next, we present notable works that, in addition to detecting known attacks, also implement specific mechanisms for detecting unknown attacks.

NIDSs aim at protecting the integrity of a computer network. They are responsible for inspecting incoming traffic and setting off an alarm when an attack vector is identified. Nowadays, the most promising approach in NIDS research is A-NIDS, characterized by the ability to identify any type of intrusion that differs from normal traffic [Buczak2016].

In recent years, a wide variety of A-NIDS were developed using Machine Learning (ML) classifiers. Among the existing A-NIDS, there are those that only flag malicious traffic and there are those that, after flagging an attack, attempt to classify it. As discussed in the Introduction, we believe these latter systems to be the most promising ones, because they allow the formulation of effective incident response actions instead of just blocking the attack [AHMAD2021102122]. Systems with this ability usually use multi-class algorithms to learn and identify attack patterns. Next, we review the literature on A-NIDS considering the taxonomy presented in Figure 1.

Figure 1: A-NIDS Literature Review

Binary A-NIDSs classify traffic into two classes: benign or malicious. They are useful to identify and block intrusions, but they do not provide information about identified attacks. We highlight EFC as it was first presented by Pontes et al. [pontes2019new], as a binary classifier. EFC performance was evaluated in the CICIDS2017, CIDDS-001 and CICDDoS2019 datasets and it was compared with several other ML classifiers – Naive Bayes (NB), K-Nearest Neighbors (KNN), Decision Tree (DT), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Adaboost (Ada B.) and Random Forest (RF). From these experiments, it was concluded that EFC is capable of detecting anomalies in the three datasets, reaching an F1 score around 97% at best and an Area Under the Receiver Operating Characteristics (AUROC) around 99% at best. In our work, we modify the original single-class method to perform multi-class classification by using the Potts model to model not only benign flows, but also several attacks classes.

Different from binary classifiers, recent reviews of the literature [Buczak2016, Xin2018, Kunal2019, Mishra2019] identify common multi-class classifiers employed in NIDS using KNN [Saheed2020338], SVM [Saheed2020338], NB [Saheed2020338] [Idhammad2018] [Khammassi], DT [Khammassi], RF [Idhammad2018] [Khammassi] [Oliveira2021], and Artificial Neural Network (ANN[Oliveira2021] [KUNANG2021102804] [Vinayakumar2019]. Although these classifiers are capable of achieving high accuracy labeling known intrusions from training with attack databases, they lack the capability of detecting unknown attacks. Therefore, in the following paragraphs, we present some recent work using multi-class classifiers able to also detect unknown attacks in A-NIDS.

Al-Yaseen et al. (2017) [Al-Yaseen2017] proposed a multi-level Intrusion Detection System (IDS) composed of SVM and Extreme Learning Machine (ELM

), which was evaluated for the NSL-KDD dataset. Their model uses the K-means clustering algorithm to build smaller and improved training sets to reduce the algorithm’s training time. The solution consists of five layers, each filtering an attack class using

SVM or ELM classifiers. After filtering all classes, the final layer classifies the remaining samples as Normal or Unknown with a SVM classifier. The model overall detection rate was 95.17%. In our work, we also classify intrusions into known classes and a Suspicious class, but the complexity of our algorithm is considerably lower than for their classifier, especially in the classification phase.

Yao et al. (2019) [Yao2019] developed a novel IDS framework based on Hybrid Multi-Level Data Mining (HMLD), evaluated on the KDDCUP99 dataset. Their model classification phase consists of filtering each attack with a specific classifier trained to detect this attack. The chosen classifiers were SVM-linear for Dos, ANN-logistic for Probe, ANN-identity for U2L and ANN

-relu for

R2L. Samples not classified as belonging to any of these classes are called Impurity Data. A small subset of the Impurity Data is provided to a specialist to be labeled in new attack classes. Afterwards, this labeled set is used to train a DT to classify the Impurity Data. The overall model accuracy in the KDDCUP99 dataset was 96.70%. In our work, we also create a separate class for unknown intrusions, but our process is completely automatic, not requiring the labelling by a specialist. In addition, we also achieve a lower temporal complexity than ANN and SVM.

In summary, several works have been proposed lately in multi-class A-NIDS. We believe that the most promising are those capable of correct labelling known attacks and unknown attacks, while maintaining low complexity. In this work, we propose an evolution of the single-class EFC [pontes2019new] to overcome the limitations of the existing approaches, while performing multi-class classification. The new classifier stands out from similar proposals for having a single layer with low complexity and an assertive mechanism to correctly classify unknown attacks. In the following section, we present the conceptual foundations of EFC [pontes2019new] and the modifications made for the multi-class version.

Iii Energy-based flow classification

In this section, we introduce the energy-based classification technique. First, we present the original single-class method, developed in [pontes2019new]. Then, we propose the multi-class EFC, an evolution of the former method to perform multi-class classification.

Iii-a Single-class EFC

The main idea of EFC

’s training phase is to use inverse statistics to infer a probability distribution for the flow class to be detected. Once this distribution is defined, it is used to classify new flows by calculating and comparing a measure called

flow energy, which is a measure of how unlikely a flow is to occur in the calculated distribution. The definition of energy arises naturally when inferring the distribution of a flow class. Thus, to define and explain this measure, we will present the inference process of a generic flow class distribution.

Let be a network flow, where each position represents a feature and each feature can assume values . Let be the set of all possible flows from a given class and the subset from which we want to infer the distribution. The probabilistic model that best represents is the one that, respecting empirical observations of , assumes as little prior information as possible. Equivalently, it is the one that least restricts uncertainty among all possible models. Thus, using entropy as a measure of uncertainty, we want to find the distribution that maximizes the entropy while respecting the observed characteristics of . Formally, we want to solve the following problem of maximizing entropy


where is the empirical frequency of value on feature and is the empirical joint frequency of the pair of values of features and observed. In other words, we seek the distribution of greater entropy that reflects the configuration of flow features in .

The proposed maximization can be solved using a Lagrangian function such as presented in [Jaynes1957], yielding the following Boltzmann-like distribution:




is the Hamiltonian of flow k and Z is the partition function that normalizes the distribution. In this work, we will ignore

as we are not interested in calculating specific flow probabilities. In fact, we are only interested in the Hamiltonian of a flow, which is exactly the measure we call

energy. Before discussing its definition and the functions and , note that there is an important relationship between the energy of a flow and its probability, given by Equation 4: the higher the flow energy, the lowest its probability. This relationship implies that the energy of a flow is a measure of how unlikely it is to belong to that distribution. So, if we infer the distribution for a flow class and calculate the energy of a new flow with respect to this distribution, we get a measure of how likely it is that it belongs to that class. For this reason, the energy allows us to classify a given flow as belonging to a certain class or not.

Note that, by the solution presented in [Jaynes1957], the energy is completely defined by the Lagrange multipliers and , associated to constraints (2) and (3). So, to infer the distribution and to be able to compute energy values, we need to calculate and , defined in [Jaynes1957] as






is the covariance matrix obtained from single and joint empirical frequencies.

In an intuitive way, let and be defined for a subset according to equations 6 and 7. Let be a flow, where each position represents a feature and each feature can assume values . Then, the local fields are a measure of how likely it is that feature assumes the value in . Similarly, for the same flow, the coupling values are a measure of how likely it is that features and assume, at the same time, the values and in the set . Therefore, the sum of couplings values and local fields of all features of k reflects the similarity of the flow with the original subset feature by feature.

In brief, EFC’s training phase consists of determining the coupling values and local fields for a given flow class with training samples . Then, in classification phase, it is possible to calculate the energy of a new flow. Since the energy of a flow is the negative sum of local fields and coupling values for all features and features pairs of , it reflects how likely it is that this exact configuration of features values occurs in . If the flow energy is high, it means that it has a low probability – in other words, that it does not resemble the flows that generated the distribution. Likewise, if the energy is low, the flow is more likely to exist in the distribution.

To decide whether the energy is high or low, we use a threshold defined by the 95th percentile of the energies of the samples used to infer the model. In single-class EFC

, the training is done with benign samples only. Therefore, the classification is performed with respect to the normal distribution: if the flow has lowest energy than the 95th percentile of the benign samples,

i.e., below the threshold, it is considered to be normal. Otherwise it is labeled as abnormal traffic. For more details on the development of single-class EFC, as well as a complete explanation of the model inference, please refer to [pontes2019new]. In this paper, we focus on the development of the multi-class EFC, which uses the classification technique presented above to identify several classes. Next, we will present our proposed multi-class version of EFC and an algorithm for its implementation.

Iii-B Multi-class EFC

The multi-class EFC uses the same techniques as the single-class version, i.e., the algorithm decides whether a flow belongs to a class by looking at its energy value. However, in the single-class version the flow energy is calculated only with respect to the normal distribution, resulting in a binary classification. In the multi-class case, several distributions are inferred, one for each flow class. Afterwards, flow energies are calculated in each distribution and their values are compared to return the classification result.

Figure 2 shows the multi-class EFC training process. This phase consists of a replication of the single-class training process to more than one class. While in the single-class version we infer a model only for the benign traffic class, here we need to infer a model for each flow class. So, initially, training samples are grouped by class. Then, the models are inferred and the thresholds are computed for each class, in the same way as the single-class version, i.e., calculating the local fields, coupling values and assuming an statistical threshold, namely the 95th percentile of training samples energies. Lastly, the models induced for each class are stored to be used in the classification phase.

Figure 2: Multi-class EFC training phase

Figure 3 shows the classification process of the multi-class EFC. To classify an instance from the test set, its energy is computed in each model induced in the training phase, generating an energy vector for each instance. As explained in the previous subsection, the energy of a flow in a distribution is a measure of dissimilarity of that flow to the set used to infer the distribution. So, the energy vector of a flow actually contains values inversely proportional to the probabilities of the flow belonging to each class. Therefore, after computing the energies, EFC takes the lowest generated value and compares it with the threshold of that class (since the lowest energy corresponds to the higher similarity). If the energy is below the threshold, the flow is considered to be from the class that generated the energy, otherwise it is classified as suspicious. This second situation means that even the class that most closely resembles the flow isn’t similar enough to it. Therefore, the flow is considered to be suspicious, possibly corresponding to an unknown type of attack.

Figure 3: Multi-class EFC testing phase

In more formal terms, let be the set of all flows labeled in the training set . For all , we infer the coupling values and local fields from , and define the threshold as the 95th percentile of the energies of samples in , calculated using and . To classify a new flow, we compute the energies vector , where is the Hamiltonian for class , and take the minimum value of that vector. If we label the flow with class . Otherwise, we label it as suspicious.

Input: , , ,

1:import all model inference functions
2:for  in  do
3:      flows labeled with
12:end for
13:while Scanning the Network do
14:      wait_for_incoming_flow()
15:      []
16:     for  in  do
17:         e
18:         for  to  do
20:              for  to  do
22:                  if  and  then
24:                  end if
25:              end for
26:              if  then
28:              end if
29:         end for
31:     end for
34:     if  then
36:     else
38:     end if
39:end while
Algorithm 1 Multi-class Energy-based Flow Classifier

Algorithm 1 shows the pseudo-code of the procedures described above. Lines 2 to 11 represent the EFC’s training phase, in which the sets are separated, the statistical models and are induced and the threshold is defined, for each class . When a network flow is captured, lines 16 to 38 perform its classification. First, the energy vector is computed, using each model inferred in the training phase. Then, lines 28-32 select the lowest energy and check if it is below the threshold . If so, the flow is labeled as . Otherwise, it is labeled as suspicious.

The classifier training complexity (lines 2-11) is

where is the number of instances in the training set, is the number of classes, is the number of features and is the size of the alphabet used for discretization, i.e., the maximum number of bins obtained in the discretization. Meanwhile, the complexity for classification phase (lines 16-38) is

Therefore, both training and testing complexities are linear in the number of samples and are more dependent on the number of features and the size of the discretization alphabet. This is a great advantage of the proposed classifier, since both quantities can be kept small. In the next section, we will discuss the datasets, metrics and the experimental setup used to evaluate the classifier presented in this section.

Iv Methodology

This section describes in detail the methodology adopted in this work to evaluate the multi-class EFC. Subsection IV-A discusses the experiments carried out with the classifier and subsection IV-B presents the datasets in which the experiments were conducted.

Iv-a Experiments

We assess our solution by dividing the evaluation into two parts. The first is a performance comparison with other traditional ML multi classifiers to assess EFC’s ability to distinguish between different flow classes. The second is a deep investigation of our classifier mechanism to identify unknown attacks, i.e., types of malicious flows that it was not trained with.

The first part of our assessment is a comparison of EFC and six other multi-class classifiers in 5-fold cross-validation. For this experiment, we used the following classifiers with their default scikit-learnScikit learn library -

implementations and hyperparameters:

Naive Bayes, K-Nearest Neighbors, Decision Tree, Support Vector Classifier, Multi-Layer Perceptron and the ensemble Random Forest

. We also used EFC with its default hyperparameters: 30 for number of discretization bins and 0.5 for pseudocount weights. The results of this assessment are shown in terms of F1 score, defined as the harmonic mean of Precision and Recall measures. More details about data pre-processing for this experiment will be presented in the next subsection.

The second experiment further investigates the mechanism implemented by EFC for classification of unknown attacks. To do so, we wanted to compare EFC detection of unknown samples with some similar techniques from the literature, such as those presented in [Yao2019] and [Al-Yaseen2017]. However, we faced difficulties in implementing both proposals. In the method in [Yao2019], the detection of unknown attacks is heavily dependent on the labeling of the Impurity Data (defined by the authors) by a security expert. Thus, the implementation of their method would be very dependent on our choice of specialist and could result in a biased comparison. In the work of [Al-Yaseen2017], their method uses a new implementation of the K-means algorithm whose description in the article is not sufficient for reproduction and whose code was not made available. Thus, we decided to use for comparison classical algorithms from the literature.

For this experiment, we used the same datasets as for the previous experiment, but we systematically removed an attack class from the training sets, while keeping this class in the test sets. For example, for the DoS class, we performed the 5-fold cross-validation having removed the DoS samples from the training sets. In this way, the DoS samples present in the test set became unknown attacks, as the classifier did not train with this attack, and we were able to analyze the classification of these unknown samples. This procedure was executed for all classes of attacks present in the dataset, one at time. For comparative purposes, we did the same experiment with Random Forest, chosen among other classifiers for having the best performance in the previous evaluation. Although Random Forest does not have a mechanism to identify unknown attacks, it serves as a baseline for the behavior of a common classifier, i.e., a classifier that does not implement a class for unknown attacks. In the next subsection, we present the datasets used in these experiments and more details about the data preparation.

Iv-B Datasets

CIDDS-001 is a publicly available dataset created by Coburg University in 2017 [Ring2017]

. It is a flow-based dataset containing real and simulated traffic collected in a four week period. For each observation, there are 11 correspondent features extracted by

NetFlow. These features consist of flow header information, such as Source IP, Destination IP, Source port, Destination port, and Protocol; and empirical attributes such as Duration, Number of transmitted packets, Number of transmitted bytes, Flags, and Date first seen. In this work, we only used the simulated environment, from weeks 1 to 4. From these weeks, we removed the features Source IP, Destination IP and Date first seen because of their identifier nature that does not characterize traffic content.

CICIDS2017 [Sharafaldin2018] is a dataset created by the University of New Brunswick in 2017. It contains simulated traffic in packet-based and bidirectional flow-based format. For each observation there are 88 features, with 80 of them extracted by CICFlowMeter. As in CIDDS-001, these features can be divided into flow header information and observed data. In our experiments, the features Flow ID, Source IP, Destination IP and Time stamp were removed, because they only make sense in the emulated environment and are not informative regarding the traffic nature. In this dataset, we combined the classes Web Attack-Brute Force, Web Attack-XSS, and Web Attack-Sql Injection into one class called Web Attack, as did [Toupas2019], because the behavior of flows of these classes is practically the same at the network level.

In both datasets, we performed the same pre-processing script. First, we encoded the labels and the symbolic features using ordinal enconding. Next, we normalized the continuous features by their maximum absolute value so that features fit in the range

and discretized the continuous features using the quantile discretization. It is important to note that the data used by the other ML algorithms was only encoded and normalized, as discretization could negatively affect their performance. Finally, due to the large number of instances in CIDDS-001 and CICIDS2017, we performed undersampling in the training sets, restricting to 5000 instances (in CICIDS17) and 6000 instances (in CIDDS-001) the classes that had more instances than this. The final distribution of CIDDS-001 and CICIDS2017 test sets can be seen in Table

I. The scripts used to pre-process data and execute the experiments were made available in the project repositoryEFC repository - In the following section, we will present and discuss our results.

Label Number
normal 5610384
pingScan 1218
bruteForce 998
portScan 53182
DoS 591805
Total 6257587
(a) CIDDS-001
Label Number
BENIGN 454538
DDoS 25606
PortScan 31786
Bot 393
Infiltration 8
Web Attack 436
FTP-Patator 1587
SSH-Patator 1180
DoS Hulk 46025
DoS GoldenEye 2058
DoS slowloris 1159
DoS Slowhttptest 1100
Heartbleed 3
Total 565877
(b) CICIDS2017
Table I: Test sets composition

V Results

max width=min width=0.95 Class EFC NB KNN DT SVC MLP RF normal 0.971 0.001 0.153 0.047 0.973 0.002 0.988 0.001 0.398 0.131 0.963 0.008 0.992 0.001 bruteForce 0.271 0.005 0.001 0.000 0.014 0.001 0.033 0.001 0.002 0.000 0.006 0.001 0.036 0.001 dos 0.973 0.000 0.345 0.005 0.956 0.001 0.993 0.001 0.454 0.021 0.950 0.003 0.999 0.000 pingScan 0.539 0.004 0.369 0.057 0.033 0.007 0.057 0.008 0.002 0.000 0.294 0.133 0.108 0.007 portScan 0.880 0.002 0.000 0.000 0.760 0.021 0.790 0.029 0.003 0.002 0.756 0.009 0.855 0.024 Macro average 0.605 0.001 0.174 0.017 0.547 0.004 0.572 0.007 0.172 0.031 0.594 0.029 0.598 0.006 Weighted average 0.970 0.001 0.170 0.042 0.969 0.002 0.987 0.001 0.400 0.119 0.959 0.008 0.991 0.001

Table II:

CIDDS-001 - Average classification performance and standard error (95% CI)

max width= Class EFC NB KNN DT SVC MLP RF BENIGN 0.949 0.001 0.828 0.002 0.966 0.001 0.993 0.000 0.902 0.003 0.978 0.002 0.998 0.000 Bot 0.585 0.019 0.007 0.000 0.297 0.034 0.525 0.053 0.031 0.003 0.486 0.036 0.617 0.041 DDoS 0.966 0.002 0.968 0.010 0.935 0.003 0.993 0.003 0.749 0.002 0.961 0.015 0.998 0.000 DoS GoldenEye 0.967 0.002 0.435 0.011 0.644 0.010 0.903 0.045 0.437 0.024 0.859 0.027 0.959 0.003 DoS Hulk 0.823 0.005 0.790 0.016 0.923 0.002 0.990 0.001 0.897 0.009 0.958 0.015 0.993 0.001 DoS Slowhttptest 0.917 0.008 0.341 0.037 0.818 0.018 0.847 0.055 0.752 0.012 0.817 0.029 0.916 0.022 DoS slowloris 0.963 0.003 0.183 0.007 0.743 0.047 0.760 0.070 0.505 0.053 0.789 0.044 0.980 0.007 FTP-Patator 0.975 0.004 0.983 0.004 0.706 0.035 0.944 0.028 0.454 0.041 0.855 0.042 0.983 0.022 Heartbleed 0.800 0.160 0.626 0.270 0.103 0.049 0.263 0.367 0.302 0.345 0.429 0.292 0.893 0.135 Infiltration 0.347 0.171 0.005 0.001 0.052 0.021 0.018 0.004 0.069 0.019 0.069 0.020 0.723 0.187 PortScan 0.969 0.002 0.972 0.032 0.888 0.009 0.987 0.002 0.863 0.028 0.900 0.000 0.996 0.000 SSH-Patator 0.701 0.041 0.653 0.009 0.560 0.018 0.962 0.039 0.191 0.004 0.657 0.078 0.990 0.006 Web Attack 0.574 0.097 0.105 0.032 0.197 0.018 0.514 0.086 0.216 0.004 0.302 0.022 0.821 0.040 Macro average 0.752 0.013 0.530 0.017 0.602 0.005 0.746 0.027 0.490 0.026 0.697 0.021 0.913 0.020 Weighted average 0.940 0.001 0.835 0.004 0.952 0.001 0.991 0.001 0.886 0.003 0.968 0.003 0.996 0.000

Table III: CICIDS2017 - Average classification performance and standard error (95% CI)

Our assessments are two-fold: (i) a performance comparison with other ML multi-classifiers and (ii) a investigation of EFC ability to identify unknown attacks. In the following subsections, we present and discuss the results of both assessments.

V-a Classification performance analysis

Tables II and III show the results of EFC and other six ML multi-classifiers with CIDDS-001 and CICIDS2017 datasets, respectively. In these tables, we show the F1 scores obtained in each class and the macro and weighted average of these scores.

Table II shows that EFC achieved good relative scores in the CIDDS-001 classes. In fact, in classes bruteForce, pingScan and portScan, EFC had the best F1 score among all classifiers. In normal and dos classes, although it was not the best classifier, EFC achieved F1 values of and , which are quite satisfactory for a realistic IDS. Overall, EFC achived the best F1 macro average, , and the third best F1 weighted average, .

It is important to note that, despite being significantly better than all other classifiers, EFC performance in bruteForce and pingScan was far from good: and in average, respectively. Nevertheless, these results can be explained by noting that it is the Precision that pulls down the F1 scores and not the Recall (which is actually around for both classes). Since these classes correspond to less than of the test set, their Precision is expected to be heavily impacted by small portions of misclassifications of majority classes. In other words, we interpret these results as a consequence of class imbalance in the test set and not as clear indication of EFC’s ability to characterize these attacks.

Table III shows the results obtained for the CICIDS2017 dataset. Although EFC had a good performance in most classes, it did not stand out in relation to the other classifiers. With the exception of DoS GoldenEye and Dos Slowhttptest classes, where EFC achived the best F1 score, it normaly stayed in second or third place of classification. In classes Bot, Dos Slowloris, FTP-Patator, Heartbleed, Infiltration and Web Attack EFC obtained the second best F1 score, always behind the Random Forest classifier. In the classes DDos and SSH-Patator, EFC achieved the third best F1 score, staying behind Random Forest and Decision Tree. In overall metrics, EFC achieved of F1 macro average and of F1 weighted average, which represent the second and fifth best results, respectively.

As in CIDDS-001, EFC obtained very low F1 scores in some classes, namely Bot, Infiltration and Web Attack. We attribute this result to the same phenomenon of the previous experiment, since these classes correspond to less than 0.002%, 0.07% and 0.08 %, respectively, of the test set.

Although EFC’s overall performance was not the best among the classifiers, it is important to note that its temporal complexity is significantly smaller than that of its competitors. While Decision Tree and Random Forest training complexities are and [Buczak2016], the multi-class EFC is linear on , where is the number of instances, is the number of attributes, is the number of trees and is the number of classes. Therefore, the training complexity of EFC is more dependent on the number of attributes and the number of bins used for discretization, than on the number of samples. Buczak et al. [Buczak2016] considered that for a classifier to have high streaming capacity it needs to have at most linear complexity on . They also considered that Random Forest and Decision Tree have only a medium capacity of streaming. Therefore, EFC is a competitive classifier if we consider not only the performance, but also the applicability of the method.

Figure 4: CIDDS-001 - Classification of unknown attacks by EFC and Random Forest - Misclassification represented in blue, desirable results are bars either gray for RF or orange for EFC.
Figure 5: CICIDS2017 - Classification of unknown attacks by EFC and Random Forest - Misclassification represented in blue, desirable results are bars either gray for RF or orange for EFC.

V-B Unknown attack detection analysis

The second evaluation carried out in our work investigates the ability of classifiers to detect unknown attacks. To perform such an evaluation, we turn a known attack into an unknown by removing the data samples of this attack from the training set. Afterward, we assess the performance of our solution against RF considering a normal test set, which remains unchanged with samples of the original attack, evaluating the capability of both techniques to identify these samples as a threat to the network. Further details of this experiment can be found in Subsection IV-A, third paragraph. Next, we present its results.

Figures 4 and 5 show the results for RF and EFC in CIDDS-001 and CICIDS2017 datasets, respectively. Each bar in these figures shows the result of the classification of samples from the class that had been omitted from training. For example, the bar labeled bruteForce, in Figure 4, shows the classification of bruteForce samples in the experiment where bruteForce were omitted from training (making it an unknown attack). The colors in the bars represent the predicted classes of these samples, which can be Benign, if it was labeled as Benign, or Other classes, if it was labeled as any other attack class. In addition, for the multi-class EFC, we also have the Suspicious class, which is the label provided by the classifier when samples do not fit into any known class, being classified as a possible threat to the network. The ideal results for RF would be full gray bars, which would mean that no unknown attack samples were classified as Benign. Meanwhile, for the EFC, the ideal results would be full orange bars, which would mean that every unknown attack was correctly recognized as an unknown threat.

From Figure 4, we can see that RF classified more than 60% of bruteForce, portScan and Dos attacks as Benign when it was not trained with samples of these types. These results illustrate well the problem of unknown attack detection: although RF has excellent classification metrics, far superior than most other classifiers, it cannot identify new classes of attacks without training with their samples. In contrast, for the same classes, EFC classified the vast majority of samples as Suspicious and identified almost all DoS samples correctly. So, EFC is more robust to be used in real NIDS, where correct labelling of unknown attacks is fundamentally difficult and almost impossible for a network administrator.

For CICIDS2017, Figure 5 shows the same tendency observed in CIDDS-001. When omitted from training, the attacks from classes DoS Hulk, Heartbleed and Infiltration were completely ignored by RF, which classified their samples as Benign. On the other hand, in the same context, these samples were almost integrally identified by EFC as Suspicious. Other classes like Bot, DDoS, FTP-Patator and PortScan also showed significant detection improvements by EFC: the proportion of samples from these classes predicted as Benign decreases in EFC’s classification. Finally, Web Attacks and SSH-Patator generated similar results by both classifiers, which perhaps can be explained by their attack nature, as they belong to the application layer and require deep packet inspection to be properly identified.

The results obtained in the second experiment show that the mechanism implemented in EFC for recognition of unknown attacks is effective. Furthermore, it highlights the importance of having such a mechanism, since, in the absence of it, most unknown samples are classified as benign traffic, even by the best classifier. In the context of intrusion detection, this behavior represents a huge security breach and is not acceptable for real systems. For this reason, we believe that EFC is a promising classifier to be used in NIDS, as it is capable of detecting known intrusions and unknown attacks with low complexity compared to other classifiers.

Vi Conclusion

In this work, we proposed a new multi-class classifier to be used in NIDS: the multi-class EFC. Our method is an adaptation of the single-class EFC, first introduced in [pontes2019new], that performs classification with respect to a benign class, several attack classes and a suspicious class (intended for unknown attacks). We evaluated the classifier using two up-to-date datasets: CIDDS-001 and CICIDS2017. In both datasets, the average results were good compared to other classic ML classifiers, although it was not always the best classifier. Nevertheless, our proposal proved to be very efficient in recognizing unknown attacks, while the best among the other classifiers showed serious vulnerabilities in this regard. We also highlight that the proposed method has a single layer with low complexity, which makes it more suitable for streaming than other classifiers or similar proposals from the literature.

In the future, we intend to investigate a dynamic threshold to replace the static 95th percentile used in the current versions of EFC. Also, we are already working on a real-time EFC integrated with Software Defined Networks.



Manuela M. C. de Souza is an undergrad Computer Science student at University of Brasilia (UnB), Brasilia, DF, Brazil. Her research interests are Network Security and Machine Learning.

Camila F. T. Pontes is a researcher at the Barcelona Supercomputing Center (BSC) in Spain. She has two undergrad degrees in Biology and Computer Science from the University of Brasilia (2014, 2020), and also a M.Sc. and a Ph.D. in Computational Biology from the same university (2016, 2021). Her research interests are Computational and Theoretical Biology and Network Security.

João J. C. Gondim was awarded an M.Sc. in Computing Science at Imperial College, University of London, in 1987 and a Ph.D. in Electrical Engineering at UnB (University of Brasilia, 2017). He is an adjunct professor at Department of Computing Science (CIC) at UnB where he is a tenured member of faculty. His research interests are network, information and cyber security.

Luís Paulo Faina Garcia Graduated in Computer Engineering (2010) and PhD in Computer Science (2016) from the University of São Paulo. In 2017, his thesis was ranked among the best by the Brazilian Computer Society (SBC) and received the CAPES award for the best thesis in Computer Science in the country. He is currently Adjunct Professor A in the Department of Computer Science (CIC) at the University of Brasília (UnB). He has experience in subjects related to noise detection, meta-learning and data streams.

Luiz A. DaSilva [F] ( is the Executive Director of the Commonwealth Cyber Initiative and the Bradley Professor of Cybersecurity at Virginia Tech. He was previously at Trinity College Dublin, where he was the director of CONNECT, the Science Foundation Ireland Research Centre for Future Networks. He is an IEEE Fellow, and an IEEE Communications Society Distinguished Lecturer.

Marcelo Antonio Marotta is an adjunct professor at the University of Brasilia, Brasilia, DF, Brazil. He received his Ph.D. degree in Computer Science in 2019 from the Institute of Informatics (INF) of the Federal University of Rio Grande do Sul (UFRGS), Brazil. His research involves Heterogeneous Cloud Radio Access Networks, Internet of Things, Software Defined Radio, Cognitive Radio Networks, and Network Security.