End-to-End Adversarial Learning for Intrusion Detection in Computer Networks

04/25/2019 ∙ by Bahram Mohammadi, et al. ∙ 0

This paper presents a simple yet efficient method for an anomaly-based Intrusion Detection System (IDS). In reality, IDSs can be defined as a one-class classification system, where the normal traffic is the target class. The high diversity of network attacks in addition to the need for generalization, motivate us to propose a semi-supervised method. Inspired by the successes of Generative Adversarial Networks (GANs) for training deep models in semi-unsupervised setting, we have proposed an end-to-end deep architecture for IDS. The proposed architecture is composed of two deep networks, each of which trained by competing with each other to understand the underlying concept of the normal traffic class. The key idea of this paper is to compensate the lack of anomalous traffic by approximately obtain them from normal flows. In this case, our method is not biased towards the available intrusions in the training set leading to more accurate detection. The proposed method has been evaluated on NSL-KDD dataset. The results confirm that our method outperforms the other state-of-the-art approaches.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The significant Internet expansion and also its rising popularity cause a massive increase in data exchange between different parties. These data most probably include valuable information of people, government, and organizations. Therefore, a reliable defense system is required to prevent misuse of sensitive information and to detect network vulnerabilities and potential threats. Intrusion Detection System (IDS) is a promising solution for providing network security services. Also, it is considered an effective alternative for conventional defense systems such as firewall. Previous researches in this topic, can be divided into two major categories: (1) signature-based, and (2) anomaly-based

[1]. Signature-based approach is very effective to detect already known attacks which their patterns are available. However, they are inefficient against unfamiliar and new intrusions. Conversely, the latter approach seems more effective owing to its superior performance against unknown and zero-day network attacks [2].

Generally, supervised methods need to know the specific characteristics of attacks, while their high diversity makes this process expensive and probably impossible [3]. This difficulty makes supervised manner lacks generalization. To overcome such problem, we have proposed a semi-supervised method. In contrary to supervised methods which detection is performed based on the available labeled training data, our method is able to decide about the unseen incoming network Packet Flows (PFs).

In recent years, Deep Neural Networks (DNNs) have achieved state-of-the-art performance in various research fields, especially in computer vision and natural language processing. These outstanding successes motivate us to take advantages of deep learning in IDS. Note that, DNNs such as Convolutional Neural Networks (CNNs), can achieve the remarkable results only if they access to a lot of annotated training samples from all classes. Under the realistic conditions, there are numerous samples from normal class, but the abnormal class is often absent during the training, poorly sampled, or not well defined. In summary, the anomaly detection task refers to a binary classification task, while there are not any samples from abnormal/outlier class. Training an end-to-end DNN in the absence of one class data, is not straightforward


To address the above-mentioned challenge, We have aimed to generate simulated anomalous flows, inspired by the Generative Adversarial Networks (GANs) [5]. Using GAN, we have proposed an end-to-end deep network for IDS which is able to effectively detect the network intrusions, even against unforeseen and new ones. The proposed method is composed of two main modules, econstructor network (), and nomaly detector network (). Since and are trained adversarially and competitively, they properly learn the distribution of the feature space of normal PFs. strives to fool , in efforts to manipulate it into detecting as a normal PF not a reconstructed version, however, the training duration of network is limited in order to have more realistic anomalies. On the other hand, The duty of is to distinguish between original normal PFs and the reconstructed ones (abnormal traffic). In other words, determines whether the incoming PF, follows the distribution of the feature space of the normal traffic.

The main contributions of the proposed method are: (1) Proposing an end-to-end DNN for IDS which is trained adversarially in a GAN-style setting. To the best of our knowledge, our method is the first end-to-end DNN for semi-supervised intrusion detection, (2) In our method, training is done merely using the feature space of the normal PFs. Nevertheless, the experimental results show the superiority of the proposed method, even compared to the supervised approaches, due to its generality. (3) Performance of the proposed method is better than the other state-of-the-arts in terms of accuracy while the time complexity is also reduced noticeably. (4) In the proposed method, the unseen traffic, _meaning:NTF . i.e _catcode:NTF a i.e. i.e.,  anomaly, is simulationly generated using adversarial training. Hence, the need for availability of all classes data has been satisfied.

(a) Training Phase
(b) Test Phase
Fig. 1: Overall scheme of the proposed method for anomaly detection in computer networks in the stages of training and test. and are two modules of the model which are trained adversarially and competitively. includes an encoder-decoder network and

consists of a Fully-Connected Neural Network (FCNN) ending with a softmax classifier. In training phase,

parameters are optimized to reconstruct the incoming normal PFs for generating simulated abnormal traffic while it attempts to achieve an optimum value for the reconstruction error. Additionally, can detect if an incoming network PF belongs to the target class (normal) or it is an outlier (anomaly). In test phase, The classification is performed based on a predefined . The simulated abnormal PFs have some deviation from the real anomalies. Thus, this fact is pointed out by using different colors for simulated abnormal and real abnormal packets.

Ii End-to-end IDS

In this paper, we have proposed a semi-supervised and end-to-end deep model for IDS. Accordingly, the abnormal class data is not available. The key idea of the proposed method is to solve this problem by simulating anomalous PFs using original normal traffic. This method comprises two modules ( _meaning:NTF . i.e _catcode:NTF a i.e. i.e.,  networks): (1) , and (2) . The former generates simulated anomalous PFs by reconstructing normal traffic to obviate the need for the presence of anomaly class in training phase. The latter detects whether the incoming traffic is normal or not. and are trained adversarially and unsupervisedly in an end-to-end setting. reconstructs the incoming PFs to mislead over the input type, _meaning:NTF . i.e _catcode:NTF a i.e. i.e.,  normal flow or simulated abnormal one. Since the original data is available for , it is familiar with their concept. Therefore, does not act blindly and attempts to reject the simulated anomalous traffic. These two modules are trained in a GAN-style setting, forming an end-to-end framework for anomaly detection in IDS. The difference of our work with the original GAN is the time which the generator training process stops. should not perfectly reconstruct the normal traffic. In our case, simulated anomalies should have some deviation from the normal traffic, otherwise, anomalous PFs are not properly generated. can also effectively distinguish between real and fake input. After training process, knows the distribution of the target class, _meaning:NTF . i.e _catcode:NTF a i.e. i.e.,  normal traffic, and can simply investigate that each of new incoming flow follows the distribution or not. Fig. 1 shows an overview of our proposed method. A detailed explanation of network, network, joint training of , and the process of anomaly detection are also provided in this section.

econstructor Network: has simulationly and adverserially generated anomalous traffic in the stage of training by reconstructing normal incoming PFs. This network gradually and in competition with , learns to generate flows similar to normal ones. To this end, includes an encoder-decoder network. Equation 1 shows the function of these components.


Here, is the feature space of the incoming PF, is the feature space of the simulated abnormal PF, is the latent representation, and are weight matrices, and

are element-wise activation functions. The encoder maps an incoming PF to a latent space and the decoder attempts to retrieve the simulated anomaly from the latent space. Note that, if

reconstructs the normal flows with high precision, they can not play the role of anomalies. Consequently, the training of should be stopped before it be able to perfectly reconstruct the normal flows. In [6, 4], it has been shown that, by over-training an encoder-decoder network, we could inpatient the irregular samples and convert them into a normal concept. In contrary to these works, in our case, it is not desirable that encoder-decoder maps the flows including irregularity to equivalent normal flows by their reconstruction. Fig. 2 shows the architecture of the network.

Fig. 2: The architecture of encoder-decoder network.

nomaly detector Network: The network acts as a classifier to distinguish between the representation of simulated traffic _meaning:NTF . i.e _catcode:NTF a i.e. i.e.,  abnormal/fake flows, generated by and normal traffic _meaning:NTF . i.e _catcode:NTF a i.e. i.e., real flows. Previously, in computer vision community [4, 7], it is investigated that a network like , which somehow differs from ours in terms of learning process, is capable of efficiently detecting the irregular/outlier images.

includes a sequence of fully-connected layers ending with a softmax layer (classifier). As mentioned previously, the main purpose of

is to detect the abnormal incoming PFs. It is worth mentioning that, the output indicates the likelihood of the input following the distribution spanned by the target class. Hence, outputs can be considered as a target likelihood score for any given input. The detailed architecture of the network is indicated in the Fig. 3.

Fig. 3: network architecture specifying if an incoming PF follows the target class distribution or not.

Adversarial Training: Goodfellow _meaning:NTF . et al _catcode:NTF a et al. et al.  [5] has introduced an efficient way for adversarial learning of two networks, Generator (G) and Discriminator (D), called GAN. GANs aim to generate samples that follow the real data distribution, through adversarial training of two networks. G learns to map a latent space like Z sampled from a specific distribution _meaning:NTF . i.e _catcode:NTF a i.e. i.e.,  , to a real data distribution (referred to as ). D is trained by maximizing the probability of assigning the correct label to both the actual data and the fake data from G, while G is simultaneously trained to minimize the . In other words, G and D play the following two-players mini-max game:


Similarly, we have jointly and adversarially trained network, and on the contrary to the purpose of conventional GANs which learns to generate sample from distribution, and are trained to generate sample for abnormal class and distinguish abnormal flows from normal ones, respectively. Consequently, the optimum point for stopping the joint training of differs from conventional GANs. In beside of this, in our method instead of mapping the latent representation, Z, to a data sample with the distribution , maps:


As stated earlier, is the distribution of the target class, _meaning:NTF . i.e _catcode:NTF a i.e. i.e.,  normal traffic. Since the PFs from the target class are available for , thus it knows . In this case, can explicitly decide whether follows or not. Accordingly, can be trained by optimizing the following objective:


nomaly Detection: In this part, we have explained the classification manner of the proposed method. As previously discussed, acts as an anomaly detector, and also derive a benefit from . Eventually, the proposed Anomaly Detector (AD) is formulated merely using network as follows:


Where is a threshold value.

Iii Experimental Results

In this section, We have evaluated our proposed method on a widely-used dataset known as NSL-KDD111NSL-KDD is available at https://github.com/defcom17/NSLKDD [8]. The experimental results along with a thorough comparison are provided in the following subsections.

Iii-a Implementation Details

The proposed method is implemented using Keras framework and Python ran on the GOOGLE COLAB

222https://colab.research.google.com. Learning rate is set equal to 0.001 for both networks and (Equation 5) is set equal to 0.5. The detailed structure of and 333The trained models of both networks, _meaning:NTF . i.e _catcode:NTF a i.e. i.e.,  and , are available at https://github.com/Bahram-Mohammadi/End-to-End-Adversarial-Learning-for-Intrusion-Detection-in-Computer-Networks has been provided in Section II.

Iii-B Evaluation

As mentioned previously, the assessment is carried out using the NSL-KDD dataset including two subsets, KDDTrain and KDDTest. Since we have proposed a semi-supervised method, only the normal records of KDDTrain are involved in the training phase and thus its anomalous records are unused. Furthermore, for the stage of the test, KDDTest is used completely. Table I indicates the data distribution of training and test sets for the evaluation process.

Number of PFs
Class KDDTrain KDDTest


Normal 67343 9711
Anomaly 12833
TABLE I: Data distribution of training and test sets.
AE [9] 88.28 91.23 87.86 89.51
De-noising AE [9] 88.65 96.48 83.08 89.28
Ours 91.39 89.94 95.56 92.67
TABLE II: Binary classification performance Comparison.

Our method (

) is evaluated using four measures: accuracy (ACC), precision (PR), recall (RE) and f-score (FS). We are able to compare

with methods using the same test set, _meaning:NTF . i.e _catcode:NTF a i.e. i.e.,  KDDTest, and also report the performance results of their work with same metrics. Table II confirms that achieves better results compared to another work included two different methods [9]. Although these methods are semi-supervised, they used anomalous records of KDDTrain in the validation set. In fact, in this work, the threshold for distinguishing normal traffic from anomalies are determined based on the validation set. But abnormal samples have not been involved in the training phase of at all. Nonetheless, we have gained better results in terms of ACC, RE, and FS.

Method Supervision ACC (%)
RNN-IDS [10] 83.28
DCNN [11] 85.00
AE [9] 88.28
Sparse AE and MLP [12] 88.39
Random Tree [13] 88.46
De-noising AE [9] 88.65
LSTM [11] 89.00
Random Tree and NBTree [13] 89.24
Ours 91.39
TABLE III: Binary Classification Accuracy Comparison. There are two kinds of Supervision for Methods in This table, Supervised (S) and Semi-Supervied (SS).

The selected performance indicator for providing a thorough comparison with the other state-of-the-art methods is showing the correct classification rate of all classes. Regarding Table III, represents a significant improvement in terms of accuracy. In our work, simulated abnormal flows are generated with some deviation from the target class irrespective of real anomalies. In fact, our decision making process is not biased towards intrusions which are available in the training set. Hence, includes generalization property helping us to obtain better result even compared to supervised methods. It is worth mentioning that, the process of detecting each incoming network PF is done just in 45 on average. Therefore, is capable of properly working in real time.

Iv Discussion

Mode collapse: GANs face an issue arising when the generator learns only a portion of real-data distribution and then outputs samples from a single mode, ( _meaning:NTF . i.e _catcode:NTF a i.e. i.e.,  the other modes are ignored). This problem is known as mode collapse [14]. Mode collapse is no longer exists in our case as directly sees all possible flows of the target class data and implicitly learns the manifold spanned by the target data distribution.

Unseen class generating: Our proposed method is semi-supervised, thus we need to somehow obtain the anomalous PFs from the normal traffic. The simulated anomalous traffic are generated by network to play the role of real anomalies. In fact, we have approximately generated the unseen class data using the normal traffic to solve the problem of DNNs with the absence of one class data.

Generalization: If the training process is done irrespective of the anomalies in the training set, the proposed method is able to decide about the type of incoming PFs while it is not biased towards the already known intrusions. In this case, the proposed method can provide generalization leading to better detection rate.

V conclusion

In this paper, we have proposed a novel semi-supervised and end-to-end deep learning method for anomaly detection in IDS. Specifically, our model is composed of two modules, and . These networks are trained competitively in an adversarial manner. After training phase, can simulate anomalies in a way that do not perfectly match with the normal traffic, while can distinguish normal PFs from abnormal traffic. The proposed method has been evaluated on NSL-KDD dataset and the results demonstrates our method has better performance compared to the other state-of-the-arts.


  • [1] H.-J. Liao, C.-H. R. Lin, Y.-C. Lin, and K.-Y. Tung, “Intrusion detection system: A comprehensive review,” Journal of Network and Computer Applications, vol. 36, no. 1, pp. 16 – 24, 2013.
  • [2] P. Garcia-Teodoro, J. Diaz-Verdejo, G. Macia-Fernandez, and A. E. Vazquez”, “Anomaly-based network intrusion detection: Techniques, systems and challenges,” Computers & Security, vol. 28, no. 1, pp. 18 – 28, 2009.
  • [3] L. Fernandez Maimo, . L. Perales Gomez, F. J. Garcia Clemente, M. Gil Perez, and G. Martinez Perez, “A self-adaptive deep learning-based system for anomaly detection in 5g networks,” IEEE Access, vol. 6, pp. 7700–7712, 2018.
  • [4]

    M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli, “Adversarially learned one-class classifier for novelty detection,” in

    IEEE Conference on Computer Vision and Pattern Recognition

    , 2018, pp. 3379–3388.
  • [5] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds.    Curran Associates, Inc., 2014, pp. 2672–2680.
  • [6] M. Sabokrou, M. Fathy, and M. Hoseini, “Video anomaly detection and localisation based on the sparsity and reconstruction error of auto-encoder,” Electronics Letters, vol. 52, no. 13, pp. 1122–1124, 2016.
  • [7] M. Sabokrou, M. Pourreza, M. Fayyaz, R. Entezari, M. Fathy, J. Gall, and E. Adeli, “Avid: Adversarial visual irregularity detection,” in Asian Computer Vision Conference, 2018.
  • [8] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis of the kdd cup 99 data set,” in 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, July 2009, pp. 1–6.
  • [9]

    R. C. Aygun and A. G. Yavuz, “Network anomaly detection with stochastically improved autoencoder based models,” in

    2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), June 2017, pp. 193–198.
  • [10]

    C. Yin, Y. Zhu, J. Fei, and X. He, “A deep learning approach for intrusion detection using recurrent neural networks,”

    IEEE Access, vol. 5, pp. 21 954–21 961, 2017.
  • [11] S. Naseer, Y. Saleem, S. Khalid, M. K. Bashir, J. Han, M. M. Iqbal, and K. Han, “Enhanced network anomaly detection based on deep neural networks,” IEEE Access, vol. 6, pp. 48 231–48 246, 2018.
  • [12] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep learning approach for network intrusion detection system,” in Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (Formerly BIONETICS), ser. BICT’15.    ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2016, pp. 21–26.
  • [13] J. Kevric, S. Jukic, and A. Subasi, “An effective combining classifier approach using tree algorithms for network intrusion detection,” Neural Computing and Applications, vol. 28, no. 1, pp. 1051–1058, Dec 2017.
  • [14] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in

    Proceedings of the 34th International Conference on Machine Learning

    , ser. Proceedings of Machine Learning Research, D. Precup and Y. W. Teh, Eds., vol. 70.    PMLR, 06–11 Aug 2017, pp. 214–223.