Rallying Adversarial Techniques against Deep Learning for Network Security

03/27/2019 ∙ by Joseph Clements, et al. ∙ Clemson University 0

Recent advances in artificial intelligence and the increasing need for powerful defensive measures in the domain of network security, have led to the adoption of deep learning approaches for use in network intrusion detection systems. These methods have achieved superior performance against conventional network attacks, which enable the deployment of practical security systems to unique and dynamic sectors. Adversarial machine learning, unfortunately, has recently shown that deep learning models are inherently vulnerable to adversarial modifications on their input data. Because of this susceptibility, the deep learning models deployed to power a network defense could in fact be the weakest entry point for compromising a network system. In this paper, we show that by modifying on average as little as 1.38 of the input features, an adversary can generate malicious inputs which effectively fool a deep learning based NIDS. Therefore, when designing such systems, it is crucial to consider the performance from not only the conventional network security perspective but also the adversarial machine learning domain.



There are no comments yet.


page 1

page 2

page 3

page 4

page 5

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Largely attributable to advances in deep learning, the field of artificial intelligence has been growing swiftly in the recent past. Through many examples, it has been witnessed that deep learning systems have the potential to achieve or even surpass human-level performance on certain tasks. Furthermore, these systems are not explicitly given a function to implement, but instead can discover hidden rules or patterns that developers may not be able to comprehend. This ability to learn has made deep learning an indispensable tool for advancing the state-of-the art in multiple fields.

With this remarkable successes, it is unsurprising that deep learning techniques are quickly being adopted in network security for use in intrusion detection [1], malware analysis [2], spam filtering [3], and phishing detection  [4]

. However, the growing popularity of novel network paradigms (i.e., Internet of Things (IoT) and mobile networks) also brings more unique and higher security requirements. To this end, modern deep learning algorithms have the potential to rival traditional approaches, especially in these emerging fields. Recently, the field of deep learning based network intrusion detection systems (DL-NIDS) have been growing due to the variability and efficiency of the deep learning model. The availability of novel techniques such as recurrent neural networks, semi-supervised learning, and reinforcement learning is allowing DL-NIDS to achieve success in applications that have been traditionally out of the reach of intrusion detection systems  

[5, 6, 7].

However, the downside of deep learning is that the high non-linearity seen in these systems limits the ability of developers to guarantee or explain their functionality. This allows for the possibility of unseen security risks. Indeed, many recent works have demonstrated the vulnerability of deep learning to adversarial manipulation [8, 9, 10]. For example, adversarial examples can completely misclassify a deep learning model by only slightly altering the network input data [11, 12, 13, 14]. In response to the threat that this form of attack poses to deep learning multiple potential defenses have arisen [15, 16, 17]. Despite this, security applications remain vulnerable due to the fact that it is uncertain which defensive methodologies are most effective in given scenarios.

Therefore, to ensure the defensive capabilities of deep learning based security systems, these application should be evaluated against both the traditional performance metrics in the target security field and those vulnerabilities from the adversarial deep learning domain. In fact, if deployed without understanding the vulnerability, a deep learning model could easily become the most susceptible component of a security application. In this paper, we analyze the current state-of-the art in deep learning based network intrusion detection system (DL-NIDS). Specifically, we investigate the security of a recently proposed DL-NIDS, Kitsune, which offers a similar level of the defensive capability as traditional intrusion detection systems, while requiring a lower overhead. We evaluate the DL-NIDS from two perspectives: 1) the ability to defend from malicious network attack, and 2) the robustness against adversarial examples.

In the remainder of the paper, we first introduce the basics of deep learning and its use in intrusion detection systems as well as the state-of-the-art in adversarial deep learning. We decompose the target DL-NIDS, Kitsune, in Section III. Then, we briefly outline our experimental setup in Section IV. In Sections V and VI, we evaluate the DL-NIDS from the perspectives of network security and adversarial deep learning, respectively. Finally, Section VII concludes the paper with a discussion of potential future research directions.

Ii Background

Ii-a Deep Learning based Network Intrusion Detection Systems

The increasing frequency and size of cyber-attacks in recent years [18] makings network intrusion detection systems (NIDS) a critical component in network security. An example of a network intrusion detection system is shown in Figure 1. The intrusion detection system essentially acts as a gate keeper at the target node which activates a firewall or alerts a host device when a malicious network traffic is detected. Unfortunately, while these systems can be used to effectively defend the entry point, much of the network remains unprotected. In other words, attacks that remain internal to the network are often difficult to detect by the traditional intrusion detection systems [19].

Fig. 1: An intrusion detection system positioned to defend a host device from abnormal network traffic.

Deploying an intrusion detection system at multiple nodes in a distributed manner throughout the network can fill this hole to further securing networks. A major drawback of the traditional rule based approach, however, is that each intrusion detection system must be explicitly programmed to follow a set of rules. This process also generates potentially long lists of rules that need to be stored locally for the intrusion detection systems to access. Furthermore, any changes in a network node might potentially lead to an update for the entire network. To this end, DL-NIDS have the potential to overcome this weakness as they can generalize the defense by capturing the distribution of normal network traffics, instead of being explicitly programmed [19, 20, 21, 22]. In addition, these methods do not require large lookup tables, which could also reduce the implementation cost.

Ii-B Adversarial Example Generation

A major focus of adversarial deep learning is the adversarial example generation, which attempts to find input samples by slightly perturbing the original benign data to yield different classifications. Formally, the adversarial example generation process can be expressed by [11]:


Where is the model’s original primary input, is a perturbation on to achieve the desired adversarial behavior, and is defines a bounded region of the valid input values. is a distance metric which limits , while is a constraint that defines the goal of the attack. Two commonly used constraint functions are and . The first defines a targeted attack in which the adversarial goal is to force the network output, , to a specific output, . The second defines the untargeted scenario where the adversarial goal is for the network to produce any output except . The choice of also greatly affects the final outcome of the attack. In the existing works, norms (i.e., , , , and ) are often used due to their mathematical significance and correlation with perceptual distance in image or video recognition. Recently, new distance metrics are being explored with the recent works such as spatially transformed adversarial examples [23].

In the literature, many algorithms for generating adversarial examples utilizing various , , and/or optimization approaches have been developed. For example, one of the earliest adversarial example algorithms, Fast Gradient Sign Method (FGSM), perturbs every element of the input in the direction of its gradient by a fixed size [12]. While this method produced quick results, Basic Iterative Method (BIM) can significantly decrease the perturbation which however requires a longer time to run [16]. Furthermore, adversarial example generation algorithms continue grow more sophisticated as novel attack build on the foundation of existing works. An example of this is the elastic net method (ENM) which adds an regularization term and the iterative shrinkage-thresholding algorithm to Carlini and Wagner’s attack [14]. Moreover, adversarial examples are expanding out from image processing into alternate fields where they continue to inhibit the functionality of deep learning models [24, 25, 26]. The effort to draw researcher awareness to the subject has even lead to the generation of competitions in which contestants attempt to produce and defend neural networks from these adversarial example [27, 28].

Ii-C Robustness against Adversarial Examples

Some researchers believe that the vulnerability of deep learning models to adversarial examples is evidence of a pervasive lack of robustness rather than simply an inability to secure these models [29, 30, 31]. As such, defenses attempt to bolster the deep learning model’s robustness through the use of either reactive or proactive methods [32]

. Defensive distillation and adversarial training are two proactive defenses, which improve a neural network’s robustness by retraining the network weights to smooth the classification space 

[15, 16]. A recent example of a reactive defense is, PixelDefend, which attempts to perturb adversarial example input back to the region of inputs space that is correctly handled by the network [17].

When deep learning is powering security applications the robustness of the model is even more critical. The field of malware classification is a prime example as deep learning models have been shown to perform superbly in this area in multiple implementation and scenarios [33, 34, 35, 36]. Unfortunately, when adversarial example are presented to these systems the lack of robustness in the deep learning model often allows an attack to bypass these security measures [37, 38]. Despite this vulnerability deep learning is a prime candidate for security implementations when the resource demands or static nature of traditional defenses inhibit their practicality. Thus, as deep learning continues to develop into network intrusion detection the robustness of such systems should be thoroughly studied. To this end, researchers are continuing to develop guidelines and frameworks to aid in ensuring the robustness of machine learning systems against adversarial manipulations [39, 40].

Fig. 2: A graphical representation of Kitsune [19].

Iii Evaluated Network

In this section, we present a brief overview of the network intrusion detection system and then analyze the Kitsune’s deep learning model, KitNET, in more detail.

Iii-a Kitsune Overview

The DL-NIDS, Kitsune, is composed of Packet Capturer, Packet Parser, Feature Extractor, Feature Mapper, and Anomaly Detector [19]

. The Packet Capturer and Packet Parser are standard components of NIDS, which forward the parsed packet and meta information (e.g., transmission channel, network jitter, capture time) to the Feature Extractor. Then, the Feature Extractor generates a vector of over

statistics which defines the packet and current state of the active channel. The Feature Mapper clusters these features into subsets to be fed into the Anomaly Detector, which houses the deep learning model, KitNET.

The Kitsune DL-NIDS is specifically targeted at being a light-weight intrusion detection system to be deployed on network switches in the IoT settings. Thus, each implementation of Kitsune should be tailored to the network node that it is deployed to. This is achieved through the use of an unsupervised online learning approach which allows the DL-NIDS to dynamically update in response to the traffic at the target network node. The algorithm assumes that all real-time transmissions during the training stage are legitimate and thus learns a benign data distribution. For inference, it analyzes the incoming transmissions to determine if it resembles the learned distribution.

Iii-B KitNET

KitNET consists of an ensemble layer and an output layer. The ensemble layer includes multiple autoencoders, with each working on a single cluster of inputs provided from the Feature Mapper. The output scores of these autoencoders are then normalized before being passed to an aggregate autoencoder in the output layer whose score is used to assess the security of the network traffic data.

Iii-B1 The Autoencoders

The fundamental building block of KitNET is an autoencoder, a neural network which reduces an input down to a base representation before reconstructing to the same input dimension from that representation. The autoencoders in KitNET are trained to correctly capture the property of normal network traffics. The number of hidden neurons inside an autoencoder is limited so the network can learn a compact representation.

KitNET employs a root-mean-squared-error (RMSE) function on each autoencoder as the performance criteria. The score generated by each autoencoder block is given by:


where is the number of inputs. Because the model was trained to reproduce instances from

a low score indicates the input resembling the normal distribution well.

Iii-B2 The Normalizers

Another component used by Kitsune is the normalizers, appearing both before entering KitNET and before the aggregate autoencoder. These normalizers implement the standard function:


which linearly scales minimum and maximum input values to and , respectively. In Kitsune’s training, the value of and respectively take on the maximum and minimum input values seen by the element during training.

Iii-C Classifying the Output

The primary output of KitNET is the RMSE score,

, produced by the aggregate autoencoder. It should be noted that the scores produced by KitNET are numerical values rather than a probability distribution or logits like in common deep learning classifiers. Kitsune utilizes a classification scheme which triggers an alarm under the condition:

, where is the highest value of recorded during training and is a constant used to find a trade-off between the amount of false positives and negatives. The authors limit the value of to be greater than or equal to in order to assure a training accuracy (i.e., all the training data are considered as benign).

Iv Experimental Setup

In this section, we briefly describe our experimental setup and the necessary modifications to the KitNET.

Iv-a Implementing KitNET

In order to perform adversarial machine learning, the original C++ version of Kitsune was reproduced in TensorFlow 

[41]. The TensorFlow model was tested and evaluated similarly as the C++ implementation with an average deviation on the outputs of from the original model. We then utilized the Cleverhans [42], an adversarial machine learning library that are produced and maintained by key contributors in the domain, to mount different adversarial example generation algorithms on the Kitsune. We also used the same Mirai dataset as in [19].

Iv-B Modifications to the Model

Our implementation of KitNET moves the classification mechanism into the model by adding a final layer at the output, as expressed in Equation 4.


This allows the deep learning model to produce the classification result based on a threshold, . Effectively, this alteration moves the original classification scheme into KitNET itself when , transforming the model from a regression model into a classifier.

As adversarial examples target on deep learning models, we isolate KitNET from Kitsune when performing our attacks. In a real-world attack on Kitsune, the adversary must circumvent, or surmount, the Feature Extractor in order to induce perturbations on KitNET’s input. However, with an understanding of the Feature Extractor, it is feasible for the adversary to craft network traffics to generate required features. Thus, in our experiments, we focus on evaluating the security of KitNET from the normalized feature space.

V Evaluation from the Network Security Perspective

To understand the defensive capability of a DL-NIDS, it must be evaluated from both the network security and adversarial machine learning aspects. In the domain of intrusion detection, the ability to distinguish malicious network traffics from benign traffics is the main performance metric. In this section, we evaluate the classification accuracy of the Kitsune.

Kitsune’s developers evaluate the DL-NIDS against a series of attacks in a variety of networks [19]. In our implementation, the accuracy of Kitsune is highly dependent on the threshold, . This value defines the decision boundary, which makes it a critical parameter when deploying the model. We evaluate the KitNET by assuming that the threshold is not predefined, but trained as in an end-to-end deep learning system. In addition, this analysis also indicates how the threshold correlates with the perturbation required in adversarial machine learning.

To assess the performance of a given threshold value, we consider the following two metrics:

  1. False Positives: The percentage of benign inputs that are incorrectly classified as malicious.

  2. False Negatives: The percentage of malicious data that are incorrectly classified as benign.

On the one hand, the rate of false positives accounts for the reliability of a network. On the other hand, the rate of false negatives is closely associated with the effectiveness of the intrusion detection system. Therefore, both rates should be minimized in an ideal situation. However, in the setting of Kitsune, the value of the threshold indeed acts as a trade-off between false positive rate and false negative rate.

We investigated the full functional range of possible thresholds in this analysis, i.e., from the minimum score of to a score of which leads to false negatives on the given dataset. Figure 3(a) plots the two metrics as well as the accuracy of the DL-NIDS.

Fig. 3: The percentage of misclassified benign and malicious inputs for chosen threshold values (a). A receiver operating characteristic (ROC) curve for Kitsune (b).

It can be seem that the rates of false positives and false negatives remain almost unchanged in the middle range. Furthermore, it can also be observed that if we want to minimize one of the rates, the other rate will increase significantly. Finally, the accuracy is also largely unchanged for threshold values below 7, which this can be partially contributed to the imbalance of the dataset (i.e., most of the data belong to the benign class). Therefore, a threshold between 0.05 and 1 would be appropriate for this scheme. The effectiveness of Kisune at separating the Mirai dataset is further demonstrated by the ROC curve in Figure 3(b).

Vi Evaluation against Adversarial Machine Learning

This section continues the evaluation of Kitsune through an empirical analysis of its robustness against adversarial examples.

Vi-a Adversarial Example Generation Methods

Intelligent and adaptive adversaries will exploit the vulnerability of the machine learning models against novel DL-NIDS by using techniques such as adversarial examples and poisoning attacks. There are mainly two attacking objectives in adversarial machine learning, namely, integrity and availability violations [43]. In this setting, integrity violations attempt to generate malicious traffic which evades detection (produce a false negative), while availability violations attempt to make benign traffic appear malicious (produce a false positive) [44]. However, adversarial examples demand to achieve the misclassification with perturbations as small as possible.

Another concern in performing these attacks is that the network data are fundamentally distinct from images, which are usually used in conventional adversarial machine learning. An adversarial example in the image domain is an image that is perceived to be the same by human observers but differently by the model. The norm between the two images exemplifies that observable distance and hence can be used as the distance metric. In network security, however, this definition fails as observing network traffic at the bit-level is not generally practical. Therefore, the semantic understanding of these attacks in this setting is remarkably different.

One potential definition for adversarial examples in this scenario, which is facilitated by the architecture of Kitsune, is to use the extracted features generated by the model as an indication of the observable difference. Thus, we adopt the distance on the feature space between the original input and the perturbed input as the distance metric. In particular, the norm correlates to altering a small number of extracted features, which might be a better metric than other norms.

As many methods of generating adversarial examples have been developed with each thrives in different settings, we attempt to generate a broad comparison on the effect of adversarial examples with different distance metrics in the network security domain. We evaluate the robustness of the KitNET against the following algorithms:

  • Fast Gradient Sign Method (FGSM): This method optimizes over the norm (i.e., reduces the maximum perturbation on any input feature) by taking a single step to each element of in the direction opposite the gradient  [12].

  • Jacobian Base Saliency Map (JSMA): This attack minimizes the norm by iteratively calculating a saliency map and then perturbing the feature that will have the highest effect [13].

  • Carlini and Wagner (C&W): Carlini and Wagner’s adversarial framework, as discussed earlier, can either minimize the , or distance metric [11]. In our experiments, we utilize the norm to reduce the Euclidean distance between the vectors through an iterative method.

  • Elastic Net Method (ENM): Elastic net attacks are novel algorithms that limit the total absolute perturbation across the input space, i.e., the norm. ENM produces the adversarial examples by expanding an iterative attack with an regularizer [14].

Vi-B Experimental Results

We conduct our experiments on both integrity and availability violations. Integrity violation attacks are performed on the benign inputs with a threshold of . The experimental results are presented in Table I. For comparison between different algorithms, the common distance metrics are all presented. Each attack was conducted on the same random benign samples from the dataset.

Algorithm Success (%)
TABLE I: Integrity Attacks on KitNET

Availability attacks are also performed using the same threshold of . input vectors that yield closest output scores to the threshold were selected. The results are summarized in Table II. Note that as the normalizers were only trained on benign inputs, many malicious inputs would be normalized outside the typical range between and .

Algorithm Success (%)
TABLE II: Availability Attacks on KitNET

Vi-C Analysis and Discussion

By comparing Table I and Table II, it can be seen that the integrity attacks in general perform much better than the availability attacks. For instance, adversarial examples are rarely generated in the FGSM and JSMA availability attacks. Additionally, the perturbations produced by the availability attacks are all larger than their integrity counterparts. A potential cause for the difficulty is the disjoint nature between the benign and malicious input data, as exhibited by the clipping of the normalized inputs, in conjunction with a boundary decision (i.e., the threshold ) that is much closer to the benign input data.

Among these four methods, the earlier algorithms, i.e., the FGSM and JSMA, perform worse than the C&W and ENM attacks. As we mentioned above, especially in the availability attacks, the success rates of these attacks are significantly low. This result is actually expected, since the more advanced iterative C&W and ENM algorithms are capable of searching a larger adversarial space than the FGSM and JSMA.

A final observation is that ENM is very effective in these attacks. Despite the fact that this attack is optimized with respect to the norm, its generated adversarial examples yield very small values for all other norms as well. Specifically, the perturbations produced were even better than those produced by JSMA. As stated above, the norm seems to be the most appropriate norm among these four norms in the setting of network security, as it signifies altering a minimized number of extracted features from the network traffic. Thus, ENM can be feasibly implemented against the Kitsune to generate adversarial examples to fool the detection system, while only requiring very small perturbations.

We note that the above attacks were produced with an adaptive step size random search of each methods parameters. In practice adversaries may use such a naive approach to determine effect attack algorithms. Then, utilize more robust optimization algorithms, such as Bayesian or gradient decent optimization, with the indicated attack algorithms to produce superior result.

Vi-D Optimizing ENM

Since ENM has been demonstrated to be very successful in our experiments, we next focus on optimizing the ENM attack on Kitsune in our setting. The Cleverhans implementation uses a simple gradient descent optimizer to minimize the function:


where is the logit output of the target classifier, Y is the target logit output (i.e., the output which produces the desired violation), and is the original network input. It can be seen that there are two regularization parameters, and . These parameters determine the contribution of the different metrics to the attack algorithm. For example, a very large effectively increases the attacks ability to converge to a successful attack. The large contribution of the constraint terms also potentially overshadows the distance metrics effectively diminishing the attacks ability to minimize the perturbation. The focus of this optimization is to determine optimal regularization terms to produce effective attacks on KitNET.

The ENM algorithm has several other hyper-parameters, including the learning rate, the maximum number of gradient descent steps, and the targeted confidence level. As these parameters are standard in adversarial example attacks, these parameters are set to the constant values of , , and , respectively. An optimization scheme included in the ENM algorithm aids in producing optimal result by altering . It does this by decreasing the parameter -times, only retaining the successful attack which produces the lowest perturbation. To insure this functionality isn’t contributing to the optimization, this feature is disabled by setting . Therefore, the results of the optimization could be further improved by enabling this functionality.

The parameter, , determines the contribution of the adversarial misclassification objective at the cost of diminishing the two normalization terms. Thus, it can be logically determined that the optimal value of is that value which achieves the demanded success rate while remaining as small as possible. We evaluate a wide range of values for , as shown in Figure 4. We find to be optimal, which achieves a success rate with a relatively small perturbation. It can also be observed from Figure 4 that the resulted distance does not directly correlated to the selection of . We also tried to increase the value of into the thousands; interestingly, the distances still only changed very slightly.

Fig. 4: The success rate and distance with respect to changes in the regularization parameter, .

On the other hand, the choice of significantly affects the distances. We now optimize the produced perturbation through varying the parameter for . The results are summarized in Table III. It can be seen that the success rate will drop as the increase of , after the second term of Equation 5

begins to overpower the loss function associated with


Success (%)
- - - -
TABLE III: The perturbations produced with respect to .

Summary: It can be concluded that adversarial machine learning can be a realistic threat against DL-NIDS. Therefore, when moving intrusion detection towards the deep learning realm, it is critical to evaluate the security of a DL-NIDS against both adversarial attacks in the conventional network and the machine learning domains.

Vii Conclusions and Future Directions

This paper has demonstrated the vulnerability of DL-NIDS to well-crafted attacks from the domain of adversarial machine learning. This vulnerability is present in deep learning based systems even when the model achieves a high degrees of accuracy for classifying between benign and malicious network traffics. Therefore, researchers must take steps to verify the security of deep learning models in security-critical applications to ensure they do not impose additional risks; otherwise, it will defeat the purpose of using deep learning techniques to protect networks.

The existence of the Feature Extractor and the Packet Parser signifies that the Kitsune is at least partially utilizing domain knowledge of network traffic to generate its classification. To get the most benefit from deep learning models, their applications strive to be as data-driven as possible (i.e., require little to no human knowledge to generate a function mapping). Thus, despite the current success of Kitsune and other DL-NIDS, as the field continues to develop, DL-NIDS will attempt directly converting network traffic to a classification utilizing end-to-end deep learning models. Furthermore, the human knowledge currently being used by modern DL-NIDS implies that to increase the probability of a successful attack an adversary should have an understanding of this knowledge. Thus, as DL-NIDS continue to develop evaluating the model against adversarial machine learning techniques becomes even more critical as attacks will no longer have a need for this additional knowledge when targeting the system.

In this work it is assumed that the adversary has direct knowledge of the target DL-NIDS allowing them to directly generate inputs for the deep learning model. A potential drawback of this assumption is that the perturbation require to generate the adversarial examples does not directly correlate to the alteration on the network. Additionally, it doesn’t account for the effect that that change on the network traffic would have on the host device. Future works will attempt to address this by bridging this gap between the adversarial input to the deep learning model and the network traffic.