Deep Adversarial Learning in Intrusion Detection: A Data Augmentation Enhanced Framework

01/23/2019 ∙ by He Zhang, et al. ∙ University of Exeter china university of petroleum 0

Intrusion detection systems (IDSs) play an important role in identifying malicious attacks and threats in networking systems. As fundamental tools of IDSs, learning based classification methods have been widely employed. When it comes to detecting network intrusions in small sample sizes (e.g., emerging intrusions), the limited number and imbalanced proportion of training samples usually cause significant challenges in training supervised and semi-supervised classifiers. In this paper, we propose a general network intrusion detection framework to address the challenges of both data scarcity and data imbalance. The novelty of the proposed framework focuses on incorporating deep adversarial learning with statistical learning and exploiting learning based data augmentation. Given a small set of network intrusion samples, it first derives a Poisson-Gamma joint probabilistic generative model to generate synthesised intrusion data using Monte Carlo methods. Those synthesised data are then augmented by deep generative neural networks through adversarial learning. Finally, it adopts the augmented intrusion data to train supervised models for detecting network intrusions. Comprehensive experimental validations on KDD Cup 99 dataset show that the proposed framework outperforms the existing learning based IDSs in terms of improved accuracy, precision, recall, and F1-score.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Network intrusion detection (NID) has been one of the key components towards network security in the past decades [1], [2], [3]. Intrusion detection systems (IDSs) play a significant role in NID for deterring cyber attacks and threats coming from a vast number of network-connected devices (e.g., computing systems [4] [5], sensor networks [6] [7] and software defined networks [8]). With the increasing usage of networking systems and frequent emergence of varied intrusion attacks, new generation of high performance IDSs needs to be developed for accurately detecting not only a series of high-frequency network intrusions, such as neptune and smurf attacks, but also intrusions (e.g., mailbomb and snmpguess attacks) that have limited known records.

IDSs can be generally categorized into two categories. The first group focus on patterns/signatures of network packets/traffic. Those systems identify network intrusions using rule-based matching [9], [10]

. The second group apply machine learning (ML) based approaches such as supervised and/or semi-supervised learning and train NID models on a collection of labeled and/or unlabeled network data. Those learning based IDSs are reported to have high detection rate and fast processing speed

[3]. They are gaining increased significance alongside with the surge of network traffic and heterogenous types of network attacks. The performance of learning based IDSs [11] [12]

has significant dependence on network data features and the choice of relevant classifiers (e.g., support vector machine (SVM), and logistic regression (LR)) which are obtained in a data-driven manner. In this case, the defects, such as feature redundance and data imbalance, of the training set become the bottleneck of developing accurate IDSs

[3] [13].

In the past decade, deep learning (DL) models, e.g., deep neural networks (DNNs), have revolutionized classical ML on supervised classification tasks

[14] [15]. In the community of network security, DL based IDSs [3] [8] show advanced NID performance because hierarchically structured DNNs extract representative features of network data. Unfortunately, the weights of DNNs need to be optimized on a large dataset. Therefore, DL based IDSs cannot accurately detect network intrusions if the related training data are insufficient and imbalanced [3].

The challenges encountered in learning based IDSs can be formulated as data scarcity and data imbalance. One highly potential solution of those problems is to increase the number of related data samples in the training set. However, labeling large datasets is expensive, time-consuming, and sometimes impossible due to emerging and fast evolving intrusion attacks. In addition, adequate records of different types of intrusions might be unavailable, which makes those problems particularly severe. Thus, developing a data augmentation (DA) enhanced NID framework is crucial for network security.

To tackle the aforementioned challenges, this paper presents a novel NID framework that is able to utilize supervised classification models for identifying normal network requests and high-frequency cyber attacks. More importantly, we propose a DA module that incorporates deep adversarial learning and statistical learning techniques, which allows the NID framework to detect network intrusions in small sample scenarios. Experimental results on KDD Cup 99 dataset [16] show that it can effectively identify intrusions, in particular, the emerging ones when limited training data are provided.

The main contributions of this paper are summarized as follows:

  • We propose a general learning based NID framework focusing on detecting network intrusion in small samples. By exploiting DA and advanced classification methods, it is capable of identifying a variety of network intrusions, especially emerging ones.

  • We develop a novel DA module that addresses the scarcity and the imbalance of training set via a data-driven manner. It involves a probabilistic generative model for estimating network data feature distributions and generating synthesised data using Monte Carlo methods. Furthermore, we pre-train and fine-tune deep generative neural networks in adversarial learning scheme for augmenting synthesised intrusion data with high quality.

  • Extensive experiments on classifying small sample intrusions and normal network requests are performed. Compared with the existing learning based IDSs, the DA enhanced NID framework achieves the better or comparable accuracy, precision, recall and F1-score.

The remainder of this paper is organized as follows: Section II introduces related works and motivations. Section III presents the proposed NID framework and learning based DA module. Section IV reports the evaluation, analysis, and comparison of experimental results. Section V provides further discussions of our work. Section VI concludes this paper.

Ii Related Work

In this section, we first introduce recent progresses in learning based IDSs. We then present the preliminaries of DA methods and the motivations of this paper.

Ii-a Learning based Network Intrusion Detection

Existing learning based IDSs utilize ML and DL models for distinguishing different types of network data. In the category of recent ML based IDSs, Zhang et al. [11] combined unsupervised clustering and supervised learning for robust network traffic classification. Ashfaq et al. [17]

proposed semi-supervised fuzzy method for intrusion detection. Those methods establish NID models by learning knowledge from a large amount of unlabeled network data. Howbeit, the performance of their methods in detecting intrusions with small sample sizes remains unknown if abundant data are unavailable. Apart from those methods, supervised classification models such as LR, SVM, and random forest (RF) have been extensively applied for improving modern IDSs

[13] [18] [12]. To deal with the feature redundancy problem in training supervised classifiers, Ambusaidi et al. [13] introduced mutual information based algorithm to select network features. Zhou et al. [18] adopted word embedding for extracting meaningful features of network data. In order to further alleviate the over-fitting problem, RF [12] and tree algorithms [19]

have been employed to ensemble sub-classification models for robust NID. However, the data scarcity and data imbalance remain unsolved because the feature selection/extraction and classifier aggregation do not increase the number of intrusion samples among different categories in the training set.

In DL based IDSs, Tang et al. [8] applied three-layer DNN for extracting multilevel features and classifying flow-based network intrusions. Shone et al. [3]

proposed a more complex DNN called nonsymmetric deep auto-encoder (NDAE). They stacked two NDAEs for unsupervised feature learning of network data, followed by a RF for classification. Those DNN based IDSs achieve promising NID performance in terms of different evaluation metrics (e.g., accuracy, precision and recall). Considering the requirement of large quantities of training data for optimizing the layer weights of DNNs, the weakness of those IDSs in detecting low-frequency intrusions (e.g., the

guesspasswd and the bufferoverflow attack reported in [3]) is non-negligible. Thus, enriching small sample intrusions in the training set is essential in developing learning based IDSs.

Ii-B Data Augmentation Methods

Ii-B1 Probabilistic Generative Models

In Bayesian statistical learning, probabilistic generative models are extensively used to approximate unknown distributions of target data

[20]

. Therein Markov chain Monte Carlo (MCMC) methods are used to sample their parameters from observed data.

The Metropolis-Hastings (MH) algorithm [21]

is the foundation of MCMC, which generates candidate data from a proposal distribution based on an acceptance probability. An improved version to MH algorithm is Gibbs sampling

[22] [23] which uses a full conditional distribution for designing the proposal distribution. Closely related to Gibbs sampling, expectation maximisation (EM) algorithm [24] [20] alternates between an expectation step and a maximization step for estimating distribution parameters. Nonetheless, those algorithms cannot be directly applied to generate intrusions due to the difficulty of modeling complex features (e.g., real/Boolean values) with large divergence.

Ii-B2 Generative Adversarial Networks

Generative adversarial networks (GANs) [25]

have been increasingly employed in generating realistic objects in computer vision

[26] [27]. GANs contain two differently structured DNNs: the discriminative net and the generative net denoted as and , respectively. During the stage of training, attempts to produce forged data based on input priority. takes in both forged data and real ones, and learns to distinguish the counterfeit from the truth. This finally results in a powerful data generator .

Note that both and are constructed of DNNs, therefore, training GANs on a small collection of intrusion samples is a challenge. Even if sufficient network intrusions are available, GANs are reported to have remarkable training difficulties [28] [29]. For instance, gets worse while gets better due to the significant difference of convergence speed, which might lead to poor outputs and a deficient generator.

Fig. 1: The proposed NID framework for detecting network intrusions in small sample sizes.

Ii-C Motivations

Despite promising progresses in recent research on learning based IDSs, challenges such as data scarcity and data imbalance in training (semi-)supervised classifiers are still two fundamental issues to the community. Those challenges become vital with the increased emerging attacks that often have limited known data samples. Data augmentation methods provide a potential approach to tackle these problems, which could enrich network data in the training set if designed properly. Findings obtained from the existing literatures show that directly applying DA algorithms might bring undesired effects [20] [29]. Hence, a novel learning based DA module associated with the general NID framework presented in this paper will contribute to the design of advanced high performance IDSs.

Fig. 2: The structure of DA module for augmenting network intrusions.

Iii Network Intrusion Detection with Data Augmentation

In this section, we first introduce the pipeline of the proposed NID framework and then present the formulation of the learning based DA module for addressing data scarcity and imbalance problems. Finally, we provide the detailed optimization of DA module through adversarial learning.

Iii-a The NID Framework

Figure 1 depicts the proposed NID framework that involves the training phase for training supervised classification models assisted with learning based DA module. Those classifiers are then adopted in the testing phase for detecting network intrusions from normal network requests.

In the training phase

, continuous features of network data are normalized and the related statistical variables (i.e., mean and standard deviation) are recorded for processing testing data. We make discrete symbolic features remain unchanged. Other pre-processing techniques (e.g., data filtering and feature selection) can be alternatively applied. Normalized data are then partitioned, in which a proportion of normal network data are randomly selected and the imbalanced network intrusion samples are augmented by DA module. Finally, those data are aggregated as a balanced dataset for training NID models using supervised learning methods.

In the testing phase, input network data are pre-processed as training data and fed into NID models for classification. Because those models are trained on balanced dataset using advanced learning methods, the proposed framework is competent to accurately identify network intrusions, especially emerging ones.

Iii-B The Data Augmentation Module

Figure 2 shows the schematic diagram of the proposed DA module. Given a small number of network intrusion data samples, the Poisson-Gamma joint probabilistic model (PGM) is first derived for efficiently generating synthesised intrusion data. Deep generative neural networks (DGNNs) then take both real and synthesised data to train layer weights through adversarial learning. Finally, DGNNs output network intrusion data with augmented quality.

In DA module, PGM plays an essential role in initializing DGNNs to fit in with the general distribution of related intrusions. This alleviates the convergence problem of training DGNNs on limited network data. In turn, DGNNs compensate for the boundedness of PGM on simulating network features that exhibit large divergence.

Iii-C The Poisson-Gamma Joint Probabilistic Model

In this subsection, we present the Poisson-Gamma joint probabilistic generative model for modeling feature distributions of network data. At the same time, we theoretically analyze its feasibility for simulating complex network features. Finally, we exploit MCMC based Gibbs sampler for efficiently generating synthesised intrusion data.

Let be the feature vector of one network intrusion sample, where denotes the dimension of feature space. Assuming that intrusions of the same category come from the joint probabilistic distribution defined as follows:

(1)

where , and denotes the distribution parameters of -th feature dimension of . Note that contains components, which means each feature of is jointly modeled by one distribution.

In the existing NID benchmarks [16] [30], continuous

digital features of network data usually represent the volume of requests or the time of connections. Consequently, Poisson distribution

[31] is employed to approximate the accumulation of those network events:

(2)

where indicates the average intensity, i.e., the statistical average, of . Since Poisson distribution contains one parameter, we have and .

To render an effective Markov chain for the subsequent data simulation,

is approximated by Gamma distribution:

(3)

where are shape and scale parameters. Given a collection of intrusion samples , we have and , where and

denote the expectation and variance of

. Because of the conjugated relationship between Poisson and Gamma distribution [32], the convergence property of PGM is theoretically guaranteed for synthesising network data.

Proposition 1.

The Poisson-Gamma joint probabilistic model can estimate the distributions of both continuous and discrete digital features of network data.

Proof 1.

The Poisson-Gamma joint distribution can approximately estimate

continuous digital features distributions since the accumulation of network events can be statistically formulated by Poisson process.

For discrete digital features (i.e., values equals to 0 or 1), the joint distribution can model those symbolic values by sampling the Poisson parameter to be 0 or 1 from Gamma distribution, respectively. ∎

Providing intrusion samples of the same category, the goal of synthesising intrusions then becomes to estimate given

. According to Bayes’ theorem

[33], this can be achieved by maximizing the compact posterior:

(4)

where the first and the second terms on the right side are the density functions of Poisson and Gamma distributions, respectively.

Proposition 2.

If network intrusion samples are assumed to obey Poisson distribution and their Poisson parameter is formulated by Gamma distribution, then the posterior obeys Gamma distribution.

Proof 2.

In Eq. (4), each term stands for a probability and hence has a nonnegative value. Applying natural logarithm on both sides in Eq. (4), we have:

(5)

where denotes the serial number of known intrusion samples.

Substituting the exponential form of Poisson and Gamma density functions into Eq. (5), we obtain:

(6)

where denotes the element-wise factorial of and a unit vector.

Extracting the terms with respect to in Eq. (6), we derive the representation of the posterior as follows:

(7)

where is a negligible constant [32]. All multiplications above are operated in an element-wise manner. Therefore, the posterior obeys Gamma distribution.

Given a collection of network intrusion samples , we utilize Eq. (7) to approximate the Poisson parameter and further synthesise intrusion data by Poisson using Gibbs sampler. Algorithm 1 shows the pseudo-code of PGM for producing synthesised intrusion data.

1:A set of real intrusion samples , The number of synthesised intrusion data , The cut off threshold of Gibbs sampler .
2:A set of synthesised intrusion data .
3:Initialize ;
4:for  =  do
5:     Sample in Eq. (7);
6:     if  then
7:         Sample in Eq. (2);
8:         ;
9:     end if
10:end for
Algorithm 1 The Poisson-Gamma joint probabilistic model
(a) The pre-training process (forward-propagation).
(b) The fine-tuning process (forward-propagation).
Fig. 3: The schematic depiction of two-fold adversarial mechanism for training DGNNs.

Iii-D The Deep Generative Neural Networks

In this subsection, we present the formulation of DGNNs that augments synthesised intrusion data with enhanced quality. Afterwards, we propose a two-fold mechanism for training DGNNs via adversarial learning strategies.

The proposed DGNNs have two components: the Discriminator (-net) and the Generator (-net). In adversarial training process, -net generates augmented intrusion data by learning their real feature distribution, while -net acts as an indicator trying to reject augmented data from real intrusion samples. By fully exploiting the capability of hierarchical feature learning, -net and -net are implemented by DNNs. The input size of -net and -net is which equals to the dimension of network data features. In the hidden layers, neural nodes are stacked for abstracting latent representation of input data. The output of -net is a scalar which returns the possibility whether its input comes from real samples. In contrast, the output of -net is a -dimensional vector representing the augmented network intrusion.

Provided with target network intrusion samples, the goal of training DGNNs is to obtain a potent -net that recovers their feature distributions and further generates intrusion data with similar characters. As discussed in Section II-B, training DGNNs with limited samples is easy to over-fit because both -net and -net are structured in DNNs. Therefore, we present a two-fold adversarial learning mechanism which adopts both real and synthesised network data for optimizing their weights.

Iii-D1 Pre-training

As shown in Figure 3(a), synthesised intrusion data are treated as sub-optimal targets and -net is trained to distinguish augmented ones (i.e., ) from them. Variables and are the prior of -net for learning the mapping . To adjust to different and unexpected situations,

is sampled from Gaussian distribution

. This assumes -net has no specific prior knowledge on targets. The objective of pre-training is then formed as the following adversarial game:

(8)

where and denote feature distributions of and , respectively. Given sufficient synthesised data, DGNNs are pre-trained to fit in with the general distribution of related network intrusions [29]. In this case, the quality of augmented samples is bound to be mediocre as -net cannot directly access the feature information of real intrusion samples.

Iii-D2 Fine-tuning

During the adversarial training demonstrated in Figure 3(b), -net takes in both real intrusion samples and augmented ones from -net. Meanwhile, -net is fed by synthesised data added with Gaussian variables (i.e., ). In this scenario, DGNNs are fine-tuned according to the following objective:

(9)

where denotes the real intrusion distribution to recover. Note that rather than or is chosen in fine-tuning. This avoids -net learning monotonous distribution from the same data in .

Since DGNNs have been pre-trained on synthesised intrusion data, the Generator competes with the Discriminator on a comparative level in fine-tuning stage. Thus, the two-fold adversarial training mechanism allows DGNNs to learn real intrusion distribution in a progressive manner. This precludes the remarkable convergence difference of DGNNs, and prevents the Generator generating poor outputs.

1:Real intrusion samples , Synthesised intrusion data

, Gaussian random variables

, Mini-batch size , Maximum iteration and .
2:Optimized weights and of DGNNs.
3:for  do
4:     if  then
5:         for  steps do
6:              Select samples from ;
7:              Select samples from ;
8:              Compute gradient as Eq. (10);
9:              Update by Adam.
10:         end for
11:         for  steps do
12:              Select samples from ;
13:              Compute gradient as Eq. (11);
14:              Update by Adam.
15:         end for
16:     else
17:         for  steps do
18:              Select real intrusions from ;
19:              Select samples from ;
20:              Select samples from ;
21:              Compute gradient as Eq. (12);
22:              Update by Adam.
23:         end for
24:         for  steps do
25:              Select samples from ;
26:              Select samples from ;
27:              Compute gradient as Eq. (13);
28:              Update by Adam.
29:         end for
30:     end if
31:end for
Algorithm 2 Pre-train and fine-tune DGNNs in adversarial manner with stochastic gradient optimization for augmenting synthesised network intrusion data

Iii-E Optimization

In this section, we present the optimization details of pre-training and fine-tuning DGNNs.

Let and denote the weights of -net and -net. The aim of training DGNNs becomes to optimize the minmax objectives in Eq. (8) and Eq. (9) with respect to and . Considering that DGNNs are formed in DNNs, they are trained by back-propagation with stochastic gradient algorithm.

In the pre-training stage, the gradients of with respect to and on a batch of intrusion samples are:

(10)
(11)

In the fine-tuning stage, the gradients of with respect to and on a batch of intrusion samples are:

(12)
(13)

After obtaining gradients and , layer weights and

are optimized by stochastic gradient descent.

Algorithm 2 shows the pseudo-code of training DGNNs. During the evolution of adversarial learning, we alternate between steps of optimizing -net and steps of optimizing -net. In each inner loop, the gradient descent method Adam [34] is employed for updating and .

Iv Experiments

In this section, we conduct comprehensive experimental validations of the proposed DA enhanced NID framework on KDD Cup 99 dataset [16]. Its performance on detecting the network intrusions in small samples (e.g., emerging attacks) is compared with learning based IDSs, in which classical LR, SVM, and advanced DNN are employed for comparison.

Iv-a Network Data

KDD Cup 99 dataset [16] is a benchmark dataset widely used in NID study. As outlined in Table I, it is composed of two training sets at different scales and one testing set. Network data are categorized into normal requests (NORMAL) and four major intrusions: denial-of-service (DOS), surveillance and other probing (PROBE), unauthorized access from a remote machine (R2L) and unauthorized access to root user (U2R). Given that 100% training set merely contains additional records of normal requests and high-frequency intrusions, the 10% training set is used in all experiments.

Category Training (100%) Training (10%) Testing
NORMAL 972781 97278 60593
DOS 3883370 391458 229853
PROBE 41102 4107 4166
U2R 52 52 228
R2L 1126 1126 16189
TABLE I: KDD Cup 99 Dataset (No. of samples in each category)
Category Intrusion Type Training (10%) Testing
DOS ’apache2’ 0 794
’mailbomb’ 0 5000
’processtable’ 0 759
PROBE ’mscan’ 0 1053
’saint’ 0 736
R2L ’guesspasswd’ 53 4367
’snmpgetattack’ 0 7741
’snmpguess’ 0 2406
TABLE II: No. of training samples in the subcategories that have severe scarcity and imbalance problems in the KDD Cup 99 Dataset

Each record of KDD Cup 99 dataset consists of 38 digital features and 4 character features. Assuming that the digital features provide adequate information for identification, characters are not used for experimental validation.

Considering the task of NID in small sample sizes, network intrusions are expected to satisfy the prerequisites that they have limited records in the training set, while plentiful records exist in the testing set. In this case, 8 types of intrusions are selected (as shown in Table II): apache2, mailbomb and processtable of DOS attack, mscan and saint of PROBE attack, guesspasswd, snmpgetattack and snmpguess of R2L attack. In experiments, if no training data in one intrusion type is available, related testing intrusion samples will be selected to complement the training set before data augmentation. Those samples are then rejected for testing. Despite training DNNs requires a large amount of labelled data, is set to 50 to meet the above prerequisites.

Iv-B Evaluation Metrics

The metrics that are used to measure NID results of IDSs are listed below:

  • True Positive (TP): Network intrusions (or normal network requests) that are correctly detected.

  • True Negative (TN): Normal network requests (or network intrusions) that are correctly detected.

  • False Positive (FP): Normal requests that are mis-classified as intrusions.

  • False Negative (FN): Intrusions that are mis-classified as normal requests.

Then, the accuracy, precision, recall, and F1-score [3] [8] are computed to evaluate different IDSs, in which larger values represent the better detection performance on network data.

Iv-C Parameters

The detailed parameter setup and tuning strategies are provided as follows:

Iv-C1 Threshold of Gibbs sampler

The cut off threshold is set to be 500 to assure the convergence of PGM. Due to the efficiency of Gibbs sampling, can be a large value while the time consuming is still economic.

Iv-C2 Structure of DGNNs

In

-net, hidden nodes are set to be 70, 50, 40, and 20. The ReLU activation function is used after each non-terminal layer, while the the sigmoid activation function is applied to the last layer for producing the decision probability. In

-net, the layer nodes are set to be 40, 30, and 20. Those three hidden layers are sequentially connected by ReLu and sigmoid functions. The last hidden layer is linearly mapped to the output layer. In this scenario,

-net has less hidden nodes than that of -net, which leads to improved inference capacity of -net and guarantees an effective -net to be trained. In addition, dropout [35] is employed in all hidden layers to regularize training process and decrease over-fitting risks.

Iv-C3 Mini-batch size of DGNNs

is constrained to be less than the minimum number of real and synthesised intrusion data (i.e., ). In our experiments, is set to be 20% 40% of real intrusion samples (i.e., ). This allows training DGNNs to be efficiently and effectively.

Iv-C4 Training iterations of DGNNs

The maximum iterations in pre-training stage and fine-tuning stage are empirically set to be and . Since known network intrusions are limited, less iterations in fine-tuning stage are suggested to avoid over-fitting problem.

Iv-C5 Other hyper-parameters of DGNNs

The learning rates of -net and -net are both empirically set to be for the propose of adjusting weights of DNNs with respect to low ratio of loss gradients. This guarantees DNNs to be robustly trained and avoids missing optimum. Howbeit, it might result in low convergence speed. Thus, momentum [36] is adopted to accelerate training and prevent gradient oscillations. The parameters in momentum are defaultly set to be 0.9 and 0.99 when Adam optimizer is used [34].

Iv-D Binary Classification based NID

Intrusion Type NID Model Name Accuracy Precision Recall F1-Score
’apache2’ NID-LR 98.51 0.42 0.00 0.00 -
NID-DA-LR 99.53 0.05 77.29 1.21 90.70 3.41 83.45 2.14
NID-SVM 98.97 0.02 55.73 0.56 99.32 0.38 71.39 0.49
NID-DA-SVM 99.94 0.01 95.61 0.87 99.70 0.07 97.61 0.44
’mailbomb’ NID-LR 91.93 0.44 0.00 0.00 -
NID-DA-LR 97.84 0.41 78.11 3.37 99.89 0.14 87.63 2.08
NID-SVM 93.04 0.03 80.38 1.58 11.53 0.33 20.16 0.50
NID-DA-SVM 99.34 0.34 92.29 3.65 99.78 0.16 95.86 2.05
’processtable’ NID-LR 99.33 0.05 64.98 1.78 98.74 2.03 78.37 1.72
NID-DA-LR 99.53 0.04 72.46 1.81 100.00 0.00 84.02 1.21
NID-SVM 99.87 0.06 90.79 3.86 99.60 0.56 94.96 2.14
NID-DA-SVM 99.90 0.04 92.60 2.17 99.71 0.15 96.02 1.22
’mscan’ NID-LR 97.78 0.11 42.58 1.35 84.90 1.48 56.71 1.42
NID-DA-LR 99.01 0.06 64.81 1.48 92.29 0.46 76.14 1.14
NID-SVM 99.61 0.12 85.10 4.34 94.11 0.18 89.32 2.73
NID-DA-SVM 99.73 0.06 90.03 4.12 95.12 1.04 92.44 1.65
’saint’ NID-LR 98.22 0.22 40.17 2.89 96.47 1.45 56.65 2.81
NID-DA-LR 98.47 0.04 43.78 0.75 96.93 0.65 60.32 0.83
NID-SVM 98.56 0.03 45.39 0.50 96.41 0.75 61.72 0.46
NID-DA-SVM 98.60 0.01 46.08 0.24 97.47 0.33 62.58 0.27
’guesspasswd’ NID-LR 88.59 0.75 34.21 1.50 75.10 2.87 46.98 1.44
NID-DA-LR 89.07 0.24 35.65 0.59 77.74 0.19 48.89 0.57
NID-SVM 94.59 0.27 89.06 1.93 22.21 4.10 35.42 5.19
NID-DA-SVM 98.95 0.10 90.57 2.17 94.29 0.22 92.38 1.19
’snmpgetattack’ NID-LR 88.67 0.00 0.00 0.00 -
NID-DA-LR 80.42 0.58 36.61 0.69 99.43 0.07 53.51 0.72
NID-SVM 88.65 0.02 0.00 0.00 -
NID-DA-SVM 82.42 0.03 39.13 0.03 99.39 0.21 56.15 0.04
’snmpguess’ NID-LR 98.84 0.15 78.61 2.51 95.93 0.05 86.39 1.55
NID-DA-LR 99.07 0.06 82.66 1.17 95.84 0.00 88.76 0.68
NID-SVM 96.18 0.00 0.00 0.00 -
NID-DA-SVM 81.20 0.04 16.85 0.02 99.72 0.10 28.83 0.03
TABLE III: ML based 2-class NID with LR and SVM(Mean Std-Dev Percent)

In this subsection, we evaluate binary classification performance of the DA enhanced NID framework. In each group of the experiments, it is required to identify positive samples (i.e., one type of network intrusions) from negative samples (i.e., normal network requests).

As illustrated in Fig. 1, a proportion of 6,000 randomly selected normal request data and 50 intrusion samples (augmented to 500 by the DA module) are used for training ML based NID models. In this case, LR and SVM are adopted since they are basic building blocks of many learning based IDSs [13] [3].

The training and testing procedures are repeated 15 times, each with varying selected negative samples and rejecting anomalous results. Afterwards, the statistic average of evaluation metrics are computed for validation. In this scenario, the adverse impact of data imbalance (i.e., training with all negative samples) is further reduced. Besides, it avoids training classifiers with biased data, such that the selected negative data are not representative.

Table III summarizes the binary classification results. It demonstrates that the DA enhanced NID frameworks (named as NID-DA-LR/SVM) achieve improved or comparable recall, and significantly outperforms other IDSs (named as NID-LR/SVM) in terms of precision and F1-score. Those results show that the intrusions augmented by the proposed DA module can be used to train potent classification models for NID tasks.

Note that the proposed frameworks obtain improved accuracy on most intrusion types except snmpgetattack attack and snmpguess attack of R2L. The reasons are two aspects. Firstly, computing accuracy requires to count normal network requests which occupy a large proportion in the testing set (see Table I and II). Thus, accuracy merely increases in small margin when additional intrusions are detected. Secondly, the low precision and recall of NID-LR/SVM on snmpgetattack and snmpguess attacks indicate that considerable intrusions are misclassified. Accordingly, they are not applicable to detect intrusions in the small sample scenario.

Iv-E Multiple Classification based NID

Category NID Model Name Accuracy Precision Recall F1-Score
NORMAL NID-SVM 76.30 0.01 75.91 0.01 98.69 0.02 85.81 0.01
NID-PGM-SVM 76.16 0.08 75.90 0.03 98.41 0.11 85.70 0.06
NID-DA-SVM 82.87 0.17 98.39 0.20 77.68 0.32 86.82 0.16
DOS NID-SVM 94.00 0.01 93.35 1.15 25.34 0.16 39.86 0.10
NID-PGM-SVM 93.84 0.10 86.98 2.63 25.33 0.49 39.23 0.80
NID-DA-SVM 99.48 0.05 94.09 0.58 99.70 0.06 96.81 0.29
PROBE NID-SVM 98.82 0.02 68.54 0.43 83.27 0.53 75.19 0.37
NID-PGM-SVM 98.80 0.02 67.39 0.31 85.89 0.38 75.53 0.30
NID-DA-SVM 99.32 0.04 88.70 0.88 78.13 1.85 83.07 1.00
R2L NID-SVM 83.43 0.01 97.91 0.64 4.83 0.02 9.20 0.04
NID-PGM-SVM 83.34 0.05 94.15 5.15 4.51 0.02 8.60 0.04
NID-DA-SVM 83.74 0.19 51.75 0.31 96.57 0.66 67.38 0.23
TABLE IV: ML based 4-class NID with SVM(Mean Std-Dev Percent)
Category NID Model Name Accuracy Precision Recall F1-Score
NORMAL NID-DNN 77.15 3.22 76.5 2.81 99.15 0.51 86.33 1.64
NID-PGM-DNN 83.17 1.40 81.58 1.31 99.26 0.37 89.55 0.76
NID-DA-DNN 87.96 0.12 86.95 0.28 98.15 0.61 92.21 0.11
DOS NID-DNN 93.81 2.94 89.70 6.58 23.55 9.17 27.72 7.37
NID-PGM-DNN 99.04 0.22 97.84 3.73 89.97 0.98 93.7 1.34
NID-DA-DNN 99.59 0.08 97.14 1.36 97.65 0.40 97.39 0.52
PROBE NID-DNN 98.92 0.31 74.54 1.93 81.49 8.26 76.79 3.29
NID-PGM-DNN 98.40 0.07 64.93 1.90 55.84 7.72 59.78 3.86
NID-DA-DNN 99.05 0.07 75.27 1.24 83.47 4.52 79.11 2.17
R2L NID-DNN 83.90 2.72 58.99 3.87 7.55 5.81 11.28 3.13
NID-PGM-DNN 84.90 1.26 88.45 5.67 13.92 7.48 23.60 2.57
NID-DA-DNN 89.09 0.10 92.17 4.86 40.97 1.82 56.64 0.92
TABLE V: DL based 4-class NID with DNN(Mean Std-Dev Percent)

In order to verify the performance of the DA module on enhancing the existing learning based IDSs, we have undertaken multi-class based NID experiments. In this case, the network data are categorized as: NORMAL, DOS, PROBE, and R2L, which covers the aforementioned 8 intrusion types.

For learning based IDSs, SVM and one typical DNN architecture presented in [8] [37] are chosen as supervised NID models. In the training process of DNN, the batch size is initialized as 32 and the learning rate is adaptively decided. Furthermore, dropout [35] and cross-validation are employed to avoid over-fitting problems.

The evaluation stage is repeated 15 times. At each time, 6,000 and 12,000 normal requests are randomly selected for training SVM and DNN, respectively. In addition, 50 fixed samples of each type of intrusions are separately augmented to 500 by PGM and PGM-DGNNs associated DA module to 500. Overall, the augmented training set contains 4,000 intrusion samples. At the completion of training, the deviant results (e.g., extremely low F1-score) are rejected and the average metrics are computed for the follow-up analysis.

Table IV and Table V present the comparison results of multi-class NID using SVM and DNN, respectively. It can be observed that the PGM-DGNNs enhanced NID frameworks (named as NID-DA-SVM/DNN) outperforms other two NID models with regard to both accuracy and F1-score.

For DNN based NID, the precision and recall obtained by the proposed frameworks are better than, or comparable to those obtained by other NID models. For SVM based NID, it achieves the highest precision and recall on DOS with notable exceptions on other three categories. The reason is that network intrusion data generated by -net are strictly affected by DNN implemented -net. As a result, augmented intrusion features are more sensitive for DNN based NID models.

Despite that PGM enhanced IDSs have achieved improvement on some evaluation metrics, those IDSs show undesired performance (e.g., low F1-score on PROBE when DNN is used) in identifying certain network intrusions. In contrast, PGM-DGNNs aided NID frameworks are able to accurately detect the intrusions with small and imbalanced samples.

V Discussion

The DA enhanced NID framework proposed in this paper is trained in a data-driven manner. Its performance is closely related to the quality of augmented network intrusions with regard to the quantity and quality of known samples. Therefore, the selected network intrusion samples are recommended to be able to represent the underlying characteristics of relevant intrusion types.

In the proposed NID framework, we have presented the insight of using deep adversarial learning for augmenting limited network intrusion samples. The structure of -net and -net are generally formulated as fully connected DNNs, in which advanced network architectures [28] might be employed to design DGNNs.

The computational complexity of the proposed NID framework includes the following two phases. In the testing phase, classifying over 80,000 network data takes a few seconds even if DNN is applied. In the training phase, time consumption can be analyzed as follows. First, PGM simulates synthesised intrusion data within minutes by using MCMC based Gibbs sampling. Second, DGNNs is efficiently optimized on high performance computing devices. In our experiments, training the DGNNs on NVIDIA GTX 1080 GPU takes less than two hours. Third, training supervised classifiers (i.e., SVM and DNN) requires less than half an hour. In spite of the training time, it demonstrates high efficiency in identifying network intrusions, which is necessary in network security.

The proposed DA module can be alternatively applied to assist NID algorithms implemented on distributed platforms. Specifically, the imbalanced training set is firstly augmented with the proposed DA module in a data center. The augmented dataset can then be partitioned into several data blocks, in which normal network request data and network intrusion samples have comparable proportions. Finally, those balanced data blocks are delivered to different computing nodes to distributively train NID models.

Vi Conclusion

In this paper, a general NID framework and two learning based data augmentation components have been jointly proposed to tackle the data scarcity and data imbalance problem in designing learning based IDSs. In this framework, statistic learning based PGM and deep learning based DGNNs of DA module are developed for enlarging limited intrusion samples in the training set. By employing classical ML models (e.g., LR and SVM) as well as advanced DNNs, it can accurately classify normal network request and heterogeneous network intrusions. Extensive experimental validations have been conducted on KDD Cup 99 dataset. Both binary and multiple classification results have shown that the DA enhanced IDSs outperform others regarding F1-score (which is a crucial criterion of evaluating imbalanced classification tasks). Additionally, it achieves improved or comparable accuracy, precision and recall, especially when DNN is adopted for classifying network data.

References

  • [1] S. X. Wu and W. Banzhaf, “The use of computational intelligence in intrusion detection systems: A review,” Appl. Soft. Comput., vol. 10, no. 1, pp. 1–35, 2010.
  • [2] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep learning approach for network intrusion detection system,” in Proc. EAI Int. Conf. Bio-inspired Inf. Commun. Technol., 2016, pp. 21–26.
  • [3] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning approach to network intrusion detection,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 2, no. 1, pp. 41–50, 2018.
  • [4] C. Huang, G. Min, Y. Wu, Y. Ying, K. Pei, and Z. Xiang,

    “Time series anomaly detection for trustworthy services in cloud computing systems,”

    IEEE Trans. Big Data, pp. 1–13, 2017.
  • [5] I. Nevat, D. M. Divakaran, S. G. Nagarajan, P. Zhang, L. Su, L. Ling Ko, and V. L. Thing, “Anomaly detection and attribution in networks with temporally correlated traffic,” IEEE/ACM Trans. Netw., vol. 26, no. 1, pp. 131–144, 2018.
  • [6] G. Y. Keung, B. Li, and Q. Zhang, “The intrusion detection in mobile sensor network,” IEEE/ACM Trans. Netw., vol. 20, no. 4, pp. 1152–1161, 2012.
  • [7] H. Moosavi and F. M. Bui, “A game-theoretic framework for robust optimal intrusion detection in wireless sensor networks,” IEEE Trans. Inf. Forensic Secur., vol. 9, no. 9, pp. 1367–1379, 2014.
  • [8] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho, “Deep learning approach for network intrusion detection in software defined networking,” in Proc. Int. Conf. Wirel. Netw. Mob. Commun., 2016, pp. 258–263.
  • [9] Y. Xu, Z. Liu, Z. Zhang, and H. J. Chao, “High-throughput and memory-efficient multimatch packet classification based on distributed and pipelined hash tables,” IEEE/ACM Trans. Netw., vol. 22, no. 3, pp. 982–995, 2014.
  • [10] A. X. Liu and E. Torng, “Overlay automata and algorithms for fast and scalable regular expression matching,” IEEE/ACM Trans. Netw., vol. 24, no. 4, pp. Liu, Alex X and Torng, Eric, 2016.
  • [11] J. Zhang, X. Chen, Y. Xiang, W. Zhou, and J. Wu, “Robust network traffic classification,” IEEE-ACM Trans. Netw., vol. 23, no. 4, pp. 1257–1270, 2015.
  • [12] N. Farnaaz and M. A. Jabbar, “Random forest modeling for network intrusion detection system,” Procedia Comput. Sci., vol. 89, pp. 213–217, 2016.
  • [13] M. A. Ambusaidi, X. He, P. Nanda, and Z. Tan, “Building an intrusion detection system using a filter-based feature selection algorithm,” IEEE Trans. Comput., vol. 65, no. 10, pp. 2986–2998, 2016.
  • [14] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Int. Conf. Learn. Represent., 2015.
  • [15] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in

    Proc. IEEE Conf. Comput. Vision Pattern Recognit.

    , 2016, pp. 770–778.
  • [16] “KDD Cup 99 Dataset,” http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, 1999.
  • [17] R. A. R. Ashfaq, X. Wang, J. Z. Huang, H. Abbas, and Y. He, “Fuzziness based semi-supervised learning approach for intrusion detection system,” Inf. Sci., vol. 378, pp. 484–497, 2017.
  • [18] X. Zhuo, J. Zhang, and S. W. Son, “Network intrusion detection using word embeddings,” in Proc. IEEE Int. Conf. Big Data, 2017, pp. 4686–4695.
  • [19] J. Kevric, S. Jukic, and A. Subasi, “An effective combining classifier approach using tree algorithms for network intrusion detection,” Neural Comput. Appl., vol. 28, no. 1, pp. 1051–1058, 2017.
  • [20] C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan, “An introduction to MCMC for machine learning,” Mach. Learn., vol. 50, no. 1-2, pp. 5–43, 2003.
  • [21] N. Metropolis and S. Ulam, “The Monte Carlo method,” J. Am. Stat. Assoc., vol. 44, no. 247, pp. 335–341, 1949.
  • [22] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 6, no. 5-6, pp. 721–741, 1984.
  • [23] K. P. Murphy, Machine learning: A probabilistic perspective, The MIT Press., 2012.
  • [24] G. C. Wei and M. A. Tanner, “A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms,” J. Am. Stat. Assoc., vol. 85, no. 411, pp. 699–704, 1990.
  • [25] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
  • [26] H. Zhang, C. Luo, X. Yu, and P. Ren, “Mcmc based generative adversarial networks for handwritten numeral augmentation,” in Proc. Int. Conf. Commun. Signal Process. Syst., 2017, pp. 2702–2710.
  • [27] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum, “Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 82–90.
  • [28] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.
  • [29] M. Arjovsky and L. Bottou, “Towards principled methods for training generative adversarial networks,” in Int. Conf. Learn. Represent., 2017.
  • [30] “NSL-KDD Dataset,” http://iscx.ca/NSL-KDD/, 2009.
  • [31] M. Lefebvre, Applied stochastic processes, Springer Science and Business Media, 2007.
  • [32] W. R. Gilks, S. Richardson, and D. Spiegelhalter, Markov chain Monte Carlo in practice, CRC Press., 1995.
  • [33] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, John Wiley and Sons, 2012.
  • [34] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Int. Conf. Learn. Represent., 2015.
  • [35] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
  • [36] I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, “On the importance of initialization and momentum in deep learning,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 1139–1147.
  • [37] R Vinayakumar, KP Soman, and Prabaharan Poornachandran,

    “Applying convolutional neural network for network intrusion detection,”

    in Proc. Int. Conf. Adv. in Comput. Commun. Inform., 2017, pp. 1222–1228.