I Introduction
Network intrusion detection (NID) has been one of the key components towards network security in the past decades [1], [2], [3]. Intrusion detection systems (IDSs) play a significant role in NID for deterring cyber attacks and threats coming from a vast number of networkconnected devices (e.g., computing systems [4] [5], sensor networks [6] [7] and software defined networks [8]). With the increasing usage of networking systems and frequent emergence of varied intrusion attacks, new generation of high performance IDSs needs to be developed for accurately detecting not only a series of highfrequency network intrusions, such as neptune
and smurf
attacks, but also intrusions (e.g., mailbomb
and snmpguess
attacks) that have limited known records.
IDSs can be generally categorized into two categories. The first group focus on patterns/signatures of network packets/traffic. Those systems identify network intrusions using rulebased matching [9], [10]
. The second group apply machine learning (ML) based approaches such as supervised and/or semisupervised learning and train NID models on a collection of labeled and/or unlabeled network data. Those learning based IDSs are reported to have high detection rate and fast processing speed
[3]. They are gaining increased significance alongside with the surge of network traffic and heterogenous types of network attacks. The performance of learning based IDSs [11] [12]has significant dependence on network data features and the choice of relevant classifiers (e.g., support vector machine (SVM), and logistic regression (LR)) which are obtained in a datadriven manner. In this case, the defects, such as feature redundance and data imbalance, of the training set become the bottleneck of developing accurate IDSs
[3] [13].In the past decade, deep learning (DL) models, e.g., deep neural networks (DNNs), have revolutionized classical ML on supervised classification tasks
[14] [15]. In the community of network security, DL based IDSs [3] [8] show advanced NID performance because hierarchically structured DNNs extract representative features of network data. Unfortunately, the weights of DNNs need to be optimized on a large dataset. Therefore, DL based IDSs cannot accurately detect network intrusions if the related training data are insufficient and imbalanced [3].The challenges encountered in learning based IDSs can be formulated as data scarcity and data imbalance. One highly potential solution of those problems is to increase the number of related data samples in the training set. However, labeling large datasets is expensive, timeconsuming, and sometimes impossible due to emerging and fast evolving intrusion attacks. In addition, adequate records of different types of intrusions might be unavailable, which makes those problems particularly severe. Thus, developing a data augmentation (DA) enhanced NID framework is crucial for network security.
To tackle the aforementioned challenges, this paper presents a novel NID framework that is able to utilize supervised classification models for identifying normal network requests and highfrequency cyber attacks. More importantly, we propose a DA module that incorporates deep adversarial learning and statistical learning techniques, which allows the NID framework to detect network intrusions in small sample scenarios. Experimental results on KDD Cup 99 dataset [16] show that it can effectively identify intrusions, in particular, the emerging ones when limited training data are provided.
The main contributions of this paper are summarized as follows:

We propose a general learning based NID framework focusing on detecting network intrusion in small samples. By exploiting DA and advanced classification methods, it is capable of identifying a variety of network intrusions, especially emerging ones.

We develop a novel DA module that addresses the scarcity and the imbalance of training set via a datadriven manner. It involves a probabilistic generative model for estimating network data feature distributions and generating synthesised data using Monte Carlo methods. Furthermore, we pretrain and finetune deep generative neural networks in adversarial learning scheme for augmenting synthesised intrusion data with high quality.

Extensive experiments on classifying small sample intrusions and normal network requests are performed. Compared with the existing learning based IDSs, the DA enhanced NID framework achieves the better or comparable accuracy, precision, recall and F1score.
The remainder of this paper is organized as follows: Section II introduces related works and motivations. Section III presents the proposed NID framework and learning based DA module. Section IV reports the evaluation, analysis, and comparison of experimental results. Section V provides further discussions of our work. Section VI concludes this paper.
Ii Related Work
In this section, we first introduce recent progresses in learning based IDSs. We then present the preliminaries of DA methods and the motivations of this paper.
Iia Learning based Network Intrusion Detection
Existing learning based IDSs utilize ML and DL models for distinguishing different types of network data. In the category of recent ML based IDSs, Zhang et al. [11] combined unsupervised clustering and supervised learning for robust network traffic classification. Ashfaq et al. [17]
proposed semisupervised fuzzy method for intrusion detection. Those methods establish NID models by learning knowledge from a large amount of unlabeled network data. Howbeit, the performance of their methods in detecting intrusions with small sample sizes remains unknown if abundant data are unavailable. Apart from those methods, supervised classification models such as LR, SVM, and random forest (RF) have been extensively applied for improving modern IDSs
[13] [18] [12]. To deal with the feature redundancy problem in training supervised classifiers, Ambusaidi et al. [13] introduced mutual information based algorithm to select network features. Zhou et al. [18] adopted word embedding for extracting meaningful features of network data. In order to further alleviate the overfitting problem, RF [12] and tree algorithms [19]have been employed to ensemble subclassification models for robust NID. However, the data scarcity and data imbalance remain unsolved because the feature selection/extraction and classifier aggregation do not increase the number of intrusion samples among different categories in the training set.
In DL based IDSs, Tang et al. [8] applied threelayer DNN for extracting multilevel features and classifying flowbased network intrusions. Shone et al. [3]
proposed a more complex DNN called nonsymmetric deep autoencoder (NDAE). They stacked two NDAEs for unsupervised feature learning of network data, followed by a RF for classification. Those DNN based IDSs achieve promising NID performance in terms of different evaluation metrics (e.g., accuracy, precision and recall). Considering the requirement of large quantities of training data for optimizing the layer weights of DNNs, the weakness of those IDSs in detecting lowfrequency intrusions (e.g., the
guesspasswd
and the bufferoverflow
attack reported in [3]) is nonnegligible. Thus, enriching small sample intrusions in the training set is essential in developing learning based IDSs.
IiB Data Augmentation Methods
IiB1 Probabilistic Generative Models
In Bayesian statistical learning, probabilistic generative models are extensively used to approximate unknown distributions of target data
[20]. Therein Markov chain Monte Carlo (MCMC) methods are used to sample their parameters from observed data.
The MetropolisHastings (MH) algorithm [21]
is the foundation of MCMC, which generates candidate data from a proposal distribution based on an acceptance probability. An improved version to MH algorithm is Gibbs sampling
[22] [23] which uses a full conditional distribution for designing the proposal distribution. Closely related to Gibbs sampling, expectation maximisation (EM) algorithm [24] [20] alternates between an expectation step and a maximization step for estimating distribution parameters. Nonetheless, those algorithms cannot be directly applied to generate intrusions due to the difficulty of modeling complex features (e.g., real/Boolean values) with large divergence.IiB2 Generative Adversarial Networks
Generative adversarial networks (GANs) [25]
have been increasingly employed in generating realistic objects in computer vision
[26] [27]. GANs contain two differently structured DNNs: the discriminative net and the generative net denoted as and , respectively. During the stage of training, attempts to produce forged data based on input priority. takes in both forged data and real ones, and learns to distinguish the counterfeit from the truth. This finally results in a powerful data generator .Note that both and are constructed of DNNs, therefore, training GANs on a small collection of intrusion samples is a challenge. Even if sufficient network intrusions are available, GANs are reported to have remarkable training difficulties [28] [29]. For instance, gets worse while gets better due to the significant difference of convergence speed, which might lead to poor outputs and a deficient generator.
IiC Motivations
Despite promising progresses in recent research on learning based IDSs, challenges such as data scarcity and data imbalance in training (semi)supervised classifiers are still two fundamental issues to the community. Those challenges become vital with the increased emerging attacks that often have limited known data samples. Data augmentation methods provide a potential approach to tackle these problems, which could enrich network data in the training set if designed properly. Findings obtained from the existing literatures show that directly applying DA algorithms might bring undesired effects [20] [29]. Hence, a novel learning based DA module associated with the general NID framework presented in this paper will contribute to the design of advanced high performance IDSs.
Iii Network Intrusion Detection with Data Augmentation
In this section, we first introduce the pipeline of the proposed NID framework and then present the formulation of the learning based DA module for addressing data scarcity and imbalance problems. Finally, we provide the detailed optimization of DA module through adversarial learning.
Iiia The NID Framework
Figure 1 depicts the proposed NID framework that involves the training phase for training supervised classification models assisted with learning based DA module. Those classifiers are then adopted in the testing phase for detecting network intrusions from normal network requests.
In the training phase
, continuous features of network data are normalized and the related statistical variables (i.e., mean and standard deviation) are recorded for processing testing data. We make discrete symbolic features remain unchanged. Other preprocessing techniques (e.g., data filtering and feature selection) can be alternatively applied. Normalized data are then partitioned, in which a proportion of normal network data are randomly selected and the imbalanced network intrusion samples are augmented by DA module. Finally, those data are aggregated as a balanced dataset for training NID models using supervised learning methods.
In the testing phase, input network data are preprocessed as training data and fed into NID models for classification. Because those models are trained on balanced dataset using advanced learning methods, the proposed framework is competent to accurately identify network intrusions, especially emerging ones.
IiiB The Data Augmentation Module
Figure 2 shows the schematic diagram of the proposed DA module. Given a small number of network intrusion data samples, the PoissonGamma joint probabilistic model (PGM) is first derived for efficiently generating synthesised intrusion data. Deep generative neural networks (DGNNs) then take both real and synthesised data to train layer weights through adversarial learning. Finally, DGNNs output network intrusion data with augmented quality.
In DA module, PGM plays an essential role in initializing DGNNs to fit in with the general distribution of related intrusions. This alleviates the convergence problem of training DGNNs on limited network data. In turn, DGNNs compensate for the boundedness of PGM on simulating network features that exhibit large divergence.
IiiC The PoissonGamma Joint Probabilistic Model
In this subsection, we present the PoissonGamma joint probabilistic generative model for modeling feature distributions of network data. At the same time, we theoretically analyze its feasibility for simulating complex network features. Finally, we exploit MCMC based Gibbs sampler for efficiently generating synthesised intrusion data.
Let be the feature vector of one network intrusion sample, where denotes the dimension of feature space. Assuming that intrusions of the same category come from the joint probabilistic distribution defined as follows:
(1) 
where , and denotes the distribution parameters of th feature dimension of . Note that contains components, which means each feature of is jointly modeled by one distribution.
In the existing NID benchmarks [16] [30], continuous
digital features of network data usually represent the volume of requests or the time of connections. Consequently, Poisson distribution
[31] is employed to approximate the accumulation of those network events:(2) 
where indicates the average intensity, i.e., the statistical average, of . Since Poisson distribution contains one parameter, we have and .
To render an effective Markov chain for the subsequent data simulation,
is approximated by Gamma distribution:
(3) 
where are shape and scale parameters. Given a collection of intrusion samples , we have and , where and
denote the expectation and variance of
. Because of the conjugated relationship between Poisson and Gamma distribution [32], the convergence property of PGM is theoretically guaranteed for synthesising network data.Proposition 1.
The PoissonGamma joint probabilistic model can estimate the distributions of both continuous and discrete digital features of network data.
Proof 1.
The PoissonGamma joint distribution can approximately estimate
continuous digital features distributions since the accumulation of network events can be statistically formulated by Poisson process.For discrete digital features (i.e., values equals to 0 or 1), the joint distribution can model those symbolic values by sampling the Poisson parameter to be 0 or 1 from Gamma distribution, respectively. ∎
Providing intrusion samples of the same category, the goal of synthesising intrusions then becomes to estimate given
. According to Bayes’ theorem
[33], this can be achieved by maximizing the compact posterior:(4) 
where the first and the second terms on the right side are the density functions of Poisson and Gamma distributions, respectively.
Proposition 2.
If network intrusion samples are assumed to obey Poisson distribution and their Poisson parameter is formulated by Gamma distribution, then the posterior obeys Gamma distribution.
Proof 2.
In Eq. (4), each term stands for a probability and hence has a nonnegative value. Applying natural logarithm on both sides in Eq. (4), we have:
(5) 
where denotes the serial number of known intrusion samples.
Substituting the exponential form of Poisson and Gamma density functions into Eq. (5), we obtain:
(6)  
where denotes the elementwise factorial of and a unit vector.
Extracting the terms with respect to in Eq. (6), we derive the representation of the posterior as follows:
(7)  
where is a negligible constant [32]. All multiplications above are operated in an elementwise manner. Therefore, the posterior obeys Gamma distribution.
∎
Given a collection of network intrusion samples , we utilize Eq. (7) to approximate the Poisson parameter and further synthesise intrusion data by Poisson using Gibbs sampler. Algorithm 1 shows the pseudocode of PGM for producing synthesised intrusion data.
IiiD The Deep Generative Neural Networks
In this subsection, we present the formulation of DGNNs that augments synthesised intrusion data with enhanced quality. Afterwards, we propose a twofold mechanism for training DGNNs via adversarial learning strategies.
The proposed DGNNs have two components: the Discriminator (net) and the Generator (net). In adversarial training process, net generates augmented intrusion data by learning their real feature distribution, while net acts as an indicator trying to reject augmented data from real intrusion samples. By fully exploiting the capability of hierarchical feature learning, net and net are implemented by DNNs. The input size of net and net is which equals to the dimension of network data features. In the hidden layers, neural nodes are stacked for abstracting latent representation of input data. The output of net is a scalar which returns the possibility whether its input comes from real samples. In contrast, the output of net is a dimensional vector representing the augmented network intrusion.
Provided with target network intrusion samples, the goal of training DGNNs is to obtain a potent net that recovers their feature distributions and further generates intrusion data with similar characters. As discussed in Section IIB, training DGNNs with limited samples is easy to overfit because both net and net are structured in DNNs. Therefore, we present a twofold adversarial learning mechanism which adopts both real and synthesised network data for optimizing their weights.
IiiD1 Pretraining
As shown in Figure 3(a), synthesised intrusion data are treated as suboptimal targets and net is trained to distinguish augmented ones (i.e., ) from them. Variables and are the prior of net for learning the mapping . To adjust to different and unexpected situations,
is sampled from Gaussian distribution
. This assumes net has no specific prior knowledge on targets. The objective of pretraining is then formed as the following adversarial game:(8)  
where and denote feature distributions of and , respectively. Given sufficient synthesised data, DGNNs are pretrained to fit in with the general distribution of related network intrusions [29]. In this case, the quality of augmented samples is bound to be mediocre as net cannot directly access the feature information of real intrusion samples.
IiiD2 Finetuning
During the adversarial training demonstrated in Figure 3(b), net takes in both real intrusion samples and augmented ones from net. Meanwhile, net is fed by synthesised data added with Gaussian variables (i.e., ). In this scenario, DGNNs are finetuned according to the following objective:
(9)  
where denotes the real intrusion distribution to recover. Note that rather than or is chosen in finetuning. This avoids net learning monotonous distribution from the same data in .
Since DGNNs have been pretrained on synthesised intrusion data, the Generator competes with the Discriminator on a comparative level in finetuning stage. Thus, the twofold adversarial training mechanism allows DGNNs to learn real intrusion distribution in a progressive manner. This precludes the remarkable convergence difference of DGNNs, and prevents the Generator generating poor outputs.
IiiE Optimization
In this section, we present the optimization details of pretraining and finetuning DGNNs.
Let and denote the weights of net and net. The aim of training DGNNs becomes to optimize the minmax objectives in Eq. (8) and Eq. (9) with respect to and . Considering that DGNNs are formed in DNNs, they are trained by backpropagation with stochastic gradient algorithm.
In the pretraining stage, the gradients of with respect to and on a batch of intrusion samples are:
(10) 
(11) 
In the finetuning stage, the gradients of with respect to and on a batch of intrusion samples are:
(12) 
(13) 
After obtaining gradients and , layer weights and
are optimized by stochastic gradient descent.
Iv Experiments
In this section, we conduct comprehensive experimental validations of the proposed DA enhanced NID framework on KDD Cup 99 dataset [16]. Its performance on detecting the network intrusions in small samples (e.g., emerging attacks) is compared with learning based IDSs, in which classical LR, SVM, and advanced DNN are employed for comparison.
Iva Network Data
KDD Cup 99 dataset [16] is a benchmark dataset widely used in NID study. As outlined in Table I, it is composed of two training sets at different scales and one testing set. Network data are categorized into normal requests (NORMAL) and four major intrusions: denialofservice (DOS), surveillance and other probing (PROBE), unauthorized access from a remote machine (R2L) and unauthorized access to root user (U2R). Given that 100% training set merely contains additional records of normal requests and highfrequency intrusions, the 10% training set is used in all experiments.
Category  Training (100%)  Training (10%)  Testing 

NORMAL  972781  97278  60593 
DOS  3883370  391458  229853 
PROBE  41102  4107  4166 
U2R  52  52  228 
R2L  1126  1126  16189 
Category  Intrusion Type  Training (10%)  Testing 

DOS  ’apache2’  0  794 
’mailbomb’  0  5000  
’processtable’  0  759  
PROBE  ’mscan’  0  1053 
’saint’  0  736  
R2L  ’guesspasswd’  53  4367 
’snmpgetattack’  0  7741  
’snmpguess’  0  2406 
Each record of KDD Cup 99 dataset consists of 38 digital features and 4 character features. Assuming that the digital features provide adequate information for identification, characters are not used for experimental validation.
Considering the task of NID in small sample sizes, network intrusions are expected to satisfy the prerequisites that they have limited records in the training set, while plentiful records exist in the testing set. In this case, 8 types of intrusions are selected (as shown in Table II): apache2
, mailbomb
and processtable
of DOS attack, mscan
and saint
of PROBE attack, guesspasswd
, snmpgetattack
and snmpguess
of R2L attack. In experiments, if no training data in one intrusion type is available, related testing intrusion samples will be selected to complement the training set before data augmentation. Those samples are then rejected for testing. Despite training DNNs requires a large amount of labelled data, is set to 50 to meet the above prerequisites.
IvB Evaluation Metrics
The metrics that are used to measure NID results of IDSs are listed below:

True Positive (TP): Network intrusions (or normal network requests) that are correctly detected.

True Negative (TN): Normal network requests (or network intrusions) that are correctly detected.

False Positive (FP): Normal requests that are misclassified as intrusions.

False Negative (FN): Intrusions that are misclassified as normal requests.
IvC Parameters
The detailed parameter setup and tuning strategies are provided as follows:
IvC1 Threshold of Gibbs sampler
The cut off threshold is set to be 500 to assure the convergence of PGM. Due to the efficiency of Gibbs sampling, can be a large value while the time consuming is still economic.
IvC2 Structure of DGNNs
In
net, hidden nodes are set to be 70, 50, 40, and 20. The ReLU activation function is used after each nonterminal layer, while the the sigmoid activation function is applied to the last layer for producing the decision probability. In
net, the layer nodes are set to be 40, 30, and 20. Those three hidden layers are sequentially connected by ReLu and sigmoid functions. The last hidden layer is linearly mapped to the output layer. In this scenario,
net has less hidden nodes than that of net, which leads to improved inference capacity of net and guarantees an effective net to be trained. In addition, dropout [35] is employed in all hidden layers to regularize training process and decrease overfitting risks.IvC3 Minibatch size of DGNNs
is constrained to be less than the minimum number of real and synthesised intrusion data (i.e., ). In our experiments, is set to be 20% 40% of real intrusion samples (i.e., ). This allows training DGNNs to be efficiently and effectively.
IvC4 Training iterations of DGNNs
The maximum iterations in pretraining stage and finetuning stage are empirically set to be and . Since known network intrusions are limited, less iterations in finetuning stage are suggested to avoid overfitting problem.
IvC5 Other hyperparameters of DGNNs
The learning rates of net and net are both empirically set to be for the propose of adjusting weights of DNNs with respect to low ratio of loss gradients. This guarantees DNNs to be robustly trained and avoids missing optimum. Howbeit, it might result in low convergence speed. Thus, momentum [36] is adopted to accelerate training and prevent gradient oscillations. The parameters in momentum are defaultly set to be 0.9 and 0.99 when Adam optimizer is used [34].
IvD Binary Classification based NID
Intrusion Type  NID Model Name  Accuracy  Precision  Recall  F1Score 

’apache2’  NIDLR  98.51 0.42  0.00  0.00   
NIDDALR  99.53 0.05  77.29 1.21  90.70 3.41  83.45 2.14  
NIDSVM  98.97 0.02  55.73 0.56  99.32 0.38  71.39 0.49  
NIDDASVM  99.94 0.01  95.61 0.87  99.70 0.07  97.61 0.44  
’mailbomb’  NIDLR  91.93 0.44  0.00  0.00   
NIDDALR  97.84 0.41  78.11 3.37  99.89 0.14  87.63 2.08  
NIDSVM  93.04 0.03  80.38 1.58  11.53 0.33  20.16 0.50  
NIDDASVM  99.34 0.34  92.29 3.65  99.78 0.16  95.86 2.05  
’processtable’  NIDLR  99.33 0.05  64.98 1.78  98.74 2.03  78.37 1.72 
NIDDALR  99.53 0.04  72.46 1.81  100.00 0.00  84.02 1.21  
NIDSVM  99.87 0.06  90.79 3.86  99.60 0.56  94.96 2.14  
NIDDASVM  99.90 0.04  92.60 2.17  99.71 0.15  96.02 1.22  
’mscan’  NIDLR  97.78 0.11  42.58 1.35  84.90 1.48  56.71 1.42 
NIDDALR  99.01 0.06  64.81 1.48  92.29 0.46  76.14 1.14  
NIDSVM  99.61 0.12  85.10 4.34  94.11 0.18  89.32 2.73  
NIDDASVM  99.73 0.06  90.03 4.12  95.12 1.04  92.44 1.65  
’saint’  NIDLR  98.22 0.22  40.17 2.89  96.47 1.45  56.65 2.81 
NIDDALR  98.47 0.04  43.78 0.75  96.93 0.65  60.32 0.83  
NIDSVM  98.56 0.03  45.39 0.50  96.41 0.75  61.72 0.46  
NIDDASVM  98.60 0.01  46.08 0.24  97.47 0.33  62.58 0.27  
’guesspasswd’  NIDLR  88.59 0.75  34.21 1.50  75.10 2.87  46.98 1.44 
NIDDALR  89.07 0.24  35.65 0.59  77.74 0.19  48.89 0.57  
NIDSVM  94.59 0.27  89.06 1.93  22.21 4.10  35.42 5.19  
NIDDASVM  98.95 0.10  90.57 2.17  94.29 0.22  92.38 1.19  
’snmpgetattack’  NIDLR  88.67 0.00  0.00  0.00   
NIDDALR  80.42 0.58  36.61 0.69  99.43 0.07  53.51 0.72  
NIDSVM  88.65 0.02  0.00  0.00    
NIDDASVM  82.42 0.03  39.13 0.03  99.39 0.21  56.15 0.04  
’snmpguess’  NIDLR  98.84 0.15  78.61 2.51  95.93 0.05  86.39 1.55 
NIDDALR  99.07 0.06  82.66 1.17  95.84 0.00  88.76 0.68  
NIDSVM  96.18 0.00  0.00  0.00    
NIDDASVM  81.20 0.04  16.85 0.02  99.72 0.10  28.83 0.03 
In this subsection, we evaluate binary classification performance of the DA enhanced NID framework. In each group of the experiments, it is required to identify positive samples (i.e., one type of network intrusions) from negative samples (i.e., normal network requests).
As illustrated in Fig. 1, a proportion of 6,000 randomly selected normal request data and 50 intrusion samples (augmented to 500 by the DA module) are used for training ML based NID models. In this case, LR and SVM are adopted since they are basic building blocks of many learning based IDSs [13] [3].
The training and testing procedures are repeated 15 times, each with varying selected negative samples and rejecting anomalous results. Afterwards, the statistic average of evaluation metrics are computed for validation. In this scenario, the adverse impact of data imbalance (i.e., training with all negative samples) is further reduced. Besides, it avoids training classifiers with biased data, such that the selected negative data are not representative.
Table III summarizes the binary classification results. It demonstrates that the DA enhanced NID frameworks (named as NIDDALR/SVM) achieve improved or comparable recall, and significantly outperforms other IDSs (named as NIDLR/SVM) in terms of precision and F1score. Those results show that the intrusions augmented by the proposed DA module can be used to train potent classification models for NID tasks.
Note that the proposed frameworks obtain improved accuracy on most intrusion types except snmpgetattack
attack and snmpguess
attack of R2L. The reasons are two aspects. Firstly, computing accuracy requires to count normal network requests which occupy a large proportion in the testing set (see Table I and II). Thus, accuracy merely increases in small margin when additional intrusions are detected. Secondly, the low precision and recall of NIDLR/SVM on snmpgetattack
and snmpguess
attacks indicate that considerable intrusions are misclassified. Accordingly, they are not applicable to detect intrusions in the small sample scenario.
IvE Multiple Classification based NID
Category  NID Model Name  Accuracy  Precision  Recall  F1Score 

NORMAL  NIDSVM  76.30 0.01  75.91 0.01  98.69 0.02  85.81 0.01 
NIDPGMSVM  76.16 0.08  75.90 0.03  98.41 0.11  85.70 0.06  
NIDDASVM  82.87 0.17  98.39 0.20  77.68 0.32  86.82 0.16  
DOS  NIDSVM  94.00 0.01  93.35 1.15  25.34 0.16  39.86 0.10 
NIDPGMSVM  93.84 0.10  86.98 2.63  25.33 0.49  39.23 0.80  
NIDDASVM  99.48 0.05  94.09 0.58  99.70 0.06  96.81 0.29  
PROBE  NIDSVM  98.82 0.02  68.54 0.43  83.27 0.53  75.19 0.37 
NIDPGMSVM  98.80 0.02  67.39 0.31  85.89 0.38  75.53 0.30  
NIDDASVM  99.32 0.04  88.70 0.88  78.13 1.85  83.07 1.00  
R2L  NIDSVM  83.43 0.01  97.91 0.64  4.83 0.02  9.20 0.04 
NIDPGMSVM  83.34 0.05  94.15 5.15  4.51 0.02  8.60 0.04  
NIDDASVM  83.74 0.19  51.75 0.31  96.57 0.66  67.38 0.23 
Category  NID Model Name  Accuracy  Precision  Recall  F1Score 

NORMAL  NIDDNN  77.15 3.22  76.5 2.81  99.15 0.51  86.33 1.64 
NIDPGMDNN  83.17 1.40  81.58 1.31  99.26 0.37  89.55 0.76  
NIDDADNN  87.96 0.12  86.95 0.28  98.15 0.61  92.21 0.11  
DOS  NIDDNN  93.81 2.94  89.70 6.58  23.55 9.17  27.72 7.37 
NIDPGMDNN  99.04 0.22  97.84 3.73  89.97 0.98  93.7 1.34  
NIDDADNN  99.59 0.08  97.14 1.36  97.65 0.40  97.39 0.52  
PROBE  NIDDNN  98.92 0.31  74.54 1.93  81.49 8.26  76.79 3.29 
NIDPGMDNN  98.40 0.07  64.93 1.90  55.84 7.72  59.78 3.86  
NIDDADNN  99.05 0.07  75.27 1.24  83.47 4.52  79.11 2.17  
R2L  NIDDNN  83.90 2.72  58.99 3.87  7.55 5.81  11.28 3.13 
NIDPGMDNN  84.90 1.26  88.45 5.67  13.92 7.48  23.60 2.57  
NIDDADNN  89.09 0.10  92.17 4.86  40.97 1.82  56.64 0.92 
In order to verify the performance of the DA module on enhancing the existing learning based IDSs, we have undertaken multiclass based NID experiments. In this case, the network data are categorized as: NORMAL, DOS, PROBE, and R2L, which covers the aforementioned 8 intrusion types.
For learning based IDSs, SVM and one typical DNN architecture presented in [8] [37] are chosen as supervised NID models. In the training process of DNN, the batch size is initialized as 32 and the learning rate is adaptively decided. Furthermore, dropout [35] and crossvalidation are employed to avoid overfitting problems.
The evaluation stage is repeated 15 times. At each time, 6,000 and 12,000 normal requests are randomly selected for training SVM and DNN, respectively. In addition, 50 fixed samples of each type of intrusions are separately augmented to 500 by PGM and PGMDGNNs associated DA module to 500. Overall, the augmented training set contains 4,000 intrusion samples. At the completion of training, the deviant results (e.g., extremely low F1score) are rejected and the average metrics are computed for the followup analysis.
Table IV and Table V present the comparison results of multiclass NID using SVM and DNN, respectively. It can be observed that the PGMDGNNs enhanced NID frameworks (named as NIDDASVM/DNN) outperforms other two NID models with regard to both accuracy and F1score.
For DNN based NID, the precision and recall obtained by the proposed frameworks are better than, or comparable to those obtained by other NID models. For SVM based NID, it achieves the highest precision and recall on DOS with notable exceptions on other three categories. The reason is that network intrusion data generated by net are strictly affected by DNN implemented net. As a result, augmented intrusion features are more sensitive for DNN based NID models.
Despite that PGM enhanced IDSs have achieved improvement on some evaluation metrics, those IDSs show undesired performance (e.g., low F1score on PROBE when DNN is used) in identifying certain network intrusions. In contrast, PGMDGNNs aided NID frameworks are able to accurately detect the intrusions with small and imbalanced samples.
V Discussion
The DA enhanced NID framework proposed in this paper is trained in a datadriven manner. Its performance is closely related to the quality of augmented network intrusions with regard to the quantity and quality of known samples. Therefore, the selected network intrusion samples are recommended to be able to represent the underlying characteristics of relevant intrusion types.
In the proposed NID framework, we have presented the insight of using deep adversarial learning for augmenting limited network intrusion samples. The structure of net and net are generally formulated as fully connected DNNs, in which advanced network architectures [28] might be employed to design DGNNs.
The computational complexity of the proposed NID framework includes the following two phases. In the testing phase, classifying over 80,000 network data takes a few seconds even if DNN is applied. In the training phase, time consumption can be analyzed as follows. First, PGM simulates synthesised intrusion data within minutes by using MCMC based Gibbs sampling. Second, DGNNs is efficiently optimized on high performance computing devices. In our experiments, training the DGNNs on NVIDIA GTX 1080 GPU takes less than two hours. Third, training supervised classifiers (i.e., SVM and DNN) requires less than half an hour. In spite of the training time, it demonstrates high efficiency in identifying network intrusions, which is necessary in network security.
The proposed DA module can be alternatively applied to assist NID algorithms implemented on distributed platforms. Specifically, the imbalanced training set is firstly augmented with the proposed DA module in a data center. The augmented dataset can then be partitioned into several data blocks, in which normal network request data and network intrusion samples have comparable proportions. Finally, those balanced data blocks are delivered to different computing nodes to distributively train NID models.
Vi Conclusion
In this paper, a general NID framework and two learning based data augmentation components have been jointly proposed to tackle the data scarcity and data imbalance problem in designing learning based IDSs. In this framework, statistic learning based PGM and deep learning based DGNNs of DA module are developed for enlarging limited intrusion samples in the training set. By employing classical ML models (e.g., LR and SVM) as well as advanced DNNs, it can accurately classify normal network request and heterogeneous network intrusions. Extensive experimental validations have been conducted on KDD Cup 99 dataset. Both binary and multiple classification results have shown that the DA enhanced IDSs outperform others regarding F1score (which is a crucial criterion of evaluating imbalanced classification tasks). Additionally, it achieves improved or comparable accuracy, precision and recall, especially when DNN is adopted for classifying network data.
References
 [1] S. X. Wu and W. Banzhaf, “The use of computational intelligence in intrusion detection systems: A review,” Appl. Soft. Comput., vol. 10, no. 1, pp. 1–35, 2010.
 [2] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep learning approach for network intrusion detection system,” in Proc. EAI Int. Conf. Bioinspired Inf. Commun. Technol., 2016, pp. 21–26.
 [3] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning approach to network intrusion detection,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 2, no. 1, pp. 41–50, 2018.

[4]
C. Huang, G. Min, Y. Wu, Y. Ying, K. Pei, and Z. Xiang,
“Time series anomaly detection for trustworthy services in cloud computing systems,”
IEEE Trans. Big Data, pp. 1–13, 2017.  [5] I. Nevat, D. M. Divakaran, S. G. Nagarajan, P. Zhang, L. Su, L. Ling Ko, and V. L. Thing, “Anomaly detection and attribution in networks with temporally correlated traffic,” IEEE/ACM Trans. Netw., vol. 26, no. 1, pp. 131–144, 2018.
 [6] G. Y. Keung, B. Li, and Q. Zhang, “The intrusion detection in mobile sensor network,” IEEE/ACM Trans. Netw., vol. 20, no. 4, pp. 1152–1161, 2012.
 [7] H. Moosavi and F. M. Bui, “A gametheoretic framework for robust optimal intrusion detection in wireless sensor networks,” IEEE Trans. Inf. Forensic Secur., vol. 9, no. 9, pp. 1367–1379, 2014.
 [8] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho, “Deep learning approach for network intrusion detection in software defined networking,” in Proc. Int. Conf. Wirel. Netw. Mob. Commun., 2016, pp. 258–263.
 [9] Y. Xu, Z. Liu, Z. Zhang, and H. J. Chao, “Highthroughput and memoryefficient multimatch packet classification based on distributed and pipelined hash tables,” IEEE/ACM Trans. Netw., vol. 22, no. 3, pp. 982–995, 2014.
 [10] A. X. Liu and E. Torng, “Overlay automata and algorithms for fast and scalable regular expression matching,” IEEE/ACM Trans. Netw., vol. 24, no. 4, pp. Liu, Alex X and Torng, Eric, 2016.
 [11] J. Zhang, X. Chen, Y. Xiang, W. Zhou, and J. Wu, “Robust network traffic classification,” IEEEACM Trans. Netw., vol. 23, no. 4, pp. 1257–1270, 2015.
 [12] N. Farnaaz and M. A. Jabbar, “Random forest modeling for network intrusion detection system,” Procedia Comput. Sci., vol. 89, pp. 213–217, 2016.
 [13] M. A. Ambusaidi, X. He, P. Nanda, and Z. Tan, “Building an intrusion detection system using a filterbased feature selection algorithm,” IEEE Trans. Comput., vol. 65, no. 10, pp. 2986–2998, 2016.
 [14] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” in Int. Conf. Learn. Represent., 2015.

[15]
K. He, X. Zhang, S. Ren, and J. Sun,
“Deep residual learning for image recognition,”
in
Proc. IEEE Conf. Comput. Vision Pattern Recognit.
, 2016, pp. 770–778.  [16] “KDD Cup 99 Dataset,” http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, 1999.
 [17] R. A. R. Ashfaq, X. Wang, J. Z. Huang, H. Abbas, and Y. He, “Fuzziness based semisupervised learning approach for intrusion detection system,” Inf. Sci., vol. 378, pp. 484–497, 2017.
 [18] X. Zhuo, J. Zhang, and S. W. Son, “Network intrusion detection using word embeddings,” in Proc. IEEE Int. Conf. Big Data, 2017, pp. 4686–4695.
 [19] J. Kevric, S. Jukic, and A. Subasi, “An effective combining classifier approach using tree algorithms for network intrusion detection,” Neural Comput. Appl., vol. 28, no. 1, pp. 1051–1058, 2017.
 [20] C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan, “An introduction to MCMC for machine learning,” Mach. Learn., vol. 50, no. 12, pp. 5–43, 2003.
 [21] N. Metropolis and S. Ulam, “The Monte Carlo method,” J. Am. Stat. Assoc., vol. 44, no. 247, pp. 335–341, 1949.
 [22] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 6, no. 56, pp. 721–741, 1984.
 [23] K. P. Murphy, Machine learning: A probabilistic perspective, The MIT Press., 2012.
 [24] G. C. Wei and M. A. Tanner, “A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms,” J. Am. Stat. Assoc., vol. 85, no. 411, pp. 699–704, 1990.
 [25] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
 [26] H. Zhang, C. Luo, X. Yu, and P. Ren, “Mcmc based generative adversarial networks for handwritten numeral augmentation,” in Proc. Int. Conf. Commun. Signal Process. Syst., 2017, pp. 2702–2710.
 [27] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum, “Learning a probabilistic latent space of object shapes via 3d generativeadversarial modeling,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 82–90.
 [28] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.
 [29] M. Arjovsky and L. Bottou, “Towards principled methods for training generative adversarial networks,” in Int. Conf. Learn. Represent., 2017.
 [30] “NSLKDD Dataset,” http://iscx.ca/NSLKDD/, 2009.
 [31] M. Lefebvre, Applied stochastic processes, Springer Science and Business Media, 2007.
 [32] W. R. Gilks, S. Richardson, and D. Spiegelhalter, Markov chain Monte Carlo in practice, CRC Press., 1995.
 [33] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, John Wiley and Sons, 2012.
 [34] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Int. Conf. Learn. Represent., 2015.
 [35] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
 [36] I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, “On the importance of initialization and momentum in deep learning,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 1139–1147.

[37]
R Vinayakumar, KP Soman, and Prabaharan Poornachandran,
“Applying convolutional neural network for network intrusion detection,”
in Proc. Int. Conf. Adv. in Comput. Commun. Inform., 2017, pp. 1222–1228.
Comments
There are no comments yet.