1 Introduction
Due to the massive growth of Internet usage and a huge amount of online data, it is essential to take care of the network. Traditional security techniques such as data encryption, user authentication, firewall, etc. are not sufficient to provide trusted security to the network, as technologies are expanding day by day. Intruders are getting different ways of network attacks, so we must have to go for the second line of defense such as Intrusion detection system Ambusaidi et al. (2016). Intrusion is the set of actions which attempt to harm the Integrity, Confidentiality and Availability of a system. An intrusion detection system is a primary tool which is used for protecting networks and information systems against the threats. It monitors the host or packets transmitted throughout the network. If it detects a security policy violation, then it raises alarm to the system administrator.
The network generates large traffic and this huge data slows down the process of intrusion detection. The data also contains some information which is irrelevant and redundant for the detection purpose, so it is very important to select only those information which is relevant. Thus, feature selection is an important component of an IDS, which can powerfully identify a subset of most relevant features within a dataset to decrease the time for computation. Features extracted from IDS dataset contain similar types of information and possess high degrees of associations or correlations. Thus deletion of some of these features do not decrease the classification power of the system. The use of a full set of features increases the complexity of the system as well as decreases the accuracy. So the selection of a proper subset of features which are highly relevant for the given task as well as uncorrelated to each other is desired
Chebrolu et al. (2005). IDS developed in Peddabachigari et al. (2007); Ambusaidi et al. (2015); Thaseen and Kumar (2017); Salunkhe and Mali (2017); Wang et al. (2018); Aljawarneh et al. (2018); Wang et al. (2010); Horng et al. (2011) used different feature deduction methods for selecting relevant features, whereas IDS presented in Benaicha et al. (2014); Gharaee and Hosseinvand (2016); Aghdam and Kabiri (2016); Dhopte and Chaudhari (2014); Eesa et al. (2015); Shah et al. (2018)used evolution theory to detect good subset of features and achieve good accuracy. Some researchers also used deep learning based techniques for designing IDS
Najizad and Najizad (2017); Kim et al. (2016, 2017); Javaid et al. (2016); Putchala (2017); Chawla (2017).In all these works, researchers have applied a single optimization function for the process of feature selection. Features can be redundant and correlated simultaneously. We should select the features in such a manner so that there would be minimum redundancy and correlation. The domain of the feature also defines its importance. If there is a feature having the same value for all the samples, then it would not be helpful for the classification, at the same time if it will have a vast range of values for the feature then also it would not be beneficial. Thus we need to identify the feature set in such a way that all these properties should be preserved so that it will be helpful for getting maximum accuracy in minimum computation time.
Motivated by these facts, in the current work, we have devised an effective IDS framework based on an unsupervised feature selection technique. Our IDS framework is divided into two phases. The first phase builds on the search capability of a popular multiobjective optimization based technique, namely nondominated sorting genetic algorithmII, NSGAIIDeb et al. (2000)
for optimizing multiple feature quality measures simultaneously, in order to find optimal feature subsets. We have used three different feature quality measures, two of them help in getting only relevant features and removing redundancy from the dataset with the help of some similarity measures ( such as mutual information, Pearson correlation coefficient etc.). Last one helps in getting those features whose variances are high over the samples. Note that the proposed feature selection technique is fully unsupervised. It does not utilize any labeled data at the time of feature selection. The second phase uses different available machine learning classifiers for generating an effective IDS. The features extracted from the first phase can be utilized in different machine learning based classifiers. The key contributions of this paper are as follows:

All the features present in the IDS data set are not suitable for classifying the data; thus we have devised an unsupervised based framework for evaluating the feature subset. It does not utilize any labeled information during its computation for selecting the relevant feature subset from any IDS data set.

We have used multiobjective optimization (MOO) for selecting suitable feature subsets. Different feature quality measures are optimized simultaneously using the search capability of MOO. Experiments are conducted varying the feature quality measures ranging from mutual information, standard deviation, information gain, Pearson correlation coefficient, entropy etc. The qualities of the selected feature subsets are further verified using different machine learning based classifiers. Several IDS systems are developed varying the classification technique and feature subset. For all the cases good performance is obtained.

A maximum accuracy of 99.78 % is attained by decision tree based IDS in multiclass classification with the use of feature subset identified by MOObased approach with information gain and standard deviation as objective functions. To the best of our knowledge, this is the bestreported accuracy compared to all the existing works available in the literature.
The rest of this paper is arranged as follows: in section 2, we have discussed some prior works which have been done by researchers. In section 3, important concepts are discussed. In section 4, the framework designed for proposed IDS is discussed. In Section 5, the proposed approach for MOObased feature selection is discussed. the functioning of the algorithm varying the base performance metric is reported. In section 6, different simulation results are discussed. In section 7, final results of the proposed IDS are elucidated. Finally, we conclude the paper in section 8.
2 Related Works
In recent years, some feature selection approaches using different machine learning algorithms are devised to increase the accuracy and to reduce overhead complexities. Researchers developed a feature selection method using mutual information and Pearson correlation coefficient measure for designing an effective IDS Ambusaidi et al. (2016). In Chebrolu et al. (2005)
, authors have proposed an ensemble based decision tree classifier for intrusion detection system. They have used Hidden Markov Model (HMM) for finding the feature subsets. In
Peddabachigari et al. (2007), authors have proposed decision tree (DT) and support vector machine (SVM) based intrusiondetection models. In Ambusaidi et al. (2015), authors have devised an unsupervised feature selection algorithm using Laplacian score. In Thaseen and Kumar (2017), authors used chisquare method for feature selection and multiclass support vector machine (SVM). In this work, Gamma and the fitting constant value for the Radial Basis function are also optimized, to get the good classification results. Researchers have also used an ensemble of different classifiers in order to increase the accuracy of system
Salunkhe and Mali (2017). In Wang et al. (2018), authors proposed three different strategies to extract relevant features. After feature extraction, they applied Support vector machine for detection. In Aljawarneh et al. (2018), authors built a hybrid model made of J48, Meta Pagging, RandomTree, REPTree, AdaBoostM1, DecisionStump and NaiveBayes classifier, which resulted a very good accuracy. In Wang et al. (2010), researchers combined artificial neural network with fuzzy clustering to achieve better accuracy. In
Horng et al. (2011), hierarchical clustering is used for feature selection and then support vector machine is used for classification.
In Benaicha et al. (2014), authors proposed an effective IDS using Genetic algorithm. They have used a weighted sum of support and confidence values as the evaluation function. In 2016, authors used the sum of true positive rate, false positive rate and number of selected features to evaluate the fitness of different feature combinations using Genetic algorithm Gharaee and Hosseinvand (2016). Authors have also used Antcolony optimization based method to find the best feature subset for the Intrusion detection system Aghdam and Kabiri (2016). In Dhopte and Chaudhari (2014), researchers have used a combination of a number of connections in the DARPA dataset as fitness function and applied the genetic algorithm for generation of rules to classify the instances. In Eesa et al. (2015), authors have used the cuttlefish algorithm as a search technique to find the optimal subset of features. After finding the optimal feature subset, decision tree classifier was used for checking the performance of the selected features produced by their algorithm. In 2018, authors used Genetic algorithm and support vector machine for classification. GA is used to optimize all the parameters of SVM, and then SVM is used for efficient intrusion detection Shah et al. (2018).
In Najizad and Najizad (2017), authors used artificial neural network to increase the accuracy of the system. In Kim et al. (2016), authors proposed an IDS using LSTM and efficiency of the technique is improved using different optimizers with LSTM in Kim et al. (2017). In Javaid et al. (2016)
, authors used auto encoder for developing IDS. They have used the selftaught learning capacity of autoencoder to learn the features so that good classification results can be achieved. Authors also used gated recurrent neural network for designing an effective IDS
Putchala (2017). In Chawla (2017), the author has designed a realtime Intrusion detection module for extracting features from the network and then applied a sequential neural network having three hidden layers to detect the attack.3 Background
In this section, We have discussed in brief the problem formulation of feature selection, NSGAIIDeb et al. (2000), different criteria for selecting features, different datasets used for designing the system, and different performance measures. We used the NSGAII algorithm for selecting relevant features from the huge dataset, to increase the classification accuracy as well as to decrease the time complexity of the IDS.
3.1 Feature selection
In a complex classification system, some features contain false correlations and result in hindering the classification process. Some features may be redundant also. Extra features can lead to an increase in computation time and may impact on the accuracy of the detection system. Feature selection improves classification by searching for the subset of features, which best classify the training data Chebrolu et al. (2005). Thus automatic selection of attributes in training and test data can help in developing a best predictive model. The objectives of feature selection are: (i) to improve the prediction performance of the model, (ii) to provide a faster and costeffective model and (iii) to provide a better understanding of the process that generated the data Guyon and Elisseeff (2003). Researchers devised many new algorithms for selecting only relevant features from the huge KDD99 dataset to increase the accuracy of the IDS, which have been already discussed in section 2 .
3.2 Multiobjective Optimization
Multiobjective optimization is a technique for optimizing more than one objective functions simultaneously. A popular multiobjective optimization algorithm is NSGAII (nondominated sorting genetic algorithmII) which is a fast and elitist technique for optimizing different objective functions. It was proposed in Deb et al. (2000). The optimization of multiple objectives associated to a problem leads to the generation of a set of optimal solutions known as Paretooptimal solutions, instead of a single solution. As none of these solutions is said to be better than the other, we would have many Paretooptimal solutions for a problem. Below we briefly describe the steps of NSGAII.
NSGAII is a variant of genetic algorithm (GA) Oh et al. (2004). Below we have discussed the steps of GA first. Genetic algorithm is an optimization and search methodology. It uses a chromosomelike data structure for representing solutions which are evolved using selection, recombination and mutation operators. In GA, chromosomes are represented as linear strings of symbols. For the feature selection problem, generally binary encoding is used. If the feature set contains number of features, then a string with binary digits is used. Each binary digit represents a feature, value of 1 is used to represent the selection of the feature and 0 is used to represent the rejection of the feature.
It works with a set of candidate solutions called as population. The chromosomes of the populations are randomly generated binary strings. Each bit of the string is initialized by ”randint(0,1)” function of python denoting the presence or absence of a single feature. It obtains the optimal solution after a series of iterative computations. Chromosomes are selected by evaluating the fitness values. A fitness function is an evaluation function which assesses the quality of a chromosome in every step of evaluation. Selection, crossover and mutation are three evolutionary operators which are repeated in the sequence until termination condition is satisfied. Selection selects the best solution from the population. Crossover does the job of recombination, and mutation adds some changes in the new population. In this way, using the search capability of GA optimal solution is found Oh et al. (2004).
In NSGAII, the number of optimal solutions can not be one, because of the optimization of more than one objective functions. Thus, in NSGAII a random population is generated as an initial population and it is sorted based on nondomination. Using selection, crossover, and mutation operators, offsprings are generated. Parents and offsprings are combined and then for the combined population, fronts are determined. The approaches of nondominated sorting and front calculation are described below in Section 3.2.1. Further, according to crowding distance comparison operator, (discussed in 3.2.2) the best solutions are kept in the population for the next generation. In this way, NSGAII optimizes multiple objectives as well as keeps the best solutions in the Pareto optimal front.
3.2.1 Fast Nondominated sorting approach
In this approach, the set of solutions which are not dominated by any other solution are determined. For each solution two entities: 1) domination count  the number of solutions which dominate the solution x, and 2)  a set of solutions which are dominated by solution x are calculated. Let there be two solutions, x and y. If a solution x has a better value in at least one of the objective functions and not poor values in other objective functions with respect to y, then y is said to be dominated by x; hence increases by 1 and y is added in the . All solutions in the first nondominated front will have their domination counts as zero. The front is the set of solutions. For each solution x with =0, each member y of its set is visited and then is reduced by 1. In this process, if for any member becomes zero, it is put in another list called as second nondominated front. This procedure is continued until all fronts are identified.
3.2.2 Crowdingdistance calculation and crowded comparison operator
After finding all the fronts, the density of solutions surrounding a particular solution in the population needs to be calculated. It is done by calculating the average distance of two solutions on either side of the particular solution along each of the objectives. This is called crowding distance. The sum of individual distance values corresponding to different objective functions is called the overall crowdingdistance value. Each objective function is normalized before calculating the crowding distance.The crowdedcomparison operator guides the selection process to form a uniformly spreadout Pareto optimal front. Every solution in the population has two attributes: nondomination rank (front no.) and crowding distance. Between two solutions with different non domination ranks, the solution with the lower rank is preferred. If both solutions belong to the same front, then a less crowded solution is preferred.
3.3 Different criteria for selecting features
There are different criteria which state about the nature of a feature in a dataset. These criteria reveal how useful a feature can be for the classification task without using the class labels. Some such criteria are mutual information, Pearson correlation coefficient, information gain, entropy and standard deviation.
3.3.1 Entropy
Entropy measures the impurity of a feature. Higher the entropy, more information the feature will have; it means more it will help in predicting the class labels. The entropy of a discrete variable Y can be calculated as :
(1) 
Where
denotes the probability mass function of Y.
3.3.2 Information gain (IG)
Information gain (IG) Information gain in decision trees (2019) measures how much information a feature provides us about the class. It measures the change in entropy after using the attribute. Thus, it conveys how important a given feature is. It is calculated as follows:
(2) 
3.3.3 Mutual information (MI)
Mutual information Goshtasby (2012)
is the amount of information communicated in one random variable about another. ”An important theorem from information theory says that the mutual information between two variables is 0 if and only if the two variables are statistically independent”.
Given two continuous random variables X=
{, ,.., } and Y={ , ,..,} where n is the total number of samples, the mutual information between X and Y is defined as:3.3.4 Pearson correlation coefficient(PCC)
Pearson correlation coefficient Goshtasby (2012) measures the linear correlation between two random features. It is symmetric in nature. The value of PCC falls in a definitely closed interval [1,1]. PCC value close to either 1 or 1 indicates the strong relationship between the two variables. PCC value close to 0 infers the weaker relationship between them. PCC value 0 indicates no relationship between them. PCC quantifies the degree to which a relationship between two variables can be described by a single line.
(4) 
Where, = Mean of variable, and = Mean of variable.
3.3.5 Standard deviation
Standard deviation is a measure, which is used to quantify how values for a feature are deviated from the average. It is calculated as follows:
(5) 
Where: = standard deviation of the feature
= no. of samples in the feature
= value of ith sample in feature, and
= mean value of the feature
3.4 Datasets used
In order to show the effectiveness of the proposed approach, we have used the standard KDD99 dataset, NSLKDD dataset, and Kyoto 2006+ dataset.
3.4.1 KDD Cup 99
The KDD Cup 99 dataset was derived from the DARPA 98 dataset generated from the 1998 DARPA Intrusion Detection Evaluation program Tavallaee et al. (2009). It has more than 4 million training samples and 3 million test samples. It contains TCP connection records having 41 informational features plus one labeled feature. The recorded details of each TCP connection are described by the informational features and the labeled feature specifies the type of connection. By the connection type, it means whether a connection is normal or abnormal. The 41 features consist of 32 continuous features and 9 nominal features. The features are classified into 4 categories: basic features, contentbased feature, timebased traffic feature, and hostbased traffic feature. We have used ”kddcup.data 10 percent” as training data and ”Corrected” as testing data. We have used 10 fold cross validation for validation purpose. In table 1, we have tabulated different attack classes of training data, and the category they belong to. In the testing data, some more attacks are present, but we considered only those attack classes which come under DoS, Probe, U2R, and R2L.
Attack category  List of attacks 

DoS  back, neptune, land, pod, smurf, teardrop 
Probe  ipsweep, nmap, portsweep, satan 
U2R  buffer_overflow, loadmodule, rootkit, perl 
R2L  ftp_write, guess_passwd, imap, multihop, phf, spy, warezclient, warezmaster 
data  Dos  Probe  U2R  R2L  Normal 

train data  391458  4107  52  1126  97278 
test data  222200  2377  39  5993  60593 
3.4.2 NslKdd
Although the KDD Cup 99 dataset is the most widely used benchmark in intrusion detection research, the dataset has some drawbacks. There are many duplicate records, which cause a biased training of classifier. The level of difficulty of the KDD cup 99 dataset is also not very good. Thus, to eliminate these undesirable qualities of KDD Cup 99 dataset, authors of Tavallaee et al. (2009) proposed a more effective dataset, ”NSLKDD dataset”. It is based on KDD Cup 99 dataset. The redundant records of the KDD dataset have been eliminated and the structure of the dataset is reconstructed to increase the level of difficulty. The elimination and reconstruction have made the new dataset more reasonable in both data structure and data size wise. Therefore, the ”NSLKDD dataset” can be considered as a more standard dataset for intrusion detection research. We have used the ”NSLKDDTrain+20%” dataset for the experimental purpose. It is made up of 25192 instances, in which 13449 are normal data and 11743 are considered as attack data.
3.4.3 Kyoto 2006+
The Kyoto 2006+ dataset was presented by Song et al. (2006). The data was collected over the period from August 2009 to November 2009. It was collected from honeypots and regular servers, deployed at Kyoto University. Each connection in this dataset has 24 different features. First 14 are same as KDD99 and 10 additional features are also present in the data. The additional 10 features enable us to investigate more effectively what happened to our networks. The additional features include: IDS_detection, Malware_detection, label etc. The labels are ” 1, 1, and 2”, where 1 means the session was observed, 1 means attack was observed, and 2 means an unknown attack was observed.
3.5 Performance Measures
Traditionally, researchers use accuracy, detection rate, precision and false alarm rate Elhamahmy et al. (2010)
to evaluate the performance of an IDS. The confusion matrix is a tabular structure representing the predicted/ actual classification. It leads to the calculation of True Positive(TP), True negative(TN), False positive(FP) and False negative(FN). True positive is the number of actual attacks classified as the actual attack. A true negative is the number of nonattacks classified as nonattack. False positive is the number of nonattacks classified as attack class and false negative is the number of actual attacks classified as nonattack. Below we have listed the formulas of different metrics.
True positive rate or Recall or Detection rate: It is the proportion of test results which are correctly classified by the model.
(6) 
Precision or positive predictive value : It is the probability of correct classification of any instance by the model.
(7) 
Fall out or false positive rate : It indicates that an attack is predicted by the model, while in reality it does not exist.
(8) 
Overall accuracy : It is the ratio of number of correctly classified instances and total number of instances.
(9) 
Weighted Average Accuracy: The average accuracy (9) might not be a good measure of performance in case of imbalance data. Weighted average accuracy is calculated by dividing the sum of product of the accuracy achieved in different classes with its corresponding number of samples and the total number of samples present in the data.
(10) 
Where, is the number of samples in ith class, is the accuracy of the ith class, and is the total no. of different classes.
Fmeasure is the harmonic mean of precision and recall. It is used to examine accuracy of a classification system by considering both precision and recall.
(11) 
4 Multi Objective Optimization Based Intrusion Detection Framework
The framework of the proposed IDS is depicted in 1. It is divided into four main phases. 1) Data collection: In this phase, a good dataset is chosen in order to design the proposed model and evaluate its performance. (2) Data preprocessing: In this phase, training and test data are preprocessed and normalized, (3) Feature Selection: Different feature quality measures are computed and a MOO based feature selection technique is employed for determining the Pareto optimal feature subsets, and (4) Model building: Different classifiers are applied on the determined feature subsets to identify the best feature subset and the best classifier. For test data, firstly preprocessing is done, then the built model is used to find the final result.
4.1 Data Collection
Data collection is the first step of any intrusion detection system. IDS is of two types on the basis of location from which the data is collected: a) networkbased IDS, and b) Hostbased IDS. In the network based IDS, the data is collected from the network, while data is collected from the host in the hostbased IDS. Our study proposes a networkbased IDS to test the proposed approach. Some standard datasets such as: NSLKDD, KDDcup99 and Kyoto dataset are chosen as the working data sets. These datasets are already described in Section 3.4.
4.2 Datapreprocessing
Generally machine learning classifier requires each instance in the input data as a vector of real number. Thus, to turn the nonnumerical values into numerical values, a preprocessing phase is
required. This phase is common for both training and test dataset. It contains two main stages shown as follows.
a) Data sample transformation: In this phase, the nonnumerical features are converted to numerical values. Second, third, and fourth features (protocol type, service and flag) of all the three standard datasets (KDD Cup 99, Kyoto, and NSLKDD)are categorical in nature.
Specific values are assigned to different samples to convert these features into numerical types such as for protocol type: ’TCP’ = 1, ’UDP’ = 2 and ’ICMP’ = 3, for service type ’aol’ =1, ’auth’=2, ;bgp’=3 and so on, and for flag ’oth’=1, ’Rej’=2 and so on. In this way, for each feature, the categories are converted to numerical form.
b) Change of class type from nonnumeric to numeric: The classes given in the KDDcup99 datset, and NSLkdd dataset are ’normal’, ’DOS’, ’Probe’, ’U2R’, and ’R2L’. These are assigned as 1, 2, 3, 4 and
5, respectively.
4.3 Feature Selection
In 3.1 we have discussed the need and the importance of feature selection. In this phase, we used multiobjective optimization for optimizing different objective functions in order to determine a subset of features. The process of feature selection is discussed later in Section 5.3. Different feature quality measures like mutual information, pearson correlation coefficient, entropy and information gain are calculated and NSGAII based feature selection approach is applied to get the Pareto optimal solutions.
4.4 Model Building
In this phase we applied different available machine learning classifiers on different Pareto optimal feature subsets found from different models; the models are discussed in Section 5.2. We used the validation data for finding the best subset of features. The subset on which validation data is giving the best result is chosen for building the final model.
4.5 Finding results for test data
After building a model, it is applied on the test data to get the results for an unknown set of data samples. For the test data, datapreprocessing is done. After preprocessing the data, the built model is applied on it to get the final result. In the final result, it is mentioned that, in which class the sample belongs to.
5 The Proposed Multiobjective Feature Selection Approach
In this paper, we have devised a filter based feature selection approach, which uses the optimization capabilities of fast and elitist nondominated sorting genetic algorithm (NSGAII) Deb et al. (2000)
for optimizing different feature quality measures in order to determine the optimal feature subsets. This can help in achieving the best classification accuracy for the IDS. Three different feature quality measures are optimized simultaneously to get the Pareto optimal feature subsets. The average dissimilarity of the selected features, the average similarity of the nonselected features, and the average standard deviation of the selected features are optimized simultaneously to identify the optimal feature subsets. After selecting the optimal feature subsets, we have applied different machine learning classifiers such as decision tree, support vector machine, random forest, knearest neighbour, Adaboost etc. to check the behaviour of obtained feature subsets on different classifiers. Our proposed approach is divided into two stages. First stage deals with the optimization of feature subsets and the second stage deals with the calculation of classification accuracies using differently available classifiers. Below, we have discussed the process of feature selection and application of different machine learning classifiers on the selected feature subsets.
5.1 Chromosome Representation
In Genetic algorithm, chromosomes are used to represent a solution as discussed in Section 3. For feature selection problem, binary encoding is used. If the feature set contains number of features, then a string with binary digits is used. Each binary digit represents a feature, value of 1 is used to represent the selection of the feature and 0 is used to represent the rejection of the feature.
Let us consider a problem of selecting features among 10 features. The encoded string would be a string of 10 binary digits. Let ”0001000110” be the encoded string. It means 4th, 8th and 9th features are selected, and remaining others are rejected. In this way, we have represented our chromosomes.
5.2 Different Models for Objective Function Evaluations
For evaluating the significance of each objective function, we have created different models using different combinations of mutual information, Pearson correlation coefficient, information gain, standard deviation, and entropy. The feature subset is divided into two mutually exclusive subsets namely selected feature subset (SF) and nonselected feature subset (NSF). Selected feature subsets refer to the set of all features which are selected after optimization (features whose corresponding entries in the chromosome are 1s) and nonselected feature subset (features whose corresponding entries in the chromosome are 0s) consist of those features, which are not selected after optimization.
5.2.1 ModelI
In the first model, we have used mutual information for measuring the similarity between the features, and standard deviation for checking the attribute values of the selected features. The first objective function (.) is defined as the average of normalized mutual information between selected features. As the goal of feature selection is to remove the irrelevant features from the feature set, so mutual information between relevant features must be low. Thus, the should be minimized. To avoid the overhead of minimization, we have changed to its reciprocal, so that it could also be maximized.
(12) 
(13) 
The second objective function (.) is defined as the average of normalized mutual information between nonselected features and the nearest selected feature. It indicates that if features which are described by one of the selected features are removed, then mutual information must be high. Thus, the should be maximized.
(14) 
where 1NN () returns the first nearest neighbor of the non selected feature from the selected feature subset. We used euclidean distance to find the distance between two features.
The third objective function is defined as the average of the standard deviations of selected features. Larger the variation of values of a feature, more it will help in finding class labels. Thus, needs to be maximized.
(15) 
High value of mutual information between two variables indicates high redundancy of information in the dataset. For fast and accurate classification, it is necessary to have mutually exclusive and irredundant feature set. Standard deviation is a measure, which is used to state how values for a feature are deviated from the average. If the values are spread out, the value of standard deviation is high, otherwise low. These two performance measures describe how irredundant, mutually exclusive and larger the domain feature subset is. Thus functions described in Equations 13, 14 and 15 need to be maximized. In ModelI, we have optimized the above objective functions to extract the optimal subset of features.
We have developed two variations of ModelI: i) ModelI(a)  maximizing all equations simultaneously, and ii) ModelI(b) ignoring the standard deviation of 4th and 5th features while maximizing equation 15. Since the standard deviations of and are very large (988217.1009 and 33039.9678, respectively), so we ignored these two features in order to get rid of biased optimization, as we have to maximize the average standard deviation.
5.2.2 ModelII
Entropy measures the impurity of a feature. Larger the entropy of a feature is, larger information it will contain. Thus we have considered entropy in place of standard deviation. We have replaced the standard deviation of ModelI by entropy in order to evaluate the significance of standard deviation and entropy.
(16) 
In ModelII, the first two equations are same as ModelI, which are 13, and 14. We have changed the third equation with 16. Thus in ModelII, we will optimize the functions reported in equations 13, 14, and 16.
5.2.3 ModelIII
Information gain (IG) measures the change in entropy after adding an attribute of a feature. Thus it can also be used as a similarity measure for feature subsets. We have replaced mutual information of ModelI with IG, not the standard deviation to develop the ModelIII. Using IG, we can have three different objective functions denoted as (.), (.) and (.). The new objective functions are :
(17) 
(18) 
The functions described for ModelIII in equations 17, 18 and 15 need to be maximized, in order to get the optimal feature subsets.
Similar to ModelI, we have created two variations of modelIII also; ModelIII(a) and ModelIII(b) one for optimizing all the objectives simultaneously and other which ignores the std. deviation of 4th and 5th features while maximizing the std. deviation of the selected features.
Model  Objective functions 

ModelI  Mutual information and standard deviation 
ModelII  Mutual information and entropy 
ModelIII  Information gain and standard deviation 
ModelIV  Information gain and entropy 
ModelV  Pearson correlation coefficient and standard deviation 
ModelVI  Pearson correlation coefficient and entropy 

5.2.4 ModelIV
5.2.5 ModelV
Pearson correlation coefficient (PCC) can be used in place of mutual information. PCC measures the linear correlation between two random features. It can also be used as selection criteria for features. If its value between two features is less, they are less correlated, otherwise they are highly correlated. Thus, PCC value can also be used as an optimization criteria which will measure the similarity between the features subsets. For Modelv, we have three different objective functions denoted as (.), (.) and (.). We have replaced mutual information by Pearson correlation coefficient, not the standard deviation. The new objective functions are :
(19) 
(20) 
In ModelV, the functions mentioned in equations 19, 20 and 15 need to be maximized, in order to get the optimal feature subsets using Pearson correlation coefficient and standard deviation. Similar to ModelI and ModelIII, we have created two variations of modelV also, ModelV(a) and ModelV(b) one for optimizing all the objectives simultaneously and other which ignores the std. deviation of 4th and 5th feature while maximizing the std. deviation of the selected features.
5.2.6 ModelVI
5.3 Finding optimal feature subsets
We have utilized the steps of nondominated sorting genetic algorithmII, a popular multiobjective genetic algorithm, to obtain optimal feature subsets with respect to different objective functions. We have created different models after optimizing different feature quality measures using NSGAII as tabulated in Table 3. We have optimized different combinations of feature quality measures to develop different models using NSGAII. It is a fast and elitist way of optimizing different objective functions to obtain a set of Pareto optimal solutions.
In Algorithm 1, we have discussed the algorithm for obtaining the optimal feature subsets proposed in the current work. Here first, the population is initialized by randomly generating some solutions. Then we calculate the values of different objective functions for individual solutions in the population. Values of different objective functions are used to identify fronts utilizing nondominated sorting approach. After that, crowding distance values are calculated for solutions of different fronts. Then we generate another population from the initial population by using selection, crossover, and mutation operations. Now the initial population and new offsprings are combined together in order to maintain the elitist nature of NSGAII. On the combined population, fronts and crowding distance values are calculated again. Then using the crowding distance comparison operator, the best individuals are selected for the next generation. This whole process is repeated until the number of generations equals the maximum number of generations. After completion of the maximum number of generations, a population of solutions is generated which contains the set of optimal feature subsets.
5.4 Applying machine learning classifiers on optimal subsets
We have identified optimal feature subsets using NSGAII after optimizing different feature quality measures. The optimization of these objective functions leads to the identification of mutually exclusive and highly relevant feature subsets. Since NSGAII is a multiobjective optimization approach, it provides a set of optimal feature subsets on the final Pareto optimal front. We have applied differently available machine learning classifiers such as decision tree, support vector machine, random forest, knearest neighbour classifier, Adaboost, multilayer Perceptron on the obtained optimal feature subsets for developing some intrusion detection systems.
6 Simulation results
We have performed the simulation of our proposed algorithm on KDD Cup 99, NSLKDD dataset and Kyoto 2006 datasets. In Section 6.1, we have tabulated all the results for KDD dataset, in section 6.2 we have mentioned the results of NSLKDD dataset and in Section 6.3, the results for Kyoto datasets are mentioned.
6.1 Results on KDD99
Our IDS framework works in two phases: first is the identification of optimal feature subsets and second is the development of best IDS classifiers utilizing the best subset of features identified in the first stage.
6.1.1 Phase 1: Finding optimal feature subsets using NSGAII
For finding Pareto optimal subsets, we have performed simulations for each of the models discussed in Table 3. We have tabulated the maximum, minimum and average lengths of feature subsets obtained from all the models in Table 4. It is observed that feature no. 0, 1, 2, 3, 4, 5, 6, 7, 10, 12, 15, 19, 21, 22, 23, 25, 29, 31, 32, 33, 36, and 39 are present in maximum number of models. Only some features are different in different model’s best feature subsets. Thus it is clearly demonstrated that all the obtained models select important features. Thus it also validates our assumption about different fitness functions.
Models  Max. length  Min. length  Avg. length 

ModelI(a)  22  12  18 
ModelI(b)  22  14  19 
ModelII  23  15  20 
ModelIII(a)  23  11  20 
ModelIII(b)  22  14  18 
ModelIV  23  16  20 
ModelV(a)  35  23  30 
ModelV(b)  33  23  28 
ModelVI  32  23  28 
6.1.2 Using multiclass classification
The standard KDDcup99 dataset comprises of five different classes, so we have opted for multiclass classification. We have applied different classifiers on all the feature subsets obtained from each of the optimizing cases as discussed in section 6.1.1. We have applied different classifiers on the identified feature subsets by different models. After analyzing the results of the classification system, we found that feature subsets obtained after executing ModelIII(a), provide best results in terms of accuracy on validaion data. The results are tabulated in table 5. We performed different tests by changing the classifier. After changing classifiers, we concluded that decision tree is giving the best result, so we have considered decision tree as the classifier for building the IDS. Here we have tabulated the best case result in Table 6.
For ModelI, ModelIII, and ModelV, we have executed the MOO two times: i) Maximizing all the equations simultaneously, and ii) Maximizing all equations simultaneously, but ignoring the standard deviation of 4th and 5th feature. After analyzing the length and the features present in the feature subsets of both cases, we concluded that ignoring standard deviation of 4th and 5th feature did not create a significant amount of change in the feature subsets obtained. This variation in the standard deviation is done just to check the performance of the optimizing algorithm. We analyzed that the lengths of obtained feature subsets and the features selected are almost the same and the classification results are also not much different.
Model  Feature subset  Min Accuracy 

ModelI(a)  0, 1, 2, 3, 4, 5, 11, 12, 15, 16, 19, 21, 22, 23, 25, 27, 29, 31, 32, 33, 36  99.01 
ModelI(b)  0, 1, 2, 3, 11, 12, 13, 15, 19, 21, 22, 23, 25, 29, 31, 32, 33, 36, 39  99.12 
ModelII  0, 1, 4, 5, 6, 7, 10, 12, 13, 15, 17, 18, 19, 21, 22, 23, 25, 26, 29, 31, 32, 33, 35, 36  99.36 
ModelIII(a)  0, 1, 2, 3, 4, 5, 6, 7, 9, 12, 14, 15, 16, 21, 22, 23, 28, 29, 36, 37, 39  99.38 
ModelIII(b)  0, 1, 2, 3, 4, 7, 8, 9, 10, 12, 13, 15, 21, 22, 23, 28, 29, 31, 32, 36, 37, 39  58.93 
ModelIV  1, 2, 3, 4, 5, 7, 8, 9, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 26, 28, 29, 30, 31, 32, 33, 36, 37, 39  58.94 
ModelV(a)  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 20, 21, 22, 23, 25, 27, 29, 30, 31, 32, 33, 36  79.70 
ModelV(b)  0, 2, 3, 4, 6, 7, 8, 10, 11, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 25, 27, 29, 30, 31, 32, 33, 36  99.37 
ModelVI  0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 13, 15, 16, 17, 21, 22, 23, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39  58.45 
Classifiers  Decision tree  Random forest  KNN  MLP  Adaboost 
Max. accuracy  99.38  97.56  97.59  97.61  97.38 
Avg. accuracy  96.85  97.56  96.57  94.56  21.52 
Min. accuracy  98.31  92.46  95.34  85.09  58.97 
In fig. 3, we have shown the confusion matrix for the multiclass classification of the best subset obtained after the optimization of ModelIII(a) for the test data. The classes are not balanced properly and its effect can be clearly observed in the confusion matrix obtained. For classes, normal and Dos, the number of True Positives is very good, both of the classes are classified correctly. Out of 60593 data samples of the normal class, 601255 samples are classified correctly. For Dos class, out of 222200 samples, 222045 samples are detected correctly. The number of samples for U2r attack is very less, only 39. Out of 39 samples, only 16 are classified correctly, while 18 are detected as Dos and 5 as belong to R2l class. The data samples, which belong to Probe class are classified most accurately. Out of 2377 test samples, 2369 samples are correctly classified, 4 are classified as Dos and 4 as a normal class. For R2l class, out of 5993 samples, 4392 samples are classified as R2l, 4 samples as U2r, 1 sample as DOS, and 1566 samples as normal class. In this way, the test data samples are classified efficiently using the classification model.
Class 
Accuracy  Detection rate  Precision  False alarm rate  Fmeasure 
Normal  99.32  99.44  97.37  0.7  98.39 
Dos  99.91  99.93  99.96  0.1  99.94 
Probe  99.87  99.66  86.58  0.1  92.66 
U2R  99.98  41.02  64.00  0.003  50.00 
R2l  99.43  73.28  99.09  0.01  84.25 
Weighted average  99.78  99.27  99.29  0.2  99.23 

In Table 7, the different performance measures like Accuracy, Precision, Detection rate, False alarm rate, Fmeasure values are reported for the test data. We have discussed all these metrics in Section 3.5. Since there are five different classes, so we got five different values of each class for each measure. The weighted average value is considered for giving an unbiased result. The results are discussed in section 7 in details.
6.1.3 Results using binary classification
The KDDcup 99 dataset has different classes. Researchers have also done 5class binary classification (one vs rest) for each class Ambusaidi et al. (2016), Chebrolu et al. (2005). It was done because the classes are not balanced properly, so calculating the performance measures for each class separately may not give accurate results in multiclass classification. We have used binary classification (one vs rest) to show the goodness of the algorithm for a specific class. For each class, we created a new training and test sets. We have assigned the class value as 1 to all the samples belonging to that class and for remaining samples class values are set to 0. In this way, we got 5 different results for each class. We have calculated the accuracy, precision, detection rate, false alarm rate and Fmeasure values for each of the binary classification results for test data.In Table 8, all the measures are reported, but we have ignored the weighted average section because the classification was binary classification, so there is no need of weighted average.
Class 
Accuracy  Detection rate  Precision  False Alarm Rate  Fmeasure 
normal  98.17  92.47  99.31  0.18  95.76 
Dos  99.96  99.96  99.98  0.04  99.97 
Probe  99.38  58.03  88.13  0.09  69.98 
U2R  99.83  3.39  41.02  0.007  6.274 
R2l  98.27  95.89  16.76  1.7  28.54 

6.2 Results using NSLKDD dataset
We have divided the ”NSLKDDTrain+20%” data into two parts: namely, training and testing. 80% of data are considered as training data and the remaining 20% are considered as testing data. The dataset is partitioned in such a manner that, there is no overlap between the training and the test data.
We have already found a feature subset by the NSGAII algorithm using KDD Cup dataset. Since, the data distributions for both the NSL and the KDD datasets are same, so we do not need to find the feature subset again. We applied different available machine learning classifiers on the NSLKDD dataset and tabulated the results in Table 9.
Class 
Accuracy  Detection rate  Precision  False alarm rate  Fmeasure 
Normal  99.57  99.58  99.62  0.4  99.60 
Dos  99.90  99.93  99.97  0.1  99.86 
Probe  99.82  98.85  99.13  0.08  98.99 
U2R  99.77  84.37  87.09  0.1  85.71 
R2l  99.92  66.66  0.5  0.04  57.14 
Weighted avg.  99.83  99.16  98.73  0.18  98.92 
Class 
Accuracy  Detection rate  Precision  False Alarm Rate  Fmeasure 
Known attack  99.65  99.70  99.63  0.3  99.67 
normal  99.65  99.59  99.68  0.2  99.63 
unknown attack  99.99  97.5  97.08  0.004  97.29 
weighted avg.  99.65  99.65  99.65  0.3  99.65 

6.3 Results using Kyoto 2006+
We have also used Kyoto 2006+ dataset for evaluating the performance of the proposed feature selection algorithm. In Section 3.4.3, the dataset and the corresponding attributes are discussed. Some features of Kyoto dataset are also categorical in nature, such as service, flag and protocol. We have applied the data preprocessing part here also. We have chosen the data of 27th August 2009. For testing the performance of the algorithm, we have used 10 fold crossvalidation method. We applied NSGAII based feature selection technique with the proposed models (Section 5.2) to obtain a set of feature subsets which are optimal with respect to the objective functions of different models. After finding the subsets, we applied 10 fold cross validation to get the best results. As we have found that Decsion Tree attains the best result for other datasets, so we applied decision tree for finding the results. The results are shown in Table 10.
7 Final results and Comparison of our proposed method with other classifiers
We have proposed a new unsupervised feature selection approach using multiobjective optimization. In this section, we have discussed the final results of multiclass classification and the binary classification also. Since we have considered different models for optimization, so it would be good to discuss the final results separately. Our algorithm has performed well in all the cases, but in this section, we have reported only the best cases. In Table 7 , we have reported different performance measures for different classes. Our proposed approach attained an accuracy of 99.32% for normal, 99.91% for Dos, 99.87% for probe, 99.98% for U2R and 99.43% for R2l classes. After calculating the weighted average of accuracies, our proposed system attained overall 99.78% accuracy. We have to detect the attack classes, so we found the detection rates for different classes separately. Our proposed system attained a detection rate of 99.44% for normal, 99.93% for Dos, 99.66% for probe, 41.02% for U2R and 71.01% for R2l classes. The number of samples in Probe class is very less, so we attained a very less detection rate for this case. We have considered weighted average for calculating the overall performance of the system. The reason for taking weighted average is that the classes are not balanced. We have already tabulated the distribution of instances in Table 2. As we can see that, for classes like R2l, U2R and probe, comparatively sample instances are very less. Thus it is a good choice to evaluate weighted average. With weighted average, proposed system attained an accuracy of 99.78%, detection rate of 99.27%, a precision of 99.29 %, a false alarm rate of 0.2% and a Fmeasure of 99.23% .
In order to show that performance improvements obtained by our proposed approach are not happened by chance but those are statistically significant, we have performed statistical ttest over results obtained by the proposed MOObased approach with different models and existing best model (LSTM based
Kim et al. (2017)). We have reported the results of statistical significance tests in Table 11. From this table, it is clear that all the models proposed in this paper attain improved results over the LSTM based approach Kim et al. (2017). LSTM model requires a huge amount of labelled data for learning architecture and weight values whereas our proposed unsupervised feature selection approach requires no labeled data while selecting features. Labeled data with limited feature set is used only in developing decision tree based classifier.We have also tabulated the results of binary classification of KDDdataset in table 8. We got an accuracy of 98.17%, 99.96%, 99.38%, 99.83%, and 98.27% for normal, Dos, Probe, U2R, and R2l class respectively. We have also tabulated other performance measures in the table too.
For 20% of NSLKDD dataset, we got an overall accuracy of 99.83%, 99.16% detection rate, 98.73% precision, 0.18% of false alarm rate, and 98.92% of Fmeasure. These results are tabulated in 9. We got an overall accuracy of 99.65%, detection rate of 99.65%, 0.3% of false alarm rate and 99.65% of Fmeasure for the Kyoto dataset also. These results are tabulated in table 10.
Models  tvalue  Is Pvalue than 0.00001  Is significant at P than 0.5 

ModelI(a)  22.17  Yes  Yes 
ModelI(b)  46.864  Yes  Yes 
ModelII  64.25  Yes  Yes 
ModelIII(a)  10.14  Yes  Yes 
ModelIII(b)  8.14279  Yes  Yes 
ModelIV  10.93  Yes  Yes 
ModelV(a)  11.28  Yes  Yes 
ModelV(b)  12.735  Yes  Yes 
ModelVI  49.72  Yes  Yes 
In Table 12, we have presented a tabular comparison of results of different published works with respect to accuracy values and the achieved accuracy by our proposed model. The best accuracy and detection rate for the current dataset are achieved by Ref. Kim et al. (2017)
. Authors had applied long short term memory recurrent neural network with all the 41 features. They got 97.54 % accuracy and 98.95 % detection rate. Our proposed model gives a set of feature subsets in which the maximum length of the feature subset is 23, minimum length is 11, and the average length of all feature subsets is 20. Thus it can be said that our proposed classifier will be less costly as compared to the LSTM based classifier, as it uses all the 41 features for the classification purpose because our classifier uses only 21 features.
classifier  Precision  Detection rate  Accuracy  False alarm rate 
Devaraju and Ramakrishnan (2014) GNNN  87.08  59.12  93.05  12.46 
Kim et al. (2016) LSTM  98.88  96.93  10.04  
Wang et al. (2010)fuzzy clustering  96.75  
Kim et al. (2017) LSTM  97.69  98.95  97.54  9.98 
Proposed Algorithm  99.78  99.29  99.27  0.2 
We have also compared our method with other feature selection based Intrusion detection system. In table 13, we have tabulated some other works, in which authors have selected optimal feature subsets in order to achieve good accuracy. In Aghdam and Kabiri (2016), authors observed the accuracy of 98.9 % using ant colony optimization strategy. At the same time, our system got an overall accuracy of 99.78%. We have not compared our performance metrics with Ambusaidi et al. (2016) because they have used classes for detecting relevant set of feature subset. They got an accuracy of 99.79%, detection rate of 99.46%, and false alarm rate of 0.13% which is not very much high in compare to our result.
We have also achieved an accuracy of 99.57%, 99.90%, 99.82%, 99.77%, and 99.92% for Normal, DOS, Probe, U2R, and R2L class respectively for the test dataset of NSLKDD dataset which is better that the previous best result described in Aljawarneh et al. (2018). They got an accuracy of 99.7%, 99.9%, 96.2%, 99.1%, and 97.9% for normal, Dos, Probe, U2R, and R2l respectively. Thus it can also be seen that our method is performing well on NSLKDD dataset also.
Method  Accuracy 

Ant colony optimization method Aghdam and Kabiri (2016)  98.9 
Cuttlefish optimization Eesa et al. (2015)  91.98 
Chisquare based Thaseen and Kumar (2017)  98 
clustering feature Horng et al. (2011)  95.7 
KDDwinner Pfahringer (2000)  91.8 
KDDrunner up Levin (2000)  91.5 
Proposed Method  99.78 

8 Conclusion and future Work
In this research work, we have devised a new filterbased feature selection technique in multiobjective optimization framework for detecting relevant features from an unlabeled data set. Our goal was to devise a new algorithm for selecting features without using any class label, and without compromising the accuracy of IDS. Originally, there were 41 attributes in the standard KDD99 dataset, but our designed model uses a maximum of 23 features, an average of 20 and a minimum of 11 features for classification. Our system gives an accuracy of 99.38% for multiclass classification. This accuracy is calculated according to equation 9. Since the classes are not balanced properly so finding the weighted average (equation 10 ) of accuracy values would be a better choice. Our system attains a weighted average of 99.78% accuracy with Decision Tree classifier. To the best of our knowledge, this is the bestreported accuracy compared to stateoftheart techniques. In Kim et al. (2017), authors have reported an accuracy of 97.54% and in Ambusaidi et al. (2016) authors have reported an accuracy of 99.80 %, but here 5class binary classification problem is solved and class labels are utilized for selecting a suitable set of features. Our devised method attains the best accuracy as well as it is an efficient unsupervised method for selecting relevant features from a huge and unlabeled dataset. We are also getting an weighted average accuracy of 99.83% for NSLKDD dataset and 99.65% for Kyoto dataset.
In future, we aim to implement some deep learning approaches to classify the system using the feature sets obtained by our proposed feature selection technique. In future, we have also planned to devise some novel MOObased wrapper algorithms, which can find optimal feature subsets using Convolution neural network, Autoencoder, recurrent neural network, long shortterm neural network for classification purpose.
References
 Ambusaidi et al. (2016) M. A. Ambusaidi, X. He, P. Nanda, Z. Tan, Building an intrusion detection system using a filterbased feature selection algorithm, IEEE transactions on computers 65 (2016) 2986–2998.
 Chebrolu et al. (2005) S. Chebrolu, A. Abraham, J. P. Thomas, Feature deduction and ensemble design of intrusion detection systems, Computers & security 24 (2005) 295–307.
 Peddabachigari et al. (2007) S. Peddabachigari, A. Abraham, C. Grosan, J. Thomas, Modeling intrusion detection system using hybrid intelligent systems, Journal of network and computer applications 30 (2007) 114–132.
 Ambusaidi et al. (2015) M. A. Ambusaidi, X. He, P. Nanda, Unsupervised feature selection method for intrusion detection system, in: Trustcom/BigDataSE/ISPA, 2015 IEEE, volume 1, IEEE, pp. 295–301.
 Thaseen and Kumar (2017) I. S. Thaseen, C. A. Kumar, Intrusion detection model using fusion of chisquare feature selection and multi class svm, Journal of King Saud UniversityComputer and Information Sciences 29 (2017) 462–472.
 Salunkhe and Mali (2017) U. R. Salunkhe, S. N. Mali, Security enrichment in intrusion detection system using classifier ensemble, Journal of Electrical and Computer Engineering 2017 (2017).
 Wang et al. (2018) W. Wang, J. Liu, G. Pitsilis, X. Zhang, Abstracting massive data for lightweight intrusion detection in computer networks, Information Sciences 433 (2018) 417–430.
 Aljawarneh et al. (2018) S. Aljawarneh, M. Aldwairi, M. B. Yassein, Anomalybased intrusion detection system through feature selection analysis and building hybrid efficient model, Journal of Computational Science 25 (2018) 152–160.
 Wang et al. (2010) G. Wang, J. Hao, J. Ma, L. Huang, A new approach to intrusion detection using artificial neural networks and fuzzy clustering, Expert systems with applications 37 (2010) 6225–6232.
 Horng et al. (2011) S.J. Horng, M.Y. Su, Y.H. Chen, T.W. Kao, R.J. Chen, J.L. Lai, C. D. Perkasa, A novel intrusion detection system based on hierarchical clustering and support vector machines, Expert systems with Applications 38 (2011) 306–313.
 Benaicha et al. (2014) S. E. Benaicha, L. Saoudi, S. E. B. Guermeche, O. Lounis, Intrusion detection system using genetic algorithm, in: Science and Information Conference (SAI), 2014, IEEE, pp. 564–568.
 Gharaee and Hosseinvand (2016) H. Gharaee, H. Hosseinvand, A new feature selection ids based on genetic algorithm and svm, in: Telecommunications (IST), 2016 8th International Symposium on, IEEE, pp. 139–144.
 Aghdam and Kabiri (2016) M. H. Aghdam, P. Kabiri, Feature selection for intrusion detection system using ant colony optimization., IJ Network Security 18 (2016) 420–432.
 Dhopte and Chaudhari (2014) S. Dhopte, M. Chaudhari, Genetic algorithm for intrusion detection system, IJRIT International Journal of Research in Information Technology 2 (2014) 503–509.
 Eesa et al. (2015) A. S. Eesa, Z. Orman, A. M. A. Brifcani, A novel featureselection approach based on the cuttlefish optimization algorithm for intrusion detection systems, Expert Systems with Applications 42 (2015) 2670–2679.
 Shah et al. (2018) A. A. Shah, M. K. Ehsan, K. Ishaq, Z. Ali, M. S. Farooq, An efficient hybrid classifier model for anomaly intrusion detection system, International Journal of Computer Science and Network Security 18 (2018) 127–+.
 Najizad and Najizad (2017) M. Najizad, M. Najizad, Optimization of intrusion detection systems to increase the efficiency using artificial neural networks, International Journal of Computer Science and Network Security 17 (2017) 112–118.
 Kim et al. (2016) J. Kim, J. Kim, H. L. T. Thu, H. Kim, Long short term memory recurrent neural network classifier for intrusion detection, in: Platform Technology and Service (PlatCon), 2016 International Conference on, IEEE, pp. 1–5.
 Kim et al. (2017) J. Kim, H. Kim, et al., An effective intrusion detection classifier using long shortterm memory with gradient descent optimization, in: Platform Technology and Service (PlatCon), 2017 International Conference on, IEEE, pp. 1–6.
 Javaid et al. (2016) A. Javaid, Q. Niyaz, W. Sun, M. Alam, A deep learning approach for network intrusion detection system, in: Proceedings of the 9th EAI International Conference on Bioinspired Information and Communications Technologies (formerly BIONETICS), ICST (Institute for Computer Sciences, SocialInformatics and Telecommunications Engineering), pp. 21–26.
 Putchala (2017) M. K. Putchala, Deep Learning Approach for Intrusion Detection System (IDS) in the Internet of Things (IoT) Network using Gated Recurrent Neural Networks (GRU), Ph.D. thesis, Wright State University, 2017.
 Chawla (2017) S. Chawla, Deep learning based intrusion detection system for Internet of Things, University of Washington, 2017.
 Deb et al. (2000) K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, A fast elitist nondominated sorting genetic algorithm for multiobjective optimization: Nsgaii, in: International Conference on Parallel Problem Solving From Nature, Springer, pp. 849–858.
 Guyon and Elisseeff (2003) I. Guyon, A. Elisseeff, An introduction to variable and feature selection, Journal of machine learning research 3 (2003) 1157–1182.
 Oh et al. (2004) I.S. Oh, J.S. Lee, B.R. Moon, Hybrid genetic algorithms for feature selection, IEEE Transactions on pattern analysis and machine intelligence 26 (2004) 1424–1437.
 Information gain in decision trees (2019) Information gain in decision trees, Information gain in decision trees — Wikipedia, the free encyclopedia, 2019. [Online; accessed 02April2019].
 Goshtasby (2012) A. A. Goshtasby, Similarity and dissimilarity measures, in: Image registration, Springer, 2012, pp. 7–66.
 Tavallaee et al. (2009) M. Tavallaee, E. Bagheri, W. Lu, A. A. Ghorbani, A detailed analysis of the kdd cup 99 data set, in: Computational Intelligence for Security and Defense Applications, 2009. CISDA 2009. IEEE Symposium on, IEEE, pp. 1–6.
 Song et al. (2006) J. Song, H. Takakura, Y. Okabe, Description of kyoto university benchmark data, Available at link: http://âĂŃ www. âĂŃ takakura. âĂŃ com/âĂŃ Kyoto_âĂŃ data/âĂŃ BenchmarkDataDescriptionv5. âĂŃ pdf.[Accessed on 15 March 2016] (2006).

Elhamahmy et al. (2010)
M. Elhamahmy, H. N. Elmahdy,
I. A. Saroit,
A new approach for evaluating intrusion detection
system,
CiiT International Journal of Artificial Intelligent Systems and Machine Learning 2 (2010).
 Devaraju and Ramakrishnan (2014) S. Devaraju, S. Ramakrishnan, Performance comparison for intrusion detection system using neural network with kdd dataset., ICTACT Journal on Soft Computing 4 (2014).
 Pfahringer (2000) B. Pfahringer, Winning the kdd99 classification cup: bagged boosting, ACM SIGKDD Explorations Newsletter 1 (2000) 65–66.
 Levin (2000) I. Levin, Kdd99 classifier learning contest: Llsoft’s results overview, SIGKDD explorations 1 (2000) 67–75.
Comments
There are no comments yet.