The Internet has evolved into a platform to deliver services from a platform to disseminate information. Consequently, misuse and policy violations by the attackers are routine affairs nowadays. Denning 
introduced the concept of detecting the cyber threats by constant monitoring of network audit trails using Intrusion Detection System (IDS) to discover abnormal patterns or signatures of network or system usage. Recent advancements in IDS are related to the use of machine learning and soft computing techniques that have reduced the high false positive rates which were observed in the earlier generations of IDSs . The statistical models in the data mining techniques provide excellent intrusion detection capability to the designers of the existing IDSs which have increased their popularity. However, the inherent complications of IDSs such as competence, accuracy, and usability parameters make them unsuitable for deployment in a live system having high traffic volume. Further, the learning process of IDSs requires a large amount of training data which may not be always available, and it also requires a lot of computing power and time. Studies have revealed that it is difficult to handle high-speed network traffic by the existing IDSs due to their complex decision-making process. Attackers can take advantage of this shortcoming to hide their exploits and can overload an IDS using extraneous information while they are executing an attack. Therefore, building an efficient intrusion detection is vital for the security of the network system to prevent an attack in the shortest possible time.
A traditional IDS may discover network threats by matching current network behavior patterns with that of known attacks. The underlying assumption is that the behavior pattern in each attack is inherently different compared to the normal activity. Thus, only with the knowledge of normal behavior patterns, it may be possible to detect a new attack. However, the automatic generation of these patterns (or rules) is a challenging task, and most of the existing techniques require human intervention during pattern generation. Moreover, the lack of exhaustive prior knowledge (or labeled data) regarding the attacks makes this problem more challenging. It is advantageous for any IDS to consider unlabeled examples along with the available (may be small in number) labeled examples of the target class. This strategy helps in improving the accuracy of the IDSs against the new attacks. An IDS which can use both labeled and unlabeled examples is known as a semi-supervised IDS. Another important aspect of any intrusion detection system is the time required to detect abnormal activity. Detection in real time or near real time is preferred as it can prevent substantial damage to the resources. Thus, the primary objective of this work is to develop a semi-supervised intrusion detection system for near real-time detection of cyber threats.
Numerous security breaches of computer networks have encouraged researchers and practitioners to design several Intrusion Detection Systems. For a comprehensive review, we refer to . Researchers have adopted various approaches to design IDSs, and a majority of them modeled the design problem as a classification problem. In 
, a feature selection method is used with a standard classifier like SVM as the conventional classifiers perform poorly due to the presence of redundant or irrelevant features. Authors of also adopted a similar approach. Most of these designs share one common disadvantage, i.e., they follow a supervised learning approach. Recently, a new semi-supervised IDS has been proposed in , and it outperforms the existing semi-supervised IDSs, but it suffers from the low accuracy of detection.
It is essential to understand the behavior patterns of the known attacks, as well as the behaviors of normal activity to discover and prevent the attacks. Generation of patterns or signatures to model the normal as well as the abnormal activities is a tedious process, and it can be automatized using the application of LAD. Peter L. Hammer introduced the concept of logical analysis of data (or LAD) in the year  and subsequently developed it as a technique to find the useful rules and patterns from the past observations to classify new observations [6, 9]. Patterns (or rules) can provide a very efficient way to solve various problems in different application areas, e.g., classification, development of rule-based decision support system, feature selection, medical diagnosis, network traffic analysis, etc. The initial versions of LAD [1, 9, 12] were designed to work with the binary data having either of the two labels, i.e., positive or negative. Thus, the data or observations were part of a two-class system. A specific goal of LAD is to learn the logical patterns which set apart observations of a class from the rest of the classes.
LAD has been used to analyze problems involving medical data. A typical dataset consists of two disjoint sets
which represent a set of observations consisting of positive and negative examples, respectively. Here, each observation is a vector consisting of different attribute values. In the domain of medical data analysis, each vector represents the medical record of a patient, and the patients inhave a specific medical condition. On the other hand, represents the medical records of the patients who do not have that condition. Subsequently, if a new vector / patient is given, one has to decide whether the new vector belongs to or , i.e., one has to determine whether the patient has the particular medical condition or not. Thus, in this example, the medical diagnosis problem can be interpreted as a two-class classification problem. The central theme of LAD is the selection of such patterns (or rules) which can collectively classify all the known observations. LAD stands out in comparison with other classification methods since a pattern can explain the classification outcome to human experts using formal reasoning.
Conventional LAD requires labeled examples for the pattern or rule generation. However, there exist several application domains (e.g., intrusion detection system, fraud detection, document clustering, etc.) where the existence of labeled examples are rare or insufficient. To harness the strength of LAD in these application domains, one needs to extend LAD for unsupervised and semi-supervised pattern generation . Here, we introduce a preprocessing methodology using which we can extend the LAD in such a manner that it can use unlabeled observations along with the labeled observations for pattern generation. Consequently, it acts like a semi-supervised learning approach. The central theme is to use the classical LAD to generate initial positive and negative patterns from the available labeled observations. Once the patterns are available, we measure the closeness of the unlabeled observations with the initial positive or negative patterns using balance score. The observations with high positive balance score are labeled as the positive observations and the observations having high negative balance score are labeled as the negative examples. Once labels are generated, the standard LAD can be used as it is. We have used this approach successfully in the design of a new semi-supervised and lightweight Intrusion Detection System (IDS) which outperforms the existing methods in terms of accuracy and requirement of computational power.
Creation of signatures or patterns to model the normal as well as the abnormal network activities can be accomplished using the semi-supervised LAD (or S-LAD in short), and in this effort, we have used S-LAD to design a semi-supervised IDS. Here, S-LAD is used to generate the patterns which can differentiate the normal activities from the malicious activities, and these patterns are later converted to rules for the classification of unknown network behavior(s). The proposed SSIDS has two phases, the offline phase is used to design a rule-based classifier. This phase uses historical observations, both labeled and unlabeled, to find the patterns or rules of classification, and require a significant amount of processing power. Once the classification rules are generated, the online phase uses those rules to classify any new observation. The online phase requires much less processing power than the offline phase, and it can detect threats in near real-time. The accuracy of proposed semi-supervised IDS is much better than any state-of-the-art semi-supervised IDS and comparable with the supervised IDSs.
The main contributions of the proposed paper are: (1) a new implementation of LAD having extensively modified pattern generation algorithm; (2) a new strategy to extend LAD that is suitable for the design of semi-supervised classifiers; (3) a LAD-based design of a lightweight semi-supervised intrusion detection system that outperforms any existing semi-supervised IDSs.
The rest of the paper is organized as follows. Next section gives a brief description of our modified implementation of LAD and Section III describes the proposed method to extend LAD to the semi-supervised LAD. Details of the proposed SSIDS is available in the Section IV. Performance evaluation and comparative results are available in the Section V and we conclude the paper in the Section VI.
Ii Proposed Implementation of LAD
LAD is a data analysis technique which is inspired by the combinatorial optimization methods. As pointed out earlier, the initial version of LAD was designed to work with the binary data only. Let us first briefly describe the basic steps of LAD when it is applied to the binary data. An observation havingattributes may be represented as a binary vector of length as the last bit (a.k.a. the class label) indicates whether it is a member of or . Thus, the set of binary observations () can be represented by a partially defined Boolean function (pdBf in short) , indicating a mapping of . The goal of LAD is to find an extension of the pdBf which can classify all the unknown vectors in the sample space. However, this goal is clearly unachievable and we try to find an approximate extension of . should approximate as closely as possible based on the several optimality criteria. Normally, the extension is represented in a disjunctive normal form (DNF). In brief, the LAD involves following steps .
Binarization of Observations. We have used a slightly modified implementation of binarization here.
Elimination of Redundancy (or Support Sets Generation).
Pattern Generation. Our extensively modified pattern generation algorithm makes the ’Theory Formation’ step redundant.
Theory Formation. We have omitted this step.
Classifier Design and Validation.
There are many application domains from the finance to the medical where the naturally occurring data are not binary [1, 5]. Thus, to apply LAD in those domains, a method to convert any data to binary is discussed in the subsection II-A. Moreover, we have modified the original pattern generation algorithm in such a manner that the coverages of every pair of patterns have a very low intersection. Thus, the step “theory formation” is no longer required. Recently, a technique to produce internally orthogonal patterns (i.e., the coverages of every pair of patterns have empty intersection) is also reported in .
Ii-a Binarization of Observations
A threshold (a.k.a. cut-point) based method was proposed to convert the numerical data to binary. Any numerical attribute is associated with two types of Boolean variables, i.e. the level variables and the interval variables. Level variables are related to the cut-points and indicate whether the original attribute value is greater than or less than the given cut-point . For each cut-point , we create a Boolean variable such that
Similarly, interval variables are created for each pair of cut-points and and represented by Boolean variable such that
We are yet to discuss how the cut-points are determined. The cut-points should be chosen carefully such that the resultant pdBf should have an extension in the class of all Boolean functions . Let us consider the numerical attribute having distinct values present in the observations and the attribute values are ordered such that . We introduce a cut-point between and if they belong to different classes. The resulting pdBf is referred to as the master pdBf if we create cut-point for each pair of values. Note that, the resultant master pdBf has extension in if and only if .
The process for selection of cut-points is explained below using an example from . The original dataset presented in the Table V is converted to the Table V by adding the class labels (or truth values of pdBf). Those observations that are the members of have as the class label and rest of the observations have as the class labels. Now, if we want to convert the numeric attribute to binary, we form another dataset as represented in the Table V. Next, we sort the dataset over the attribute to get a new dataset that is presented in the Table V. After that, we apply the following steps to get the cut points.
Preprocessing of : This step is a slight modification of the usual technique used in [5, 6], and other related papers. If two or more consecutive observations have the same attribute value but different class labels, remove all those observations except one observation. Now, we change the existing class label of to a new and unique class label which does not appear in and include that in the set of class labels of . Refer to Table V.
Now, if two consecutive observations and have different class labels, introduce a new cut-point as
If we follow the above mentioned steps, the obtained cut-points are , , . Thus, we will have six Boolean variables consisting of three level variables and three interval variables corresponding to these cut-points.
A “nominal” or descriptive attribute can be converted into binary very easily by relating each possible value of with a Boolean variable such that
Ii-B Support sets generation
Binary dataset obtained through the binarization or any other process may contain redundant attributes. A set of binary attributes is termed as a support set if the projections and of and , respectively, are such that . A support set is termed minimal if elimination any of its constituent attributes leads to . Finding the minimal support set of a binary dataset, like Table XI (see Appendix), is equivalent of solving a set covering problem. A detailed discussion on the support set, minimal support set and a few algorithms to solve the set covering problem can be found in [2, 9, 12]. Here, we have used the “Mutual-Information-Greedy” algorithm proposed in  to solve the set covering problem in our implementation. Note that, our implementation produces the set in a manner such that the constituent binary attributes are ordered according to their discriminating power and it helps us to achieve the simplicity objective which is mentioned in the description of LAD. Following binary feature variables are selected if we apply the said algorithm: .
Ii-C Modified pattern generation method
Let us first recall a few common Boolean terminologies that we may require to describe the pattern generation process. A Boolean variable or its negation is known as literals and conjunction of such literals is called a term. The number of literals present in a term is known as its degree. The characteristic term of a point is the unique term of degree , such that . The term is said to cover the point if . A term is called a positive pattern of a given dataset if
for every point .
for at least one point .
Similarly, one can define the negative patterns. Here, is defined as . Both the positive and the negative patterns play a significant role in any LAD based classifier. A positive pattern is defined as a subcube of the unit cube that intersects but is disjoint from . A negative pattern is defined as a subcube of the unit cube that intersects but is disjoint from . Consequently, we have a symmetric pattern generation procedure. In this paper, we have used an extensively modified and optimized version of the pattern generation technique that has been proposed by Boros et al. .
increases the probability that a point or observation is only covered by a single pattern instead of multiple patterns. We expect that the majority of the observations will be covered by a unique pattern. Thus, we no longer require the ‘theory formation’ step to select the most suitable pattern to cover an observation. In Step22, we have ensured that a pattern is selected if and only if it covers at least many positive observations. This ensures that a selected pattern occurs frequently in the dataset. One major drawback of this approach is that if , then it may so happen that all the observations present in the dataset may not be covered by the selected set of patterns. However, a properly chosen value of ensures that more than of the observations are covered. Note that, the negative prime patterns can also be generated in a similar fashion. If we apply the algorithm 1 over the projection of the binary dataset presented in the Table XI (see Appendix), following positive patterns are generated: (i) , (ii) , (iii) using and the corresponding negative patterns are (i) , (ii) .
Ii-D Design of Classifier
The patterns which are generated using Algorithm 1, are transformed into rules and later these rules are
used to build a classifier. The rule generation process is trivial and it’s explained using an example. Let us take the first positive
pattern . The meaning of is whether is true or false as evident from the Table XI. Similarly, the meaning of
is whether is true or false. Consequently, the rule generated from the pattern is
) . The corresponding pseudo-code is as follows.
We can combine more than one positive rule into an ‘if else-if else’ structure to design a classifier. Similarly, one can build a classifier using the negative patterns also. Hybrid classifiers can use both the positive and the negative rules to design a classifier. A simple classifier using the positive patterns is presented below.
In general, a new observation is classified as positive if at least a positive pattern covers it and no negative pattern covers it. Similar definition is possible for negative observations. However, in the ‘Simple Classifier’, we have relaxed this criterion and we consider as negative if it is not covered by any positive patterns. Another classification strategy that has worked well in our experiment is based on balance score. The balance score is the linear combination of positive () and negative () patterns and defined as :
The classification of the new observations is given by
Iii Extension of LAD
Majority of the applications of the LAD which are available in the existing literature , work with the labeled data during the classifier design phase. There are many applications where a plethora of data are available which are unlabeled or partially labeled. These applications require semi-supervised or unsupervised pattern generation approach. One such application is intrusion detection system
where the lightweight classification methods designed using the LAD are desirable. However, the dearth of labeled observations makes it difficult for the development of a LAD based solution. In this effort, we propose a pre-processing method which can label the available unlabeled data. However, the proposed method requires that some labeled data are available during the design of classifiers. Thus, the method is akin to a semi-supervised learning approach[zhu05, 22].
The process of class label generation is very simple and it uses a standard LAD based classifier  having balance score  as a discriminant function to classify an unlabeled observation. First, we design a balance score based classifier using the set of available labeled observations DL. Later, we classify each observation in the unlabeled dataset using the balance score based classifier. However, we replace the classifier described in the Equation 5 by the Equation 6. Thus, we keep those observations unlabeled which are having very low balance score and those observations are also omitted form farther processing. Basically, we are ensuring that if a given observation has a strong affinity towards the positive or negative patterns, then only the observation is classified/labeled during the labeling process.
We have evaluated the performance of the said strategy using the KDDTrain_20 percent dataset which is part of the NSL-KDD dataset. The KDDTrain_20percent dataset consists of observations and we have partitioned the dataset into two parts. The first part DL consists of randomly selected observations, and the second part DUL consists of the rest of the observations. We have removed the labels from the observations of DUL. Afterward, DL is used to design a classifier based on the Equation 6. This classifier latter used for the classification of DUL and the output of the labeling process is a dataset DL which consists of all the labeled examples from the DUL. The results are summarized in the Table VI. It is obvious that any error in the labeling process will have a cascading effect on the performance of Algorithm 2. On the other hand, the unlabeled samples (marked as ) would have no such consequence on the performance of the proposed SSIDS. Thus, while reporting the accuracy of the labeling process, we have considered the labeled samples only. It is clear from the Table VI that the number of observations that are currently labeled is and these many observations would be used for farther processing. One important aspect that remains to be discussed is the values of , and . We have used and in our experiments. We have arrived at these values after analyzing the outcome of the labeling process on the training dataset DL.
Following the introduction of this pre-processing step, the steps of a semi-supervised LAD (or S-LAD) are as follows.
Class label (or truth value) generation.
Elimination of redundancy (or Support sets generation).
Classifier design and validation.
Iv Design of a Semi-Supervised IDS using S-LAD
Organizations and governments are increasingly using the Internet to deliver services, and the attackers are trying to gain unfair advantages from it by misusing the network resources. Denning  introduced the concept of detecting the cyber threats by constant monitoring of the network audit trails using the intrusion detection systems. The intrusion can be defined as the set of actions that seek to undermine the availability, integrity or confidentiality of a network resource [11, 13, 21]. Traditional IDSs that are used to minimize such risks can be categorized into two: (i) anomaly based, (ii) misuse based (a.k.a. signature based). The anomaly based IDSs build a model of normal activity, and any deviation from the model is considered as an intrusion. On the contrary, misuse based models generate signatures from the past attacks to analyze existing network activity. It was observed that the misuse based models are vulnerable to “zero day” attacks . Our proposed technique is unique in the sense that it can be used as either a misuse based or an anomaly based model. Hybridization is also possible in our proposed technique.
Iv-a Proposed Intrusion Detection System
The proposed SSIDS is presented in the Figure 1. It consists of two major phases, i.e., the offline phase and the online phase. The offline phase uses an S-LAD to design a classifier which online phase uses for real-time detection of any abnormal activity using the data that describe the network traffic. It is obvious that the offline phase should run at least once before the online phase is used to detect any abnormal activity. The offline phase may be set up to run at a regular interval of time to upgrade the classifier with the new patterns or rules. Let us now summarize the steps of the offline phase in Algorithm 2. Note that, the Step 4 of Algorithm 2 implicitly uses the Steps 6 to 8 to build the classifier. The online phase is very simple as it uses the classifier generated in the offline phase for the classification of new observations.
V Performance Evaluations
Most widely used datasets for validation of IDSs are NSL-KDD  and KDDCUP’99 . NSL-KDD is a modified version of the KDDCUP’99 dataset and we have used the NSL-KDD dataset in all our experiments. Both the datasets consist of features along with a class label for each observation. These features are categorized into four different classes and they are (i) basic features, (ii) content features, (iii) time-based traffic features, (iv) host-based traffic features. Here, the basic features are extracted from the TCP/IP connections without scanning the packets and there are nine such features in the NSL-KDD dataset. On the other hand, features which are extracted after inspecting the payloads of a TCP/IP connection are known as the content features and there are such features present in the dataset. A detailed description of the features is available in the Table VII. There are different types of attacks present in the dataset but we have clubbed them to one and consider them as “attack” only. Thus, there are two types of class labels that we have considered in our experiments and they are “normal” and “attack”. We have used the KDDTrain_20percent dataset which is a part of the NSL-KDD dataset to build the classifier in the offline phase. The KDDTest+ and the KDDTest-21 have been used during the online phase for validation testing. The details of the experimental setup are presented in Subsection V-A.
|3||service||S||Traffic (Time Based)||25||serror_rate||C|
|12||logged_in||S||Traffic (Host Based)||34||dst_host_same_srv_rate||C|
|21||is_hot_login||S||C means Continuous|
|22||is_guest_login||S||S means Symbolic|
V-a Experimental setup
The next step after the labels are generated is binarization. Detailed attention is needed to track the number of binary variables produced during this process. In the case ofnumeric or continuous features, the number of binary variables generated is directly dependent on the number of cut-points. Thus, if a feature is producing a large number of cut-points, it will increase the number of binary variables exponentially. For example, if the number of cut-points is , the total number of interval variables is and after considering the level variables, the total number of binary variables created will be . Consequently, the memory requirement will increase afterward to an unmanageable level. On the other hand, a large number of cut-points indicate that the feature may not have much influence on the classification of observations. Our strategy is to ignore such features completely. Another set of features which are having a fairly large number of cut-points are ignored partially. Given a feature , if the number of cut-points is greater than or equal to , we completely ignore the feature and if the number of cut-points is greater than or equal to but less than , we ignore that feature partially by only generating the level variables. We have arrived at these thresholds after empirical analysis using the training data. List of features that have been fully or partially ignored are presented in the Table VIII.
|Col. Num.||Input feature||#Cut-points||Ignored ?|
Another important aspect that we have incorporated into our design is the support of a pattern. Support of a positive (negative) pattern is if it covers positive (negative) observations and it should not cover any negative (positive) observation. Thus, the value of in Step 22 of Algorithm 1 holds immense importance. In a previous implementation  the value of have been used, but it is observed during experiments that such a low support is generating a lot of patterns/rules having little practical significance. Moreover, these patterns cause a lot of false positives during testing. An empirical analysis helps us to fix the threshold at . At this threshold, more than of the observations present in the training dataset are covered by the generated patterns having degree up to .
V-B Experimental Results
We have described all the steps required to design a classifier in the offline phase. Let us now summarize the outcome of the individual steps.
1. Labeling: We have used labeled observations for labeling unlabeled observations as described in Section III. This step produces labeled observations which have been used in the following steps to design the classifier.
2. Binarize: During this step, total binary variables are produced and a binary dataset along with its class labels having size is generated.
3. Support Set Generation: We have selected binary features according to their discriminating power.
4. Pattern Generation: During pattern generation, we found positive and negative patterns.
5. Classifier Design: We have developed a rule-based IDS using the positive patterns that are generated in the last step. Thus, the SSIDS contains rules. The details of the SSIDS is available in the Classifier 1. The NSL-KDD dataset contains two test datasets: (i) KDDTest+ having observations, and (ii) KDDTest21 having observations. These two datasets are used to measure the accuracy of the proposed SSIDS and the results related to the accuracy of the IDS is presented in Table IX. These results compare favorably with the state of the art classifiers proposed in , and the comparative results are presented in Table X. It is evident that the proposed SSIDS outperforms the existing IDSs by a wide margin.
|Dataset||Accuracy||Precision||Sensitivity||F1-Score||Time in sec.|
|Classifiers$||Accuracy using Dataset(%)|
|Experiment-1 of ||82.41||67.06|
|Experiment-2 of ||84.12||68.82|
|* Results as reported in .|
|@ Classifier designed using dataset DL only by|
|omitting the ‘labeling’ process.|
|$ All the classifiers use the same training|
|dataset, i.e., KDDTrain_20percent.|
The intrusion detection system (IDS) is a critical tool used to detect cyber attacks, and semi-supervised IDSs are gaining popularity as it can enrich its knowledge-base from the unlabeled observations also. Discovering and understanding the usage patterns from the past observations play a significant role in the detection of network intrusions by the IDSs. Normally, usage patterns establish a causal relationship among the observations and their class labels and the LAD is useful for such problems where we need to automatically generate useful patterns that can predict the class labels of future observations. Thus, LAD is ideally suited to solve the design problems of IDSs. However, the dearth of labeled observations makes it difficult to use the LAD in the design of IDSs, particularly semi-supervised IDSs, as we need to consider the unlabeled examples along with the labeled examples during the design of IDSs. In this effort, we have proposed a simple methodology to extend the classical LAD to consider unlabeled observations along with the labeled observations. We have employed the proposed technique successfully to design a new semi-supervised “Intrusion Detection System” which outperforms the existing semi-supervised IDSs by a wide margin both in terms of accuracy and detection time.
Logical analysis of data – the vision of peter l. hammer.
Annals of Mathematics and Artificial Intelligence49 (1), pp. 265–312. Cited by: §I, §II-D, §II, §II, §III, §III.
-  (1994) Learning boolean concepts in the presence of many irrelevant features. Artificial Intelligence 69, pp. 279–305. Cited by: §II-B.
-  (2016) Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Transactions on Computers 65 (10), pp. 2986–2998. Cited by: §I.
-  (2017) Fuzziness based semi-supervised learning approach for intrusion detection system. Information Sciences 378, pp. 484 – 497. External Links: Cited by: §I, §V-B, TABLE X.
-  (1997-10-01) Logical analysis of numerical data. Mathematical Programming 79 (1), pp. 163–190. External Links: Cited by: item 1, §II-A, §II.
-  (2000-03) An implementation of logical analysis of data. IEEE Trans. on Knowl. and Data Eng. 12 (2), pp. 292–306. External Links: Cited by: §I, item 1, §II-C, §II-C, §III, §V-A.
-  (2015-Sep.) Effective classification using a small training set based on discretization and statistical analysis. IEEE Transactions on Knowledge and Data Engineering 27 (9), pp. 2349–2361. External Links: Cited by: §I.
-  (2018) Logical analysis of data as a tool for the analysis of probabilistic discrete choice behavior. Computers & Operations Research. External Links: Cited by: §II.
-  (1988-01) Cause-effect relationships and partially defined boolean functions. Ann. Oper. Res. 16 (1-4), pp. 299–325. External Links: Cited by: §I, §II-B.
-  (2019) Design of a fdia resilient protection scheme for power networks by securing minimal sensor set. In Proceedings of 2019 International Workshop on Artificial Intelligence and Industrial Internet-of-Things Security, LNCS-11605, External Links: Cited by: §II-A.
-  (1987-02) An intrusion-detection model. IEEE Transactions on Software Engineering SE-13 (2), pp. 222–232. External Links: Cited by: §I, §IV.
-  (1986-04) Partially defined boolean functions and cause-effect relationships. In International Conference on Multi-attribute Decision Making Via OR-based Expert Systems. University of Passau, Passau, Germany, pp. . Cited by: §I, §II-B.
-  (2009) Conversion Methods for Symbolic Features: A comparison applied to an Intrusion Detection Problem. Expert Systems with Applications 36 (7), pp. 10612–10617. Cited by: §IV.
-  (1999) KDD cup 1999 data. Note: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html[Online; accessed 05-July-2018] Cited by: §V.
A novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Expert Systems with Applications 41 (4, Part 2), pp. 1690 – 1700. External Links: Cited by: §I.
-  (2013) Intrusion detection system: a comprehensive review. Journal of Network and Computer Applications 36 (1), pp. 16 – 24. External Links: Cited by: §I.
-  (2005-04) Intrusion detection using an ensemble of intelligent paradigms. J. Netw. Comput. Appl. 28 (2), pp. 167–182. External Links: Cited by: §IV.
-  (2009) A detailed analysis of the kdd cup 99 data set. In Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA’09, Piscataway, NJ, USA, pp. 53–58. External Links: Cited by: §V.
-  (2009) Intrusion detection by machine learning: a review. Expert Systems with Applications 36 (10), pp. 11994 – 12000. External Links: Cited by: §I.
-  (2017) An effective intrusion detection framework based on svm with feature augmentation. Knowledge-Based Systems 136, pp. 130 – 139. External Links: Cited by: §I.
-  (2015-04) Distributed denial of service attacks in software-defined networking with cloud computing. IEEE Communications Magazine 53 (4), pp. 52–59. External Links: Cited by: §IV.
-  (2009) Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 3 (1), pp. 1–130. Cited by: §III.