Anomaly Detection with the Voronoi Diagram Evolutionary Algorithm

10/27/2016 ∙ by Marti Luis, et al. ∙ 0

This paper presents the Voronoi diagram-based evolutionary algorithm (VorEAl). VorEAl partitions input space in abnormal/normal subsets using Voronoi diagrams. Diagrams are evolved using a multi-objective bio-inspired approach in order to conjointly optimize classification metrics while also being able to represent areas of the data space that are not present in the training dataset. As part of the paper VorEAl is experimentally validated and contrasted with similar approaches.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Anomalous Internet traffic detection is a major question of computer network security. Intrusion detection systems (IDSs) [9] have proposed with the intention of tackling this issue. They are meant to protect a network by providing a line of defense that is able to detect and react to network attacks. Two main approaches are used when building an IDS: i) misuse-based and ii) anomaly-based detection. While the former focuses on detecting attacks that follow a known pattern or signature, the latter is interested in building a model representing the system’s normal behavior while assuming all deviated activities to be anomalous or intrusions. Because of that fact anomaly detection has received increasing attention in the recent past.

Anomaly detection has been addressed with different approaches (see [2] for a survey). Among nature-inspired approaches artificial immune systems (AISs) [7] have received an special attention.

This paper proposes the Voronoi diagram-based evolutionary algorithm (VorEAl). VorEAl is inspired on AISs and the representations that had been proposed for evolutionary shape design consolidating previous progresses made in this direction [8]. Its main distinctive feature is that it evolves Voronoi diagram-based representations for normal/abnormal regions of the search space. Such representation offers a flexible and compact alternative to some common representations used in AIS such as hyper-spheres and hyper-rectangles. VorEAl applies a multi-objective approach that takes into account the detection accuracy and other especially devised volume-based methods that promotes the emergence of solutions that also adequately represent areas of the input space where no normal data has been received and, therefore, should represent anomalies. As in any multi-objective approach, the algorithm produces a set of trade-off solutions. VorEAl applies a committee approach that is based on the best (in term of a priori given set of preferences) subset of those solution.

The paper is organized as follows. Section 2 presents the context of AIS and some existing approaches to anomaly detection. Section 3 introduces the Voronoi representation for abnormal and normal input subsets, together with the variation operators and objective functions used to evolve it and VorEAl as a whole. Section 4 introduces our methodology for the experimental validation of VorEAl, also presenting the results of the study and comparing them with other approaches from the literature. Finally, Section 5 discusses the results and sketches some further research directions.

2 Foundations

There has been a consistent interest by the community on proposing nature-inspired approaches to anomaly detection. In this context, AISs have attracted attention as they embody an analogy to the biological immune system. They are particularly appealing for anomaly detection problems as they capture the ability of the biological system of telling apart normal body cells from pathogens. That is, from a computational perspective, they create a model that is able to discriminate between normal (self) and abnormal (non-self) objects. This feature make AISs specially suited to be applied in the context of anomaly-based IDSs.

In order to extend AISs’ performance it is necessary to apply algorithms that combine a powerful representation capacity as well as the possibility of adequately adapting that capacity to meet the problem characteristics.

Voronoi diagrams are geometrical constructs that were known by ancient Greeks. Any set of points, known as Voronoi sites, in a given -dimensional Euclidean space defines a Voronoi diagram, i.e., a partition of that space into Voronoi cells: the cell corresponding to a given site is the set of points whose closest site is . The boundaries between Voronoi cells are the medians of the segments, for neighbor Voronoi sites and . Though originally defined in two or three dimensions, there exist several algorithmic procedures to efficiently compute Voronoi diagrams in any dimension.

Voronoi diagrams offer a compact representation for shapes (surfaces in 2D and volumes in 3D, for instance), by attaching to each Voronoi cell (or, equivalently, to the corresponding Voronoi site), a Boolean label. The resulting Voronoi diagram is a partition of the space into 2 subsets: the “true” cells are the shape/volume, and the “false” cells are the outside of the shape/volume. The genotype is here a (variable length) list of labeled Voronoi sites, and the phenotype is the corresponding partition in the space into two subsets. More generally, any piece-wise constant function on the underlying space can be represented by a similar representation by using real-valued labels. Such representation has been successfully used in the context of Evolutionary Optimum Design [10, 5]. In particular, it has been demonstrated that the local complexity of the representation can also be adjusted by evolution: in regions of the space where the shape has a complex boundary, several Voronoi sites will be used, whereas only a few of them will be necessary elsewhere.

In the context of classification, the target phenotypes are partitions of the parameter space into positive and negative examples (in the case of 2 classes), and can hence also be represented by Voronoi diagrams with Boolean labels —or with labels taken from a finite alphabet in the case of more than 2 classes.

3 The Voronoi diagrams-based evolutionary algorithm

We now discuss the building blocks of VorEAl. In particular, we present variation operators, the possible strategies used for evaluating the individuals and how these elements are assembled together to form the algorithm.

3.1 Variation operators

The genotypes of Voronoi representations is a variable length list of Voronoi sites , with , where each site is defined by its coordinates in . Each site has an associated label

that determines how a point that falls within the corresponding cell is classified.

function mutate_voronoi() , individual to be mutated. , prob. of mutating a site. , prob. of mutating a site feature (coordinate). , prob. of changing the label of a site. , prob. of adding a new site. , prob. of removing a site. , learning rate.      for all  do          if  then               for all  do                   if  then                                                                        if  then                   .                                   if  then          .            if  then          ; .            return , mutated individual.

Figure 1: Mutation of a Voronoi diagram.

Mutation Operator

Several mutation operators can be designed for such a variable-length representation.

  • At the individual level, a Voronoi site can be added, at a randomly chosen position, with a random label; or a randomly chosen Voronoi site can be removed.

  • At the site level, Voronoi sites can be moved around in the space – and the well-known self-adaptive Gaussian mutation has been chosen here, inspired by Evolution Strategies (see (1) below); or the label of a Voronoi site can be changed.

In the self-adaptive Gaussian mutation [11], each coordinate

of each Voronoi site also “carries” its own variance

that is used for its Gaussian mutation. Coordinate undergoes Gaussian mutation with variance while undergoes a log-normal mutation with learning rate as follows:

(1)

The different mutation operators are applied according to different probabilities, following the procedure described in Figure

1.

function crossover_voronoi() and , individuals to mate.      repeat          .          .          .      until       ; ;      ; .      return , offspring.

(a) Crossover of two Voronoi diagrams, applied with probability .
(b) Example of crossover of two Voronoi genotype individuals in 2D.
Figure 4: Crossover operator for Voronoi diagrams.

Crossover Operator

The crossover operator for Voronoi representation should not simply exchange some Voronoi sites between both parents, but should respect the locality of the representation. Voronoi sites that are close to each other should have more chance to stay together than Voronoi sites that are far apart. This is achieved by the geometric crossover that operates on two (randomly selected) parents by creating a random cutting hyperplane, and exchanges the Voronoi sites from both sides of the hyperplane. The Voronoi diagrams are of course reconstructed after the crossover. This procedure is described in detail in Figure

(a)a. A two-dimensional example is given in Figure (b)b.

3.2 Objectives and fitness assignment

Anomaly detection can be posed as a particular case of classification problem where data items must be tagged either as “normal” or “anomalous”. That relying on a dataset where, without loss of generality we can state that and obtain a classifier that correctly detects instances that correspond to each of the two categories. Because of this fact the existing metrics devised to assess the quality of a classification algorithm are also applicable in this context. For this particular problem, the most relevant metrics are accuracy, recall and specificity, although many more could also be of use. Accuracy seems the best choice in the general case, as one wants to correctly identify all examples. But when dealing with anomalies, the dataset is generally highly imbalanced, as normally there are fewer anomalous instances than ‘normal’ ones. If only the classification accuracy is used, the error contribution of the anomalies will be reduced and hence the model will be biased to not regard them.

Furthermore, as already mentioned, the anomaly detection problem requires that the classifier is not only able to correctly classify the “normal” and “anomalous” instances present in the training dataset but is also capable of detecting when a given input falls in an area that was not covered by data of the training set and, therefore, also can be interpreted as an anomaly.

It is possible to prompt the Voronoi diagrams (individuals) to represent the known data in a form as compact as possible by expressing that as the relation between the volumes of the Voronoi cell and the convex hull of the training data that it contains. Let be a Voronoi diagram, and, for each cell , let be its volume and the set of data points it contains, i.e., , being the -dimensional Euclidian distance. We can then define the individual compactness as the sum, for each cell, of the ratio of the volume of the convex hulls of and the volume of the cell,

(2)

It could be hypothesized that the previous formulation can be improved by adding a multiplicative term that counts the number of elements in , resulting in the multiplicative compactness

(3)

In both cases, maximizing the compactness will produce cells that contain the data in a form as tight as possible. Those compactness objectives can be complemented by one that promotes the existence of empty cells that represent areas of the input domain that are now present in the training data. Such objective would take care of sites with small ’s and promote that they become empty as the evolution takes place. A form of representing this is by computing the total volume of cells with an anomaly label of an individual and rate it by the number of elements it contains,

(4)

Consequently, it is obvious that it is necessary to jointly address all of those objectives. Therefore, a multi-objective optimization approach will empower the algorithm with the capacity to address all the requirements of the task at the same time.

3.3 Algorithm description

VorEAl consolidates the previous components as an algorithm that constructs a classification model. The algorithm starts by creating an initial random population of individuals. At a given iteration , individuals in the population are then mutated and mated using operators described above and thus producing an offspring population that consists of individuals. At this point, individuals that have not yet been evaluated are presented with the dataset and the values of the different objective functions are calculated. In this work, we compute the accuracy, recall and specificity, but it should be noted that others are available. From the union of and , the best are selected using the non-dominated sorting selection of NSGA-II [3].

This process repeats until the stopping criterion of the algorithm is met. When that happens, the algorithm has a final population from which the best individual(s) can be selected to represent the ‘self’ of the AIS. This a non-trivial task as it implies taking into account the different conflictive objectives. In this work, we select a committee of individuals that contains the -percent of with the highest accuracy. Hence, the classifier returns the most voted decision among the members of .

(a) Training datasets.
(b) Test datasets.
Figure 7: Training and testing datasets. Test set anomalies present in the test datasets are generated using the procedure described in Section 4.

4 Experimental study

The previous discussion and proposal must be complemented by a set of experiments that establish the validity of VorEAl and studies the impacts of the different components presented. That is the focus of this section.

One of the main questions regarding VorEAl is at what point a multi-objective affinity function would actually generate better results at an admissible cost. It could be argued that there exists the possibility that adding more objectives would just make the search process more complex and resource demanding.

An important matter to be clarified was the impact of each of the objectives presented in previous section. For that reason different combinations were tested. In particular, we tested accuracy and compactness (a/c); accuracy, compactness and total empty volume (a/c/t); accuracy and multiplicative compactness (a/m) and accuracy, multiplicative compactness and total empty volume (a/m/t).

In order to provide grounds for comparison with similar approaches as well as well-known approaches, other methods were included in the experiments. In particular, we included the negative selection algorithm (NSA) [6] using both variable-sized hyper-spheres and hyper-rectangles. For fair comparisons, we applied the NSA and NSA in which non-self training samples are subsequently used to enrich the detector library generated by NSA.

Similarly, we have included in the experiments two well-known classifiers: one-class vector machines (SVMs)

[12] and the naïve Bayes classifier.

The experiments involved six classification benchmarks problems: the ‘two spiral’, ‘crescent and full moon’, ‘half densities’, ‘corners’, ‘outliers’ and ‘cluster in cluster’ problems. They have the advantage that they can be visualized in 2D while still posing a substantial challenge to the algorithm. One key element that must be addressed is the ability of the method to detect anomalies that were present in the original dataset and also those that were not present. Six tests were prepared with that goal in mind by adding random anomaly data in the areas that did not had any data in the training dataset. The resulting training and test datasets can be observed in Figure

7. Besides fixing these parameters we limited the population to 100 individuals and ran the algorithms for 500 generations. The rest of the parameters are tuned using a grid search procedure on a reduced-size problem. The same parameters were used for all problems. The mutation of the parameters were , , , , and , while the mating probability was 0.5, the minimum and maximum number of sites in an individual was set to 20 and 100, respectively and the committee selection percentile () was set to 5% of the population.

The stochastic nature of the algorithms being analyzed calls for the use of an experimental methodology that relies on statistical hypothesis tests. Using those tests, we are able to determine in a statistical sound way if one algorithm instance outperforms another. The topic of assessing stochastic classification algorithms is studied in depth in

[4]. There, it is shown that the Bergmann–Hommel procedure is the most suitable for our class of problem. In all cases, we have used a base level of significance of 0.05 and we run the same experiment instances 50 times. The results of this experiments are shown as box plots in Figure 8. It can be inferred from those plots that the three-objective form of with accuracy, multiplicative compactness, and total empty volume VorEAl yielded the best results.

Two Spiral

VorEAl (a/c)

VorEAl (a/c/t)

VorEAl (a/m)

VorEAl (a/m/t)

NSA-re

NSA-sp

one-class SVM

naïve Bayes

Accuracy

VorEAl (a/c)
VorEAl (a/c/t)
VorEAl (a/m)
VorEAl (a/m/t)
NSA-re
NSA-sp
one-class SVM
naïve Bayes
Crescent/full moon

VorEAl (a/c)

VorEAl (a/c/t)

VorEAl (a/m)

VorEAl (a/m/t)

NSA-re

NSA-sp

one-class SVM

naïve Bayes

Half kernel

VorEAl (a/c)

VorEAl (a/c/t)

VorEAl (a/m)

VorEAl (a/m/t)

NSA-re

NSA-sp

one-class SVM

naïve Bayes

Outliers

VorEAl (a/c)

VorEAl (a/c/t)

VorEAl (a/m)

VorEAl (a/m/t)

NSA-re

NSA-sp

one-class SVM

naïve Bayes

Cluster in cluster

VorEAl (a/c)

VorEAl (a/c/t)

VorEAl (a/m)

VorEAl (a/m/t)

NSA-re

NSA-sp

one-class SVM

naïve Bayes

Corners

VorEAl (a/c)

VorEAl (a/c/t)

VorEAl (a/m)

VorEAl (a/m/t)

NSA-re

NSA-sp

one-class SVM

naïve Bayes

Recall

VorEAl (a/c)
VorEAl (a/c/t)
VorEAl (a/m)
VorEAl (a/m/t)
NSA-re
NSA-sp
one-class SVM
naïve Bayes

Specificity

VorEAl (a/c)
VorEAl (a/c/t)
VorEAl (a/m)
VorEAl (a/m/t)
NSA-re
NSA-sp
one-class SVM
naïve Bayes
Table 1: Summary of the outcome of the statistical hypothesis tests for each problem and performance indicator. When an algorithm in the row has been significantly better than the one in the column the corresponding cell is marked with a “”. If it has been worst then the cell contains a “”. Cases where no significant difference was established are identified with a “”.

When many tests are carried out, a comprehensive analysis of the results is rather difficult as it implies cross-examining and comparing the results presented separately. Consequently, we present them in summarized form in Table 1. It should be noted that experiment parameters and results are available online at http://lmarti.com/VorEAl. To further simplify the understanding of results, why we decided to adopt a more integrative representation like the one proposed in [1]. Figure 12 summarizes the outcome of the hypothesis tests by grouping them by metric and problem, as explained in the previous section. Here it is clearly visible how VorEAl with multiplicative compactness and total empty volume objectives is generally able to yield better results. Finally, as an illustrative example, we show in Figure (c)c an example of an evolved Voronoi diagram.

Figure 8: Box plots of the experimental evaluations on the anomaly detection test sets.
(a) Summary by metric. (b) Summary by problem.
(c) Example of an evolved Voronoi diagram for the two spiral problem. Sites that are labeled as anomalies are marked with red triangles (). Similarly, sites labeled as normal data are indicated with green circles ().
Figure 12: Summaries of the statistical tests and an illustrative example.

5 Discussion and conclusion

In this paper we have presented VorEAl, a multi-objective evolutionary algorithm that relies on Voronoi diagrams for representation. VorEAl has been devised with the problem of anomaly detection in mind. The experimental results obtained as part of this work point out that this is a promising direction of work. However, there are many areas that should be further studied and explored. From an algorithmic point of view, we should explore other classification objectives (metrics).

It is important to try other multi-objective fitness assignments, like those based on multi-objective performance indicators or reference points. This last approach is of particular interest as, as we already mentioned, in our case we have an a priori known ideal solution that can be used to guide the search. In parallel, work should the done in understanding and reducing the computational complexity of the algorithm. In this direction, we are already working on creating approximative versions of the volume meant to decrease the computational cost of the computation of the objective functions.

Acknowledgements

This work has been funded by the project PIA-FSN-P3344-146479. Authors wish to thank the reviewers for their fruitful comments.

References

  • [1] Bader, J.: Hypervolume-Based Search for Multiobjective Optimization: Theory and Methods. Ph.D. thesis, ETH Zurich, Switzerland (2010)
  • [2] Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41(3),  15 (2009)
  • [3]

    Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multiobjective Genetic Algorithm: NSGA–II. IEEE Transactions on Evolutionary Computation 6(2), 182–197 (April 2002)

  • [4]

    García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)

  • [5] Hamda, H., Jouve, F., Lutton, E., Schoenauer, M., Sebag, M.: Compact unstructured representations for evolutionary design. Applied Intelligence 16, 139–155 (2002)
  • [6] Ji, Z., Dasgupta, D.: Real-valued negative selection algorithm with variable-sized detectors. Lect Notes Comput Sc 3102, 287–298 (2004)
  • [7] Kim, J., Bentley, P.J., Aickelin, U., Greensmith, J., Tedesco, G., Twycross, J.P.: Immune system approaches to intrusion detection – a review. Natural Computing 6(4), 413–466 (2007)
  • [8] Martí, L., Fansi-Tchango, A., Navarro, L., Schoenauer, M.: VorAIS: A multi-objective Voronoi diagram-based artificial immune system. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation (GECCO’09). ACM (2016)
  • [9] Northcutt, S., Novak, J.: Network intrusion detection. Sams Publishing (2002)
  • [10] Schoenauer, M.: Shape representation for evolutionary optimization and identification in structural mechanics. In: Winter, G., Périaux, J., Galán, M., Cuesta, P. (eds.) Genetic Algorithms in Engineering and Computer Science (EUROGEN95). pp. 443–464 (1995)
  • [11] Schwefel, H.P.: Numerical Optimization of Computer Models. John Wiley & Sons, New-York (1981), 1995 – 2 edition
  • [12] Tax, D.M.J., Duin, R.P.W.: Support vector data description. Machine learning 54(1), 45–66 (2004)