OCmst: One-class Novelty Detection using Convolutional Neural Network and Minimum Spanning Trees

by   Riccardo La Grassa, et al.

We present a novel model called One Class Minimum Spanning Tree (OCmst) for novelty detection problem that uses a Convolutional Neural Network (CNN) as deep feature extractor and graph-based model based on Minimum Spanning Tree (MST). In a novelty detection scenario, the training data is no polluted by outliers (abnormal class) and the goal is to recognize if a test instance belongs to the normal class or to the abnormal class. Our approach uses the deep features from CNN to feed a pair of MSTs built starting from each test instance. To cut down the computational time we use a parameter γ to specify the size of the MST's starting to the neighbours from the test instance. To prove the effectiveness of the proposed approach we conducted experiments on two publicly available datasets, well-known in literature and we achieved the state-of-the-art results on CIFAR10 dataset.



There are no comments yet.


page 10

page 12


One-Class Convolutional Neural Network

We present a novel Convolutional Neural Network (CNN) based approach for...

Learning Deep Features for One-Class Classification

We propose a deep learning-based solution for the problem of feature lea...

q-Space Novelty Detection with Variational Autoencoders

In machine learning, novelty detection is the task of identifying novel ...

Generative Models for Novelty Detection: Applications in abnormal event and situational change detection from data series

Novelty detection is a process for distinguishing the observations that ...

Novelty Detection Through Model-Based Characterization of Neural Networks

In this paper, we propose a model-based characterization of neural netwo...

A Classification Methodology based on Subspace Graphs Learning

In this paper, we propose a design methodology for one-class classifiers...

HARA: A Hierarchical Approach for Robust Rotation Averaging

We propose a novel hierarchical approach for multiple rotation averaging...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

One-class novelty detection refers to the recognize of abnormal patters on data recognized as normal. Abnormal data, also know as outliers, anomaly or alien data are patters who belong to the different classes than normal class. The goal of the novelty detection field is to distinguish anomaly patterns which are different by normal and classify them. The capability of many machine learning technique, in the field of novelty and outlier detection is to decide whether a new instance belongs to the same distribution or if it has different behaviour such as to be considered as outlier. Typically, outliers detection is unsupervised and the goal is to recognize the density of clusters to discover possible outliers. We focus on novelty detection with semi-supervised approach trained without outliers, whose goal is to decide whether a new observation is an outlier. In literature, many conventional one-class classifiers that resolve the novelty detection problem exists, such OCSVM

scholkopf2001estimating, MST_CD juszczak2009minimum; GrassaGCO19; DBLP_LAGRASSA

. Due to the high complexity of some data types, such as images and audio signals, these novelty detection methods suffer also of bad performance on high-dimensional data and then dimensionality reduction techniques are required. To solve this problem techniques as Principal component analysis (PCA) and singular value decomposition (SVD) are commonly used to dimensionality reduction or classical feature selection using statistical metrics. These approaches are task-dependent and they need to an expert supervisor. In contrast to the traditional machine learning approach, deep learning models such as GAN

schlegl2017unsupervised; perera2019ocgan and Deep One-Class (DOC) perera2019learning

, are able to extract these features independently from the particular task to be solved. In literature, few deep learning approaches exist to solve novelty detection problems. Our focus, in this paper, is to investigate a generic method for one-class classification using a convolutional neural network as deep feature extractor and minimum spanning trees, able to pattern recognition. To the best of our knowledge, none of the previous work used convolutional neural network jointly with a graph-based model. In this work, we extend two our graph-based previous works

GrassaGCO19; DBLP_LAGRASSA and we use them as one-class novelty detection approach exploiting deep features. Our work makes the following contributions:

  1. We extend our previous works on MST and use jointly with a convolutional neural network to solve novelty detection problems.

  2. To prove the effectiveness and robustness of the proposed approach we evaluate on two well-known available datasets where we achieve the state-of-the-art across many tasks.

2 Related work

In this section, we briefly introduce the main approaches used for novelty detection and highlight advantages and disadvantages. In general, the problems of One-class classification is harder than the problem of normal two-class classification. For normal classification, the decision boundary is supported from both sides by examples of each the classes. In One-class classification only one side of the boundary is covered and available and it is hard to find the best separation of the target and the outliers class. Anomaly detection and one-class classification are problems related to one-class novelty detection

chalapathy2019deep. Both have similar objectives – to detect out-of-class samples given a set of in-class samples. A hard label is expected to be assigned to a given image in one-class classification; therefore its performance is measured using detection accuracy and F1 score. In contrast, novelty detection is only expected to associate a novelty score to a given image; therefore the performance of novelty detection is measured using a Receiver Operating Characteristic (ROC) curve. The supervised approach offers a better approach in terms of performance than unsupervised novelty detection techniques gornitz2013toward

. Models that use this approach learns the hyperplane of separation or a generic decision boundary to discriminate data instances and then, to predict whether test instances belong to this boundary of if it lies outside. Deep model based on a supervised approach fails when the features space is highly and non-linear and these methods require various data from the training of both classes (normal and abnormal) that usually are unavailables. Against, the unsupervised approach is used to distinguish normal and abnormal class without know labels data instances. Usually, these methods are used to automate the process of data labelling. Autoencoder is used as unsupervised deep architecture in novelty detection


. In the case with unavailable data labelled, this approach offers good results but, often it is a challenge to learn common features among instances in high dimensionality and with a high non-linear distribution of data. Semi-supervised in novelty detection are widely used and they assume that all training instances of only a class is known and the goal is to recognize is an object is predicted as normal or abnormal, for instance OCSVM, SVDD and others. The main idea for the one-class support vector method (OCSVM) is to separates all the data from feature space F and maximize the distance from a hyperplane to the origin. In contrast with traditional SVM, OCSVM learns a decision boundary that achieves maximum separation between the samples of the known class and the origin. A binary discriminative function is used to assign a positive label if the test belongs to a region or negative whether it lies out of the boundary. Instead, to consider a hyperplane, SVDD


takes a spherical boundary around the training data. The goal is to minimize the volume of this boundary such that a possible outlier lies outside. The OCSVM and SVDD are closely related. Both can be adopted as novelty detection methods by considering distance to the decision boundary as the novelty score. SVDD gives a higher owners correctness ration (true positive) in the case which a large variation in density exist among the normal-class instances. In such case, it starts to reject the low-density target points as outliers. Further, in the case, the data distribution is highly non-linear the probability to make the wrong prediction is high because is not possible to track a more detailed decision boundary around training data. SVM is affected by the same problem and it does not perform very well when the data are overlapping, furthermore is not suitable for large datasets. With the wide diffusion of deep learning, nowadays we can recognize a new type of models known as Hybrid models, able to solve novelty detection problems. Deep learning models are used as deep features extractor and they are used as input to the traditional algorithms well-known in machine learning like one-class support vector machine, autoencoder+ocsvm


, autoencoder+knn

song2017hybrid, autoencoder+svdd kim2015deep

. The main advantage of this hybrid technique is to reduce the curse of dimensionality and increase the discriminative power of features using neural networks. A recently proposed approach is One-class neural networks (OCNN)

chalapathy2018anomalyruff2018deepperera2019learning that combines a deep neural network while optimizing the data-enclosing boundary in output space. Against the hybrid models, they do not require data for the classification and they outperform in terms of speed. Intuitively, a disadvantage is the computational time required for training step and for model updates in a high dimensional input data. Another technique to approach the novelty detection problems with neural networks is the GAN goodfellow2014generative. The generative adversarial network goodfellow2014generative use a discriminator to distinguish between generated and real data simultaneously: when the discriminator can understand if the input was generated, the back-propagation update only the generator weight elsewhere it is updated only the discriminator weight. The discriminator can be used as anomaly detector because it gets as results two class that the first represents the elements that are part of the class instead of the other class represent the element that is not in the class. Some examples of GAN used as anomaly detection is the AnoGAN schlegl2017unsupervised or OCGan perera2019ocgan. It is possible to use the neural network for novelty detection with another technique: Autoencoder. The autoencoder can create a compressed version of the input and after it can generate again the input using this representation. It is possible to evaluate how well the decoded information is similar to the input information, so we can set a threshold marchi2015novel, if the evaluation is bigger than the threshold the input is classified as new. Using variational autoencoder kingma2013auto it is possible to improve the evaluation because it used to get as input the varied input, so only if some input is very different from the seen example it perform less.

Figure 1: Overview of proposed novelty detection model. In the first step (a), features from training set and test set are extracted using a CNN. Labels 0, 1 or for test samples are assigned using a single MST with two different boundaries. In (b), for each test sample labelled as (uncertain value), two MST are created to assign a new label (0 or 1).

3 Proposed Method

In this paper, we propose a novel Deep Hybrid Model (DHM), called OCmst, that effectively explores the convolutional features for one class novelty detection in image classification. Our goal is to label images, never seen during training, as belonging to the single class analyzed (0) or as anomalies (1). As graphically represented in Figure 1, our goal is achieved in two main steps: (a) through the use of a single MST to assign labels 0, 1 or ; (b) through the use of two MSTs to resolve all previous labeled samples.

We use a generic convolutional neural network as feature extractors using only samples having a single class label, according to one-class classifiers. These train data are images transformed into deep features by a VGG19, pre-trained on Imagenet. The deep features extracted from a pre-trained CNN are used directly without any transformation and fed to our proposed OCmst method.

Going into detail, in our approach we can distinguish two main steps as part of the proposed OCmst model. In the first step, when a new test sample is given, we select the first instances of the training set and create a “complete graph” using euclidean distance as the weight for each couple of edges. All the training samples belong to the same normal class. The selected training samples of the normal class (only known class) are the closest to the sample . Subsequently, we use the Kruskal’s algorithm to find the minimum spanning tree using the previous selected instances. In contrast with previous work GrassaGCO19, we use two different boundaries around each MST to create the decision boundary and establish whether a test lies inside the first, second or out of the boundaries created. In Figs. 26 and 27, we show on a 3-dimensional space a real case using OCmst on a toy dataset created to highlight three possible scenery (accepted/rejected/uncertain instance). If lies in the second boundary, we label it as uncertain test, otherwise, if the sample lies in the first boundary then we label it as a normal class otherwise abnormal class label is assigned.

In the second step, for each test sample labeled as (uncertain) in the previous step (see Fig.1(b)), we need to assign one of the two labels: normal (0) or abnormal (1). For each of such samples we select the nearest neighbours per classes and create two MST to make the final prediction (see Fig. 31). In this phase we use the label predicted of abnormal class and the ground-truth of the normal class labels. In GrassaGCO19, we use both structures based on MST, where basic elements of this classifier are not only vertices but also edges of the graph, giving a richer representation of the data.

Below is a summary of the basic idea of the work presented in GrassaGCO19. Considering the edges of the graph as target class objects, additional virtual target objects are generated. The classification of a new object is based on the distance to the nearest vertex or edge. The key of this classifier is to track a shape around the training set not considering only all instances of the training but also edges of the graph, to have more structured information. Therefore, in the prediction phase, the classifier considers two important elements:

  • Projection of point on an edge defined by vertices

  • Minimum Euclidean distance between and

The Projection of on an edge is defined as follow:


We check if lies on the edge = (,) then, we compute and the Euclidean distance between and , more formally if the following condition is true




otherwise we compute the Euclidean distance of and pairs (, ), precisely:


Therefore, a new object is recognized by (see Algorithm 1) if it lies inside the decision boundary that will be described below, otherwise, the object is considered as outlier. The decision of whether an object is recognized by the classifier or not is based on the threshold of the shape created during the training phase, more formally:


where is the subset of nodes defined by the results obtained in Eqs. 3 and 4.

Differently from what is proposed in juszczak2009minimum, where authors set the threshold as a value of the distribution of the edge weights in the given MST, in our approach we enrich this solution with the introduction of additional thresholds. In juszczak2009minimum, given as an ordered edge weights values, they define as , where . For instance, with , we assign the median value of all edge weights into the MST. In our approach, we set two different thresholds and to discriminate three different decision boundaries useful to make three kinds of classification.

In the first step of our approach defined in Algorithm 1, we create only MST, using only the normal class and then, to assign predicted labels to test samples, we introduce the follow discriminative function :


In Eq. 6 the test instance relies inside the boundary, therefore the MST assign label (object recognized) to , otherwise in Eq. 8 we refuse the object because is out of the boundary defined by . For each instance inside the border region defined by the thresholds and , we assign a label (uncertain object) as defined in Eq. 7. Furthermore, differently from juszczak2009minimum, in our work we do not use all instances of normal class to create a minimum spanning three, but we select instances from training set (see graphical representation in Figure 1(a)) closest to the test instance and create a MST. The main reason we do this is to capture a local representation from an MST built using the neighbors of from the training set. After the first step, we obtain an array of predictions as follow:


where is the label to represent recognized object as normal and label represents abnormality. Labels w represent all the test samples inside the border region defined in Eq.7 of the discriminative function. In the second step described in Figure 1(b), we use a pair of MSTs trained on normal and abnormal class to resolve the ambiguity of all the uncertain instances labelled as . In our previous work GrassaGCO19 in the case in which both classifiers accept/reject test instances, we simply searched the data distribution per classes closest to the test objects and made the final classification. In this work, we also extend this function introducing statistical metrics to benefit into classification performance. Given two sets of data samples and containing the elements per class nearest to the test

, we compute standard deviations:


where is the mean value of these observations.

Finally, given the minimum distance between a test sample and an MST node as defined in Algorithm 2 rows 9-16 , we define zeta score as:


where is a generic standard deviation of data samples. This formulation means that the new observation will be classified using jointly the concept of distance from the appropriate MST and the standard deviation , where we will assign a label class to the test object that obtains the minimum score. More formally we use a function as:

1:=All normal instances (train set)
2:for  do
3:       all euclidean distances
4:end for
5:NodeX = Take min(all euclidean distances) and return node
6:mst, Create small mst(all euclidean distances)
7:EdgesNodeX Search inc/out edge nodeX and return
8:for   do
9:       if  then
12:       else
14:       end if
15:end for
16:min dist0 =
17:1 if
18:w if
19:0 if
Algorithm 1 First step
1:=All normal instances (train set)
2:=All abnormal instances predicted in the first step
3:=All uncertain instances from first step
4:for  do
5:       all euclidean distances
6:end for
7:NodeX = Take min(all euclidean distances) and return node
8:EdgesNodeX Search inc/out edge nodeX and return
9:for   do
10:       if  then
13:       else
15:       end if
16:end for
17:Repeat line 1-16 for graph
18:min dist0 =
19:min dist1 =
20:Create two mst from neighbours of test x
21:=Compute a single threshold for both mst
22:1 if and
23:0 if and
24:if min dist0 and min dist1  then
29:       0 if
30:       1 otherwise
31:end if
32:if min dist0 and min dist1  then
33:       Repeat line 25-31 for graph
34:end if
Algorithm 2 Second Step
1:function create small MST(g0 weight sorted)
2:        = first gamma index sorted values in g0 weight sorted
3:       edges couple all combinations nodes small g0
4:       for  do
6:       end for
11:return small ,
12:end function
Algorithm 3 MST CD with gamma parameter

We can summarize the main differences with our previous work as follow:

  1. we use trained CNNs as deep feature extractors;

  2. we introduce different level of decision boundaries to track uncertain samples and we use strongly rejected instances to create the abnormal class;

  3. we introduce a new discriminative function in case two MSTs accept or reject an instance.

4 Experimental Results

4.1 Datasets

To prove the effectiveness of the proposed method we evaluate it on two well-known datasets: Fashion-MNIST xiao2017fashion and CIFAR10 krizhevsky2009learning. Figure 22 shows some examples taken from the two datasets used. Below, we describe the details on datasets used.

Fashion-MNIST: it is a dataset containing 60000 instances for the train set and 10000 instances for the test set. The number of classes is 10 and each sample is a 28x28 gray-scale image of clothing. This dataset is more challenging than MNIST lecun1998mnist

and it represents a useful benchmarker for machine learning algorithms. Looking at the differences we see that MNIST is too easy. CNNs can reach 99.7% on MNIST, while classic machine learning algorithms can easily reach 97%. Furthermore, MNIST cannot be representative of modern computer vision problems.

Cifar10: Consist of 60000 images in 10 classes (6000 per classes) with a training size and test size of 50000 and 10000 respectively. Each sample is 32x32 color images with a low resolution. The 10 classes are airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. Cifar10 is most challenging than Fashion-MNIST due to diverse content and complexity of images. Further, it is widely used as benchmarker comparison by research for classification task.

Although these two datasets are mainly used to study and compare supervised classification techniques, in this work we use them to study a novelty detection algorithm, working with the single classes against all the others.

Figure 22: Example of images extracted from CIFAR10 and Fashion MNIST datasets.

4.2 Setup

Accordingly with novelty detection problems we use only one class of a training set considering all the instances as normal class and performing an (see Algorithm 1) to discover outliers with strong rejection from the test set. Then we use the test samples rejected to create the abnormal class and subsequently classify them. The test set is composed of 10000 instances, more precisely, 1000 normal samples and 9000 abnormal samples. We used the AUC score, according to the literature (ruff2018deep

) to compare our approach with the others published. The proposed model was implemented using the framework Pytorch 

paszke2017automatic and a VGG19 simonyan2014very pre-trained on Imagenet imagenet_cvpr09 as deep features extractor. Further, the dimensionality of features extracted is 4096. We use these features as input for first step and then for second step of our OCmst model. The values of the two thresholds and were found experimentally using a validation set extracted from the two training data for the two datasets used. These parameters have therefore been set as and and have not been modified for all the other experiments.

(a) Accepted instance in green point
(b) Rejected instance in red point
(c) Uncertain instance in orange point
Figure 26: In this three figures, we show a 3D toy example of the first step in which the minimum spanning tree accept (a) or reject (b) the object. All instances strongly rejected (b) will be used as abnormal class to predict all remain instances recognized by ”uncertain” (c) through a pair of MST. To better understand, we plot the projections on X,Y,Z axis in all three plots.
Figure 27: An example of uncertain instance. The yellow sample lies on the second boundaries decision and it will be classify as normal or abnormal sample in the next step.
(a) Accepted instance as normal class by first mst in blue triangle
(b) Accepted instance as abnormal class by second mst in green triangle

Both classifier recognized the toy instance. The object will be assigned by the z-score defined in eq.

Figure 31: In this three figures, we show a 3d toy example of the second step in which two minimum spanning tree per classes are created by neighbours of test instance. In Fig (a) and Fig (b) we show two case where the first/second classifier recognized the test (Triangle blue/green) and the other mst reject. In Fig (c) we observe a simple case which both classifier recognized/rejected the instance. For clarity we plot also the relative orthogonal projection on X, Y, Z.

4.3 Results

To better understand the potential of the proposed model, we have conducted two groups of experiments:

  1. parameter analysis

  2. comparison with other methods proposed in the literature.

For the first group of experiments we extracted a validation set from the training of the CIFAR10 dataset. We have analyzed the parameter trying to understand what is the optimal value to use in all the other experiments. This parameter is very important because the speed and also the memory occupation of the entire novelty detection process depends on it. For example, on the class Plane of CIFAR10 dataset (10000 test samples), passing from to the execution time changes from minutes to minutes. In Table 2 we report the AUC score achieved in CIFAR10 using different values for each experimental run. As reported in this table, we choose the best value for the parameter, i.e. .

In the second group of experiments we compare our approach with many other approaches reported in the literature. In the Table 1 we report results of our method and we compare the same results with the results published in the papers scholkopf2001estimating; bishop2006pattern; hadsell2006dimensionality; kingma2013auto; van2016conditional; schlegl2017unsupervised; abati2018and; ruff2018deep; perera2019ocgan using the CIFAR10 dataset. From this first group of experiments we can see that our approach is the one that produces the best average results and the best absolute results for 6 classes out of 10 in total. Another similar experiment is shown in Table 3. In this case we used a different dataset, Fashion-MNIST and we compared ourselves with all the results published in the paper schlachterdeep. Looking at the average results we can see that our approach ranks fourth but the results are still comparable with the best ones. We can conclude that the OCmst show the best performance on CIFAR10 and competitive results on Fashion-MNIST.

max width= Methods Plane Car Bird Cat Deer Dog Frog Horse Ship Truck Mean OCSVM scholkopf2001estimating 0.630 0.440 0.649 0.487 0.735 0.500 0.725 0.533 0.649 0.508 0.5856 Kde bishop2006pattern 0.658 0.520 0.657 0.497 0.727 0.496 0.758 0.564 0.680 0.540 0.6097 Dae hadsell2006dimensionality 0.411 0.478 0.616 0.562 0.728 0.513 0.688 0.497 0.487 0.378 0.5358 Vae kingma2013auto 0.700 0.386 0.679 0.535 0.748 0.523 0.687 0.493 0.696 0.386 0.5833 Pix CNN van2016conditional 0.788 0.428 0.617 0.574 0.511 0.571 0.422 0.454 0.715 0.426 0.5506 Gan schlegl2017unsupervised 0.708 0.458 0.664 0.510 0.722 0.505 0.707 0.471 0.713 0.458 0.5916 And abati2018and 0.717 0.494 0.662 0.527 0.736 0.504 0.726 0.560 0.680 0.566 0.6172 AnoGan schlegl2017unsupervised 0.671 0.547 0.529 0.545 0.651 0.603 0.585 0.625 0.758 0.665 0.6179 Dsvdd ruff2018deep 0.617 0.659 0.508 0.591 0.609 0.657 0.677 0.673 0.759 0.731 0.6481 OCGan perera2019ocgan 0.757 0.531 0.640 0.620 0.723 0.620 0.723 0.575 0.820 0.554 0.6566 Soft-Dsvdd ruff2018deep 0.617 0.648 0.495 0.560 0.591 0.621 0.678 0.652 0.756 0.710 0.6328 OCmst 0.742 0.789 0.643 0.644 0.709 0.688 0.781 0.724 0.760 0.817 0.729

Table 1: One-class novelty detection results on CIFAR10. Plane and Car are respectively Airplane and Automobile. We report the AUC score from different papers (column methods) and then we compare them with our results (last row on the bottom). Furthermore, we show the average AUC score for all the One-Class classifiers. In bold the best result obtained. In all the experiments the threshold is used.

max width= Methods Plane Car Bird Cat Deer Dog Frog Horse Ship Truck OCmst 0.691 0.755 0.632 0.630 0.671 0.653 0.744 0.711 0.721 0.754 OCmst 0.694 0.760 0.632 0.641 0.680 0.668 0.746 0.719 0.723 0.757 OCmst 0.706 0.770 0.645 0.648 0.687 0.684 0.761 0.730 0.745 0.776 OCmst 0.722 0.766 0.642 0.626 0.683 0.655 0.731 0.722 0.726 0.786 OCmst 0.722 0.759 0.628 0.621 0.674 0.653 0.733 0.720 0.722 0.791 OCmst 0.781 0.798 0.665 0.671 0.723 0.699 0.791 0.732 0.777 0.836 OCmst 0.724 0.790 0.642 0.641 0.686 0.681 0.766 0.736 0.753 0.809

Table 2: Impact of our OCmst varying the gamma parameter using a validation dataset extracted from CIFAR10. The row for shows the best AUC values.

max width= Methods Ankle Bag Coat Dress Pullover Sandal Shirt Sneaker T-shirt Trouser Mean OCSVM 97.8 79.5 84.6 85.9 85.6 81.3 78.6 97.6 86.1 93.9 87.09 IF 97.9 88.3 89.8 90.1 87.1 88.7 79.7 98 86.8 97.7 90.41 Imagenet 78.3 61.9 58.3 60.1 58.1 69.2 57.3 75.5 58.1 75.4 65.22 SSIM 98.4 81.6 87.3 89.2 87.2 85.2 75.3 97.8 83.7 98.5 88.42 DSVDD 93.2 79.1 87 82.9 83 80.3 74.9 94.2 79.1 94 84.77 NaiveNN 90.7 72.9 80.8 70 73.6 64 71.8 92 62.9 65.6 74.43 NNwICS 94.9 82 85.8 89.1 82.6 85.5 75.6 94.9 85.1 94.6 87.01 Deep OC-ICS 98.5 88.6 90.2 92.1 88.2 89.4 78.3 98.3 88.3 98.9 91.08 OCmst 15 92.77 85.84 86.21 87.52 77.61 92.27 75.47 94.22 82.58 94.12 86.86 OCmst 10 93.04 86.33 86.66 87.85 78.05 92.66 75.58 94.66 83 94.63 87.24 OCmst 8 93.2 85.87 86.8 87.91 77.92 92.69 75.48 94.95 83.15 94.66 87.26

Table 3: One-class novelty detection results on Fashion-MNIST using AUC score. In all the experiments the threshold . Three different values are used and compared with other results published in schlachterdeep

5 Conclusion

In this work we introduce the first hybrid model graph-based for novelty detection problems. Our method uses the deep features produced by a convolutional neural network to find a good decision boundary exploiting minimum spanning tree structures. The proposed OCmst outperforms the state-of-the-art in Novelty detection problem on many classes of the CIFAR10 dataset, showing an AUC score higher than others. In Fashion-MNIST datasets we obtained competitive results. Our experiments prove the effectiveness of the proposed approach on two different datasets and highlighting advantages and disadvantages.

The authors kindly appreciate the NVIDIA gift of a Titan Xp GPU for this research.

6 Declaration of interest

The authors declare that they have no conflict of interest.