Deep learning methods have achieved impressive performance in object recognition and classification [Krizhevsky et al.(2012)Krizhevsky, Sutskever, and Hinton] [Simonyan and Zisserman(2014)]
by using large networks trained with millions of data examples. However, these networks usually work under a closed set assumption and thus tries to classify each query sample even if it does not belong to one of the training classes. For example, a neural network classifier trained to classify fruits, might classify an input from a completely category, say “bird” into one of the fruit classes with high confidence, which is unlikely to happen if a human does the same task. To make the systems more intelligent and better suited to real-world (open-set) applications, they should be able to understand whether the input belongs to one of the trained classes, and then only try to classify it[Pimentel et al.(2014)Pimentel, Clifton, Clifton, and Tarassenko], [Markou and Singh(2003)].
This problem is addressed in recent literature as out-of-distribution detection, anomaly detection, novelty detection, open-set recognition and one-class classification, each having subtle differences between them. One class classification rejects all the classes as outliers except the concerned class. Open-set recognition aims to recognize unknown class as well as classifying the known class correctly. In out-of-distribution the algorithm determines samples coming from other data-sets or distribution. Often in such algorithms there are knowledge of similar out-of-distribution data.
In this work, we address the multi-class novelty detection task, where given a query, the goal is to understand whether it belongs to one of the training classes.
This is very challenging, since the novel data can come from the same data distribution as that of the training data.
Here, we propose a novel framework, termed Segregation Network, which utilizes the mixup technique for this task.
The network takes as input a pair of data points, and a third interpolated data point which is generated by mixing them together using a variable ratio.
The goal is to segregate the constituent classes and their respective proportions in the interpolated data using a novel loss function.
Once the network is trained, given a unknown query sample, we mix it with the known class prototypes in a predefined proportion and pass it through the network.
Based on the network output, we can infer whether the query belongs to the known set of classes or to a novel class unknown to the system.
The main contributions of our work are as follows:
(1) We propose a novelty detection framework, termed as Segregation Network, using the mixup technique and a novel loss function for training it.
(2) Our algorithm works well with only the available training data and does not require access to any auxiliary or out-of-distribution dataset as in [Perera and Patel(2018)]. This is advantageous as the collection of auxiliary data is often difficult, expensive and might be data dependent with respect to the training set of classes.
(3) We perform experiments on two standard benchmark datasets for novelty detection and the results obtained compare favourably with the state-of-the-art method which leverage auxiliary training data.
2 Related Work
The foundation of this work is based on two threads of machine learning research, namely novelty detection algorithm and mix-up based learning techniques.
Novelty Detection: This problem is an active area of research for detecting abnormalities in data. There have been both statistical [Stevens(1984)],[Yamanishi et al.(2004)Yamanishi, Takeuchi, Williams, and Milne],[Kim and Scott(2012)], [Eskin(2000)] distance based [Knorr et al.(2000)Knorr, Ng, and Tucakov],[Hautamaki et al.(2004)Hautamaki, Karkkainen, and Franti], [Eskin et al.(2002)Eskin, Arnold, Prerau, Portnoy, and Stolfo]
and deep learning based approaches. Statistical methods generally focuses on trying to fit the distribution of the known data using probability models. Early works on open set recognition were mostly based on statistical methods. One of the early works of in this direction involves using a 1-vs-Set Machine to determine the representative space between novel and seen classes. Subsequently, to enhance the performance,[Jain et al.(2014)Jain, Scheirer, and Boult] ,[Scheirer et al.(2014)Scheirer, Jain, and Boult] have been proposed. Distance based algorithms generally perform some transform and then identify novel classes by thresholding the distance with known examples. The assumption is that the known class examples will be much closer to the known class representatives than the unknown in the transformed space. A relatively recent work in this direction is Kernel-Null Foley-Sammon Transform (KNFST) [Bodesheim et al.(2013)Bodesheim, Freytag, Rodner, Kemmler, and Denzler] for multi-class novelty detection. Here the same class points are projected into a single point in the null space, and during testing, the distance with respect to the class representative is thresholded to get a novelty score. This algorithm was improved to handle incremental incoming class and subsequently update its novelty detector in [Liu et al.(2017)Liu, Lian, Wang, and Xiao]. In addition, [Liu et al.(2017)Liu, Lian, Wang, and Xiao] made the approach more scalable and reduced the computational burden of the method proposed in [Bodesheim et al.(2013)Bodesheim, Freytag, Rodner, Kemmler, and Denzler]. Deep learning based approaches such as Open-max tries to fit a Weibull-distribution to determine the novelty[Bendale and Boult(2016)]. The generative version of this approach was proposed in [Ge et al.(2017)Ge, Demyanov, Chen, and Garnavi], where unknown samples were generated. Several one-class deep learning based novelty detection has been proposed in recent literature [Sabokrou et al.(2018)Sabokrou, Khalooei, Fathy, and Adeli],[Perera et al.(2019)Perera, Nallapati, and Xiang],[Sadooghi and Khadem(2018)]. The work in [Perera and Patel(2019)] designs a novel training paradigm where a reference set is used to learn a set of negative filters that will not be activated for the known category data. To this end they design a novel loss function called membership loss. Masana et al. [Masana et al.(2018)Masana, Ruiz, Serrat, van de Weijer, and Lopez] propose a method to improve the feature space by forming discrminitive features with contrastive loss for this task.
Mixing: Learning algorithms involving interpolation or mix-up between classes has been recently introduced in the community. The very first works in vision involves improving classification tasks by interpolating between classes [Zhang et al.(2017)Zhang, Cisse, Dauphin, and Lopez-Paz],[Tokozume et al.(2018)Tokozume, Ushiku, and Harada]. While the mentioned works interpolate in the input space, [Berthelot et al.(2018)Berthelot, Raffel, Roy, and Goodfellow], [Dumoulin et al.(2016)Dumoulin, Belghazi, Poole, Mastropietro, Lamb, Arjovsky, and Courville],[Mathieu et al.(2016)Mathieu, Zhao, Zhao, Ramesh, Sprechmann, and LeCun], [Ha and Eck(2017)],[Bowman et al.(2015)Bowman, Vilnis, Vinyals, Dai, Jozefowicz, and Bengio], [Mescheder et al.(2017)Mescheder, Nowozin, and Geiger]
tries interpolation in the latent space of Autoencoders. In our work, unlike these mentioned papers, we interpolate in the feature space to train our model.
3 Proposed Method
In this section, we describe the network architecture of the proposed Segregation Network, the novel loss function used to train the model and the training and testing protocol.
First, we describe the notations used.
Notations: Let the input data be represented as , being the number of training samples and being its feature dimension. Let the labels be denoted as , where is the number of training or known classes. We define the known class set to be , and thus . In the open set scenario, the testing data can come from the seen classes or from unseen/novel classes, for which no information is available to the system. During testing, given a query, the goal is to determine whether it comes from set or not, i.e. whether it belongs to a seen class or a novel class. Classifying the known examples into its correct class is not the focus of this work and can be done using the base classifier trained using the training data. Now, we describe the details of the Segregation Network.
Features: Any pre-trained standard deep learning model can be used to extract features. Here, we use pre-trained Alexnet [Krizhevsky et al.(2012)Krizhevsky, Sutskever, and Hinton] and VGG -16 [Simonyan and Zisserman(2014)] architecture. These networks are fine-tuned and the extracted features are normalized and given as input to our network.
3.1 Segregation Network
The proposed network consists of three fully connected (fc) layers with ReLU activations and dropout between each layer except the final fc layer. The final layer is of dimension. Sigmoid is used at the final layer activation as the output of Sigmoid is between between which can be interpreted as the proportion of the mixtures in our case. In our design, the network has architecture, with the numbers denoting the length of each fc layer. For training this network Adam optimizer with learning rate of 0.001 is used.
The network takes as input a triplet set of data samples , where , are data from the training set and is the mixture obtained by mixing and in some proportion. Let us denote the output of the first fc layer, which is shared by all three inputs, as . Then is concatenated together to form which is then passed forward through the rest of the network. In all implementation, since we have used the pretrained features from Alexnet [Krizhevsky et al.(2012)Krizhevsky, Sutskever, and Hinton] or VGG-16 [Simonyan and Zisserman(2014)]
deep networks, which are of very high dimension, the first fc layer serves as a dimensionality reduction layer, the output of which is then concatenated and passed through the rest of the network structure. The final output of the network after passing through the sigmoid activation function is denoted as.
Training the model : The network is trained such that given a interpolated input, it will decouple/segregate the input data into the constituents of the known class. This property is exploited in the following way. Given a pair of feature vectors , we perform convex combination on this pair to produce , where , . We feed these three feature vectors to our network. The output of the network is a dimensional vector from the final sigmoid layer . Since the output is passed through the sigmoid activation function, each element of the -dimensional vector is bounded between . In addition, each element denotes the proportion by which the mixed sample has been constructed from that training classes. For example, an output of indicates that the mixed sample has been constructed as where and . Given , the following cases may arise,
If, and , where , and both , belongs to seen classes, we should get the output of the model such that, and , while for . We consider such a pair to be a non-matched pair as the interpolated point lies somewhere in the middle between two classes based on the value of .
If, both , , the network should output or be as high as possible whereas for . This is because a mixed element constructed from two data items of the same class must ideally belong to the same class also. We consider such a pair to be a matched pair.
During testing in open set scenario, we pair the query sample with different training examples, and so a third case may arise if the query belongs to a novel class. Here, since one of the two inputs to the network is seen, only the output node corresponding to that class should be non-zero and equal to the proportion of this class in the generated mixture. We do not explicitly train the network for this scenario, since we do not assume any auxiliary datasets. So, we consider only the first two cases for training.
Note that the final activation function is the sigmoid layer and not softmax, the total sum of may not be equal to . This is important, since if the input belongs to a novel class, our network will only consider the mixing ratio of known class. So the proportion of unknown class in the mixture will be ignored and thus the sum will not be equal to 1.
What is important is that the value of peaks at the right places thus signifying the classes from which the mixed data has been generated.
Since, we don’t have the softmax output we cannot use cross-entropy loss function to train this model.
In addition, cross-entropy loss function tries to maximize the probability of the correct class while in our case we may need to find the two constituent classes.
Here we design our own novel loss function termed as Constituency loss which we describe next.
Constituency loss: This loss ensures that the output of the Segregation Network, , gives positive values for only those classes which has been mixed to create . Thus, the network is expected to output not only the correct proportion of mixing of the mixing class but also zero output for the non-mixing classes . Based on this requirement, the loss function can be written for - “m” and “nm” classes as follows
where, denotes the mixing coefficient vector which has zeros for the non-mixing classes and have values of and in their relevant places for the mixing classes. The denotes,the sparse output for the Non-mixing classes, while the is for the classes used in forming . It is to be noted that the weight plays a significant role in training the model as shown in the ablation studies. This factor is important since the number of zero elements is much more than the number of non-zero (mixing) coefficients. Hence during training, we penalize the errors in wrongly predicting the value of as zero much more severely compared to the incorrect prediction of the zero elements. In the implementation, we found best to be between in all our experiments.
3.2 Testing Scenario
We assume that a base network has been trained on the training classes with a softmax-output. Here, it can be taken as the AlexNet or VGGNet from where the features are extracted. In the open set testing scenario, the test query can come from one of the seen classes or from a novel class. Given a test query, we consider the Top-N classes which get the highest output scores, i.e. the possibility of the query belonging to one of these classes is high. The goal of the Segregation Network is to consider each of these top classes, and verify whether the query actually belongs to that class. Taking the top few classes is intuitive since (1) if the query belongs to a seen class, its score is usually high and (2) it reduces the computation required for novelty detection using the proposed Segregation Network. If the query is from a novel class, all retrieved classes are obviously wrong.
Here we use training class centres, where , where () as the prototype exemplars. For each query, , a set of interpolated points is generated as , where , which is then is passed through the proposed network. The mixing coefficient for the prototype exemplars are kept low while feeding to our model. In other words the mixing coefficient is kept high for the incoming test data. This is because of the following reasons
If the query data is from the domain of known classes, the high from the known class, would produce a high output for the corresponding class.
If the query data is coming from an unknown class, the low weight added to the prototype exemplars forces the network output to be low for all the classes.
Thus for each query data, the average of the highest network output is taken to be the probability of being known.
In this section, we evaluate our method, Mixing Novelty Detection (MND) against several state-of-the-art approaches. We also describe in this section, the dataset we tested on and the testing protocol that was followed. We then give the analysis of our algorithm.
4.1 Datasets Used and Baselines
Here, we report results on two benchmark datasets, namely Caltech256 and Stanford Dogs.
Caltech256 Datasets: [Griffin et al.(2007)Griffin, Holub, and Perona] This dataset is a standard dataset for visual recognition consisting of 256 object of diverse categories.
This consists of 30607 images from a minimum of 81 per class image to a maximum of 827 images per class.
As per our protocol we took the first 128 classes as known and rest are considered as unknown class.
Stanford Dogs Dataset: [Khosla et al.(2011)Khosla, Jayadevaprakash, Yao, and Li] This is a fine grained dataset consisting of 120 classes of different breeds of dogs. It consists of total of 20,580 images. We consider the first 60 class,sorted alphabetically, to be considered as known. The final testing was performed on the remaining 60 classes.
4.2 State-of-the-art Baseline Method
We evaluate our method against the following baseline algorithms - (1) Finetune[Simonyan and Zisserman(2014)]: The fine-tuned network output is taken and threshold-ed to determine whether a query is from known or novel class.; (2) One-class SVM [Schölkopf et al.(2001)Schölkopf, Platt, Shawe-Taylor, Smola, and Williamson]: All known classes are considered during training the SVM. During testing the maximum SVM score is considered.; (3) KNFST[Bodesheim et al.(2013)Bodesheim, Freytag, Rodner, Kemmler, and Denzler]:
The deep features are extracted and normalized and KNFST algorithm is implemented with those features to detect novel class.; (4) Local KNFST[Bodesheim et al.(2015)Bodesheim, Freytag, Rodner, and Denzler]: Deep features were extracted and the algorithm is evaluated with 600 local regions.; (5) Openmax [Bendale and Boult(2016)] : The feature embedding of the penultimate layer of a trained network is taken and mean activation vectors are determined to fit in the Weibull distribution.; (6) K-extremes [Bodesheim et al.(2015)Bodesheim, Freytag, Rodner, and Denzler] VGG16 features are extracted and the top 0.1 activation index is used to get the eextreme value signatures.; (7) Finetune (): [Perera and Patel(2019)] The network is trained on additional class coming from reference dataset.; and (8) the state-of-the-art algorithm proposed in [Perera and Patel(2019)] where an external dataset as reference data is used to learn negative filters which will not get activated for any of the data from the known categories using a novel membership function. This not only requires an extra auxiliary dataset but also is computationally expensive requiring a separate network to be trained. An added concern is present regarding what kind of reference dataset to chose to learn the negative filters. Our approach compares favorably with the state-of-the art algorithm without knowledge of any reference data or training the network on any extra data. This reduces not only computational cost but does not require the collection of the extra reference dataset.
4.3 Testing Protocol
In testing our protocol, half the classes were taken to be known . The rests, are considered as unknown.
The training and test splits of the known class are equally divided, while the unknown class are considered only during testing.
We consider area under the receiver operating characteristics [ROC] curve (AUC) of the receiver operating characteristics as the evaluation criteria.
We select AUC because it is threshold independent and hence less susceptible to parameter variation. Also often, there is imbalance between the number of known class instances and number of unknown class instances and AUC is equipped in handling data imbalance. Imbalance in data during testing arises from the fact that the number of unknown class data could be potentially infinite compared to the fixed set of classes used to train the network.
The evaluation of our algorithm is based on features extracted from Alexnet and VGG16, which are also used in[Perera and Patel(2019)]. The compared baseline methods too are evaluated on these features as reported in [Perera and Patel(2019)]. As seen in table 1, our method has exceeded the baseline state-of-the-art for most of the cases. Our method on VGG16 features has convincingly outperformed the method in [Perera and Patel(2019)]. The margin is of staggering 7.9 in Stanford-dogs dataset ,while in Caltech-256 it is of 1.3 . For Alexnet fetaures, Caltech-256 shows a dip in performance but in Stanford-dogs again, our method is outperforming all other baselines. We would again like to highlight that our algorithm produces these results without the knowledge of any external or auxiliary data-sets. This makes our algorithm much more computationally efficient than Deep transfer Novelty detection.
4.5 Analysis and Observation
The following points are noted and observed in our experiments:
Effect of number of prototype of class The results for our algorithm when we consider the comparison of the query element with the full set of class prototypes from the training data are provided in Table 1. We here investigate the effect of only taking the top-N class representative comparisons (as given by the softmax values from the base network namely Alexnet and VGG16) and provide the results in 2. We observe that as the value of is increased from to , the performance monotonically increases and starts saturating. This basically tells us that we need not compare the query element with all the class prototypes in the training set. This makes our algorithm quite a bit faster.
Analysis of class-wise novelty detection score: Here, we plot the Novelty Detection score (the lesser the more novel) for the images of the seen and unseen categories for the (a) Stanford Dogs and (b) Caltech-256 dataset. The first and categories are the training classes in Stanford Dogs and Caltech-256 respectively. We can draw two conclusions from the following -(1) the separation between the novelty detection score for the seen and novel categories is more for the Stanford Dogs dataset as compared to the Caltech 256. This is also reflected by overall performance of our algorithm in Table 1. (2) The curve for the VGG16 model has higher peaks and lower troughs indicating that it gives a better margin for error while detecting novelty. This is reflected by the better performance of the VGG16 model over the Alexnet model in Table 1.
Classification task: Our Model can be used for classification task as well. The classification is performed on for the known class test data split . The accuracy,shown in Table-3, is more or less similar to the softmax accuracy when tested on the finetuned base network.
|No of Class Protype|
|Dogs-Alexnet||Dogs- VGG16||Caltech- Alexnet||Caltech- VGG16|
In this work we propose a new method for multi-class novelty detection using the mixup technique. For the purpose of training our network, here we define our novel constituency loss which solves the desired objective. Our novelty detection algorithm compares favorably with the state-of-the-art without the need for any auxiliary dataset. Further analysis has shown that our method can be made much more efficient by leveraging the extra softmax confidence outputs of the pre-trained network and gives comparable results.
- [Bendale and Boult(2016)] Abhijit Bendale and Terrance E Boult. Towards open set deep networks. In
- [Berthelot et al.(2018)Berthelot, Raffel, Roy, and Goodfellow] David Berthelot, Colin Raffel, Aurko Roy, and Ian Goodfellow. Understanding and improving interpolation in autoencoders via an adversarial regularizer. arXiv preprint arXiv:1807.07543, 2018.
- [Bodesheim et al.(2013)Bodesheim, Freytag, Rodner, Kemmler, and Denzler] Paul Bodesheim, Alexander Freytag, Erik Rodner, Michael Kemmler, and Joachim Denzler. Kernel null space methods for novelty detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3374–3381, 2013.
- [Bodesheim et al.(2015)Bodesheim, Freytag, Rodner, and Denzler] Paul Bodesheim, Alexander Freytag, Erik Rodner, and Joachim Denzler. Local novelty detection in multi-class recognition problems. In 2015 IEEE Winter Conference on Applications of Computer Vision, pages 813–820. IEEE, 2015.
- [Bowman et al.(2015)Bowman, Vilnis, Vinyals, Dai, Jozefowicz, and Bengio] Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349, 2015.
- [Dumoulin et al.(2016)Dumoulin, Belghazi, Poole, Mastropietro, Lamb, Arjovsky, and Courville] Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, and Aaron Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.
Anomaly detection over noisy data using learned probability distributions.2000.
- [Eskin et al.(2002)Eskin, Arnold, Prerau, Portnoy, and Stolfo] Eleazar Eskin, Andrew Arnold, Michael Prerau, Leonid Portnoy, and Sal Stolfo. A geometric framework for unsupervised anomaly detection. In Applications of data mining in computer security, pages 77–101. Springer, 2002.
- [Ge et al.(2017)Ge, Demyanov, Chen, and Garnavi] ZongYuan Ge, Sergey Demyanov, Zetao Chen, and Rahil Garnavi. Generative openmax for multi-class open set classification. arXiv preprint arXiv:1707.07418, 2017.
- [Griffin et al.(2007)Griffin, Holub, and Perona] G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007. URL http://authors.library.caltech.edu/7694.
- [Ha and Eck(2017)] David Ha and Douglas Eck. A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477, 2017.
- [Hautamaki et al.(2004)Hautamaki, Karkkainen, and Franti] Ville Hautamaki, Ismo Karkkainen, and Pasi Franti. Outlier detection using k-nearest neighbour graph. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., volume 3, pages 430–433. IEEE, 2004.
- [Jain et al.(2014)Jain, Scheirer, and Boult] Lalit P Jain, Walter J Scheirer, and Terrance E Boult. Multi-class open set recognition using probability of inclusion. In European Conference on Computer Vision, pages 393–409. Springer, 2014.
- [Khosla et al.(2011)Khosla, Jayadevaprakash, Yao, and Li] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC), volume 2, 2011.
[Kim and Scott(2012)]
JooSeuk Kim and Clayton D Scott.
Robust kernel density estimation.Journal of Machine Learning Research, 13(Sep):2529–2565, 2012.
- [Knorr et al.(2000)Knorr, Ng, and Tucakov] Edwin M Knorr, Raymond T Ng, and Vladimir Tucakov. Distance-based outliers: algorithms and applications. The VLDB Journal—The International Journal on Very Large Data Bases, 8(3-4):237–253, 2000.
- [Krizhevsky et al.(2012)Krizhevsky, Sutskever, and Hinton] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
- [Liu et al.(2017)Liu, Lian, Wang, and Xiao] Juncheng Liu, Zhouhui Lian, Yi Wang, and Jianguo Xiao. Incremental kernel null space discriminant analysis for novelty detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 792–800, 2017.
- [Markou and Singh(2003)] Markos Markou and Sameer Singh. Novelty detection: a review—part 1: statistical approaches. Signal processing, 83(12):2481–2497, 2003.
- [Masana et al.(2018)Masana, Ruiz, Serrat, van de Weijer, and Lopez] Marc Masana, Idoia Ruiz, Joan Serrat, Joost van de Weijer, and Antonio M Lopez. Metric learning for novelty and anomaly detection. arXiv preprint arXiv:1808.05492, 2018.
- [Mathieu et al.(2016)Mathieu, Zhao, Zhao, Ramesh, Sprechmann, and LeCun] Michael F Mathieu, Junbo Jake Zhao, Junbo Zhao, Aditya Ramesh, Pablo Sprechmann, and Yann LeCun. Disentangling factors of variation in deep representation using adversarial training. In Advances in Neural Information Processing Systems, pages 5040–5048, 2016.
- [Mescheder et al.(2017)Mescheder, Nowozin, and Geiger] Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2391–2400. JMLR. org, 2017.
- [Perera and Patel(2018)] Pramuditha Perera and Vishal M Patel. Learning deep features for one-class classification. arXiv preprint arXiv:1801.05365, 2018.
[Perera and Patel(2019)]
Pramuditha Perera and Vishal M Patel.
Deep transfer learning for multiple class novelty detection.Accepted in Proceedings of the IEEE conference on computer vision and pattern recognition, 2019.
- [Perera et al.(2019)Perera, Nallapati, and Xiang] Pramuditha Perera, Ramesh Nallapati, and Bing Xiang. Ocgan: One-class novelty detection using gans with constrained latent representations. arXiv preprint arXiv:1903.08550, 2019.
- [Pimentel et al.(2014)Pimentel, Clifton, Clifton, and Tarassenko] Marco AF Pimentel, David A Clifton, Lei Clifton, and Lionel Tarassenko. A review of novelty detection. Signal Processing, 99:215–249, 2014.
- [Sabokrou et al.(2018)Sabokrou, Khalooei, Fathy, and Adeli] Mohammad Sabokrou, Mohammad Khalooei, Mahmood Fathy, and Ehsan Adeli. Adversarially learned one-class classifier for novelty detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3379–3388, 2018.
[Sadooghi and Khadem(2018)]
Mohammad Saleh Sadooghi and Siamak Esmaeilzadeh Khadem.
Improving one class support vector machine novelty detection scheme using nonlinear features.Pattern Recognition, 83:14–33, 2018.
- [Scheirer et al.(2014)Scheirer, Jain, and Boult] Walter J Scheirer, Lalit P Jain, and Terrance E Boult. Probability models for open set recognition. IEEE transactions on pattern analysis and machine intelligence, 36(11):2317–2324, 2014.
- [Schölkopf et al.(2001)Schölkopf, Platt, Shawe-Taylor, Smola, and Williamson] Bernhard Schölkopf, John C Platt, John Shawe-Taylor, Alex J Smola, and Robert C Williamson. Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443–1471, 2001.
- [Simonyan and Zisserman(2014)] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
James P Stevens.
Outliers and influential data points in regression analysis.Psychological Bulletin, 95(2):334, 1984.
- [Tokozume et al.(2018)Tokozume, Ushiku, and Harada] Yuji Tokozume, Yoshitaka Ushiku, and Tatsuya Harada. Between-class learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5486–5494, 2018.
- [Yamanishi et al.(2004)Yamanishi, Takeuchi, Williams, and Milne] Kenji Yamanishi, Jun-Ichi Takeuchi, Graham Williams, and Peter Milne. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery, 8(3):275–300, 2004.
- [Zhang et al.(2017)Zhang, Cisse, Dauphin, and Lopez-Paz] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.