1 Introduction
Anomaly (novelty/outlier) detection refers to the identification of novel or abnormal patterns embedded in a large amount of typical (normal) data (Miljković, 2010). Anomaly detection algorithms find application in fraud detection systems, discovering failures in the industrial domain, detection of adversarial examples, etc..
In contrast to typical binary classification problems, where every class follows some probability distribution, an anomaly is a pattern that does not conform to the expected behavior. In consequence, a completely novel type of anomalies can occur at test time, which is not similar to any known anomalies. Moreover, in most cases, we do not have access to any anomalies at training time. In consequence, novelty detection is usually solved using unsupervised approaches, such as oneclass classifiers, which focus on describing the behavior of available data (inliers). Any observation, which deviates from this behavior, is labeled as an outlier.
Our research is motivated by the idea of Support Vector Data Description (SVDD)
(Tax and Duin, 2004), which obtains a spherically shaped boundary around a dataset by usage of soft margin and penalization of data points from outside the bounding region. We propose FlowSVDD – a oneclass classifier based on flowbased models (Dinh et al., 2014), which finds a hypersphere with a minimal volume that encloses data. Since flowbased models are commonly used in the context of generative models, we redefine their cost function to minimize the volume of the bounding hypersphere instead of maximizing the loglikelihood function. On one hand, flowbased models allow us to calculate a Jacobian of a neural network at every point. In consequence, minimizing the volume of the hypersphere in the feature space leads to the minimization of the volume of the corresponding bounding region in the input space. On the other hand, since flowbased models give an explicit formula for the inverse mapping, we automatically get a parametric form for the corresponding bounding region in the input space. In contrast to deep SVDD models, our approach eliminates the problem of hypersphere collapse, which makes it easy to use.
Extensive experiments performed on typical benchmark datasets show that our method significantly outperforms the deep SVDD model while being comparative to stateoftheart models for anomaly detection.
Our contribution is summarized as follows:

We propose an adaptation of the SVDD method to deep neural networks with the use of flow models.

We show that the realization of the SVDD loss function on flowbased models prevents from hypersphere collapse.

We experimentally compare FlowSVDD with Deep SVDD and current stateoftheart methods.
2 Proposed model
Preliminaries: SVDD.
Our approach is motivated by a classical Support Vector Data Description (SVDD) (Tax and Duin, 2004), which tries to find a minimal hypersphere to enclose the data. To allow the possibility of outliers in the training set, SVDD uses a soft margin and penalizes data points that lie outside the bounding hypersphere. If maps input data to the output kernel space, then SVDD loss equals:
(1) 
where is the center and the radius of the hypersphere, respectively, and is the tradeoff between the volume and boundary violations of the hypersphere, i.e. fraction of outliers.
The realization of SVDD using deep neural networks was presented in (Ruff et al., 2018) (it was termed DSVDD). However, direct minimization of the SVDD loss may lead to a trivial solution, i.e. the hypersphere collapses to a single point . To avoid this negative behavior, it has been recommended that the center must be something other than the allzeroweights solution, and the network should use only unbounded activations and omit bias terms. While the two first conditions can be accepted, omitting bias terms in a network may lead to a suboptimal feature representation due to the role of bias in shifting activation values.
To eliminate the above restrictions a recent work (Chong et al., 2020) proposes two regularizers, which prevent hypersphere collapse, and uses an adaptive weighting scheme to control the amount of penalization between the SVDD loss and the respective regularizer.
Flowbased SVDD.
As an alternative to DSVDD, we realize the SVDD objective using flow models. Let us recall that a neural network is a flow model if the inverse mapping is given explicitly and the Jacobian determinant can be easily calculated. In our approach, we use a special class of flow models, in which Jacobian determinant is constant at every point, i.e. , such as NICE (Dinh et al., 2014). In this case, we get a natural correspondence between the volume of the bounding hypersphere in the output space and the volume of a bounding region in the input space, see below.
Let us first consider the simplest situation when . In such a scenario, the volume of any shape in the input space equals the volume of its image in the output feature space. In consequence, a direct minimization of the SVDD objective does not lead to the hypersphere collapse.
In a more general scenario, when , we need to include in the SVDD objective. Observe that the Jacobian determinant of the mapping equals 1. Thus to get the equality of the volume in the input and output space, we redefine the SVDD loss (1) as follows:
(2) 
In a test phase, a given example is deemed as an outlier if:
which is equivalent to
In other words, inliers lie inside the ball .
3 Experiments
In this section, we experimentally examine FlowSVDD and compare it with several stateoftheart approaches. FlowSVDD is implemented using the architecture of the NICE flow model (4 coupling layers – each consisted of 4 layers and 256 hidden dimensions) with constant Jacobian determinant and .
Illustrative example.
To get the intuition behind FlowSVDD, we first consider 2dimensional examples, which are easy to visualize. The results presented in Figure 1 show the resulting hyperspheres in the latent space and the corresponding bounding regions in the input space. At first glance, we can observe that the bounding region in the original space is close to the structure of inliers. In the latent space, FlowSVDD finds the center point and radius to enclose percentage of data inside the ball
. Observe that, unlike the densitybased flow models, FlowSVDD does not transform data into Gaussian distribution in a latent space.
Benchmark data for anomaly detection.
To provide quantitative assessment, we take into account Thyroid^{1}^{1}1http://odds.cs.stonybrook.edu/thyroiddiseasedataset/ and KDDCUP^{2}^{2}2http://kdd.ics.uci.edu/databases/kddcup99/kddcup.testdata.unlabeled_10_percent.gz datasets, which are typically used for anomaly detection. We use the standard training and test splits and follow exactly the same evaluation protocol as in (Wang et al., 2019)
. In particular, we use the F1 score and the Area Under Receiver Operating Characteristic curve (AUC).
Our model is compared with the following algorithms: (1) Oneclass SVM (OCSVM) (Schölkopf et al., 2001)
, (2) Deep structured energybased models (DSEBM)
(Zhai et al., 2016), (3) Deep autoencoding Gaussian mixture model (DAGMM)
(Zong et al., 2018), (4) variants of MQT – multivariate quantile map (NLL, TQM
, TQM, TQM) (Wang et al., 2019) and (5) Deep Support Vector Data Description (DSVDD) (Ruff et al., 2018)  another implementation of SVDD cost function in deep neural networks.Thyroid  

OCSVM  DSEBM  DAGMM  NLL  TQM  TQM  TQM  DSVDD  FlowSVDD  
F1  .3887  .0403  .4782  .7312  .5269  .5806  .7527    .7097 
AUC                0.749  .9797 
KDDCUP  
OCSVM  DSEBM  DAGMM  NLL  TQM  TQM  TQM  DSVDD  FlowSVDD  
F1  .7954  .7423  .9369  .9622  .9621  .9622  .9622    .9030 
AUC                  .9384 
The results presented in Table 1 show that FlowSVDD model performs better than most methods on the Thyroid dataset and is significantly better than DSVDD in terms of the AUC metric. In the case of KDDCUP, FlowSVDD achieves a score in between the classical methods and current stateoftheart.
Image datasets.
To provide further experimental verification, we use two image datasets: MNIST and FashionMNIST. In contrast to the previous comparison, these two datasets are usually used for multiclass classification and thus need to be adapted to the problem of anomaly detection. For this purpose, each of the ten classes is deemed as the nominal class while the rest of the nine classes are deemed as the anomaly class, which results in 10 scenarios for each dataset.
We additionally compare FlowSVDD with the following models: (1) Geometric transformation (GT) (Golan and ElYaniv, 2018), Variational autoencoder (VAE) (Kingma and Welling, 2013)
, Denoising autoencoder (DAE)
(Vincent et al., 2008), Generative probabilistic novelty detection (GPND) (Pidhorskyi et al., 2018), Latent space autoregression (LSA) (Abati et al., 2019). In contrast to previous experiment, we only use TQM and NLL as the only implementations of MTQ, because they output the highest value of AUC (Wang et al., 2019).To present the results, we compute the ranking on each of 10 scenarios and summarize it using a box plot, see Figure 2. The results show that FlowSVDD significantly outperforms DSVDD in both datasets. In the case of the MNIST dataset, we observe that FlowSVDD is almost as good as the current stateoftheart methods, like GT and NLL.
Finally, we analyze, which samples are localized close to or furthest from the center of bounding hypersphere. Results in Figure 3 shows that FlowSVDD maps regular images in the hypersphere center. Contrary, examples localized far from the center, could be hard to identify. It means that FlowSVDD gives results consistent with our intuition.
4 Conclusion
The paper introduced FlowSVDD, which realizes the SVDD paradigm in the case of neural networks. Making use of flowbased models and an appropriate SVDDlike cost function, we find a minimal bounding region for a majority of data. Unlike other deep SVDD realizations, FlowSVDD does not change the determinant of a Jacobian matrix, which means that the resulting hypersphere cannot collapse in a latent space. The experimental results demonstrate that FlowSVDD presents a very good performance in the case of both artificial and realworld oneclass settings.
Acknowledgements
The work of M. Śmieja was supported by the National Centre of Science (Poland) Grant No. 2018/31/B/ST6/00993. The work of P. Spurek was supported by the National Centre of Science (Poland) Grant No. 2019/33/B/ST6/00894. The work of J. Tabor was supported by the National Centre of Science (Poland) Grant No. 2017/25/B/ST6/01271.
References

Latent space autoregression for novelty detection.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 481–490. Cited by: §3.  Simple and effective prevention of mode collapse in deep oneclass classification. arXiv preprint arXiv:2001.08873. Cited by: §2.

Nice: nonlinear independent components estimation
. arXiv preprint arXiv:1410.8516. Cited by: §1, §2.  Deep anomaly detection using geometric transformations. In Advances in Neural Information Processing Systems, pp. 9758–9769. Cited by: §3.
 Autoencoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §3.
 Review of novelty detection methods. In The 33rd International Convention MIPRO, pp. 593–598. Cited by: §1.
 Generative probabilistic novelty detection with adversarial autoencoders. In Advances in neural information processing systems, pp. 6822–6833. Cited by: §3.
 Deep oneclass classification. In International conference on machine learning, pp. 4393–4402. Cited by: §2, §3.
 Estimating the support of a highdimensional distribution. Neural computation 13 (7), pp. 1443–1471. Cited by: §3.
 Support vector data description. Machine learning 54 (1), pp. 45–66. Cited by: §1, §2.
 Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pp. 1096–1103. Cited by: §3.
 Multivariate triangular quantile maps for novelty detection. In Advances in Neural Information Processing Systems, pp. 5061–5072. Cited by: §3, §3, §3.
 Deep structured energy based models for anomaly detection. arXiv preprint arXiv:1605.07717. Cited by: §3.
 Deep autoencoding gaussian mixture model for unsupervised anomaly detection. Cited by: §3.
Comments
There are no comments yet.