Flow-based SVDD for anomaly detection

08/10/2021 ∙ by Marcin Sendera, et al. ∙ 0

We propose FlowSVDD – a flow-based one-class classifier for anomaly/outliers detection that realizes a well-known SVDD principle using deep learning tools. Contrary to other approaches to deep SVDD, the proposed model is instantiated using flow-based models, which naturally prevents from collapsing of bounding hypersphere into a single point. Experiments show that FlowSVDD achieves comparable results to the current state-of-the-art methods and significantly outperforms related deep SVDD methods on benchmark datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Anomaly (novelty/outlier) detection refers to the identification of novel or abnormal patterns embedded in a large amount of typical (normal) data (Miljković, 2010). Anomaly detection algorithms find application in fraud detection systems, discovering failures in the industrial domain, detection of adversarial examples, etc..

In contrast to typical binary classification problems, where every class follows some probability distribution, an anomaly is a pattern that does not conform to the expected behavior. In consequence, a completely novel type of anomalies can occur at test time, which is not similar to any known anomalies. Moreover, in most cases, we do not have access to any anomalies at training time. In consequence, novelty detection is usually solved using unsupervised approaches, such as one-class classifiers, which focus on describing the behavior of available data (inliers). Any observation, which deviates from this behavior, is labeled as an outlier.

Our research is motivated by the idea of Support Vector Data Description (SVDD)

(Tax and Duin, 2004), which obtains a spherically shaped boundary around a dataset by usage of soft margin and penalization of data points from outside the bounding region. We propose FlowSVDD – a one-class classifier based on flow-based models (Dinh et al., 2014)

, which finds a hypersphere with a minimal volume that encloses data. Since flow-based models are commonly used in the context of generative models, we redefine their cost function to minimize the volume of the bounding hypersphere instead of maximizing the log-likelihood function. On one hand, flow-based models allow us to calculate a Jacobian of a neural network at every point. In consequence, minimizing the volume of the hypersphere in the feature space leads to the minimization of the volume of the corresponding bounding region in the input space. On the other hand, since flow-based models give an explicit formula for the inverse mapping, we automatically get a parametric form for the corresponding bounding region in the input space. In contrast to deep SVDD models, our approach eliminates the problem of hypersphere collapse, which makes it easy to use.

Extensive experiments performed on typical benchmark datasets show that our method significantly outperforms the deep SVDD model while being comparative to state-of-the-art models for anomaly detection.

Our contribution is summarized as follows:

  1. We propose an adaptation of the SVDD method to deep neural networks with the use of flow models.

  2. We show that the realization of the SVDD loss function on flow-based models prevents from hypersphere collapse.

  3. We experimentally compare FlowSVDD with Deep SVDD and current state-of-the-art methods.

2 Proposed model

Preliminaries: SVDD.

Our approach is motivated by a classical Support Vector Data Description (SVDD) (Tax and Duin, 2004), which tries to find a minimal hypersphere to enclose the data. To allow the possibility of outliers in the training set, SVDD uses a soft margin and penalizes data points that lie outside the bounding hypersphere. If maps input data to the output kernel space, then SVDD loss equals:

(1)

where is the center and the radius of the hypersphere, respectively, and is the trade-off between the volume and boundary violations of the hypersphere, i.e. fraction of outliers.

The realization of SVDD using deep neural networks was presented in (Ruff et al., 2018) (it was termed DSVDD). However, direct minimization of the SVDD loss may lead to a trivial solution, i.e. the hypersphere collapses to a single point . To avoid this negative behavior, it has been recommended that the center must be something other than the all-zero-weights solution, and the network should use only unbounded activations and omit bias terms. While the two first conditions can be accepted, omitting bias terms in a network may lead to a sub-optimal feature representation due to the role of bias in shifting activation values.

To eliminate the above restrictions a recent work (Chong et al., 2020) proposes two regularizers, which prevent hypersphere collapse, and uses an adaptive weighting scheme to control the amount of penalization between the SVDD loss and the respective regularizer.

Flow-based SVDD.

As an alternative to DSVDD, we realize the SVDD objective using flow models. Let us recall that a neural network is a flow model if the inverse mapping is given explicitly and the Jacobian determinant can be easily calculated. In our approach, we use a special class of flow models, in which Jacobian determinant is constant at every point, i.e. , such as NICE (Dinh et al., 2014). In this case, we get a natural correspondence between the volume of the bounding hypersphere in the output space and the volume of a bounding region in the input space, see below.

Let us first consider the simplest situation when . In such a scenario, the volume of any shape in the input space equals the volume of its image in the output feature space. In consequence, a direct minimization of the SVDD objective does not lead to the hypersphere collapse.

In a more general scenario, when , we need to include in the SVDD objective. Observe that the Jacobian determinant of the mapping equals 1. Thus to get the equality of the volume in the input and output space, we redefine the SVDD loss (1) as follows:

(2)

In a test phase, a given example is deemed as an outlier if:

which is equivalent to

In other words, inliers lie inside the ball .

3 Experiments

In this section, we experimentally examine FlowSVDD and compare it with several state-of-the-art approaches. FlowSVDD is implemented using the architecture of the NICE flow model (4 coupling layers – each consisted of 4 layers and 256 hidden dimensions) with constant Jacobian determinant and .

Illustrative example.

To get the intuition behind FlowSVDD, we first consider 2-dimensional examples, which are easy to visualize. The results presented in Figure 1 show the resulting hyperspheres in the latent space and the corresponding bounding regions in the input space. At first glance, we can observe that the bounding region in the original space is close to the structure of inliers. In the latent space, FlowSVDD finds the center point and radius to enclose percentage of data inside the ball

. Observe that, unlike the density-based flow models, FlowSVDD does not transform data into Gaussian distribution in a latent space.

Figure 1: Enclosing hyperspheres in the latent space of FlowSVDD (right) and the corresponding bounding regions in the input space (left).

Benchmark data for anomaly detection.

To provide quantitative assessment, we take into account Thyroid111http://odds.cs.stonybrook.edu/thyroid-disease-dataset/ and KDDCUP222http://kdd.ics.uci.edu/databases/kddcup99/kddcup.testdata.unlabeled_10_percent.gz datasets, which are typically used for anomaly detection. We use the standard training and test splits and follow exactly the same evaluation protocol as in (Wang et al., 2019)

. In particular, we use the F1 score and the Area Under Receiver Operating Characteristic curve (AUC).

Our model is compared with the following algorithms: (1) One-class SVM (OC-SVM) (Schölkopf et al., 2001)

, (2) Deep structured energy-based models (DSEBM)

(Zhai et al., 2016)

, (3) Deep autoencoding Gaussian mixture model (DAGMM)

(Zong et al., 2018)

, (4) variants of MQT – multivariate quantile map (NLL, TQM

, TQM, TQM) (Wang et al., 2019) and (5) Deep Support Vector Data Description (DSVDD) (Ruff et al., 2018) - another implementation of SVDD cost function in deep neural networks.

Thyroid
OC-SVM DSEBM DAGMM NLL TQM TQM TQM DSVDD FlowSVDD
F1 .3887 .0403 .4782 .7312 .5269 .5806 .7527 - .7097
AUC - - - - - - - 0.749 .9797
KDDCUP
OC-SVM DSEBM DAGMM NLL TQM TQM TQM DSVDD FlowSVDD
F1 .7954 .7423 .9369 .9622 .9621 .9622 .9622 - .9030
AUC - - - - - - - - .9384
Table 1: Performance on two anomaly detection datasets.

The results presented in Table 1 show that FlowSVDD model performs better than most methods on the Thyroid dataset and is significantly better than DSVDD in terms of the AUC metric. In the case of KDDCUP, FlowSVDD achieves a score in between the classical methods and current state-of-the-art.

Image datasets.

To provide further experimental verification, we use two image datasets: MNIST and Fashion-MNIST. In contrast to the previous comparison, these two datasets are usually used for multiclass classification and thus need to be adapted to the problem of anomaly detection. For this purpose, each of the ten classes is deemed as the nominal class while the rest of the nine classes are deemed as the anomaly class, which results in 10 scenarios for each dataset.

We additionally compare FlowSVDD with the following models: (1) Geometric transformation (GT) (Golan and El-Yaniv, 2018), Variational autoencoder (VAE) (Kingma and Welling, 2013)

, Denoising autoencoder (DAE)

(Vincent et al., 2008), Generative probabilistic novelty detection (GPND) (Pidhorskyi et al., 2018), Latent space autoregression (LSA) (Abati et al., 2019). In contrast to previous experiment, we only use TQM and NLL as the only implementations of MTQ, because they output the highest value of AUC (Wang et al., 2019).

Figure 2: Box plots for rankings calculated on MNIST (left) and Fashion-MNIST (right). The median ranking is marked by a line, while the average ranking is marked with a number.

To present the results, we compute the ranking on each of 10 scenarios and summarize it using a box plot, see Figure 2. The results show that FlowSVDD significantly outperforms DSVDD in both datasets. In the case of the MNIST dataset, we observe that FlowSVDD is almost as good as the current state-of-the-art methods, like GT and NLL.

Finally, we analyze, which samples are localized close to or furthest from the center of bounding hypersphere. Results in Figure 3 shows that FlowSVDD maps regular images in the hypersphere center. Contrary, examples localized far from the center, could be hard to identify. It means that FlowSVDD gives results consistent with our intuition.

Figure 3: Best nominal (left) and worst nominal (right) examples determined by FlowSVDD for MNIST (top) and Fashion-MNIST (bottom).

4 Conclusion

The paper introduced FlowSVDD, which realizes the SVDD paradigm in the case of neural networks. Making use of flow-based models and an appropriate SVDD-like cost function, we find a minimal bounding region for a majority of data. Unlike other deep SVDD realizations, FlowSVDD does not change the determinant of a Jacobian matrix, which means that the resulting hypersphere cannot collapse in a latent space. The experimental results demonstrate that FlowSVDD presents a very good performance in the case of both artificial and real-world one-class settings.

Acknowledgements

The work of M. Śmieja was supported by the National Centre of Science (Poland) Grant No. 2018/31/B/ST6/00993. The work of P. Spurek was supported by the National Centre of Science (Poland) Grant No. 2019/33/B/ST6/00894. The work of J. Tabor was supported by the National Centre of Science (Poland) Grant No. 2017/25/B/ST6/01271.

References

  • D. Abati, A. Porrello, S. Calderara, and R. Cucchiara (2019) Latent space autoregression for novelty detection. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 481–490. Cited by: §3.
  • P. Chong, L. Ruff, M. Kloft, and A. Binder (2020) Simple and effective prevention of mode collapse in deep one-class classification. arXiv preprint arXiv:2001.08873. Cited by: §2.
  • L. Dinh, D. Krueger, and Y. Bengio (2014)

    Nice: non-linear independent components estimation

    .
    arXiv preprint arXiv:1410.8516. Cited by: §1, §2.
  • I. Golan and R. El-Yaniv (2018) Deep anomaly detection using geometric transformations. In Advances in Neural Information Processing Systems, pp. 9758–9769. Cited by: §3.
  • D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §3.
  • D. Miljković (2010) Review of novelty detection methods. In The 33rd International Convention MIPRO, pp. 593–598. Cited by: §1.
  • S. Pidhorskyi, R. Almohsen, and G. Doretto (2018) Generative probabilistic novelty detection with adversarial autoencoders. In Advances in neural information processing systems, pp. 6822–6833. Cited by: §3.
  • L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Müller, and M. Kloft (2018) Deep one-class classification. In International conference on machine learning, pp. 4393–4402. Cited by: §2, §3.
  • B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson (2001) Estimating the support of a high-dimensional distribution. Neural computation 13 (7), pp. 1443–1471. Cited by: §3.
  • D. M. Tax and R. P. Duin (2004) Support vector data description. Machine learning 54 (1), pp. 45–66. Cited by: §1, §2.
  • P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol (2008) Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pp. 1096–1103. Cited by: §3.
  • J. Wang, S. Sun, and Y. Yu (2019) Multivariate triangular quantile maps for novelty detection. In Advances in Neural Information Processing Systems, pp. 5061–5072. Cited by: §3, §3, §3.
  • S. Zhai, Y. Cheng, W. Lu, and Z. Zhang (2016) Deep structured energy based models for anomaly detection. arXiv preprint arXiv:1605.07717. Cited by: §3.
  • B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen (2018) Deep autoencoding gaussian mixture model for unsupervised anomaly detection. Cited by: §3.