Boundary of Distribution Support Generator (BDSG): Sample Generation on the Boundary

07/21/2021 ∙ by Nikolaos Dionelis, et al. ∙ 0

Generative models, such as Generative Adversarial Networks (GANs), have been used for unsupervised anomaly detection. While performance keeps improving, several limitations exist particularly attributed to difficulties at capturing multimodal supports and to the ability to approximate the underlying distribution closer to the tails, i.e. the boundary of the distribution's support. This paper proposes an approach that attempts to alleviate such shortcomings. We propose an invertible-residual-network-based model, the Boundary of Distribution Support Generator (BDSG). GANs generally do not guarantee the existence of a probability distribution and here, we use the recently developed Invertible Residual Network (IResNet) and Residual Flow (ResFlow), for density estimation. These models have not yet been used for anomaly detection. We leverage IResNet and ResFlow for Out-of-Distribution (OoD) sample detection and for sample generation on the boundary using a compound loss function that forces the samples to lie on the boundary. The BDSG addresses non-convex support, disjoint components, and multimodal distributions. Results on synthetic data and data from multimodal distributions, such as MNIST and CIFAR-10, demonstrate competitive performance compared to methods from the literature.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Anomaly detection is the identification of samples different from typical data [1, 2]

. When anomalies are not known in advance, unsupervised learning with generative models is used. The aim is to learn a model of “normality” with anomalies being detected as deviations from this model

[3, 4]. Important goals are reducing misdetections and false alarms, estimating the support of the “normal” data distribution, detecting anomalies close to the support boundary, generating within-distribution and Out-of-Distribution (OoD) data, and providing decision boundaries for inference of within and OoD.

Existing approaches to anomaly detection use probability, reconstruction [5, 6], and domain based models. GANs are trained to generate samples and fit the “normal” data distribution [7, 8]. During inference, an anomaly score of a queried test sample, , is computed by evaluating the probability of obtaining with the generator [9]. Such models belong to the probability-based methods (e.g. AnoGAN) [10, 11]. However, these models do not directly address the major problems of multimodal support and the ability to generate on the tails/boundaries. Recent approaches have tried to improve performance and alleviate shortcomings (e.g. MinLGAN and FenceGAN) [12, 13]. At present, generative models based on invertible residual networks, such as [14, 15], are lacking for unsupervised anomaly detection [16, 17]. Anomaly detection techniques show discernible limitations for detecting anomalies near the support of multimodal distributions [18, 19].

This work aims at addressing these limitations. Our aim is to detect abnormalities and generate samples on the boundary of the underlying multimodal distribution of the “normal data”. We train invertible models [14] to estimate the density of typical samples and propose a loss function for the boundary generator. We pay particular attention to anomalies close to the boundary of the data distribution and to anomalies near high-probability normal samples. We focus on the ability to model multimodal distributions with non-convex support and disjoint components. Our model is denoted by Boundary of Distribution Support Generator (BDSG). It achieves competitive performance on synthetic and typically used benchmark data. In summary, our contributions are: (a) Training invertible generative models and evaluating the use of inference for anomaly detection, and (b) Sample generation on the tails.

2 Related Work: Boundary Generation

The GAN discriminator estimates the distance between the target and model distributions, while the generator learns the mapping from the latent space, z, to the data space, x. The GAN optimization is , where the distance metric is given by , e.g. . The GAN loss is


where , , and . To perform anomaly detection, we need to change (1) and create a discriminator that can distinguish normal from abnormal. Yet, this implies the ability to have learned all underlying modes and have covered the full support of the distribution from limited data. Unfortunately, GANs tend to learn the mass of the underlying multimodal distribution well, focusing less towards the low probability regions, i.e. the tails, and have discernible problems with mode collapse [20, 19].

MinLGAN uses minimum likelihood regularization to generate data on the tail of the normal data distribution [12]. FenceGAN performs both sample generation on the boundary and anomaly detection using the generator and discriminator, respectively [13]. The generator loss is reinforced with bespoke losses to help model the boundary and the output of the discriminator is used as an anomaly threshold. However, FenceGAN does not succeed to form multimodal supports and to detect anomalies near discontinuous boundaries.

Figure 1: Flowchart of BDSG to learn the mapping .

3 The Proposed BDSG Model

We propose the BDSG to detect strong anomalies which are near the boundary of the normal data distribution. The BDSG flowchart is shown in Fig. 1. The premise of our approach is to use two generators: models data of the distribution and models data that lie close to the support boundary of the distribution. Specifically, we first train an invertible generator, , in the form of IResNet [14] and ResFlow [22], . z

follows a standard Gaussian distribution,

, and the mapping from the latent space, , to the data space, , is given by . The inverse is given by . The second step is to train a generator, , to perform sample generation on the support boundary of the data distribution, learning the mapping .

We now formulate the BDSG loss function. The first term, , guides to find the boundary, while the second term, , penalizes deviations from the “normal class” using the distance from a point to a set. The third term, , is for the scattering of the samples in the x space. is for dispersion and diversity and is the ratio of distances in the z and x spaces. With , BDSG addresses the mode collapse problem. The loss function for is


where the loss, , is given by


where and are hyper-parameters of the BDSG. In (3) and (4), the first term, , is given by


where and are estimated by an invertible model. The parameters are obtained by running Gradient Descent on , which can decrease to zero and is written in terms of the sample size, , and the batch size, . In the loss in (4), the effective dimensionality of is lower than that of .

3.1 BDSG Benefits in Sampling Complexity, Anomaly Detection, and Generation of Strong Anomalies

The Sampling Complexity Problem: To perform anomaly detection, FenceGAN estimates . This is difficult due to the rarity problem since at least points are needed on the tail of the distribution. Sampling from a distribution could fail to have even a single point in low probability regions [23, 24]. However, the FenceGAN loss does not succeed to generate a discrete boundary around multimodal distributions separately because it is based on the parallel simultaneous estimation of the density and of sample generation on the boundary. In contrast, the proposed BDSG obviates the rarity problem achieving better sampling complexity.

Anomaly Detection: During inference, a test sample, , is anomalous if and normal otherwise. In practice, a threshold, , is used instead of . The first term of the loss in (4) discriminates between normal and abnormal data.

Generating Strong Anomalies: The BDSG can generate samples lying on the tail of the data distribution, i.e. strong anomalies. First, the boundary generator, , generates samples. Then, the probability of each of these boundary samples is computed and in (4), and if , then is a strong anomalous sample.

4 Evaluation of the BDSG

We evaluate BDSG on synthetic and image data considering several criteria that measure its ability to approximate the boundary and detect anomalies. We evaluate the BDSG for anomaly detection using the Area Under the Receiver Operating Characteristics Curve (AUROC) and the Area Under the Precision-Recall Curve (AUPRC). Using the leave-one-out methodology, we compare the BDSG with the state-of-the-art models of GANomaly, AnoGAN, MinLGAN, and FenceGAN on MNIST, CIFAR-10, and other datasets for OoD.

(a) (b) (c) (d)
Figure 2: (a,b) CFS BDSG for uni and multimodal distributions. (c,d) IResNet BDSG. The red points are samples, the green points IResNet samples, and the blue points BDSG samples.

Setup: Synthetic data:

We test BDSG using two experimental setups using the multivariate Gaussian distribution, where we know the closed-form of the underlying probability density function. The first setup uses a closed-form solution (CFS) evaluation of

model distribution, in lieu of . The second setup uses from IResNet [14].

Benchmark data:

We also evaluate the BDSG on MNIST by first training an invertible generator, ResFlow, for density estimation. We then train the BDSG using a convolutional neural network (CNN), applying (

4). Then, we evaluate the performance of the BDSG on CIFAR-10. Further, we evaluate the performance of the BDSG trained on MNIST and CIFAR-10 and tested on OoD data using the algorithm convergence criteria of the proposed loss and its second term, [21].

Models: We use a fully-connected

model for synthetic data and CNN and batch normalization for images.

4.1 B(z) Model Architecture for Synthetic Data

CFS BDSG Model: Based on sensitivity analyses, we use dense fully-connected layers for , , , , and . The sample size, , affects the BDSG performance. The batch size, , affects the convergence speed and can lead to a thinner boundary. Figure 2(a) shows the boundary formed using the CFS BDSG for a unimode distribution. The red points are from the normal data distribution; the blue points are on the estimated boundary. The 2-8-8-2 model for achieves a low loss function value and converges the samples to the boundary. For a bimodal distribution in Fig. 2(b), a 2-8-8-8-2 network leads to low loss values and accurate boundary formation. The average probability of the points, which are on the boundary, is in (3). We obtain descending loss values, successfully converging to the boundary.

IResNet-Based BDSG: To show that BDSG yields competitive performance on synthetic data from multimodal distributions, we also perform a second experiment. We train our chosen invertible model, IResNet, and use the estimated density to create the boundary. If is estimated correctly, then BDSG estimates the boundary of . In Fig. 2(c), we use a 2-8-8-8-2 network for for the unimode distribution, , , , and .

For the bimodal distribution in Fig. 2(d), we use a deeper architecture for , , and . An ablation study found that in (4) is necessary, and otherwise mode collapse is encountered. In Fig. 2(d), for evaluation, we also use the boundary clustering algorithm given by


where clusters from the bimodal distribution, samples from each mode, and is the -th sample of mode . Here, is negligible, smaller than the distance from a mode/set to a set.

Figures 2(b) and (d) show that BDSG achieves successful boundary formation and stable convergence without mode collapse. BDSG is compared to FenceGAN and FenceGAN yields incomplete boundary formation between the modes.

4.2 Binary Classification and Boundary Precision

We create a grid of equidistant points in the 2D space and associate each grid point with a probability using the distribution in Fig. 2(d). Using a threshold, , to detect anomalies, we evaluate the inference performance of in (4) by computing binary classification metrics. To examine the influence of the choice of , we compute precision, recall, F1 score, and accuracy, and these scores are higher than for . To examine how accurate we estimate the boundary and to compare with IResNet, we define two Boundary Precision (BP) scores. By analogy with precision, BP1 is the percentage of -points that satisfy . BP2 is defined as the intersection of the grid points with IResNet. BP1 is always higher than BP2: , when .

Figure 3: AUROC and AUPRC evaluation on MNIST data.

4.3 Evaluation of the BDSG on Image Data

MNIST. Setup: We train ResFlow until convergence on MNIST using the leave-one-out evaluation where the anomaly class is the leave-out digit and the normal class is the remaining digits. We then train the BDSG using a CNN with batch normalization, using (4). We also examine different models such as feed-forward and residual. For , we use the entire training set and we also examine different values for in (4). After convergence, the loss is , , and . This , which is the distance from a point to a set, is smaller than the minimum set distance of every pair of MNIST digits which is approximately . For evaluation, we compare the proposed BDSG with state-of-the-art models using AUROC and AUPRC as they are commonly used evaluation criteria in the literature [5].

Findings: Figure 3 shows that BDSG achieves competitive performance compared to the alternative techniques and on average and for most digits, BDSG outperforms EGBAD, AnoGAN, and VAE in AUROC and GANomaly, EGBAD, AnoGAN, VAE, FenceGAN, and WGAN in AUPRC.

Going beyond the leave-one-out setting, we assess how BDSG performs when other OoD data are used as anomaly samples considering MNIST as normal and Fashion-MNIST and KMNIST as OoD abnormal [1]. We report results in Table 1 using algorithm convergence criteria, the proposed loss and . The loss and are lower for the normal class, digits 1 to 9, than for the anomaly class, digit 0, and the abnormal OoD data indicating that the proposed loss and its first term can be used for anomaly detection with a threshold of .

CIFAR-10. Setup: We train ResFlow and IResNet for density estimation on CIFAR-10 [15]. Next, we train BDSG using a CNN with batch normalization and applying (4).

MNIST Loss CIFAR-10 Loss
Digits 1-9 CIFAR-10
Digit 0 CIFAR-100
Table 1: Evaluation of BDSG comparing normality (MNIST Digits 1-9 and CIFAR-10) with the abnormal class and the anomaly cases (MNIST Digit 0, Fashion-MNIST, KMNIST and CIFAR-100, SVHN, STL-10), using and in (4).
Figure 4: AUROC evaluation of BDSG on CIFAR-10 data.

Findings: Figure 4 presents the AUROC for each CIFAR-10 class. On a leave-one-out evaluation, the BDSG outperforms on average EGBAD and AnoGAN. We demonstrate the efficacy of the proposed BDSG model which achieves competitive performance in AUROC compared to EGBAD, AnoGAN, and VAE. Table 1 presents the performance evaluation of the BDSG to detect abnormal OoD data from CIFAR-100, SVHN, and STL-10 using the algorithm criteria of the loss and . Both and in (4) are high for the anomaly cases deviating from normality, indicating that an anomaly detection threshold can be imposed on either the proposed cost or its second term, e.g. on and on .

5 Conclusion

For anomaly detection, the accurate determination of the support boundary is critical and in this paper, we present the BDSG which uses the loss in (4) and leverages reversibility to compute the probability at any point in x. It addresses the rarity problem and the detection of strong anomalies, and maps from z to x concentrating the images of z on the boundary. Using invertible models has advantages in improving the anomaly detection methodology by allowing to devise a generator for creating boundary samples. The BDSG performs sample generation on the boundary, addresses mode collapse, and achieves competitive performance on synthetic data from multimodal distributions and on MNIST and CIFAR-10.

6 Acknowledgment

This work was supported by the UK EPSRC Grant Number EP/S000631/1 and the UK MOD UDRC in Signal Processing.