Mirrored Autoencoders with Simplex Interpolation for Unsupervised Anomaly Detection

by   Y. Wu, et al.
University of Maryland

Use of deep generative models for unsupervised anomaly detection has shown great promise partially owing to their ability to learn proper representations of complex input data distributions. Current methods, however, lack a strong latent representation of the data, thereby resulting in sub-optimal unsupervised anomaly detection results. In this work, we propose a novel representation learning technique using deep autoencoders to tackle the problem of unsupervised anomaly detection. Our approach replaces the L_p reconstruction loss in the autoencoder optimization objective with a novel adversarial loss to enforce semantic-level reconstruction. In addition, we propose a novel simplex interpolation loss to improve the structure of the latent space representation in the autoencoder. Our technique improves the state-of-the-art unsupervised anomaly detection performance by a large margin on several image datasets including MNIST, fashion MNIST, CIFAR and Coil-100 as well as on several non-image datasets including KDD99, Arrhythmia and Thyroid. For example, On the CIFAR-10 dataset, using a standard leave-one-out evaluation protocol, our method achieves a substantial performance gain of 0.23 AUC points compared to the state-of-the-art.


page 4

page 22


ODDObjects: A Framework for Multiclass Unsupervised Anomaly Detection on Masked Objects

This paper presents a novel framework for unsupervised anomaly detection...

Anomaly Detection with Adversarially Learned Perturbations of Latent Space

Anomaly detection is to identify samples that do not conform to the dist...

Anomaly Detection Based on Selection and Weighting in Latent Space

With the high requirements of automation in the era of Industry 4.0, ano...

DDR-ID: Dual Deep Reconstruction Networks Based Image Decomposition for Anomaly Detection

One pivot challenge for image anomaly (AD) detection is to learn discrim...

Limiting the Reconstruction Capability of Generative Neural Network using Negative Learning

Generative models are widely used for unsupervised learning with various...

Anomaly Detection With Partitioning Overfitting Autoencoder Ensembles

In this paper, we propose POTATOES (Partitioning OverfiTing AuTOencoder ...

Unsupervised Anomaly Detection From Semantic Similarity Scores

In this paper, we present SemSAD, a simple and generic framework for det...

Code Repositories


Codebase for Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders paper

view repo

1 Introduction

Data distributions encountered in different applications are typically noisy and may contain out-of-distribution samples, also called outliers or anomalies 

[7]. Detecting these anomalous patterns is crucial in many applications: In medical imaging, detecting anomalous patterns in X-ray and MRI scans could aid doctors diagnose patients more effectively [19]. Detecting anomalous behaviour in credit card usage patterns help banks identify fraudulent users [18]. Detecting anomalous objects such as guns in baggage scans can identify hazardous materials in airport screening systems [1]

. Anomaly detection systems are also used as a pre-processing step in many machine learning pipelines 


. For instance, a system for detecting cats vs. dogs could assign high probability scores to car samples, as it is required to predict car samples as one of the two classes. An anomaly detection step can weed out such anomalous patterns before passing it to the recognition system.

Anomaly detection is a long standing problem in machine learning and computer vision 

[7, 33]

. The problem is typically addressed in a supervised, semi-supervised or unsupervised framework. In supervised and semi-supervised anomaly detection, access to a few (or many) labeled anomalous samples are assumed, and is typically solved as a supervised learning problem. Unsupervised anomaly detection, on the other hand, is a much harder problem than the previous ones as anomalous samples are not available in the training time. Instead, we are given an input dataset, with a goal of detecting any out-of-distribution sample that does not belong to the provided input dataset. The absence of out-of-distribution samples during training makes the unsupervised anomaly detection problem a challenging one.

Unsupervised setting in anomaly detection is an important one to address primarily for the following reasons: In many applications like detecting fraudulent credit card users, number of labeled anomalous samples can be much smaller than normal samples. Supervised classification in this case typically leads to over-fitting. Or in some applications, annotating data can be expensive e.g., medical imaging. Even in cases where a few types of anomalous samples are labeled (eg., broken arms in X-ray images), supervised models could achieve high performance in detecting the same type of anomaly. But if a new type of anomaly is presented at test-time, these models fail to generalize. By addressing anomaly detection in unsupervised regime, we focus on detection any class of out-of-distribution samples. Additionally, we do not rely on any labeled information, hence data scarcity is no longer an issue.

Figure 1: Anomaly detection pipeline: Given two samples and , we first compute their latent codes using Mirrored Autoencoder (Fig. 3 (b)) trained with Simplex interpolation (Fig. 3 (a)). These latent codes are decoded back to the image space to obtain and . A discriminator network (used in mirrored auto-encoder training) then extracts the features representations of (input, reconstruction) pairs. These features are used to compute anomaly scores. Use of simplex interpolation and mirrored autoencoders during training gives good latent representations well suited for anomaly detection.

Use of deep generative models has received much attention recently for unsupervised anomaly detection. The core idea is to learn the input distribution using a generative model such as a GAN or an auto-encoder, and to flag a sample as anomalous if it lies far away from the generative manifold. Since estimating the distance of a sample from a generative manifold is a hard problem, proxy measures are typically used as anomaly scores. In

[23], a linear combination of distance between images and discriminator feature representations is used as anomaly score. In [1] and [2], an encoder network is learnt with an auto-encoding objective, and the distance between encoded feature representations of the input and reconstructed image is used as anomaly score. In [4], distance of a test sample from a GAN manifold along with the latent likelihood and an entropy term is proven to be a estimator of sample likelihood, which is a natural candidate for anomaly score.

In this work, we propose a novel approach for unsupervised anomaly detection problem by learning a powerful autoencoder model that contains two novel components. First, we introduce Mirrored Adversarial Autoencoder, a variant of autoencoder in which we replace the

loss with an adversarial loss on the joint distribution of input and its mirrored reconstructed samples. As shown in Fig

3, the autoencoder model employs a discriminator network that discriminates between (input, input) and (input, reconstruction) pairs, while the encoder-generator pair is trained using an adversarial loss derived from the discriminator network. Second, we extend the interpolation idea proposed in [6] and introduce a novel interpolation scheme for autoencoders, called Simplex Interpolation, in which we make the reconstructions corresponding to simplex interpolations of real latent samples look realistic. This is realized using an adversarial loss, where a discriminator is trained to predict simplex coefficients given the reconstructed images, and the autoencoder is trained to fool the discriminator (see Fig. 3). The proposed interpolation scheme yields a better-clustered latent representation.

The resulting autoencoder performs extremely well on the unsupervised anomaly detection task. On CIFAR-10 dataset, using a leave-one-out evaluation protocol, the best performing prior approach can only obtain a AUC score of around (refer Table. 1). Our approach, on the other hand, achieves a substantial performance gain of 0.23 AUC points, thus achieving a new state-of-the-art for the problem. In particular, even for harder classes like Bird, in which prior approaches consistently under-perform, our approach achieves a performance gain over . Our approach is versatile, and can be applied on non-image datasets as well. We achieve the state-of-the-art performance on three non-image datasets: KDD99, Thyroid and Arhythemia, especially obtaining an improvement of over on Thyroid dataset.

In summary, our key contributions are as follows:

  • We propose a novel autoencoder model, called Mirrored Adversarial Autoencoder, in which we replace the loss with an adversarial loss involving joint distribution of original image and the reconstructed one.

  • We propose a novel interpolation scheme, called Simplex adversarial interpolation to obtain a rich clustered and semantically meaningful latent representation in an auto-encoder.

  • The two schemes are used in the unsupervised anomaly detection problem, where we achieve the state-of-the-art results on CIFAR-10, KDD99, Thyroid and Arhythemia datasets.

2 Related Work

Figure 2: T-SNE visualization of latent space using different interpolation techniques: Blue points denote anomalous samples, while red points denote normal samples. ”No interpolation” denotes a mirrored autoencoder model trained with no interpolation. The panel on the top shows TSNE with truck class in CIFAR-10 dataset used as the anomaly class, while the panel on the bottom shows TSNE using ship as the anomaly class. All models are trained on the other CIFAR-10 categories leaving out the anomaly class. We find that simplex interpolation results in good separation of latent space of normal and anomaly samples (More results are shown in supplementary material).

Traditional methods for anomaly detection has been surveyed in detail in [7][17][33]. Some techniques for unsupervised anomaly detection includes using one-class SVM [24] to find the classification boundary of the normal data, using clustering method [25] to force similarity between members from the same cluster, etc. Eskin [8] project data points into feature space and find anomalous points in the sparsity region of feature space. However, these methods can only be used on low dimensional data distributions, perform poorly in high dimensional settings.

Recently, there has been much interest in using deep generative models for unsupervised anomaly detection. Approaches are either based on GAN, AutoEncdoer or Variational Auto-Encoder model. Zhou [32] build a robust denoising auto encoder model, and detects anomalous samples using reconstruction error. Zong [34] and Zhai [31] directly learns a generative model on normal data distribution using mixture of Gaussians.

One of the first works that uses GAN model for anomaly detection is [23]. A GAN model is trained on normal samples, and a technique for inverting images to latent space is proposed. At test time, both normal samples and abnormal ones are mapped into the latent space and the generator model reconstructs them. Anomaly score is calculated using an norm between the difference of normal samples and the reconstructions.  [1] [2] [5] train GAN model simultaneously with an encoder network for mapping images back into the latent space. Zenati  [30] propose ALAD model, which is a BiGAN network for anomaly detection. FGAN  [14]

trains a GAN model to generate images along boundary of the normal distribution, and directly uses the discriminator score as anomaly threshold.

3 Interpolation

Interpolation is a way to enhance the structure of the latent space in an autoencoder. By forcing intermediate points along the interpolation to be indistinguishable from real data distribution, Berthelot et al. [6] find that the representation in latent space gets enhanced, leading to improved performance on downstream tasks such as supervised learning and clustering.

First, let us understand why interpolation can improve anomaly detection. Consider the Figure. 2

(a) - the TSNE visualization of normal and anomalous latents of a vanilla autoencoder. Even though the autoencoder is trained only on normal samples, we find that the latent space of anomaly samples is mixed up with normal samples. This results in poor anomaly detection performance. Ideally, we would like to have a loss function that separates the manifolds of normal and anomaly samples. However, the absence of anomaly samples in the training phase prohibits using such a loss term. Instead, we can perform space filling, where we force the space between normal latents to be occupied by in-distribution samples. This will produce tight clustering of normal distributions, and anomaly distributions will inevitable fall out of this cluster. Simplex interpolation is an exact realization of this space filling. By forcing reconstructions of convex-hulls of normal samples look realistic, we fill the space between the latent distributions of normal samples.

3.1 Background: Berthelot et al.’s Interpolation

Berthelot et al. [6] investigates the use of adversarial loss to force the semantic consistency in image space using interpolation in latent space. First, latents corresponding to pairs of input images are generated, and a convex combination of the these latents are formed and decoded. A critic network takes this decoded image as input, and attempts to recover the coefficient of convex combination. The autoencoder is then trained so that the critic fails (assigns a coefficient ).

Let us denote the encoder and decoder network as and respectively. For two data points and , and are their latent representations. Then, the linear interpolation of these two points can be represented as: , where is constrained to be in the range . is first decoded as , which is then passed the critic network . is trained to distinguish real samples from interpolated ones by predicting 0 for non-interpolated inputs, and for interpolated samples. The loss that optimizes can be written as:


Meanwhile, autoencoder is trained to fool to give 0 for interpolations.

Figure 3: Frameworks. On the left panel, we show -point simplex interpolation. predicts how far the decoded interpolated image is from the original image, while autoencoder is trained to fool the discriminator. On the right panel, we show Mirrored Adversarial Autoencoder (MAE) model. MAE is trained to minimize the Wasserstein distance between the joint distribution of (input, input) and (input, reconstruction) samples.

3.2 Our proposal: -point Simplex Interpolation

In this section, we introduce our simplex interpolation scheme. Our method includes a number of modifications to the Berthelot et al..’s  [6] interpolation method. First, we train the on the joint distribution of the training images and the decoded interpolated images to force the encoder to generate semantically similar images from points close in latent space, rather than simply forcing all interpolated images to be indistinguishable from the training set as a whole, as in the Berthelot et al. formulation. Secondly, we extend line interpolation to simplex interpolation to cover more points in the latent space. This results in improved space-filling.

Two-point Simplex Interpolation. In order to estimate the distance of image generated from interpolated point to two non-interpolated images, we introduce a discriminator trained on joint distribution of real and decoded interpolated image. For a given pair of training images, an interpolated image is first generated by decoding a convex combination of their latents. is trained separately on pairs of each of the image and the interpolating point to recover the distance in latent space between the encodings of each of the two training images and the interpolated image. The formula can be formalized as:


always gives 0 to the pair of points that share the same semantics. When equals 1, should give us 0 since and it should have same semantic meaning as . On the other hand, is supposed to output 1 since and have totally different semantic meanings.

A general case. Sainburget al. [21] argue that pairwise interpolation between samples of do not reach all points within the latent distribution, and will not necessarily make the latent distribution convex. Simplex Interpolation can cover points that line interpolation cannot cover. However, the loss function defined in Berthelot et al. [6] algorithm (Eq. (1)) is tailored for -point interpolation, and replacing Eq. (1) for predicting vector instead of the scalar coefficient did not converge. Our approach (Eq.(2)), on the other hand, can be directly extended for - point simplex interpolation since it measures how far interpolant is from each vertex of the simplex. The equations can be written as:


Meanwhile, the autoencoder is trained to fool to give 0 for interpolated points, which can be written as

where n is the number of images used to interpolate ( corresponds to - point simplex interpolation). Note that in Berthelot et al.’s formulation  [6] there is no term before the since they just consider the distance of decoded interpolated image to one of original images. However, in our algorithm, is very crucial for the following reason: If , then decoded image is closer to . Hence, the encoder-decoder loss corresponding to should receive a higher weight. Similarly, if equals 0, has no relation to , therefore there is no need to force and to generate a close to . So, we propose scaling the discriminator loss with the term.

Model Bird Car Cat Deer Dog Frog Horse Plane Ship Truck Avg.
Fence GAN [14] 0.67 0.71 0.68 0.75 0.66 0.79 0.75 0.51 0.52 0.73 0.68
EGBAD [30] 0.38 0.51 0.44 0.37 0.48 0.35 0.52 0.57 0.41 0.56 0.46
Ano-GAN [23] 0.41 0.49 0.39 0.33 0.39 0.32 0.39 0.52 0.57 0.51 0.43
Skip-GANomaly [2] 0.44 0.95 0.60 0.69 0.61 0.93 0.78 0.80 0.66 0.91 0.73
2-point Interpolation 0.97 0.83 0.99 0.99 0.92 0.94
3-point Interpolation 0.969 0.99 0.99 0.970 0.961
Table 1: Anomaly detection performance on CIFAR-10 dataset. Each column denotes an anomaly class. Performance is reported in AUC scores.

4 Mirrored Adversarial AutoEncoder

For any autoencoder training, either or reconstruction loss between original image and its reconstruction has been used, which can be define as where . We propose to replace the pixel-level losses with a sementic-level reconstruction loss that is suited for the unsupervised anomaly detection.

or reconstruction losses typically result in blurry reconstructions. Moreover, using it as an anomaly score provides poor estimates as

distances do not measure the semantic similarity between images. Additionally, a high

reconstruction loss between input and decoded image can be an outcome of poor reconstruction quality and not because the image is an outlier, hence it results in poor anomaly scores. Our proposal is to replace reconstruction loss with a novel adversarial loss, which is motivated by the following reasons: (1) To improve the quality of reconstructions, (2) Use of discriminator to obtain a semantically meaningful measure of anomaly score.

4.1 Reconstruction Discriminator

We use a discriminator to measure the Wasserstein distance between the joint distribution and . This approach differs from conventional Wasserstein GAN-based architectures  [3] as joint distribution between image and reconstructed images are minimized instead of the marginal distributions. The reason for using such a discriminator is as follows: For training autoencoders, we are required to reconstruct a sample that looks similar to that of input sample. Just minimizing the Wasserstein distance between marginals of real and generated samples might result in a situation where input and generated sample both belong to the same distribution, yet semantically different. For example, a cat image in a CIFAR dataset can be reconstructed as an airplane. This will still be a feasible solution since both airplane and cat belong to the same input distribution, hence wasserstein distance will be small.

To resolve this issue, we perform Wasserstein minimization between the joint distributions and . The discriminator now takes in pairs of input images and . This clearly avoids the problems discussed in the previous section as the distribution always has pairs of samples that are similar looking. If a car image reconstructs as airplane, the generated distribution will contain (car, airplane) sample, which is never found in the input distribution . Hence, the model will always generate samples sharing the same semantics. We would like to point out that the formulation presented here is equivalent to matching conditional distributions between

. This model also shares similarities to discriminator architectures used in conditional image to image translation such as Pix2Pix 


Mathematically, our formulation can be written as:



Lemma 1

If E and G are optimal encoder and generator networks, i.e., , then =

4.2 Latent Space Regularizer

In addition to Wasserstein minization between joint distributions of image-reconstruction pairs, we use a latent space regularization to regularize the norm of the latent codes. We find this regularization useful in practice for obtaining good anomaly detection scores.

where d is the dimension size of your latent space representation.

Figure 4: Autoencoder reconstructions: The panel on the left shows input and reconstructions of samples from normal distribution, while panel on the right shows it on anomaly distribution. In each panel, the top row corresponds to input and the bottom row shows reconstructions. While the quality of reconstructions are good for normal samples, anomaly samples are either reconstructed as birds or blurry images.

5 Unsupervised Anomaly Detection

The previous sections discussed two techniques for training autoencoders with improved latent representations: Simplex Interpolation and Mirrored Adversarial Autoencoders. In this section, we discuss how such autoencoder models can be used for unsupervised anomaly detection problem. The use of simplex interpolation helps obtain a compact and a clustered latent space for normal samples. As discussed in Section. 3, interpolation performs space-filling where the space between latent distributions of normal samples are made to look like normal distribution. Hence, latent codes of anomaly samples has to lie outside this distribution, which naturally gives a good separation between normal and anomaly regions in the latent space. This results in improved anomaly detection performance. Mirrored Adversarial Autoencoders, as discussed in Section. 4, learns autoencoders using an adversarial loss based on Wasserstein minimization between joint distributions of real and decoded samples. The learnt discriminator network provides a good feature representation to detect if the tuple of (input, reconstruction) sample belongs to the input distribtion. We show that this discriminator representation provides a good estimate of anomaly score.

5.1 Training objective

First, we would like to point out that two discriminator models are used in out training pipeline: - discriminator used in interpolation step of simplex adversarial interpolation, and - discriminator used in reconstruction step in autoencoder training. and are updated according to Eq. (3) and Eq. (4) respectively. Encoder-decoder pair, on the other hand, has the following two objectives: (1) Autoencoder update: Minimizing the Wasserstein distance between the joint distribution of and , and (2) Interpolation update: Forcing the interpolated points to look realistic. Overall objective can be written as:


where is a scalar hyper-parameter which controls the weight of the interpolation loss. denotes the number of images used to interpolate ( corresponds to 2-point interpolation).

5.2 Anomaly Score

Let denote the response of the pen-ultimate layer of the discriminator network when the pair is used as input. This gives the feature embedding of the pair of points . We measure define anomaly score as the norm difference between the feature embeddings:

Similar measures of anomaly scores are used by [2, 30].

Figure 5: In the left panel, we show anomaly detection AUC scores on CIFAR-10 dataset. Our approach significantly improves AUC scores over prior approaches on all settings. On the right panel, we show some ablation studies on use of regualrizers and of varying in -point simplex interpolation.

6 Experiments

Dataset Model Precision Recall F1 score
KDD99 IF [13] 0.9216 0.9373 0.9294
DSEBM-r[31] 0.8521 0.6472 0.7328
DSEBM-e [31] 0.8619 0.6446 0.7399
DAGMM [32] 0.9297 0.9442 0.9369
AnoGAN [23] 0.8786 0.8297 0.8865
ALAD[30] 0.9427 0.9577 0.9501
3- Simplex (Ours) 0.9527 0.9677 0.9601
Arrhythmia IF 0.5147 0.5469 0.5303
DSEBM-r 0.1515 0.1513 0.1510
DSEBM-e 0.4667 0.4565 04601
DAGMM 0.4909 0.5078 0.4983
AnoGAN 0.4118 0.4375 0.4242
ALAD 0.5000 0.5313 0.5152
3- Simplex (Ours) 0.5294 0.5625 0.5455
Thyroid PAE-GMM 0.4532 0.4881 0.4688
DSEBM-r 0.0404 0.0403 0.0403
DSEBM-e 0.1319 0.1319 0.1319
DAGMM 0.4766 0.4834 0.4782
ALAD 0.4583 0.4681 0.4632
3- Simplex (Ours) 0.6875 0.7021 0.6947
Table 2: Anomaly detection performance on non-image tabular datasets. Our approach consistently improves the performance on all three datasets: KDD99, Arrhythmia and Thyroid.
Outlier    Fence GAN [14]    EGBAD[30]    Ano-GAN[23]    GANomaly[1]    Ours
0 0.900 0.780 0.620 0.800 0.991
1 0.910 0.300 0.492 0.300 0.982
2 0.900 0.692 0.399 0.900 0.992
3 0.850 0.527 0.335 0.700 0.996
4 0.820 0.494 0.393 0.700 0.995
5 0.843 0.453 0.321 0.780 0.993
6 0.830 0.582 0.399 0.800 0.998
7 0.880 0.387 0.516 0.690 0.974
8 0.850 0.413 0.567 0.800 0.991
9 0.737 0.525 0.511 0.500 0.899
Avg 0.852 0.515 0.455 0.697 0.979
Table 3: Anomaly detection performance (AUC scores) on MNIST dataset. Each column denotes an anomaly class.
Method Config1 Config2 Config3
AUC score F1 score AUC score F1 score AUC score F1 score
REAPER [11] 0.900 0.892 0.877 0.703 0.824 0.541
Outlier Pursuit [28] 0.908 0.902 0.837 0.686 0.822 0.528
DPCP [27] 0.900 0.882 0.859 0.684 0.804 0.511
thresholding [26] 0.991 0.978 0.992 0.941 0.991 0.897
R-graph [29] 0.997 0.990 0.996 0.970 0.996 0.955
GPND [16] 0.968 0.979 0.945 0.960 0.919 0.941
Ours 0.987 0.993 0.971 0.983 0.977 0.986
Table 4: Performance of our methods on Coil-100 dataset. Config 1 - Inliers: One category of images, Outliers: , Config 2 - Inliers: Four category of images, Outliers: , Config 3 - Seven category of images, Outliers:
Date    ALOCC DR [20]    ALOCC D[20]    DCAE[22]    GPND[16]    OCGAN [15]    Ours
FMNIST 0.88 0.82 0.899 0.932 0.977 0.994
Table 5:

Mean One-class novelty detection on FMNIST dataset

CIFAR10. To test unsupervised anomaly detection on CIFAR-10 dataset, we used the commonly-used leave-one-out protocol [1] [2], in which a samples from one of the CIFAR-10 classes is used as anomalous samples, and all other classes are used as normal samples (training data). Since the setting is unsupervised, training data only consists of normal data and anomalous samples should not be used while training. Experiments are repeated for trials, each time using one of the CIFAR-10 classes as anomaly.

Our approach is optimized using the objective function 5.1. For the exact algorithm, please refer to the Supplementary material. In all our models, we used = 0.5. To optimize , and models, we used Adam optimizer with initial learning rate = 3e-4, and momentum = 0, = 0.999. The encoder model is optimized using Adam optimizer with initial learning rate = 3e-4 and momentum = 0.5, = 0.999. Among all models trained, we pick the best model as the one that gives least discriminator feature difference loss (

loss between the discriminator features of the input samples and the reconstructed ones) on training samples after 60 epochs. Experiments are performed using two NVIDIA GTX-2080TI GPUS.

To evaluate our models, we compute the anomaly scores for normal and anomalous samples, and measure their AUC scores. Plot of our AUC scores compared with other approaches are reported in Figure. 5. We find that our approach significantly improves the AUC scores compared to the prior approaches. On an average, we get an improvement of 0.25 AUC points, which is a significant improvement. Additionally, most of the prior approaches fail (achieve a AUC score of less than ) on hard classes like bird (In CIFAR-10, bird class is similar to airplane class). Out approach achieves a performance of , which is a phenomenal improvement in performance in such hard classes.

MNIST. We also evaluate our simplex interpolation model on MNIST dataset. We used the same leave-one-out protocol as CIFAR10 experiment, data points from a class as anomalous sample and data points from the other 9 classes as normal samples. Training data only consists of normal data and anomalous samples should not be used while training. Experimentsare repeated for10trials, each time using one of the MNIST classes as anomaly. The results on table3 shows that ourmethod can reach AUC around 0.97 in MNIST. Please refer to supplementary material for more details on model architectures and hyper-parameters.

Coil-100 and FMNIST. GPND [16] and OCGAN [15] also evaluate their performance on Coil-100 and FMNIST. We use the same experiment design to test our model performance on these two dataset. We take randomly n categories, where n 1, 4, 7 and randomly sample the rest of the categories for outliers. We repeat this procedure 30 times. Result in Table  4 shows our method can compete them in Coil-100.
For FMNIST, 80% of in-class samples are used for training, 20% of in-class samples are used for testing. Negative samples are randomly selected so that they take up 50% of the test dataset. We leave one class as normal and others as anomalous samples, the final AUC score is calculated as average of 10 labels. In Table  5, our method is able to compete OCGAN [15] on FMNIST dataset.

Experiments on non-image dataset. Our approach is versatile, and can be applied to non-image datasets as well. We evaluate our simplex interpolation model on publicly available tabular data set KDDCup99 10%, Arrhythmia and Thyroid [12]. In these datasets, samples of KDDCup99, of Arrhythmia, of Thyroid are labelled as anomalous. We evaluate anomaly detection performance using Precision, Recall and F1 score metrics, as done in previous approaches [30, 31, 34]. We randomly sample of the data as training set, and remove anomalous samples from these. The resulting dataset is used as our training data.

At test time, we assume that fraction of anomaly samples in each dataset is known ( in KDDCup99, in Arrhythmia, in Thyroid). This is the protocol used in [30, 31, 34]. For the test set, we compute anomaly scores for each sample, sort them by anomaly scores and assign top- of samples as anomalous, where

is the percentage of anomaly samples in each dataset. With these are assignments as predictions, we compute the evaluation metrics. Results are reported in Table. 

2. We observe that our approach achieves the state-of-the-art performance on all three datasets. In particular, we obtain significant gains of 0.25 performance points on Thyroid dataset. Please refer to supplementary material for more details on model architectures and hyper-parameters.

Ablation study

Our objective function consists of three main components, as shown in Eq. (5.1): (1) Autoencoder training loss, (2) Simplex interpolation and (3) Latent space regularization. In this experiment, we perform an ablation study of each of these components. For all experiments, we use Mirrored Autoencoder as our base architecture. Among simplex interpolation, we compare against 2-point and 3-point interpolation. In our experiments, we observe that performance saturates beyond . Results of the ablation study for CIFAR-10 is provided in Fig. 5. We make following observations: (1) Interpolation improves performance compared to not using any interpolation, and (2) Among the different interpolation techniques, Simplex interpolation outperforms Berthello interpolation, (3) 3-point simplex interpolation achieves improvements over using 2-point interpolation. These best performance is obtained by using 3-point simplex interpolation with a combination of all three terms in the objective of Eq. (5.1)

7 Conclusion

In this paper, we introduced a new method for the unsupervised anomaly detection problem based on a novel representation learning technique using deep autoencoders that contains two novel components: (1) Mirrored Adversarial Autoencoder that replaces the reconstruction loss in the autoencoder optimization objective with a novel adversarial loss to enforce semantic-level reconstruction, and (2) Simplex Interpolation that extends the interpolation idea of [6] to improve the structure of the latent space representation in the autoencoder. We showed that our proposed method improves the state-of-the-art by a large margin on benchmark anomaly detection datasets. We note that ideas proposed in this work can be potentially used in the semi-supervised anomaly detection problem where we have access to few anomaly samples during the training. We leave this for the future work.

8 Acknowledgment

This work was supported in part by NSF CAREER AWARD 1942230, a sponsorship from Capital One and IBM Faculty award.


  • [1] S. Akcay, A. A. Abarghouei, and T. P. Breckon (2018) GANomaly: semi-supervised anomaly detection via adversarial training. CoRR abs/1805.06725. External Links: Link, 1805.06725 Cited by: §1, §1, §2, Table 3, §6.
  • [2] S. Akçay, A. A. Abarghouei, and T. P. Breckon (2019) Skip-ganomaly: skip connected and adversarially trained encoder-decoder anomaly detection. CoRR abs/1901.08954. External Links: Link, 1901.08954 Cited by: §1, §2, Table 1, §5.2, §6.
  • [3] M. Arjovsky, S. Chintala, and L. Bottou (2017-06–11 Aug) Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 70, International Convention Centre, Sydney, Australia, pp. 214–223. Cited by: §4.1.
  • [4] Y. Balaji, H. Hassani, R. Chellappa, and S. Feizi (2019-09–15 Jun) Entropic GANs meet VAEs: a statistical approach to compute sample likelihoods in GANs. In Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 414–423. Cited by: §1.
  • [5] A. Berg, J. Ahlberg, and M. Felsberg (2019) Unsupervised learning of anomaly detection from contaminated image data using simultaneous encoder training. CoRR abs/1905.11034. External Links: Link, 1905.11034 Cited by: §2.
  • [6] D. Berthelot, C. Raffel, A. Roy, and I. J. Goodfellow (2018) Understanding and improving interpolation in autoencoders via an adversarial regularizer. CoRR abs/1807.07543. External Links: Link, 1807.07543 Cited by: §1, §3.1, §3.2, §3.2, §3.2, §3, §7.
  • [7] V. Chandola, A. Banerjee, and V. Kumar (2007) Anomaly detection: a survey. Cited by: §1, §1, §2.
  • [8] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo (2002-02) A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. Applications of Data Mining in Computer Security 6, pp. . External Links: Document Cited by: §2.
  • [9] D. Hendrycks, M. Mazeika, and T. Dietterich (2019) Deep anomaly detection with outlier exposure. In International Conference on Learning Representations, External Links: Link Cited by: §1.
  • [10] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017)

    Image-to-image translation with conditional adversarial networks

    CVPR. Cited by: §4.1.
  • [11] G. Lerman, M. B. McCoy, J. A. Tropp, and T. Zhang (2012) Robust computation of linear models, or how to find a needle in a haystack. CoRR abs/1202.4044. External Links: Link, 1202.4044 Cited by: Table 4.
  • [12] M. Lichman (2013) UCI machine learning repository. Online. External Links: Link, 1905.11034 Cited by: §6.
  • [13] F. T. Liu, K. M. Ting, and Z. Zhou (2012-03) Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data 6 (1). External Links: ISSN 1556-4681, Link, Document Cited by: Table 2.
  • [14] C. P. Ngo, A. A. Winarto, C. K. L. Kou, S. Park, F. Akram, and H. K. Lee (2019) Fence GAN: towards better anomaly detection. CoRR abs/1904.01209. External Links: Link, 1904.01209 Cited by: §2, Table 1, Table 3.
  • [15] P. Perera, R. Nallapati, and B. Xiang (2019) OCGAN: one-class novelty detection using gans with constrained latent representations. CoRR abs/1903.08550. External Links: Link, 1903.08550 Cited by: Table 5, §6.
  • [16] S. Pidhorskyi, R. Almohsen, D. A. Adjeroh, and G. Doretto (2018) Generative probabilistic novelty detection with adversarial autoencoders. CoRR abs/1807.02588. External Links: Link, 1807.02588 Cited by: Table 4, Table 5, §6.
  • [17] M. A. F. Pimentel, D. A. Clifton, L. Clifton, and L. Tarassenko (2014-06) Review: a review of novelty detection. Signal Process. 99, pp. 215–249. External Links: ISSN 0165-1684, Link, Document Cited by: §2.
  • [18] U. Porwal and S. Mukund (2018)

    Credit card fraud detection in e-commerce: an outlier detection approach

    CoRR abs/1811.02196. External Links: Link, 1811.02196 Cited by: §1.
  • [19] P. Rajpurkar, J. Irvin, A. Bagul, D. Y. Ding, T. Duan, H. Mehta, B. Yang, K. Zhu, D. Laird, R. L. Ball, C. Langlotz, K. S. Shpanskaya, M. P. Lungren, and A. Y. Ng (2017) MURA dataset: towards radiologist-level abnormality detection in musculoskeletal radiographs. CoRR abs/1712.06957. External Links: Link, 1712.06957 Cited by: §1.
  • [20] M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli (2018)

    Adversarially learned one-class classifier for novelty detection

    CoRR abs/1802.09088. External Links: Link, 1802.09088 Cited by: Table 5.
  • [21] T. Sainburg, M. Thielk, B. Theilman, B. Migliori, and T. Gentner (2018) Generative adversarial interpolative autoencoding: adversarial training on latent space interpolations encourage convex latent distributions. CoRR abs/1807.06650. External Links: Link, 1807.06650 Cited by: §3.2.
  • [22] M. Sakurada and T. Yairi (2014-12) Anomaly detection using autoencoders with nonlinear dimensionality reduction. pp. 4–11. External Links: Document Cited by: Table 5.
  • [23] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. CoRR abs/1703.05921. External Links: Link, 1703.05921 Cited by: §1, §2, Table 1, Table 2, Table 3.
  • [24] B. Schölkopf, R. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt (1999) Support vector method for novelty detection. In Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS’99, Cambridge, MA, USA, pp. 582–588. External Links: Link Cited by: §2.
  • [25] R. Smith, A. Bivens, M. Embrechts, C. Palagiri, and B. Szymanski (2002-01) Clustering approaches for anomaly based intrusion detection.

    Proceedings of Intelligent Engineering Systems Through Artificial Neural Networks

    , pp. 579–584.
    Cited by: §2.
  • [26] M. Soltanolkotabi and E. J. Candès (2011) A geometric analysis of subspace clustering with outliers. CoRR abs/1112.4258. External Links: Link, 1112.4258 Cited by: Table 4.
  • [27] M. C. Tsakiris and R. Vidal (2015) Dual principal component pursuit. CoRR abs/1510.04390. External Links: Link, 1510.04390 Cited by: Table 4.
  • [28] H. Xu, C. Caramanis, and S. Sanghavi (2010) Robust PCA via outlier pursuit. CoRR abs/1010.4237. External Links: Link, 1010.4237 Cited by: Table 4.
  • [29] C. You, D. P. Robinson, and R. Vidal (2017) Provable self-representation based outlier detection in a union of subspaces. CoRR abs/1704.03925. External Links: Link, 1704.03925 Cited by: Table 4.
  • [30] H. Zenati, M. Romain, C. S. Foo, B. Lecouat, and V. R. Chandrasekhar (2018) Adversarially learned anomaly detection. CoRR abs/1812.02288. External Links: Link, 1812.02288 Cited by: §2, Table 1, §5.2, Table 2, Table 3, §6, §6.
  • [31] S. Zhai, Y. Cheng, W. Lu, and Z. Zhang (2016)

    Deep structured energy based models for anomaly detection

    CoRR abs/1605.07717. External Links: Link, 1605.07717 Cited by: §2, Table 2, §6, §6.
  • [32] C. Zhou and R. C. Paffenroth (2017) Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, New York, NY, USA, pp. 665–674. External Links: ISBN 978-1-4503-4887-4, Link, Document Cited by: §2, Table 2.
  • [33] A. Zimek, E. Schubert, and H. Kriegel A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min.. Cited by: §1, §2.
  • [34] B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen (2018)

    Deep autoencoding gaussian mixture model for unsupervised anomaly detection

    In ICLR, Cited by: §2, §6, §6.