MDGAN: Boosting Anomaly Detection Using Multi-Discriminator Generative Adversarial Networks

10/11/2018 ∙ by Yotam Intrator, et al. ∙ 0

Anomaly detection is often considered a challenging field of machine learning due to the difficulty of obtaining anomalous samples for training and the need to obtain a sufficient amount of training data. In recent years, autoencoders have been shown to be effective anomaly detectors that train only on "normal" data. Generative adversarial networks (GANs) have been used to generate additional training samples for classifiers, thus making them more accurate and robust. However, in anomaly detection GANs are only used to reconstruct existing samples rather than to generate additional ones. This stems both from the small amount and lack of diversity of anomalous data in most domains. In this study we propose MDGAN, a novel GAN architecture for improving anomaly detection through the generation of additional samples. Our approach uses two discriminators: a dense network for determining whether the generated samples are of sufficient quality (i.e., valid) and an autoencoder that serves as an anomaly detector. MDGAN enables us to reconcile two conflicting goals: 1) generate high-quality samples that can fool the first discriminator, and 2) generate samples that can eventually be effectively reconstructed by the second discriminator, thus improving its performance. Empirical evaluation on a diverse set of datasets demonstrates the merits of our approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Abstract

Anomaly detection is often considered a challenging field of machine learning due to the difficulty of obtaining anomalous samples for training and the need to obtain a sufficient amount of training data. In recent years, autoencoders have been shown to be effective anomaly detectors that train only on ”normal” data. Generative adversarial networks (GANs) have been used to generate additional training samples for classifiers, thus making them more accurate and robust. However, in anomaly detection GANs are only used to reconstruct existing samples rather than to generate additional ones. This stems both from the small amount and lack of diversity of anomalous data in most domains. In this study we propose MDGAN, a novel GAN architecture for improving anomaly detection through the generation of additional samples. Our approach uses two discriminators: a dense network for determining whether the generated samples are of sufficient quality (i.e., valid) and an autoencoder that serves as an anomaly detector. MDGAN enables us to reconcile two conflicting goals: 1) generate high-quality samples that can fool the first discriminator, and 2) generate samples that can eventually be effectively reconstructed by the second discriminator, thus improving its performance. Empirical evaluation on a diverse set of datasets demonstrates the merits of our approach.

Introduction

In machine learning, anomaly detection aims to identify abnormal patterns, particularly those that arise from new classes of behaviors [Chandola, Banerjee, and Kumar2009]. Although it has been extensively researched, anomaly detection is often considered a challenging field of machine learning due to the difficulty of obtaining anomalous samples for training, a problem that often results in a high false positive rate.

In recent years, deep neural nets (DNNs) have been used for training anomaly detection models [Kwon et al.2017]

. Despite their ability to learn complex patterns and potentially generate accurate anomaly detectors, deep learning models often require large amounts of training data

[Krizhevsky, Sutskever, and Hinton2012]. Moreover, samples of anomalous data must be provided to the algorithm in order to enable it to detect similar samples. Obtaining such samples can be difficult, and defining all possible types of anomalies is often close to impossible.

Generative Adversarial Networks (GANs) [Goodfellow et al.2014] have been proposed in recent years as a solution to the two above-mentioned problems. GANs have been used both to generate additional labeled samples [Odena, Olah, and Shlens2016] and to make classifiers more robust to adversarial attacks [Lee, Han, and Lee2017]. However, to the best of our knowledge, no GAN-based solution has been proposed for generating additional samples in the domain of one-class anomaly detection. The likely cause for the lack of research in this area is the difficulty for the GAN to generate “anomalous” samples when only ”normal” ones are available for training (i.e., it is impossible to generate samples from all participating classes).

Another obstacle to using GANs in anomaly detection is the requirement in some domains that the generated samples be valid. This requirement exists in fields such as network-based intrusion detection, where the generated samples need to be realistic in order not to ”throw off” the detection algorithm. In cases where valid samples are not a prerequisite (e.g., image classification), existing solutions can be found in the literature [Frid-Adar et al.2018, Lemley, Bazrafkan, and Corcoran2017, Shrivastava et al.2017].

In this study we aim to improve the performance of anomaly detection algorithms, specifically autoencoders [Sakurada and Yairi2014], by using GANs for generating new artificial examples of ”normal” cases. In order to achieve this goal, we present Multi-Discriminator GAN (MDGAN), a novel GAN architecture that uses two discriminators, each with a different role and cost-function. The first discriminator attempts to discern the generated samples from the original ones, thus ensuring that that the generated samples appear as if they are sampled from the same distribution as the real data (i.e., valid). The second discriminator is an autoencoder that serves as an anomaly detector by measuring reconstruction error.

MDGAN’s use of two discriminators enables the generator component to achieve two seemingly conflicting goals: 1) generate high-quality samples that can fool the first discriminator, and 2) generate samples that can be reconstructed effectively by the second discriminator (the autoencoder). The proposed setting prevents MDGAN from generating simplistic samples that would be easily reconstructed, thus forcing the autoencoder to continuously improve its performance.

To evaluate the merit of our proposed approach we conducted an empirical analysis on ten datasets of varying domains and characteristics (e.g., number of samples, number of features, etc.). The results of our analysis show that MDGAN outperforms a widely-used benchmark in the large majority of tested datasets.

The contributions of this study are twofold: (1) we propose a novel GAN architecture that enable the generation of more finely-tuned training samples for one-class anomaly detection, and; (2) we present an in-depth analysis of the performance of our proposed approach and its components.

Related Work

Anomaly Detection

Anomaly detection algorithms focus on finding patterns that do not conform to expected behavior. Anomaly detection has been applied in various areas, including the fields of fraud detection [Van Vlasselaer et al.2015], cyber security [Kuypers, Maillart, and Pate-Cornell2016], medicine [James and Dasarathy2014], and even real-time crime detection [Ravanbakhsh et al.2017].

In this study we focus on spectral anomaly detection methods [Egilmez and Ortega2014]

. Approaches of this type focus on generating a lower-dimensionality (i.e. compressed) representation of the data, and then using it to reconstruct the original data. The underlying logic is that high reconstruction error is indicative of an anomaly, since the characteristics of the original data are not ”as expected.” This approach includes algorithms such as principal component analysis (PCA)

[Shyu et al.2003] and autoencoders [Sakurada and Yairi2014]

. In this study we focus on neural networks algorithms for anomaly detection

[Kwon et al.2017] and specifically on autoencoders.

Autoencoders

Autoencoders are deep neural networks used for the efficient encoding and reconstruction of input in an unsupervised manner. Traditionally, autoencoders were used for dimensionality reduction and feature learning [Hinton and Salakhutdinov2006], but currently they are also used for de-noising [Lu et al.2013], building generative models [Bengio et al.2013], and adversarial training [Mescheder, Nowozin, and Geiger2017].

Autoencoders consist of two components: the encoder and the decoder (see example in Figure 1). The two components are trained together: the encoder compresses the data, and the decoder attempts to reconstruct it. The network uses the reconstruction error to adjust the weights of the network and obtain compact representations that capture the ”essence” of the analyzed data.

Figure 1: An example of an autoencoder with one hidden layer

The ability of autoencoders to reconstruct and de-noise data makes them useful anomaly detectors [Meidan et al.2018, Mirsky et al.2018, Fan et al.2018], particularly in cases of one-class anomaly detection (i.e. when only ”normal” samples are available) [Erfani et al.2016, Wei et al.2018]

. The autoencoder receives a sample (which can also be made ”noisy” using dropout or a similar technique), compresses it using the encoder and then attempts to reconstruct the original sample using the decoder. The discrepancy between the original and reconstructed samples is captured by the loss function and is used to train the neural net. Once the network is trained, samples with high discrepancy (i.e., highly different than expected) are flagged as anomalies. One of the common means of measuring the discrepancy between samples is the root mean squared error (RMSE), which is calculated as

(1)

where and

are the vectors of the original and reconstructed samples, respectively.

Generative Adversarial Nets

Generative adversarial nets (GANs) [Goodfellow et al.2014]

are deep neural networks architectures consisting of two sub-networks: a generator and a discriminator. These sub-networks compete in a Nash equilibrium (zero sum game), where the goal of the discriminator is to discern samples produced by the generator from those sampled from the actual data and the goal of the generator is to fool the discriminator. Since their introduction in 2014, GANs have been used in multiple domains, including images, music, text generation, and anomaly detection.

The application of GANs in anomaly detection has been proposed for video [Ravanbakhsh et al.2017] and medical images [Schlegl et al.2017]. In the former study, the GAN was trained on the RGB channels of ”normal” videos in an attempt to reconstruct corrupted videos. The size of the reconstruction error was used to identify anomalous section in the video. In the latter study, the architecture was trained on benign retina scans. It is important to note that these studies do not require the samples generated by the GAN to be valid, as is the case in some domains (see the Introduction section).

Proposed Method

The proposed architecture

Combining GANs and autoencoders requires us to reconcile two seemingly opposing goals: (1) generating samples that can be reconstructed by the autoencoder with high accuracy, and (2) generating samples that are similar to ”real” data. The reason these two goals are contradictory is that reducing reconstruction error is easier to achieve with simplistic samples, while generating samples that are similar to the real data requires them to be more complex.

Figure 2: MDGAN architecture

In order to address this challenge, we propose the Multi-Discriminator Generative Adversarial Network (MDGAN). The proposed architecture, presented in Figure 2, consists of a single generator and two discriminators: and . While each discriminator receives, in turn, two batches of samples – ”real” and generated – their loss functions (i.e., goals) are different:

  • is a feedforward network whose aim is to correctly separate the ”real” samples from the generated ones. The goal of with respect to is to make the two groups indistinguishable. We do this by defining the following two-player minimax game, represented by the following formula [Goodfellow et al.2014]:

    (2)

    where (

    ) denotes the probability assigned by

    to sample of being ”real” (i.e. not generated by ).

  • is an autoencoder component. Its goal is to correctly reconstruct all the samples it receives, regardless of them being real or generated. Unlike other GAN architectures, the goal of in this context is to assist (i.e., reduce ’s loss function values). The loss function for both and is the same and is given the mean square error (MSE) of the sample reconstruction:

    (3)

We train the architecture in the following manner. In every iteration, generates a batch of samples. We send the generated batch to one of the discriminators, along with an equal size batch of real samples. We calculate the loss for the generator and the relevant discriminator using either formula 2 or 3, and then use back propagation to update the networks. The process is then repeated with the second discriminator. In each iteration, the parameters of the non-participating discriminator are frozen.

Training and initialization strategies

MDGAN utilizes two discriminators with the goal of generating instances that are both valid and have the ability to be successfully reconstructed by an autoencoder. However, we considered the possibility that because of the generator only being adversarial to , the

autoencoder may be ”thrown off” by samples generated in the early epochs. This concern stems from the fact that the generated samples of early training epochs are not likely to resemble the real data. By trying to reconstruct these wholly-unrelated samples, the autoencoder may assimilate false patterns.

To test the hypothesis that the samples generated during the initial training epochs may be detrimental to , we defined a ”warm-up” period for MDGAN. During this period we train and as described above, but only train on the real samples. In our experiments we evaluated the performance of our model on different numbers of warm-up epochs, ranging from zero to six. Once the warm-up period is over, we train both discriminators on real and generated data. The proposed algorithm is presented in Algorithm 1.

1:procedure fit()
2:     for number of training iterations do
3:         
4:         optimize on
5:         sample:
6:         
7:         optimize on
8:         optimize on
9:         optimize on
10:         if iteration number  >  then
11:              optimize on          
12:         optimize on      
13:     return
Algorithm 1 MDGAN training

Evaluation

Figure 3: architecture (left), architecture (right)

Datasets

We evaluated our approach on ten diverse datasets varying in size, number of attributes, and class imbalance. The datasets are available on the OpenML111www.openML.org

and Outlier Detection DataSets (ODDS)

222http://odds.cs.stonybrook.edu/ repositories and their properties are presented in Table 1. Five datasets are well-known benchmarks for the anomaly detection task: NSL-KDD [Revathi and Malathi2013], Pendigit [Keller, Muller, and Bohm2012], Annthyroid [Abe, Zadrozny, and Langford2006], SWaT [Goh et al.2016] and breast cancer [Mangasarian1990].

All datasets represent binary classification problems333Some datasets were originally multi-class, but binary versions exist in online repositories, with the minority class instances defined as the anomalies we aim to detect.

We partitioned each dataset into training, validation, and test sets. The partitioning process varied depending on whether or not pre-defined partitions were in existence:

  • For datasets with pre-defined partitions, we removed all anomalous samples (i.e., the minority class) from the training set. Of the remaining training set samples, 10% was randomly selected as the validation set. The test set was not changed.

  • For datasets without pre-defined partitions, we first assigned all anomalous samples to the test set. We then assigned the remaining samples in the manner described in Table 1. Again, 10% of the training set was randomly assigned to the validation set.

Dataset # of Features Trainset size Testset size Anomalies (%)
NSL-KDD* 39 67,343 22,544 56.92
Pendigit [Keller, Muller, and Bohm2012] 16 6,000 1,870 17.90
Video injection [Mirsky et al.2018] 115 1,000,000 1,369,902 6.96
Annthyroid [Abe, Zadrozny, and Langford2006] 6 6,000 1,200 44.50
Forest cover [Liu, Ting, and Zhou2008] 10 250,000 36,048 7.62
Breast cancer [Mangasarian1990] 10 200 599 48.29
CPU 18 3,000 5,192 47.70
Ailerons 40 2,000 11,750 49.60
SWaT* [Goh et al.2016] 51 496,000 449,919 11.98
Yeast [Dheeru and Karra Taniskidou2017] 8 1,014 470 7.74
Table 1: The characteristics of the evaluated datasets (number of features, size of the training and test sets, and the percentage of anomalies). ”*” indicates datasets with pre-existing partitions

In addition, we normalized all numeric features to the range [-1,1] and removed all categorical features with more than three values from the datasets. We took the latter action, because the categorical features had to be represented using sparse vectors, and this resulted in reduced performance for both MDGAN and the baseline.

Dataset No  Warm  Up One Epoch Warm Up Three Epochs Warm Up Six Epochs Warm Up
NSL-KDD 0.43% 0.9%* 0.51% 0.66%*
Pendigit 4.25%* 2.77% 3.42%* 1.48%
Video injection 0.03% -2%* -0.4% -0.63%*
Annthyroid -3.96%* -4.01%* -5.9%* -5.03%*
Forest cover 6.73%* 4.12% 1.29% 1.77%
Breast cancer 5.53%* 5.52%* 4.92%* 3.78%*
CPU 0.78%* 0.88%* 0.66%* 0.69%*
Ailerons 2.83%* 1.78%* 2.21%* 1.76%*
SWaT 0.57% 2.27% 2.65% 2.49%
Yeast -2.86% -5.57%* -2.77% -3.44%
Table 2: Percentage of improvement in AUC against the baseline (higher is better), averaged by 30 different seeds. ”*” indicates significance with 95% confidence

Training parameters

We used the following settings throughout the evaluation:

  • Stopping criteria – all models were trained for 30 epochs. We then chose the architecture configuration that was in place for the epoch with the highest score on validation set.

  • Learning rate and optimizers

    was optimized using a stochastic gradient descent optimizer with a learning rate of 0.01.

    and were optimized using the Adam optimizer with a learning rate of 0.001.

  • Dropout and batch normalization contains a 10% rate dropout after each hidden layer.

    contains batch normalization after each hidden layer.

  • Warm up values – we evaluated with warm up values of zero, one, three, and six epochs (see ”Training and initialization strategies” in the previous section for more details).

  • Initialization – each experiment was run 30 times, using different initialization parameters.

Experimental setting

The baseline. Since the goal of MDGAN is to improve the performance of an autoencoder through the generation of additional samples, we compared our approach to an autoencoder with an identical configuration to the one used by our component. The same validation set-based stopping criteria was also applied.

Dataset No  Warm  Up One Epoch Warm Up Three Epochs Warm Up Six Epochs Warm Up
NSL-KDD 0 0.3% 0.3% 0.3%*
Pendigit 14.2%* 12.6%* 12.5%* 3.9%
Video injection -0.01%* -0.01%* 0 -0.22%
Annthyroid -2.4% 2.3% 4.1%* 4.2%*
Forest cover 44.8%* 25.2%* 25.6% 12.3%
Breast cancer 3.3%* 3.3%* 3%* 2.3%*
CPU 0.7%* 0.8%* 0.6%* 0.6%*
Ailerons 3.3%* 2.4%* 2.6%* 2.1%*
SWaT -0.04% 7.6% 2.2% 4.7%
Yeast -5.1% 6.6% 4.3% 5.8%
Table 3: Percentage of improvement in AUC PR against the baseline (higher is better), averaged by 30 different seeds. ”*” indicates significance with 95% confidence
Dataset No  Warm  Up One Epoch Warm Up Three Epochs Warm Up Six Epochs Warm Up
NSL-KDD 0 -2.5% -3.9%* -5.3%*
Pendigit -20.6%* -12.8% 15.7%* 8.6%
Video injection 1.1% 7.7%* 4.1%* 3.3%*
Annthyroid 6.2%* 5.7% 8.1%* 7.5%*
Forest cover -15.9%* -10.8%* -5.8% -4.3%
Breast cancer -10.8%* -12.4%* -10.2%* -8.5%*
CPU -5.01%* -5.5%* -4.4%* -4.2%*
Ailerons -6.3%* -4.2%* -4.8%* -3.7%*
SWaT 1.5% -0.1% -2.9% -1.7%
Yeast 4.9% 11.4% 6.5% 9%
Table 4: Percentage of improvement in EER against the baseline (lower is better), averaged by 30 different seeds. ”*” indicates significance with 95% confidence

Evaluation measures. We used three evaluation measures to analyze our results:

  • Area under the receiver operating characteristic curve (AUC)

    – used to measure the performance of our approach across all possible true-positive/false-positive values.

  • Area under the precision-recall curve (AUC PRC) – considered to be more informative than AUC when the percentage of anomalies is low (i.e., imbalanced datasets).

  • Equal error rate (EER) – designed to measure the trade-off between the false-positive and false-negative rates.

Statistical tests.

To validate the significance of our results, we used the paired t-test on the three evaluation measures described above. We marked results as significant in cases where the confidence level of 95% or higher.


The architectures used in our experiments are presented in Figure 3. Their structure is as follows:

is a four layer neural net. We used the leaky ReLU function with alpha 0.2 after each hidden layer and batch normalization. Finally, after the last fully connected layer we apply the tanh activation function.

is also a four layer neural net. We used the leaky ReLU function with alpha 0.2 after each hidden layer, and then we applied dropout. Finally, after the last fully connected layer we applied the sigmoid activation function to classify as real or fake.

is a four layer autoencoder, encoding to 70% of the input dimension, and then to 50%. The decoder is the exact opposite, decoding to 70% and then to a 100%, same as the input dimension. We apply ReLU after each hidden layer, and tanh after the output layer.

Figure 4: Box-and-whisker plot of the differences in AUC between the various warm-up configurations and the baseline, across 30 different seeds

Results

Figure 5: % AUC improvement of the different warm ups to the baseline, averaged by 30 different seeds

Tables 24 and Figures 4 and 5 present the performance of MDGAN according to the three computed performance measures. In each table we present the ratio of the performance measure (averaged over the 30 experiments) of the MDGAN autoencoder and the baseline autoencoder.

From the results it can be observed that seven of the ten datasets, the GAN autoencoder outperformed the baseline autoencoder with the difference being statistically significant except for one dataset ().

Another interesting (and perhaps counter-intuitive) observation is that the warm up period does not always improves the performance of the GAN autoencoder (and even sometimes leads to a degradation in the performance). From our analysis of the data we conclude that the samples generated during the warm up period are too similar to the real samples, thus reducing the generator’s ability to generate ”unexpected” (and valuable) samples that would make the autoencoder more robust. See figure 5 for a comparison of the warm-up strategies.

Finally, our results lead us to conclude that the size of the training set does affect the performance of our approach (relative to the baseline). However, the complexity of the problem (i.e., the number of features per sample) has a clear effect on the results: the two datasets for which MDGAN fails to improve are those with the smallest number of features. This leads us to conclude that our approach is better suited for a high-dimensional space, where the generation of additional samples is a more challenging task.

Figure 6: loss on generated samples during training, best and worst performance runs; Ailerons (top), Breast Cancer (bottom). It can be seen that the top-performing architectures have higher loss values.
Figure 7: loss on during train, best and worst performance runs; Pendigits (top), CPU (bottom). It is clear that the loss converges more slowly for the top-performing architectures.

Discussion

We analyzed the performance of the various components (i.e., the generator and the two discriminators) and were able to draw the following conclusions.

Early ”peaking” is indicative of lower performance. For each of our datasets, we compared the two runs with the best and worst results for the “no warm-up” configuration. In six out of the ten datasets, the run which achieved the lower AUC score terminated considerably earlier (3.5 epochs earlier, on average) when using the validation performance as a stopping criteria. In the four other datasets, both the best and the worst were trained for the full 30 epochs (the maximal number).

We hypothesize that these results reflect the fact that our MDGAN architecture requires a longer training time to perform well due to the fact that it has to satisfy a larger number of constraints compared with a ”standard” GAN architecture. A short training time may be indicative of the GAN producing samples do not contribute to the training process.

The dense discriminator functions as a regulator. One of the base hypothesis behind MDGAN is that the dense discriminator network () will assist in guiding the generator towards generating more ”valid” samples. In order to test this hypothesis, we compared the loss function of on the generated samples only for the best and worst-performing runs in each dataset. Our results show direct correlation between higher loss function values for and improved overall performance. An example of this is presented in Figure 6.

Slower generator-autoencoder convergence is indicative of better results. We once again compare the best and worst performance for each dataset, but this time we focus on the generator loss with respect to each discriminator. Our analysis shows that while in most cases the loss for (the dense DNN) is similar for the best and worst cases, the top-performing scenarios often displayed higher initial generator-loss when training on the autoencoder. Moreover, the loss reduction was noticeably slower. An example of this scenario is presented in Figure 7.

We argue that these results show that are model deploys a variant of adversarial training, in the sense that MDGAN performs better when the autoencoder has greater difficulty reconstructing the generated samples. We find it likely that as a result, the autoencoder becomes better at reconstructing previously-unseen types of samples.

Conclusions and Future Work

In this study we have presented MDGAN, a novel multi-discriminator GAN approach for anomaly detection. Our architecture enables the GAN to reconcile two conflicting aims: 1) generating sophisticated samples that can pass ”real” instances of the dataset, and; 2) create instances that can be accurately reconstructed by an autoencoder. Using our approach, we have been able to improve the performance across a variety of datasets.

For future work, we plan to pursue several directions. First, we will experiment with more advanced training strategies, dynamically allocating different weights to each discriminator. Secondly, we will explore architectures with a larger number of discriminators in an attempt to reconcile a more complex of constraints. Thirdly, we will evaluate our method in the context of generating adversarial samples. Finally, we plan to apply our approach to additional domains.

References

  • [Abe, Zadrozny, and Langford2006] Abe, N.; Zadrozny, B.; and Langford, J. 2006.

    Outlier detection by active learning.

    In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 504–509. ACM.
  • [Bengio et al.2013] Bengio, Y.; Yao, L.; Alain, G.; and Vincent, P. 2013. Generalized denoising auto-encoders as generative models. In Advances in Neural Information Processing Systems, 899–907.
  • [Chandola, Banerjee, and Kumar2009] Chandola, V.; Banerjee, A.; and Kumar, V. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR) 41(3):15.
  • [Dheeru and Karra Taniskidou2017] Dheeru, D., and Karra Taniskidou, E. 2017. UCI machine learning repository.
  • [Egilmez and Ortega2014] Egilmez, H. E., and Ortega, A. 2014. Spectral anomaly detection using graph-based filtering for wireless sensor networks. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, 1085–1089. IEEE.
  • [Erfani et al.2016] Erfani, S. M.; Rajasegarar, S.; Karunasekera, S.; and Leckie, C. 2016. High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning. Pattern Recognition 58:121–134.
  • [Fan et al.2018] Fan, Y.; Wen, G.; Li, D.; Qiu, S.; and Levine, M. D. 2018. Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder. arXiv preprint arXiv:1805.11223.
  • [Frid-Adar et al.2018] Frid-Adar, M.; Klang, E.; Amitai, M.; Goldberger, J.; and Greenspan, H. 2018. Synthetic data augmentation using gan for improved liver lesion classification. In Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on, 289–293. IEEE.
  • [Goh et al.2016] Goh, J.; Adepu, S.; Junejo, K. N.; and Mathur, A. 2016. A dataset to support research in the design of secure water treatment systems. In International Conference on Critical Information Infrastructures Security, 88–99. Springer.
  • [Goodfellow et al.2014] Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative adversarial nets. In Advances in neural information processing systems, 2672–2680.
  • [Hinton and Salakhutdinov2006] Hinton, G. E., and Salakhutdinov, R. R. 2006. Reducing the dimensionality of data with neural networks. science 313(5786):504–507.
  • [James and Dasarathy2014] James, A. P., and Dasarathy, B. V. 2014. Medical image fusion: A survey of the state of the art. Information Fusion 19:4–19.
  • [Keller, Muller, and Bohm2012] Keller, F.; Muller, E.; and Bohm, K. 2012. Hics: high contrast subspaces for density-based outlier ranking. In Data Engineering (ICDE), 2012 IEEE 28th International Conference on, 1037–1048. IEEE.
  • [Krizhevsky, Sutskever, and Hinton2012] Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097–1105.
  • [Kuypers, Maillart, and Pate-Cornell2016] Kuypers, M. A.; Maillart, T.; and Pate-Cornell, E. 2016. An empirical analysis of cyber security incidents at a large organization. Department of Management Science and Engineering, Stanford University, School of Information, UC Berkeley, http://fsi. stanford. edu/sites/default/files/kuypersweis_v7. pdf, accessed July 30.
  • [Kwon et al.2017] Kwon, D.; Kim, H.; Kim, J.; Suh, S. C.; Kim, I.; and Kim, K. J. 2017. A survey of deep learning-based network anomaly detection. Cluster Computing 1–13.
  • [Lee, Han, and Lee2017] Lee, H.; Han, S.; and Lee, J. 2017. Generative adversarial trainer: Defense to adversarial perturbations with gan. arXiv preprint arXiv:1705.03387.
  • [Lemley, Bazrafkan, and Corcoran2017] Lemley, J.; Bazrafkan, S.; and Corcoran, P. 2017. Smart augmentation learning an optimal data augmentation strategy. IEEE Access 5:5858–5869.
  • [Liu, Ting, and Zhou2008] Liu, F. T.; Ting, K. M.; and Zhou, Z.-H. 2008. Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, 413–422. IEEE.
  • [Lu et al.2013] Lu, X.; Tsao, Y.; Matsuda, S.; and Hori, C. 2013.

    Speech enhancement based on deep denoising autoencoder.

    In Interspeech, 436–440.
  • [Mangasarian1990] Mangasarian, O. L. 1990.

    Cancer diagnosis via linear programming.

    SIAM news 23(5):1–18.
  • [Meidan et al.2018] Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Breitenbacher, D.; Shabtai, A.; and Elovici, Y. 2018. N-baiot: Network-based detection of iot botnet attacks using deep autoencoders. arXiv preprint arXiv:1805.03409.
  • [Mescheder, Nowozin, and Geiger2017] Mescheder, L.; Nowozin, S.; and Geiger, A. 2017. Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. arXiv preprint arXiv:1701.04722.
  • [Mirsky et al.2018] Mirsky, Y.; Doitshman, T.; Elovici, Y.; and Shabtai, A. 2018. Kitsune: an ensemble of autoencoders for online network intrusion detection. arXiv preprint arXiv:1802.09089.
  • [Odena, Olah, and Shlens2016] Odena, A.; Olah, C.; and Shlens, J. 2016. Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585.
  • [Ravanbakhsh et al.2017] Ravanbakhsh, M.; Sangineto, E.; Nabi, M.; and Sebe, N. 2017. Training adversarial discriminators for cross-channel abnormal event detection in crowds. arXiv preprint arXiv:1706.07680.
  • [Revathi and Malathi2013] Revathi, S., and Malathi, A. 2013. A detailed analysis on nsl-kdd dataset using various machine learning techniques for intrusion detection. International Journal of Engineering Research and Technology. ESRSA Publications.
  • [Sakurada and Yairi2014] Sakurada, M., and Yairi, T. 2014. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis,  4. ACM.
  • [Schlegl et al.2017] Schlegl, T.; Seeböck, P.; Waldstein, S. M.; Schmidt-Erfurth, U.; and Langs, G. 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International Conference on Information Processing in Medical Imaging, 146–157. Springer.
  • [Shrivastava et al.2017] Shrivastava, A.; Pfister, T.; Tuzel, O.; Susskind, J.; Wang, W.; and Webb, R. 2017. Learning from simulated and unsupervised images through adversarial training. In CVPR, volume 2,  5.
  • [Shyu et al.2003] Shyu, M.-L.; Chen, S.-C.; Sarinnapakorn, K.; and Chang, L. 2003. A novel anomaly detection scheme based on principal component classifier. Technical report, MIAMI UNIV CORAL GABLES FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING.
  • [Van Vlasselaer et al.2015] Van Vlasselaer, V.; Bravo, C.; Caelen, O.; Eliassi-Rad, T.; Akoglu, L.; Snoeck, M.; and Baesens, B. 2015. Apate: A novel approach for automated credit card transaction fraud detection using network-based extensions. Decision Support Systems 75:38–48.
  • [Wei et al.2018] Wei, Q.; Ren, Y.; Hou, R.; Shi, B.; Lo, J. Y.; and Carin, L. 2018. Anomaly detection for medical images based on a one-class classification. In Medical Imaging 2018: Computer-Aided Diagnosis, volume 10575, 105751M. International Society for Optics and Photonics.