1 Introduction
The ongoing technological advancements in medical imaging result in an everincreasing image quality and quantity in clinical, scientific and industrial settings leading to an increasing amount of conditions that become detectable [29]. Currently, the inspection of most medical image data is performed manually by trained physicians, which is time and resource consuming and does not scale very well. Furthermore, while medical experts have a high sensitivity to the specific condition in question, they are vulnerable to inattentional blindness, leading to high missrates of unexpected anomalies and conditions [12]. Missing an (incidental) finding can have grave consequences for the patient and prevent the early detection of relevant medical conditions [8]
. Machinelearning based support systems might be able to alleviate this problem, but usually require a large annotated dataset for every condition and modality. This is a major drawback that currently hampers the application of machine learning in clinical practice. Also, this approach still fails on conditions not explicitly represented in the training database. Anomaly detection aims at identifying unexpected, abnormal data points given a set of normal data samples only, thus highlighting interesting regions for further manual inspection. Importantly, it does not require supervision in form of manual annotations, is independent of human judgment errors and, instead, automatically internalizes the appearance of normal tissue to recognize anomalies.
1.0.1 Contribution
In this paper, we present a novel anomaly detection method that can be used to identify and localize abnormal regions in medical images. Our contributions are (i) we show how to combine a Context Encoder [19] with a Variational Autoencoder [14, 21] to improve anomaly scores, (ii) to the best of our knowledge we are the first to include the deviations (KLdivergence) of the posterior from the prior of the latent variable distributions in a Variational Autoencoder for pixelwise anomaly localization, (iii) we fuse the deviations from the prior in a Variational Autoencoder with the reconstruction error to improve the localization, (iv) with this approach we are able to outperform the stateoftheart unsupervised approaches on two public segmentation challenges [11].
2 Related Work
2.1 Autoencoders
An Autoencoder (AE) is trained to reconstruct its input from a learned representation [6]. It consists of two parts, an encoder , which encodes the input to a learned feature representation , and a decoder which attempts to recapture the original input by decoding the representation. For a deep convolutional AE the encoder and decoder each are modeled as deepconvolutional networks with parameters and , respectively. Hence, the training of a deep AE can be formalized as:
(1) 
A common choice for the reconstruction error is the meansquared error (MSE):
To reconstruct the image truthfully, the encoder has to encode the information of the input into the feature vector
. To learn more suitable representations different variations of AE have been proposed [22, 30]:2.1.1 Denoising Autoencoder
A Denosing Autoencoder (DAE) is trained to reconstruct the unperturbed data sample from an input sample that has been subjected to noise. This results in more robust and perturbation invariant representations [30]. The most commonly used noise is additive Gaussian noise, i.e.: for a small value . Thus in Eq. (1) becomes . Context Encoders (CEs) are a special class of DAEs where instead of the commonly used additive Gaussian noise local patches of the input are masked out. This can be interpreted as a variation of saltandpepper noise and was shown to results in better generalizing representations, which in addition to appearance also captures semantic information of the input [19].
2.1.2 Variational Autoencoders
A Variational Autoencoder (VAE) [14, 21] assumes a latent variable model where a latent variable causes the observation
, facilitating a lower bound of the probability of a data sample with
(2) 
which is often termed the Evidence Lower Bound (ELBO). Here is the prior distribution of the latent variable, is the approximate inference model and
is the generative model. By maximizing the ELBO, the probability distribution approximates the true data distribution and enables a probability estimate for a data sample. VAEs parameterize
andby neural networks and for
andassume diagonal Gaussian distributions:
(3) 
where , , and are neural networks with parameters , and respectively and is often chosen as constant. In analogy to AEs is called the encoder and is called the decoder. The often used formulation for VAE training is:
(4) 
with being sampled from using the reparametrization trick [14, 21] and MSE is chosen for .
2.2 Anomaly detection
2.2.1 Classificationbased methods
are one class of unsupervised anomaly detection methods. A prominent example of the classificationbased methods is the OneClass Support Vector Machine (OCSVM)
[24]. The OCSVM finds a decision boundary between the data features and the origin in features space to differentiate normal data from abnormal data.2.2.2 Reconstructionbased methods
aim at truthfully reconstructing normal data samples while producing high reconstruction errors for abnormal data. As compared to Principal Component Analysis (PCA)based reconstruction methods
[25], AEbased reconstruction methods can better handle nonlinear relations in the data. Reconstructionbased approaches are used in medical imaging almost exclusively, since they allow a pixelwise anomaly detection and can delineate the pathological conditions. Schlegl et al. [23] use a generative adversarial network (GAN)based method to estimate an anomaly score. Based on the assumption that a fully trained GAN can only produce samples from the learned data distribution, they use an iterative backpropagation algorithm to find the closest match to the sample of interest that the trained GAN can produce. The anomaly score is then derived from the similarity of the real and generated sample. Different AEs have been employed for anomaly detection in brain images. Baur et al. [7] employed VAEs and used the reconstruction error for localization of MS lesions on an inhouse MRI dataset. In their experiments, the VAE with an adversarial reconstruction loss slightly outperformed a vanilla VAE. Chen et al. [10, 11] show that a combination of a VAE with an adversarial loss on the latent variables can boost performance in detecting brain tumors in the BraTS 2015 MRI dataset [5] using a pixelwise reconstruction error. Pawlowski et al. [20] train different AEs on an inhouse brain CT dataset with intracranial hemorrhages and traumatic brain injuries. Similar to the studies above, they consider the pixelwise reconstruction error of different AE models for pixelwise anomaly detection. In their evaluation, an AE with dropout sampling in the bottleneck layer slightly outperforms the other models. Despite their frequent use most reconstructionbased methods have no formal assertions regarding the reconstructionerror, complicating the interpretation and the comparability of anomaly scores. A more theoretically grounded improvement is given by Alain et al [2], showing that the denoising task in DAEs can lead to reconstruction errors that approximate the local derivative of the logdensity with respect to the input. Consequently, the global reconstruction error for a whole sample reflects the norm of the derivative of the logdensity with respect to the input. While this “direction to normality” can yield important clues, it is still not the probability of the data sample itself, still posing challenges for a samplewise comparable and wellcalibrated anomaly score. Densitybased models offer a solution for this problem.2.2.3 Densitybased methods
give a probability estimate for each data sample, allowing for a straightforward normalityscoring and ordering. This class can be further split into parametric and nonparametric algorithms and other methods. The nonparametric approaches, such as neighborhoodbased methods and clusteringbased methods estimate the data density locally and assign an anomaly score based on the probability of a new data sample [13]
. Parametric approaches assume a data distribution and fit the distribution parameters to the data. Due to the “curse of dimensionality”
[4]these methods, similar to the OCSVM and PCA, often struggle in high dimensional data settings
[13]. VAEs [14, 21] are able to alleviate this problem [4], allowing to estimate abnormality scores on the basis of the evidence lower bound for a data sample [15]. Current anomaly detection methods in the literature however, employ VAEs for reconstruction [3, 7, 10, 20] and still use only the reconstructionerror model for anomaly scoreing, thus ignoring an essential part of the model. Moreover, to our knowledge, densitybased approaches have not been explicitly applied to medical imaging, presumably since they do not directly give an anomaly score on a pixel level.2.2.4 Problem Statement
Most medical imaging anomaly detection methods are based on the reconstruction error, mostly employing AEvariants for reconstruction. However, for AEbased models, the reconstruction error lacks in two essential parts. First, only considering the reconstruction error ignores all modelinternal variations such as deviations of the latent representations from their normal ranges, which can indicate an anomaly, especially in case of a perfect reconstruction. Second, the reconstruction error on its own has in most cases no formal assertion and no theorybacked validity, rendering it unsuited as a wellcalibrated and comparable anomaly score.
3 Methods
To alleviate the mentioned shortcomings, we present a novel anomaly detection method: Contextencoding Variational Autoencoder (ceVAE). By combining CE and VAE, we strive to use the modelinternal latent representation deviations and a more expressive reconstruction error for anomaly detection on a sample as well as pixel level. We define the ceVAE with fully convolutional encoders , and a decoder , where the CE only uses the mean encoder to encode a data sample (with and sharing most of their weights [14], see Fig 1).
3.0.1 CE branch
We subject a sample to context encoding noise by masking certain regions in the input (randomly sized and positioned). The CE branch is trained by reconstructing the perturbed input using as encoder and as decoder: . As described in Sec. 2.2.2, the denoising task is expected to gear the reconstruction error towards the approximation of the derivative of the logdensity with respect to the input . This, on its own, could be helpful for detecting anomalous parts in a data sample since it can yield better calibrated and interpretable reconstruction errors [2]. At the same time, CEs have been shown to result in more discriminative, semantically richer representations [19]. This is expected to have a positive influence on the expressiveness of modelinternal variations. Such deviations of the latent representation from its mean can be analyzed in the VAE branch:
3.0.2 VAE branch
We use VAEs to inspect deviations of the latent representation from its mean. Here, we use the encoders , , a decoder , and a standard diagonal Gaussian prior , resulting in the VAE objective
(5) 
where using the reparametrization trick and
is the KullbackLeibler divergence loss (KLloss) with a standard Gaussian as in Eq. (
4). This density estimation of VAEs is designed to yield a comparable persamplelikelihood estimate and thus a comparable anomaly score. To analyze the deviations of the posterior from the prior of the latent variable distributions we use the KLloss . Below, we show how to trace back these deviations to the pixel level to complement the reconstructionbased delineation of anomalous parts in a data sample.3.0.3 ceVAE
By combining CEs and VAEs, we aim at capturing both effects, namely a bettercalibrated reconstruction error and modelinternal variations, to yield more complete estimates of anomaly, for each data sample as well as for the different parts of the sample. The combined objective function is consequently given as:
(6) 
where is the KLloss, is sampled using the reparametrization trick and is perturbed by masking out regions as in CEs. During training, the CE objective does not put constraints for normality on the prior belief . This is essential to prevent the model from deeming such perturbed cases as ‘normal’. Furthermore, the combination of a CE and VAE can have a regularizing effect, prevent posterior collapse of the VAE and, due to the CE, lead to representations which capture the semantics of the data better [19].
3.0.4 Anomaly detection
We can use the ceVAE to detect anomalies on sample and pixel level. After maximizing the ELBO similar to a VAE, we can estimate the probability of a data sample by evaluating the ELBO for a data sample which can give a wellcalibrated anomaly score. Thus the samplewise anomaly score is given as:
(7) 
Simultaneously, to localize abnormal parts in the data sample we combine the reconstructionbased and densitybased pixelwise anomaly scores. The reconstructionbased score is given by the reconstruction error which, due to the denoising task, is geared towards the derivative of the logdensity with respect to the input. The densitybased score is given by a pixelwise backtracing of the latent variable deviations from the prior, which is calculated by back propagating the approximated ELBO onto the input. This combination results in a more complete estimate of , thus outlining the “direction towards normality” for each pixel. Using an elementwise function to combine the scores e.g. pixelwise multiplication, the pixelwise anomaly score is defined as:
(8) 
where the reconstruction error is the absolute pixelwise difference, and the pixelwise derivative is calculated by backpropagating the ELBO back onto the data sample.
4 Experiments
4.0.1 Data
We used T2weighted images from three different brain MRI datasets. The model was trained on the HCP dataset [28] to learn the distribution of healthy patients. After training, the model was tested to detect anomalies in the BraTS2017 [5] and the ISLES2015 [17]
dataset. The HCP dataset, the only dataset used for training, was split into 1092 patients for training and 20 for validation, i.e. 136576 and 2496 slices respectively. The BraTS2017 dataset was split into 20 patients for validation and 266 for testing and the ISLES2015 dataset was split into 8 patients for validation and 20 for testing. Each dataset was preprocessed similarly, with a patientwise zscore normalization and slicewise resampling to a resolution of
. During training, we used random mirroring, rotations, and multiplicative brightness augmentations and the validation data was used to prevent overfitting and choose the best performing model for testing.4.0.2 Model
For the encoder and decoder networks, we chose fully convolutional networks with five 2DConvLayers and 2DTransposedConvLayers respectively with CoordConv [16]
, kernel size 4 and stride 2, each layer followed by a LeakyReLU nonlinearity. The encoder and decoder are symmetric with 16, 64, 256, 1024 feature maps and a latent variable size of 1024. Similar to Kingma et al.
[14]the encoders have shared weights, with the last layer having two heads, one predicting the mean, and one predicting the log standarddeviation. Since it showed similar performance and produced visually slightly sharper images, we chose the L1Loss instead of the MSE/L2Loss as reconstruction loss
. Due to different valueranges of the reconstruction error and the backpropagated values, the combination function was chosen as elementwise multiplication. To calculate the gradients we used the smoothed guidedbackproagation algorithm [26, 27] and smoothed the gradient with a Gaussian kernel before multiplication because of checkerboard artifacts caused by the ConvLayers [18]. Since backpropagating the showed no additional benefit and only slowed down gradient calculation, we only backpropagate the KLLoss to the image. For CE noise we chose 13 randomly sized and positioned squares, but in contrast to Pathak et al. [19] we chose a random value from the data distribution. This makes the challenge of correcting the noise slightly harder and is conceptually more akin to DAEs with Gaussian noise. We used Adam with a learning rate ofand trained the model with a batch size of 64 for 60 epochs.
4.0.3 Benchmark Methods
We compare the proposed model with an OCSVM and different AEbased methods, which have shown stateoftheart performance on similar tasks [7, 10, 11, 20]. The OCSVM was based on the libsvm implementation [9]. For the AEbased methods, we used a standard AE, a DAE, a CE, and a VAE, all using the same model structure and training scheme as the ceVAE. To further inspect the benefits of combining the CE and VAE, we introduced a ceVAE weighting factor, termed ceVAEFactor, which indicates the ratio of the CE Loss ( in Eq. (6)) to the VAELoss ( and in Eq. (6)). A ratio of implies that the model was trained as a VAE only, a ratio of implies that the model was trained as a CE only, and the other ratios are differently weighted ceVAE models.
4.0.4 EvaluationMetrics
We separately evaluated the slice/samplewise performance and the pixelwise performance. For the slicewise evaluation we divided each patient into normal and abnormal slices, depending on the presence of annotations in the slice. Using the estimated sample probability , we evaluated the algorithm on the task to discriminate between normal and abnormal slices and report the ROCAUC. For the pixelwise evaluation, using the pixelwise anomaly score given by , we determined the pixelwise ROCAUC and the mean patientwise Dice score. The Dice score is calculated with a 5fold crossvalidation, where we use of the patientsamples to determine an anomaly threshold and apply it on other data samples to determine a segmentation and calculate the mean of a patientwise Dice score. As anomalylabels the groundtruth annotations were used, considering all annotations as anomalies. For each model, we performed five runs and report the median as well as the max and min performance.
5 Results
Given the proposed framework we first evaluated the effect of combining a CE with a VAE for slicewise anomaly detection. This is followed by an evaluation of the benefits of combining the reconstruction error with the gradient of the KLLoss for a pixelwise detection.
5.0.1 Slicewise detection
Firstly we compared the performance of different approaches on the slicewise anomaly detection task. Fig. 2 shows the performance of different methods on the BraTS 2017 dataset. As often reported, the OCSVM had difficulties with the structured and highdimensional data [13]. An AE outperformed the OCSVM on this task, and could further be improved upon by using an auxiliary denoising task, where context encoding appeared to be more fitting in this case. Using a standard VAE could further improve the performance, while the ceVAE outperformed all other methods by a margin.
5.0.2 Pixelwise detection
For the pixelwise performance we focused on CE, VAE, and ceVAE, since these were the best performing models in the slicewise task and since VAEs have become a defacto standard in anomalydetection for images [7, 11, 15]. We report the pixelwise ROCAUC and Dice scores on the BraTS2017 and ISLES2015 datasets in Fig. 3.
Results for the noncombined methods were as expected: the CE performed best when using solely the reconstruction error (first argument of , Eq. (8)), while the VAE performed best when using solely the gradient of the KLloss (second argument of , Eq. (8)), outperforming the CE. The ceVAE (combination of CE and VAE) outperformed the noncombined methods in all cases, while a combination of the reconstruction error and the KLloss gradient yielded the best results throughout the experiments. Focusing on the reconstruction error only, it is interesting to note that a combination of a VAE with a CE already shows benefits, possibly due to the regularizing effects described in Sec 3. It is also important to notice the difference in absolute performance on the different datasets. One probable explanation is the difference in dataset quality and thus the data distribution to start with. For each dataset, we show some qualitative results in Fig. 4.





6 Discussion & Conclusion
In this work we present ceVAE for unsupervised anomaly detection, combining CEs with VAEs for unsupervised training and detection as well as localization of anomalies in medical images. We demonstrate the performance gain over the individual approaches and outperform all presented baselines as well as the results in the literature [10, 11]. We further show how the approach can be used for a pixelwise localization of the anomalies, achieving stateoftheart ROCAUCs for unsupervised segmentation on public benchmark data.
Evaluating the performance of an anomaly detection algorithm is a challenging problem. Since there is no reference anomaly detection dataset in the field, surrogate datasets are used. Not all anomalies in the dataset might be labeled, thus the performance on those datasets might lower bound the actual performance. The domain shifts between different datasets can also obstruct the evaluation. In the HCP training dataset, the patients are healthy with an age of 2535 years, all recorded on the same scanner type with a high spatial resolution. In contrast, in the BraTS2017 and ISLES2015 test datasets, most patients are older and different scanners across multiple institutions with varying image quality were used. This results in two additional distribution shifts, age and image quality, which can cause additional missdetections. This is especially evident in the ISLES2015 dataset, where the image quality is quite low, potentially explaining the low absolute scores in the results.
Despite these challenges, the proposed approach yields relatively strong results for unsupervised segmentation and outperforms other stateoftheart methods on the given datasets [10, 11]
. We evaluated different parameter settings and design choices. Adding more layers, residual connections, different normalizationlayers, and/or using pixel reshuffling as downsampling operation did not yield any significant benefits, and thus for the sake of Occam’sRazor and training speed, we chose to keep our simple (“first educated guess”) model. Using 2.5D input, i.e. using some previous and consecutive slides did not show any significant benefits either, thus we did not include it in the final models, but extending the work to 3D might be an interesting next step. Early results on a resolution of
and pixels showed a similar or slightly better performance (a full analysis is currently omitted due to time constraints).Despite the results discussed by Adebayo et al. [1] we could not find any model or output independence of the guided backpropagation algorithm, and it slightly outperformed vanilla backpropagation. We also tried replacing/augmenting the KLLoss with an MMDLoss or an AdversarialLoss, which were reported to slightly boost the performance [10]
, but while showing a minor boost in reconstruction error, due to higher variance gradients the overall performance deteriorated. Using different reconstruction losses, such as MSE, an Adversarial or FeatureLoss, despite making the reconstructions less blurry, did not show any significant performance benefits and were omitted due to their increased training time and unstable training regime. It might be an interesting future direction to see how different (perceptual) reconstruction losses can further boost the performance or interpretability. Another future direction of research might be to integrate sampling into the anomaly score estimation. Using a bigger sampling size for the MCsampling of the VAE might give insights into areas where the learned data distribution is not well represented and thus indicates anomalies. Similarly dropout sampling might be an alternative and could further aid the performance as well.
We have presented a combination of a densitybased and reconstructionbased anomaly detection approaches, which does not need labeled data and also allows for a samplewise anomaly scoring and localization of the anomalies. The results are promising and have the potential to improve and speed up the future inspection and evaluation of medical images, thus supporting physicians in coping with the increasing amounts of medical imaging data being produced.
References
 [1] Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity Checks for Saliency Maps (2018)
 [2] Alain, G., Bengio, Y.: What Regularized Autoencoders Learn from the Datagenerating Distribution. JMLR (2014)
 [3] An, J., Cho, S.: Variational Autoencoder based Anomaly Detection using Reconstruction Probability (2015)
 [4] Bach, F.R.: Breaking the Curse of Dimensionality with Convex Neural Networks. JMLR (2017)
 [5] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S., Freymann, J.B., Farahani, K., Davatzikos, C.: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci Data (2017)
 [6] Ballard, D.H.: Modular Learning in Neural Networks. In: AAAI (1987)
 [7] Baur, C., Wiestler, B., Albarqouni, S., Navab, N.: Deep Autoencoding Models for Unsupervised Anomaly Segmentation in Brain MR Images. CoRR (2018)
 [8] Bluemke, D.A., Liu, S.: Chapter 41  Imaging in Clinical Trials. In: Principles and Practice of Clinical Research (Third Edition). Academic Press (2012)
 [9] Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM TIST (2011)
 [10] Chen, X., Konukoglu, E.: Unsupervised Detection of Lesions in Brain MRI using constrained adversarial autoencoders. CoRR (2018)
 [11] Chen, X., Pawlowski, N., Rajchl, M., Glocker, B., Konukoglu, E.: Deep Generative Models in the RealWorld: An Open Challenge from Medical Imaging. CoRR (2018)
 [12] Drew, T., Vo, M.L.H., Wolfe, J.M.: “The invisible gorilla strikes again: Sustained inattentional blindness in expert observers”. Psychol Sci (2013)
 [13] Goldstein, M., Uchida, S.: A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data. PLoS ONE (2016)
 [14] Kingma, D.P., Welling, M.: AutoEncoding Variational Bayes. CoRR (2013)
 [15] Kiran, B., Thomas, D., Parakkal, R., Kiran, B.R., Thomas, D.M., Parakkal, R.: An Overview of Deep Learning Based Methods for Unsupervised and SemiSupervised Anomaly Detection in Videos. Journal of Imaging (2018)

[16]
Liu, R., Lehman, J., Molino, P., Such, F.P., Frank, E., Sergeev, A., Yosinski, J.: An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution. CoRR (2018)
 [17] Maier, O., Menze, B.H., von der Gablentz, J., Hani, L., Heinrich, M.P., Liebrand, M., Winzeck, S., Basit, A., Bentley, P., Chen, L., Christiaens, D., Dutil, F., Egger, K., Feng, C., Glocker, B., Götz, M., Haeck, T., Halme, H.L., Havaei, M., Iftekharuddin, K.M., Jodoin, P.M., Kamnitsas, K., Kellner, E., Korvenoja, A., Larochelle, H., Ledig, C., Lee, J.H., Maes, F., Mahmood, Q., MaierHein, K.H., McKinley, R., Muschelli, J., Pal, C., Pei, L., Rangarajan, J.R., Reza, S.M.S., Robben, D., Rueckert, D., Salli, E., Suetens, P., Wang, C.W., Wilms, M., Kirschke, J.S., Kr Amer, U.M., Münte, T.F., Schramm, P., Wiest, R., Handels, H., Reyes, M.: ISLES 2015  A public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI. Med Image Anal (2017)
 [18] Odena, A., Dumoulin, V., Olah, C.: Deconvolution and Checkerboard Artifacts. Distill (2016)
 [19] Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.A.: Context Encoders: Feature Learning by Inpainting. CVPR (2016)
 [20] Pawlowski, N., Lee, M.C.H., Rajchl, M., McDonagh, S., Ferrante, E., Kamnitsas, K., Cooke, S., Stevenson, S.K., Khetani, A.M., Newman, T., Zeiler, F.A., Digby, R.J., Coles, J.P., Rueckert, D., Menon, D.K., Newcombe, V.F.J., Glocker, B.: Unsupervised Lesion Detection in Brain CT using Bayesian Convolutional Autoencoders (2018)
 [21] Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In: ICML. JMLR.org (2014)

[22]
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive Autoencoders: Explicit Invariance During Feature Extraction. In: ICML (2011)
 [23] Schlegl, T., Seeböck, P., Waldstein, S.M., SchmidtErfurth, U., Langs, G.: Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery. In: IPMI. Springer (2017)
 [24] Schölkopf, B., Platt, J.C., ShaweTaylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the Support of a HighDimensional Distribution. Neural Comput. (2001)

[25]
Shyu, M.L., Chen, S.C., Sarinnapakorn, K., Chang, L.: A Novel Anomaly Detection Scheme Based on Principal Component Classifier. ICDM (2003)
 [26] Smilkov, D., Thorat, N., Kim, B., Viégas, F.B., Wattenberg, M.: SmoothGrad: removing noise by adding noise. CoRR (2017)
 [27] Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for Simplicity: The All Convolutional Net. In: ICLR (workshop track) (2015)
 [28] Van Essen, D.C., Ugurbil, K., Auerbach, E., Barch, D., Behrens, T.E.J., Bucholz, R., Chang, A., Chen, L., Corbetta, M., Curtiss, S.W., Della Penna, S., Feinberg, D., Glasser, M.F., Harel, N., Heath, A.C., LarsonPrior, L., Marcus, D., Michalareas, G., Moeller, S., Oostenveld, R., Petersen, S.E., Prior, F., Schlaggar, B.L., Smith, S.M., Snyder, A.Z., Xu, J., Yacoub, E., WUMinn HCP Consortium: The Human Connectome Project: a data acquisition perspective. Neuroimage (2012)
 [29] Vernooij, M.W., Ikram, M.A., Tanghe, H.L., Vincent, A.J., Hofman, A., Krestin, G.P., Niessen, W.J., Breteler, M.M., van der Lugt, A.: Incidental Findings on Brain MRI in the General Population. NEJM (2007)

[30]
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. JMLR (2010)