Scale-Space Autoencoders for Unsupervised Anomaly Segmentation in Brain MRI

06/23/2020 ∙ by Christoph Baur, et al. ∙ 0

Brain pathologies can vary greatly in size and shape, ranging from few pixels (i.e. MS lesions) to large, space-occupying tumors. Recently proposed Autoencoder-based methods for unsupervised anomaly segmentation in brain MRI have shown promising performance, but face difficulties in modeling distributions with high fidelity, which is crucial for accurate delineation of particularly small lesions. Here, similar to these previous works, we model the distribution of healthy brain MRI to localize pathologies from erroneous reconstructions. However, to achieve improved reconstruction fidelity at higher resolutions, we learn to compress and reconstruct different frequency bands of healthy brain MRI using the laplacian pyramid. In a range of experiments comparing our method to different State-of-the-Art approaches on three different brain MR datasets with MS lesions and tumors, we show improved anomaly segmentation performance and the general capability to obtain much more crisp reconstructions of input data at native resolution. The modeling of the laplacian pyramid further enables the delineation and aggregation of lesions at multiple scales, which allows to effectively cope with different pathologies and lesion sizes using a single model.



There are no comments yet.


page 4

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Supervised Deep Learning has indisputably shown great performance in the segmentation of medical images, including pathologies in brain MRI. However, these models make assumptions on the nature of pathologies they try to segment based on the labeled data they are trained from, in which rare cases might not be adequately covered and thus can potentially not be delineated properly. Generally, the unavailability of large quantities of labeled data poses a burden for the field. Recently, unsupervised representation learning and generative modeling based frameworks have emerged as promising tools to detect and segment arbitrary pathologies in MRI, without calling for pixel-precise expert annotations.

Methods based on GANs model the distribution of normal retinal OCT data and rely on the GANs’ incapability to recover anomalous samples from the modeled distribution [9, 8]. Similarly, in the context of brain imaging, Variational Autoencoders[11, 12, 6] (VAEs), Adversarial Autoencoders[2] (AAEs) and combinations of GANs and VAEs[1] have been proposed to model the distribution of healthy brain MRI. The feed-forward nature of these approaches allows to efficiently obtain reconstructions of input data. In those reconstructions anomalies likely have vanished as they are not part of the modeled distribution. The variational properties of these frameworks also allow to project input samples to a probabilistic latent space and to restore more likely, lesion-free counterparts by walking along the manifold[10]. Although promising results have been reported, some important aspects have not yet been adequately addressed: i) different pathologies appear at different sizes and might call for different image resolutions; ii) at high resolution, reconstruction fidelity is paramount to be able to delineate small lesions with precision, but frameworks like VAEs can only provide blurry, coarse reconstructions.

Here, we propose a framework for unsupervised anomaly segmentation based on the Laplacian Pyramid, tailored around the family of Autoencoders (AEs). Our approach allows to compress and reconstruct MR images of the brain with high fidelity while successfully suppressing anomalies. More precisely, inspired by [3], we model the distribution of the scale-space representation of healthy brain MRI rather than actual image pixels. A comparison to classic AEs and other AE-based State-of-the-Art on three different datasets with different pathologies shows both superior segmentation performance and higher reconstruction fidelity. The inherent multi-scale nature of the laplacian pyramid also allows us to segment anomalies at different resolutions and to aggregate the results, which further improves the performance and gives insights into which resolution is appropriate for diseases such as MS and Glioblastoma.

2 Methodology

Figure 1: An overview of the Scale-Space Autoencoder (SSAE) framework. A sample is decomposed into a 3-level laplacian pyramid, and every level uses a separate AE to compress and reconstruct the respective high frequency components.

Similar to previous work, we rely on modeling healthy anatomy with encoder-decoder networks and aim to localize anomalies from reconstruction residuals. However, we do not model the intensity distribution directly. Instead, we split the frequency band of the input data by learning to compress and reconstruct the laplacian pyramid of healthy brain MRI.

Given a gaussian kernel

with variance

, a downsampling operator and an upsampling operator , a laplacian pyramid with levels can be obtained by repeatedly smoothing and downsampling an input image , i.e.

and determining the high frequency residuals at each level :


An image is completely represented by the low-resolution image after downsamplings and the high frequency residuals . A reconstruction can be obtained recursively via


Let be a set of healthy brain MR slices and be a single sample . For every level of the pyramid, we model the distribution of the respective healthy high frequency components with an encoder-decoder network by minimizing the discrepancy between and its reconstruction (see Fig. 1). To account for upsampling inaccuracies, we do not minimize the reconstruction error on the high frequency residuals directly. Instead, as a proxy, we minimize the difference between and their reconstructed counterpart :


The overall loss is a weighted sum of losses at all scales:


Since the laplacian pyramid of an image is often referred to as its scale-space representation, we refer to the resulting set of encoder-decoder networks as the Scale-Space Autoencoder (SSAE). The underlying encoder-decoder network can be arbitrarily defined as a deterministic Autoencoder or as a VAE.

2.1 Anomaly Detection

Given a trained model and the scale-space representation of an image, it can be reconstructed at different resolutions from the recursive aggregation:


Assuming that a model is not capable to reliably reconstruct high frequency components of anomalies, an anomaly segmentation can be obtained from the residuals among and :

The recursive relation in Eq. 2 can also be applied on the residuals to obtain an aggregated residual image at full resolution, i.e. a multi-scale aggregation of lesion segmentations:


3 Experiments and Results

Figure 2: Visual results. A: input; B: ground-truth segmentation; C: reconstruction from a normal AE; D: median-filtered residuals from C; E: reconstruction from our SSAE; F: median-filtered residuals from E. The high fidelity facilitated by our scale-space approach leads to fewer unwanted residuals.

In the following, we first introduce the datasets used in our experiments. In succession, we provide i) a comparison of our scale-space approach to a variety of State-of-the-Art methods, ii) a study on reconstruction fidelity and segmentation performance at multiple resolutions on different pathologies and iii) investigations of the proposed multi-scale aggregation.

3.1 Dataset

For evaluating our scale-space approach and the multi-scale aggregation, we employ four different datasets. To train our models, we use the FLAIR images from a dataset of 100 healthy subjects from our clinical partners, acquired with a Philips Achieva 3T MR scanner. For testing, we use a dataset containing FLAIR scans of 49 subjects with MS, taken with the same scanner. Further, we rely on two datasets acquired with Siemens scanners: the non-public , consisting of 26 subjects with Glioblastoma, and the publicly available MS dataset from University Hospital of Lublijana [5]. All scans were skull-stripped using ROBEX [4], co-registered to the SRI24 ATLAS [7], and normalized by their 98th percentile into . In all our experiments, we use 2D axial slices which contain brain tissue.

3.2 Implementation

All our experiments were implemented in Python with TensorFlow and carried out on a commodity GPU. Each model was trained in batches of 8 until convergence using the ADAM optimizer with a learning rate of 0.001 and an automatic early-stopping heuristic. The lagrangian multipliers

for each stage in Eq. 4 were used in a one-hot fashion to train every stage of the pyramid separately, starting with the lowest level . For smoothing the images, we use a length 5 isotropic gaussian kernel with a such that

of the gaussian distribution are covered , and for the upsampling operator

we adopt bilinear interpolation.

3.3 Comparison to State-of-the-Art

First, we compare three different variants of our scale-space approach, i.e. a dense, spatial and variational SSAE, against a variety of State-of-the-Art (SOTA) methods on all testing datasets. We measure the area under the Precision-Recall curve (AUPRC) and the optimally achievable DICE-score DICE, which constitutes a dataset-specific theoretical upper-bound to a models segmentation performance and is determined via a greedy search for the threshold which yields the highest DICE-score on a test set. Modus operandi is px, as we were unable to obtain feasible results at higher resolution with all of the SOTA methods. Results are reported in Table 3.3. Among all reconstruction-based methods, our scale-space models always show noticeable improvements over their traditional counterpart, with the SSVAE being slightly inferior to the spatial and dense SSAE. However, on and , the costly, iterative restoration-based approach from You et al. [10] shows the best overall performance.

[tabular=—l—l—l—l—l—l—l—, table head= &

Table 1: Variants of our scale-space approach compared to SOTA methods in terms of AUPRC and DICE (higher is better). Methods marked with an * share the same model complexity. Top-2 methods in each column are bold-faced.

  &   &  
, late after line=
, separator=semicolon]csv/results.sota.top2.csv Approach=,MSKRI AUPRC=, MSKRI BPDICE=, MSKRI AUROC=, GBKRI AUPRC=, GBKRI BPDICE=, GBKRI AUROC=, MSLUB AUPRC=, MSLUB BPDICE=, MSLUB AUROC= & & & & & &

3.4 Reconstruction Fidelity

Figure 3: Normalized Reconstruction-Errors at different resolutions using different AE and SSAE models on held-out healthy validation data (lower is better).

Next, we compare variants of AEs, i.e. dense AE, spatial AE and a VAE, against their scale-space counterparts in terms of their reconstruction capabilities. Again, all corresponding models share the same architecture and model complexity for a fair comparison. To measure fidelity, we collect the pixel-wise -errors among all healthy validation input slices and their reconstructions, normalized by the total number of pixels. Fig. 3 shows the corresponding statistics on px, px and px. The upper limit of px was set by our training data . In comparison to their AE counterpart, all scale-space models show substantially lower reconstruction errors at all scales. As expected, reconstruction errors increase with image resolution, as the modeling task becomes more complex. The lowest error is achieved by a spatial SSAE, which reconstructs data almost perfectly due to the low level of compression in its bottleneck. Interestingly, a dense SSAE is on par with a spatial AE, although it loses any spatial cues in its latent space. The achieved high fidelity can also be seen in our visual results (Fig. 2).

3.5 Investigating Resolution and Multi-scale Aggregation

[tabular=—l—l—l—l—l—l—l—l—, table head= & &

Table 2: Segmentation comparing dense, spatial AEs and variational AEs/SSAEs at different resolution as well as our multi-scale aggregation.

  &   &  
Approach & Resolution & AUPRC & DICE & AUPRC & DICE & AUPRC & DICE
, late after line=

Finally, we compare the different scale-space and traditional AE variants by their segmentation performance on the three datasets, again measured using the AUPRC & DICE, at different resolutions and investigate the benefits of the proposed multi-scale aggregation of residuals (Eq. 6) at highest resolution (see Table 3.5). For MS lesions in , which has been acquired with the same scanner as our healthy training data, best AUPRC is achieved by a dense SSAE at native resolution, yielding an absolute improvement of over its corresponding dense AE. On , performance is significantly lower across the board due to lower contrast, but the dense SSAE still shows the best performance. On both datasets, additional can be gained by aggregating residuals from multiple scales. In contrast to MS lesions, segmentation of tumors in works best at px with the majority of methods, and the proposed multi-scale aggregation shows no gains. The winning approach in this context is the spatial SSAE.

3.6 Discussion

The proposed scale-space formulation appears to be especially beneficial at native resolution, where it leads to considerably better reconstructions across all datasets. This is especially useful for segmenting MS lesions, which can become very small. In this context, multi-scale aggregation also turns out to be beneficial, as these lesions can vary greatly in shape and size. For large, space-occupying lesions such as Glioblastoma (), a resolution of px turns out to be preferable. In this context, we also find our scale-space approach not to provide much benefits, as it generates undesirably good reconstructions of large, homogenous lesions. Overall, the multi-scale aggregation leads to improvements in most of the cases, but generally is of greater value for normal AEs, whose anomaly detections appear to be more orthogonal among different resolutions and aggregate to a better consensus. Anomaly segmentations obtained from our scale-space models seem to correlate more across different resolutions.

4 Conclusion

In conclusion, we proposed to model normal brain anatomy in a laplacian pyramid representation to obtain high fidelity reconstructions and improved segmentation performance. We successfully demonstrate the use of this scale-space approach for unsupervised anomaly segmentation in brain MRI on different datasets with different pathologies. From the inherent multi-scale nature of our scale-space formulation, we derived a multi-scale residual aggregation technique for building an anomaly segmentation consensus among multiple resolutions, which i) turned out to be beneficial in most of the examined scenarios and ii) works for normal AEs as well. In future work, the design of a shared latent space between the different encoder-decoder networks could be investigated, and restoration approaches like [10] could be adapted for our framework. Using a scale-space representation of the MR data, we also see opportunities towards improved domain invariance in unsupervised anomaly segmentation methods.


  • [1] C. Baur, B. Wiestler, S. Albarqouni, and N. Navab (2018) Deep autoencoding models for unsupervised anomaly segmentation in brain mr images. arXiv preprint arXiv:1804.04488. Cited by: §1.
  • [2] X. Chen and E. Konukoglu (2018) Unsupervised detection of lesions in brain mri using constrained adversarial auto-encoders. arXiv preprint arXiv:1806.04972. Cited by: §1.
  • [3] G. Dorta, S. Vicente, L. Agapito, N. D. Campbell, S. Prince, and I. Simpson (2017) Laplacian pyramid of conditional variational autoencoders. In Proceedings of the 14th European Conference on Visual Media Production (CVMP 2017), pp. 7. Cited by: §1.
  • [4] J. E. Iglesias, C. Liu, P. M. Thompson, and Z. Tu (2011) Robust Brain Extraction Across Datasets and Comparison With Publicly Available Methods. IEEE Transactions on Medical Imaging 30 (9), pp. 1617–1634. Cited by: §3.1.
  • [5] Ž. Lesjak, A. Galimzianova, A. Koren, M. Lukin, F. Pernuš, B. Likar, and Ž. Špiclin (2018) A novel public mr image dataset of multiple sclerosis patients with lesion segmentations based on multi-rater consensus. Neuroinformatics 16 (1), pp. 51–63. Cited by: §3.1.
  • [6] N. Pawlowski, M. C. Lee, M. Rajchl, S. McDonagh, E. Ferrante, K. Kamnitsas, S. Cooke, S. Stevenson, A. Khetani, T. Newman, et al. (2018) Unsupervised lesion detection in brain ct using bayesian convolutional autoencoders. Cited by: §1.
  • [7] T. Rohlfing, N. M. Zahr, E. V. Sullivan, and A. Pfefferbaum (2009-12) The SRI24 multichannel atlas of normal adult human brain structure. Human Brain Mapping 31 (5), pp. 798–819. Cited by: §3.1.
  • [8] T. Schlegl, P. Seeböck, S. M. Waldstein, G. Langs, and U. Schmidt-Erfurth (2019) F-anogan: fast unsupervised anomaly detection with generative adversarial networks. Medical image analysis 54, pp. 30–44. Cited by: §1.
  • [9] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International Conference on Information Processing in Medical Imaging, pp. 146–157. Cited by: §1.
  • [10] S. You, K. C. Tezcan, X. Chen, and E. Konukoglu (2019-08–10 Jul) Unsupervised lesion detection via image restoration with a normative prior. In Proceedings of The 2nd International Conference on Medical Imaging with Deep Learning, M. J. Cardoso, A. Feragen, B. Glocker, E. Konukoglu, I. Oguz, G. Unal, and T. Vercauteren (Eds.),

    Proceedings of Machine Learning Research

    , Vol. 102, London, United Kingdom, pp. 540–556.
    External Links: Link Cited by: §1, §3.3, §4.
  • [11] D. Zimmerer, F. Isensee, J. Petersen, S. Kohl, and K. Maier-Hein (2019) Unsupervised anomaly localization using variational auto-encoders. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 289–297. Cited by: §1.
  • [12] D. Zimmerer, S. A. Kohl, J. Petersen, F. Isensee, and K. H. Maier-Hein (2018) Context-encoding variational autoencoder for unsupervised anomaly detection. arXiv preprint arXiv:1812.05941. Cited by: §1.