Supervised Deep Learning has indisputably shown great performance in the segmentation of medical images, including pathologies in brain MRI. However, these models make assumptions on the nature of pathologies they try to segment based on the labeled data they are trained from, in which rare cases might not be adequately covered and thus can potentially not be delineated properly. Generally, the unavailability of large quantities of labeled data poses a burden for the field. Recently, unsupervised representation learning and generative modeling based frameworks have emerged as promising tools to detect and segment arbitrary pathologies in MRI, without calling for pixel-precise expert annotations.
Methods based on GANs model the distribution of normal retinal OCT data and rely on the GANs’ incapability to recover anomalous samples from the modeled distribution [9, 8]. Similarly, in the context of brain imaging, Variational Autoencoders[11, 12, 6] (VAEs), Adversarial Autoencoders (AAEs) and combinations of GANs and VAEs have been proposed to model the distribution of healthy brain MRI. The feed-forward nature of these approaches allows to efficiently obtain reconstructions of input data. In those reconstructions anomalies likely have vanished as they are not part of the modeled distribution. The variational properties of these frameworks also allow to project input samples to a probabilistic latent space and to restore more likely, lesion-free counterparts by walking along the manifold. Although promising results have been reported, some important aspects have not yet been adequately addressed: i) different pathologies appear at different sizes and might call for different image resolutions; ii) at high resolution, reconstruction fidelity is paramount to be able to delineate small lesions with precision, but frameworks like VAEs can only provide blurry, coarse reconstructions.
Here, we propose a framework for unsupervised anomaly segmentation based on the Laplacian Pyramid, tailored around the family of Autoencoders (AEs). Our approach allows to compress and reconstruct MR images of the brain with high fidelity while successfully suppressing anomalies. More precisely, inspired by , we model the distribution of the scale-space representation of healthy brain MRI rather than actual image pixels. A comparison to classic AEs and other AE-based State-of-the-Art on three different datasets with different pathologies shows both superior segmentation performance and higher reconstruction fidelity. The inherent multi-scale nature of the laplacian pyramid also allows us to segment anomalies at different resolutions and to aggregate the results, which further improves the performance and gives insights into which resolution is appropriate for diseases such as MS and Glioblastoma.
Similar to previous work, we rely on modeling healthy anatomy with encoder-decoder networks and aim to localize anomalies from reconstruction residuals. However, we do not model the intensity distribution directly. Instead, we split the frequency band of the input data by learning to compress and reconstruct the laplacian pyramid of healthy brain MRI.
Given a gaussian kernel
with variance, a downsampling operator and an upsampling operator , a laplacian pyramid with levels can be obtained by repeatedly smoothing and downsampling an input image , i.e.
and determining the high frequency residuals at each level :
An image is completely represented by the low-resolution image after downsamplings and the high frequency residuals . A reconstruction can be obtained recursively via
Let be a set of healthy brain MR slices and be a single sample . For every level of the pyramid, we model the distribution of the respective healthy high frequency components with an encoder-decoder network by minimizing the discrepancy between and its reconstruction (see Fig. 1). To account for upsampling inaccuracies, we do not minimize the reconstruction error on the high frequency residuals directly. Instead, as a proxy, we minimize the difference between and their reconstructed counterpart :
The overall loss is a weighted sum of losses at all scales:
Since the laplacian pyramid of an image is often referred to as its scale-space representation, we refer to the resulting set of encoder-decoder networks as the Scale-Space Autoencoder (SSAE). The underlying encoder-decoder network can be arbitrarily defined as a deterministic Autoencoder or as a VAE.
2.1 Anomaly Detection
Given a trained model and the scale-space representation of an image, it can be reconstructed at different resolutions from the recursive aggregation:
Assuming that a model is not capable to reliably reconstruct high frequency components of anomalies, an anomaly segmentation can be obtained from the residuals among and :
The recursive relation in Eq. 2 can also be applied on the residuals to obtain an aggregated residual image at full resolution, i.e. a multi-scale aggregation of lesion segmentations:
3 Experiments and Results
In the following, we first introduce the datasets used in our experiments. In succession, we provide i) a comparison of our scale-space approach to a variety of State-of-the-Art methods, ii) a study on reconstruction fidelity and segmentation performance at multiple resolutions on different pathologies and iii) investigations of the proposed multi-scale aggregation.
For evaluating our scale-space approach and the multi-scale aggregation, we employ four different datasets. To train our models, we use the FLAIR images from a dataset of 100 healthy subjects from our clinical partners, acquired with a Philips Achieva 3T MR scanner. For testing, we use a dataset containing FLAIR scans of 49 subjects with MS, taken with the same scanner. Further, we rely on two datasets acquired with Siemens scanners: the non-public , consisting of 26 subjects with Glioblastoma, and the publicly available MS dataset from University Hospital of Lublijana . All scans were skull-stripped using ROBEX , co-registered to the SRI24 ATLAS , and normalized by their 98th percentile into . In all our experiments, we use 2D axial slices which contain brain tissue.
All our experiments were implemented in Python with TensorFlow and carried out on a commodity GPU. Each model was trained in batches of 8 until convergence using the ADAM optimizer with a learning rate of 0.001 and an automatic early-stopping heuristic. The lagrangian multipliersfor each stage in Eq. 4 were used in a one-hot fashion to train every stage of the pyramid separately, starting with the lowest level . For smoothing the images, we use a length 5 isotropic gaussian kernel with a such that
of the gaussian distribution are covered , and for the upsampling operator
we adopt bilinear interpolation.
3.3 Comparison to State-of-the-Art
First, we compare three different variants of our scale-space approach, i.e. a dense, spatial and variational SSAE, against a variety of State-of-the-Art (SOTA) methods on all testing datasets. We measure the area under the Precision-Recall curve (AUPRC) and the optimally achievable DICE-score DICE, which constitutes a dataset-specific theoretical upper-bound to a models segmentation performance and is determined via a greedy search for the threshold which yields the highest DICE-score on a test set. Modus operandi is px, as we were unable to obtain feasible results at higher resolution with all of the SOTA methods. Results are reported in Table 3.3. Among all reconstruction-based methods, our scale-space models always show noticeable improvements over their traditional counterpart, with the SSVAE being slightly inferior to the spatial and dense SSAE. However, on and , the costly, iterative restoration-based approach from You et al.  shows the best overall performance.
Approach & AUPRC & DICE & AUPRC & DICE & AUPRC & DICE
, late after line=
, separator=semicolon]csv/results.sota.top2.csv Approach=,MSKRI AUPRC=, MSKRI BPDICE=, MSKRI AUROC=, GBKRI AUPRC=, GBKRI BPDICE=, GBKRI AUROC=, MSLUB AUPRC=, MSLUB BPDICE=, MSLUB AUROC= & & & & & &
3.4 Reconstruction Fidelity
Next, we compare variants of AEs, i.e. dense AE, spatial AE and a VAE, against their scale-space counterparts in terms of their reconstruction capabilities. Again, all corresponding models share the same architecture and model complexity for a fair comparison. To measure fidelity, we collect the pixel-wise -errors among all healthy validation input slices and their reconstructions, normalized by the total number of pixels. Fig. 3 shows the corresponding statistics on px, px and px. The upper limit of px was set by our training data . In comparison to their AE counterpart, all scale-space models show substantially lower reconstruction errors at all scales. As expected, reconstruction errors increase with image resolution, as the modeling task becomes more complex. The lowest error is achieved by a spatial SSAE, which reconstructs data almost perfectly due to the low level of compression in its bottleneck. Interestingly, a dense SSAE is on par with a spatial AE, although it loses any spatial cues in its latent space. The achieved high fidelity can also be seen in our visual results (Fig. 2).
3.5 Investigating Resolution and Multi-scale Aggregation
Approach & Resolution & AUPRC & DICE & AUPRC & DICE & AUPRC & DICE
, late after line=
]csv/results.selected.csvApproach=, Stage=, MSKRI AUROC=, MSKRI AUPRC=, MSKRI BPDICE=, MSKRI DICE=, MSKRI Rec.-Error=, GBKRI AUROC=, GBKRI AUPRC=, GBKRI BPDICE=, GBKRI DICE=, GBKRI Rec.-Error=, MSLUB AUROC=, MSLUB AUPRC=, MSLUB BPDICE=, MSLUB DICE=, MSLUB Rec.-Error= & & & & & & &
Finally, we compare the different scale-space and traditional AE variants by their segmentation performance on the three datasets, again measured using the AUPRC & DICE, at different resolutions and investigate the benefits of the proposed multi-scale aggregation of residuals (Eq. 6) at highest resolution (see Table 3.5). For MS lesions in , which has been acquired with the same scanner as our healthy training data, best AUPRC is achieved by a dense SSAE at native resolution, yielding an absolute improvement of over its corresponding dense AE. On , performance is significantly lower across the board due to lower contrast, but the dense SSAE still shows the best performance. On both datasets, additional can be gained by aggregating residuals from multiple scales. In contrast to MS lesions, segmentation of tumors in works best at px with the majority of methods, and the proposed multi-scale aggregation shows no gains. The winning approach in this context is the spatial SSAE.
The proposed scale-space formulation appears to be especially beneficial at native resolution, where it leads to considerably better reconstructions across all datasets. This is especially useful for segmenting MS lesions, which can become very small. In this context, multi-scale aggregation also turns out to be beneficial, as these lesions can vary greatly in shape and size. For large, space-occupying lesions such as Glioblastoma (), a resolution of px turns out to be preferable. In this context, we also find our scale-space approach not to provide much benefits, as it generates undesirably good reconstructions of large, homogenous lesions. Overall, the multi-scale aggregation leads to improvements in most of the cases, but generally is of greater value for normal AEs, whose anomaly detections appear to be more orthogonal among different resolutions and aggregate to a better consensus. Anomaly segmentations obtained from our scale-space models seem to correlate more across different resolutions.
In conclusion, we proposed to model normal brain anatomy in a laplacian pyramid representation to obtain high fidelity reconstructions and improved segmentation performance. We successfully demonstrate the use of this scale-space approach for unsupervised anomaly segmentation in brain MRI on different datasets with different pathologies. From the inherent multi-scale nature of our scale-space formulation, we derived a multi-scale residual aggregation technique for building an anomaly segmentation consensus among multiple resolutions, which i) turned out to be beneficial in most of the examined scenarios and ii) works for normal AEs as well. In future work, the design of a shared latent space between the different encoder-decoder networks could be investigated, and restoration approaches like  could be adapted for our framework. Using a scale-space representation of the MR data, we also see opportunities towards improved domain invariance in unsupervised anomaly segmentation methods.
-  (2018) Deep autoencoding models for unsupervised anomaly segmentation in brain mr images. arXiv preprint arXiv:1804.04488. Cited by: §1.
-  (2018) Unsupervised detection of lesions in brain mri using constrained adversarial auto-encoders. arXiv preprint arXiv:1806.04972. Cited by: §1.
-  (2017) Laplacian pyramid of conditional variational autoencoders. In Proceedings of the 14th European Conference on Visual Media Production (CVMP 2017), pp. 7. Cited by: §1.
-  (2011) Robust Brain Extraction Across Datasets and Comparison With Publicly Available Methods. IEEE Transactions on Medical Imaging 30 (9), pp. 1617–1634. Cited by: §3.1.
-  (2018) A novel public mr image dataset of multiple sclerosis patients with lesion segmentations based on multi-rater consensus. Neuroinformatics 16 (1), pp. 51–63. Cited by: §3.1.
-  (2018) Unsupervised lesion detection in brain ct using bayesian convolutional autoencoders. Cited by: §1.
-  (2009-12) The SRI24 multichannel atlas of normal adult human brain structure. Human Brain Mapping 31 (5), pp. 798–819. Cited by: §3.1.
-  (2019) F-anogan: fast unsupervised anomaly detection with generative adversarial networks. Medical image analysis 54, pp. 30–44. Cited by: §1.
-  (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International Conference on Information Processing in Medical Imaging, pp. 146–157. Cited by: §1.
Unsupervised lesion detection via image restoration with a normative prior.
In Proceedings of The 2nd International Conference on Medical Imaging with Deep Learning, M. J. Cardoso, A. Feragen, B. Glocker, E. Konukoglu, I. Oguz, G. Unal, and T. Vercauteren (Eds.),
Proceedings of Machine Learning Research, Vol. 102, London, United Kingdom, pp. 540–556. External Links: Cited by: §1, §3.3, §4.
-  (2019) Unsupervised anomaly localization using variational auto-encoders. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 289–297. Cited by: §1.
-  (2018) Context-encoding variational autoencoder for unsupervised anomaly detection. arXiv preprint arXiv:1812.05941. Cited by: §1.