Few Labeled Atlases are Necessary for Deep-Learning-Based Segmentation

08/13/2019 ∙ by Hyeon Woo Lee, et al. ∙ 7

We tackle biomedical image segmentation in the scenario of only a few labeled brain MR images. This is an important and challenging task in medical applications, where manual annotations are time-consuming. Classical multi-atlas based anatomical segmentation methods use image registration to warp segments from labeled images onto a new scan. These approaches have traditionally required significant runtime, but recent learning-based registration methods promise substantial runtime improvement. In a different paradigm, supervised learning-based segmentation strategies have gained popularity. These methods have consistently used relatively large sets of labeled training data, and their behavior in the regime of a few labeled images has not been thoroughly evaluated. In this work, we provide two important results for anatomical segmentation in the scenario where few labeled images are available. First, we propose a straightforward implementation of efficient semi-supervised learning-based registration method, which we showcase in a multi-atlas segmentation framework. Second, through a thorough empirical study, we evaluate the performance of a supervised segmentation approach, where the training images are augmented via random deformations. Surprisingly, we find that in both paradigms, accurate segmentation is generally possible even in the context of few labeled images.



There are no comments yet.


page 5

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Biomedical image anatomical segmentation is a fundamental problem in medical image analysis. Recent state-of-the-art methods have focused on deep learning based supervised methods, which typically use large labeled datasets. However, acquiring a large set of paired manual segmentation maps is challenging and time consuming, leading to many datasets with few labeled examples in practice. In this work, we investigate this scenario for brain MRI in common segmentation strategies. We show how different state of the art learning methods can yield impressive results with different properties in this setting, and analyze how the number of labeled atlases affect this result. We also propose a straightforward improvement that builds on these methods.

Multi-atlas segmentation (MAS) has been widely studied, especially in the context of only a few labeled images, or atlases [3, 24, 29, 37]. To segment a new scan, each atlas is first registered to the desired scan, the atlas label map is then propagated using the resulting deformation maps, and the warped labels from multiple atlases are fused to yield a final segmentation [18, 24, 29, 37]. For the first step, registration methods have traditionally solved an optimization problem for each image pair, and therefore exhibited long runtimes [4, 5, 6, 8, 12, 28, 30]

. In contrast, recent learning-based registration approaches learn a function, usually a neural network, to take in two images and rapidly compute the deformation field. While some methods require many example (ground truth) deformations or segmentation maps 

[22, 38], others are unsupervised, requiring only a dataset of images [11]. Importantly, learning-based registration methods have fast runtimes, usually requiring only seconds for a registration at test time, even on a CPU [7, 11, 11, 14]. Learning-based registration methods have been further extended to also leverage large labeled datasets at training to yield models that better align segmentation labels [7, 22]. In this paper, we build on this prior work in learning-based registration and propose a semi-supervised registration strategy that improves multi-atlas segmentation in this scenario of few available atlases.

Supervised learning based segmentation, especially using convolutional neural networks (CNNs), has recently seen tremendous success in segmentation 

[2, 26, 27, 35, 36]. By seeing many examples, these methods learn the parameters of a CNN that takes an image as input and outputs a segmentation prediction. While these methods provide state-of-the-art results, they have generally been demonstrated in the context of large labeled datasets [2, 26]. Data augmentation strategies, such as random rotation, scaling, and smooth 3D deformations, are often performed to encourage robustness to image variations [2, 23, 33, 36]. We build on these methods and find that with careful data augmentation, good segmentations can be achieved even in our scenario of very few labeled training images.

There are several contemporaneous papers that are closely related to this paper. Some of these focus on some small (e.g. one) number of manually segmented images and leverage sophisticated data augmentations techniques or priors to facilitate supervised segmentation methods [9, 39]. Others require no labeled examples of the desired modality, but exploit segmentation maps from other datasets [13, 25]. In this paper, we focus a comparative analysis of how different numbers of labeled data affect MAS and supervised approaches to understand when the use of each method is plausible or desired. Based on the insights we gain from our experiments, we propose a new semi-supervised method.

2 Method

Our goal is to segment a dataset of medical images, and we focus on brain MRI in our experiments. Let represent a small dataset of labeled atlases, each consisting of the grayscale image  and discrete segmentation map , such that each voxel of  corresponds to one of L anatomical labels.

2.1 Background

2.1.1 Image-Based Registration.

Let be an atlas and testing image, respectively. We build on learning-based registration methods that learn a model , where  is a registration field and  are parameters of the function, usually a convolutional neural network (CNN). The goal is for the network to yield deformations  such that for each voxel , and correspond to the same anatomical location, where  represents  warped by . To optimize the parameters , supervised registration methods employ “ground truth” deformations that are either simulated or obtained using an external registration tool. To avoid the requirement of ground truth, we follow recent unsupervised methods [11]. Specifically, we optimize network parameters using the loss


using stochastic gradient descent where

is a regularization parameter, is a labeled atlas and is an image from a dataset of unlabeled images, . and are selected randomly. The first term penalizes the dissimilarity between the image and the warped atlas , and the second encourages a smooth deformation. We use normalized cross correlation (NCC) for , which has been shown to be robust to intensity heterogeneity, yielding better registration results than a simple loss such as mean square error [5, 7].

2.2 Semi-Supervised Registration

To leverage the few existing atlas segmentation maps in a learning-based registration framework, we build upon the setup above and design a semi-supervised registration method. Specifically, during training with stochastic gradient descent, instead of always providing the network with a random atlas and an unlabeled image, we occasionally provide two atlases as input. In these instances, the network can be encouraged to also optimize the accuracy of the segmentation overlap resulting from warping one atlas’ label map to the other using the resulting deformation , building on recent label-supervised methods [7, 22].

Specifically, we add an additional segmentation term to the loss function:


where is a regularization parameter for the supervised loss, and captures the agreement of segmentation maps. Specifically, we employ the Dice score, which has also been employed in recent label-supervised registration methods, in the context of large labeled datasets [7, 22]. The Dice overlap of two atlases is then:


for anatomical structure . This strategy leverages unlabeled images using equation (1), and the few labeled images using equation (2), thus exploiting the topological consistency offered by registration-based methods.

2.3 Spatial Data Augmentation

To train the network, at each iteration we randomly deform an atlas and its corresponding segmentation map with a smooth random deformation field : as has been done in segmentation methods [36, 39]. The warped segmentation is only used in the semi-supervised loss (2). As we demonstrate in our experiments, providing a synthesized atlas and segmentation map improves robustness of the learning based registration model, especially through iterations at which we register atlas to atlas during training.

2.4 Multi-Atlas Segmentation

Given a trained network, we warp labeled atlases and

augmented atlases to each test subject. Specifically, rather than using nearest neighborhood interpolation, we propagate the segmentation probabilities encoded as one-hot matrix. The label for an individual voxel is determined by averaging labels from the

warped segmentations. The label with maximum probability is assigned at each voxel.

2.5 Implementation

We implement the registration function as a CNN network with a UNet-style architecture following recent literature [7, 20, 27]. Figure 2

depicts the network used in registration. The network takes the 2-channel 3D image composed by concatenating the two inputs. We use 3D convolutions with kernel size 3x3x3 and stride of 2, followed by Leaky ReLU activations. We warp each atlas to the subject using a spatial transformation function with linear interpolation. Figure 

1 represents the overall pipeline of proposed method.

Figure 1: Overview of end-to-end semi-supervised registration based MAS. The supervised data is leveraged 10 percent of training iterations.
Figure 2: CNN architecture for image registration. Each rectangle represents a 3D volume, the number inside the box indicates the number of filters, and the spatial resolution is included below each rectangle.

3 Experiments

We provide two main experiments with the goal of understanding the performance of models when few labeled atlases are available. First, we analyze the effect of our semi-supervised learning-based registration strategy on the performance of multi-atlas segmentation. Second, we analyze more broadly how MAS strategy compares to supervised learning methods in the setting of few labeled examples.

3.1 Setup

3.1.1 Methods.

We explore three variants of MAS with learning-based registration. MAS, MAS-DA, and MAS-SS refer to multi-atlas segmentation, multi-atlas segmentation with our proposed data augmentation (DA), and MAS with semi-supervised (SS) learning and DA, respectively.

We also analyze supervised segmentation strategies. Recent supervised learning-based segmentation methods use a discriminative CNN model  that maps images I to their segmentation maps and is parametrized by . We learn such a model using the labeled atlases, minimizing the categorical cross-entropy loss using stochastic gradient descent. Our focus is not to explore architecture variants but to compare this approach with MAS strategies. To preserve model capacity, we use the same UNet-style architecture as in the registration task. We use softmax activation for the final layer to output the segmentation probabilities. For computational efficiency, we divide each image into 120 3D smaller patches of size 64x64x64. We train variants of supervised learning-based segmentation (which we call SegNet) with limited labeled data. SegNet-DA refers to a supervised method with data augmentation, implemented similar to 2.3. Finally, as an upper bound, we train a fully supervised model, SegNet-Full, using the labels of all images (not just atlases) in the training set. We use this model simply to illustrate the optimal performance and enable a measure of the gap in performance compared to the rest of the tested models.

3.1.2 Dataset.

We use two datasets. First, we use preprocessed 7829 T1 weighted brain MRI scans from eight public data sets: ADNI [34], OASIS [31], ABIDE [15], ADHD200 [1], MCIC [19], PPMI [32], HABS [10], and Harvard GSP [21]. All scans are preprocessed using FreeSurfer tools, including affine registration, brain extraction, and segmentation, only used for evaluation [17]. We use 7329 random images from this dataset as unlabeled data for registration, and we emphasizes that these labels are not used during training. We similarly use a second dataset of 38 pairs of brain MRI scans and hand annotated segmentation maps from the Buckner40 dataset [17]. We split this dataset into 18, 10, and 10 images for train, validation, and test sets, respectively. We train MAS models using atlases from Buckner40 training subset as input, and we use the eight public data sets as unlabeled data. For training SegNet, we use atlases and corresponding segmentation maps from the Buckner40 train set. SegNet-Full refers to a SegNet-DA that used all of the labeled data from Buckner40 and eight public dataset as training.

3.1.3 Experimental Setup.

Our goal is to understand the behavior of MAS and supervised learning methods in the context of few labeled examples. We use labeled scans. Specifically, for each of , N atlases are randomly chosen from the Bucker40 training dataset. We repeat this process times to construct different random “atlas sets”. For each MAS and supervised learning segmentation,

models are trained and used to perform the segmentation on the test dataset. Evaluation of the performance for each paradigm is measured by averaging each evaluation metric (described in text) over


3.1.4 Parameters.

We set network architecture and parameters based on results in previous literature [7]. Specifically, we set regularization parameters to 1.5 and to 1.0. During training, we use the supervised atlas-to-atlas registration 10 percent of the time. On a small single scenario, we experimented with 50 percent and 10 percent and found that 10 percent produced optimal results.

3.1.5 Evaluation Metric.

We first evaluate our models with anatomical segmentation overlap using Dice score [16]. We focus on 29 anatomical structures that have significant volume in all images. The predicted segmentations are evaluated relative to manual anatomical segmentations from the Buckner40 dataset. Second, we evaluate surface distance (SD) of all structures. For each pre-defined anatomical region, we compute the distance between the predicted and manual segmentation surfaces in mm. SD is likely to highlight spurious segmentations which are further from correct edges. We average the metrics over structures and test subjects.

Figure 3: Dice score and surface distance of test data for various segmentation methods. Upper: mean Dice score (higher the better) for variants of MAS (left), and variants of SegNets (right). Lower: surface distance (lower the better) for MAS-SS and variants of SegNets: mean (left) and maximum (right).
Figure 4: Examples of MR slices with segmentation for several test subjects. We indicate the numbers of labeled atlases used for each method.

3.2 Results

Figure 3 presents the performance of each strategy on the test dataset. Surprisingly, we find that all methods, when used in their best variant, can achieve reasonable segmentation results that approach the upper bound demonstrated by the fully supervised SegNet model. For example, we find that with just three atlases, the best MAS and SegNet methods can be within four Dice points. Furthermore, our proposed method, MAS-SS, can have a maximum surface distance across all structures of less than 7 mm, and a mean surface distance less than 0.4 mm.

We find that the data augmentation in registration strategy improves the performance of MAS in the few labeled atlases (less than 3) setting in terms of Dice score. Importantly, our proposed model, MAS-SS, consistently improves on the performance of MAS and MAS-DA methods in terms of Dice score in all cases. All of the MAS models show consistently very small mean and max surface distance (not shown).

Figure 3 also illustrates segmentation performance of supervised learning-based strategies. Data augmentation significantly improves SegNet segmentation performance in terms of both Dice and surface distance. Interestingly, MAS-SS yields higher Dice score for few atlases (less than 3) and equivalent Dice performance on 3 atlases. Importantly, the proposed method performs significantly better in terms of both mean and maximum surface distance given one to five atlases, highlighting the advantage of combining supervision with a registration-guided method that preserves anatomical topology. SegNet-DA performs slightly better in terms of Dice score with more atlases, but maintains high surface distance. Figure 4 shows segmentation results from top performing strategies.

4 Conclusion

We focus on segmentation in the regime of few labeled data. Through a detailed comparison, we show the surprising result that even with few labeled images, two separate deep learning based approaches can achieve reasonable results, contrasting conventional wisdom that deep learning approaches require large amounts of data. After investigating various state-of-art segmentation strategies with few labeled data, we also propose a semi-supervised, learning-based multi-atlas segmentation, which improves on existing methods. The proposed method achieves both high Dice improvement, but also low surface distance, highlighting the advantage of a semi-supervised framework within the topologically-constrained registration setting. These findings suggest two important contributions: first, the conclusion that deep learning segmentation strategies do not always require large amounts of labeled training data, and second, the semi-supervised learning method provides a new approach to multi-atlas segmentation.


  • [1] T. ADHD Consortium (2012-09) The adhd-200 consortium: a model to advance the translational potential of neuroimaging in clinical neuroscience. Frontiers in Systems Neuroscience 6. Cited by: §3.1.2.
  • [2] Z. Akkus, A. Galimzianova, A. Hoogi, and Rubin (2017-06) Deep learning for brain mri segmentation: state of the art and future directions. Journal of Digital Imaging, pp. 1–11. Cited by: §1.
  • [3] X. Artaechevarria, A. Munoz-Barrutia, and C. Ortiz-de-Solorzano (2009-08) Combination strategies in multi-atlas image segmentation: application to brain mr data. IEEE Transactions on Medical Imaging 28 (8), pp. 1266–1277. Cited by: §1.
  • [4] J. Ashburner (2007) A fast diffeomorphic image registration algorithm. Neuroimage 38 (1), pp. 95–113. Cited by: §1.
  • [5] B. B. Avants, C. L. Epstein, M. Grossman, and J. C. Gee (2008) Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis 12 (1), pp. 26–41. Cited by: §1, §2.1.1.
  • [6] R. Bajcsy and S. Kovačič (1989) Multiresolution elastic matching. Computer vision, graphics, and image processing 46 (1), pp. 1–21. Cited by: §1.
  • [7] G. Balakrishnan, A. Zhao, M. Sabuncu, J. Guttag, and A. V. Dalca (2019) VoxelMorph: a learning framework for deformable medical image registration. IEEE TMI. Cited by: §1, §2.1.1, §2.2, §2.2, §2.5, §3.1.4.
  • [8] M. F. Beg, M. I. Miller, A. Trouvé, and L. Younes (2005) Computing large deformation metric mappings via geodesic flows of diffeomorphisms. International journal of computer vision 61 (2), pp. 139–157. Cited by: §1.
  • [9] K. Chaitanya, N. Karani, C. Baumgartner, and E. Konukoglu (2019) Semi-supervised and task-driven data augmentation. arXiv preprint arXiv:1902.05396. Cited by: §1.
  • [10] A. Dagley, M. LaPoint, W. Huijbers, T. Hedden, D. G. McLaren, J. P. Chatwal, K. V. Papp, R. E. Amariglio, D. Blacker, D. M. Rentz, et al. (2017) Harvard aging brain study: dataset and accessibility. NeuroImage. Cited by: §3.1.2.
  • [11] A. V. Dalca, G. Balakrishnan, J. Guttag, and M. R. Sabuncu (2019) Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces. Medical Image Analysis. Cited by: §1, §2.1.1.
  • [12] A. V. Dalca, A. Bobu, N. S. Rost, and P. Golland (2016) Patch-based discrete registration of clinical brain images. International Workshop on Patch-based Techniques in Medical Imaging (60–67). Cited by: §1.
  • [13] A. V. Dalca, J. Guttag, and M. R. Sabuncu (2018) Anatomical priors in convolutional networks for unsupervised biomedical segmentation.

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , pp. 9290–9299.
    Cited by: §1.
  • [14] B. D. de Vos, F. F. Berendsen, M. A. Viergever, H. Sokooti, M. Staring, and I. Išgum (2019) A deep learning framework for unsupervised affine and deformable image registration. Medical image analysis 52, pp. 128–143. Cited by: §1.
  • [15] A. Di Martino, C. Yan, Q. Li, E. Denio, F. X. Castellanos, K. Alaerts, J. S. Anderson, M. Assaf, S. Y. Bookheimer, M. Dapretto, et al. (2014) The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Molecular psychiatry. Cited by: §3.1.2.
  • [16] L. R. Dice (1945) Measures of the amount of ecologic association between species. Ecology. Cited by: §3.1.5.
  • [17] B. Fischl (2012) FreeSurfer. Neuroimage 62(2), 774-781.. Cited by: §3.1.2.
  • [18] A. generative model for image segmentation based on label fusion (2010) Sabuncu, mert r and yeo, bt thomas and van leemput and others. IEEE TMI 29 (10), pp. 1714–1729. Cited by: §1.
  • [19] R. L. Gollub, J. M. Shoemaker, M. D. King, White, et al. (2013) The mcic collection: a shared repository of multi-modal, multi-site brain image data from a clinical investigation of schizophrenia. Neuroinformatics 11 (3), pp. 367–388. Cited by: §3.1.2.
  • [20] D. Hao, Y. Guang, L. Fangde, M. Yuanhan, and G. Yike (2017) Automatic brain tumor detection and segmentation using u-net based fully convolutional networks. annual conference on medical image understanding and analysis, pp. 506–517. Cited by: §2.5.
  • [21] A. J. Holmes, M. O. Hollinshead, T. M. O’Keefe, et al. (2015) Brain genomics superstruct project initial data release with structural, functional, and behavioral measures. Scientific data. Cited by: §3.1.2.
  • [22] Y. Hu, M. Modat, E. Gibson, W. Li, N. Ghavami, E. Bonmati, G. Wang, S. Bandula, C. M. Moore, M. Emberton, et al. (2018) Weakly-supervised convolutional neural networks for multimodal image registration. Medical image analysis. Cited by: §1, §2.2, §2.2.
  • [23] Z. Hussain, F. Gimenez, D. Yi, and D. Rubin (2018-04) Differential data augmentation techniques for medical imaging classification tasks. Annual Symposium proceedings. AMIA Symposium 2017, pp. 979–984. Cited by: §1.
  • [24] J. E. Iglesias and M. R. Sabuncu (2015) Multi-atlas segmentation of biomedical images: a survey. Medical image analysis 24 (1), pp. 205–219. Cited by: §1.
  • [25] T. Joyce, A. Chartsias, and S. A. Tsaftaris (2018) Deep multi-class segmentation without ground-truth labels. Cited by: §1.
  • [26] K. Kamnitsas, C. Ledig, V. F. Newcombe, et al. (2017) Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical image analysis (36), pp. 61–78. Cited by: §1.
  • [27] B. Kayalibay, G. Jensen, and P. van der Smagt (2017) CNN-based segmentation of medical imaging data. arXiv preprint arXiv:1701.03056. Cited by: §1, §2.5.
  • [28] A. Klein, J. Andersson, A. Ardekani, et al. (2009-02) Evaluation of 14 nonlinear deformation algorithms applied to human brain mri registration. NeuroImage 46, pp. 786–802. Cited by: §1.
  • [29] L. M. Koch, M. Rajchl, W. Bai, C. F. Baumgartner, T. Tong, J. Passerat-Palmbach, P. Aljabar, and D. Rueckert (2018-07) Multi-atlas segmentation using partially annotated data: methods and annotation strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (7), pp. 1683–1696. Cited by: §1.
  • [30] J. Krebs, H. e Delingette, B. Mailhé, N. Ayache, and T. Mansi (2019) Learning a probabilistic model for diffeomorphic registration. IEEE transactions on medical imaging. Cited by: §1.
  • [31] D. S. Marcus, T. H. Wang, J. Parker, J. G. Csernansky, J. C. Morris, and R. L. Buckner (2007) Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults. Journal of cognitive neuroscience 19 (9), pp. 1498–1507. Cited by: §3.1.2.
  • [32] K. Marek, D. Jennings, S. Lasch, et al. (2011) The parkinson progression marker initiative (ppmi). Progress in neurobiology. Cited by: §3.1.2.
  • [33] F. Milletari, N. Navab, and S. Ahmadi (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. Fourth International Conference on 3D Vision, pp. 565–571. Cited by: §1.
  • [34] S. G. Mueller, M. W. Weiner, Thal, et al. (2005) Ways toward an early diagnosis in alzheimer’s disease: the alzheimer’s disease neuroimaging initiative (adni). Alzheimer’s & Dementia 1 (1), pp. 55–66. Cited by: §3.1.2.
  • [35] S. Pereira, A. Pinto, V. Alves, and C. A. Silva (2016-05) Brain tumor segmentation using convolutional neural networks in mri images. IEEE Transactions on Medical Imaging 35 (5), pp. 1240–1251. Cited by: §1.
  • [36] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. MICAAI, pp. 234–241. Cited by: §1, §2.3.
  • [37] H. Wang, J. W. Suh, S. R. Das, J. B. Pluta, C. Craige, and P. A. Yushkevich (2013 Mar) Multi-atlas segmentation with joint label fusion.. IEEE Trans Pattern Anal Mach Intell 35 (3), (MEDLINE), pp. 611–623 (eng). External Links: Document, ISSN 1939-3539 (Electronic); 0162-8828 (Print); 0098-5589 (Linking) Cited by: §1.
  • [38] X. Yang, R. Kwitt, M. Styner, and M. Niethammer (2017) Quicksilver: fast predictive image registration–a deep learning approach. NeuroImage 158, pp. 378–396. Cited by: §1.
  • [39] A. Zhao, G. Balakrishnan, F. Durand, J. V. Guttag, and A. V. Dalca (2019) Data augmentation using learned transforms for one-shot medical image segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Cited by: §1, §2.3.