Convolutional neural networks (CNNs) have become a very popular method for medical image segmentation. In the field of brain MRI segmentation, CNNs have been applied to tissue segmentation [20, 13, 14] and various brain abnormality segmentation tasks [5, 8, 3].
A relatively new approach for segmentation with CNNs is the use of dilated convolutions, where the weights of convolutional layers are sparsely distributed over a larger receptive field without losing coverage on the input image [19, 18]. Dilated CNNs are therefore an effective approach to achieve a large receptive field with a limited number of trainable weights and a limited number of convolutional layers, without the use of subsampling layers.
Generative adversarial networks (GANs) provide a method to generate images that are difficult to distinguish from real images [4, 15, 17]. To this end, GANs use a discriminator network that is optimised to discriminate real from generated images, which motivates the generator network to generate images that look real. A similar adversarial training approach has been used for domain adaptation, using a discriminator network that is trained to distinguish images from different domains [2, 7] and for improving image segmentations, using a discriminator network that is trained to distinguish manual from generated segmentations . Recently, such a segmentation approach has also been applied in medical imaging for the segmentation of prostate cancer in MRI  and organs in chest X-rays .
In this paper we employ adversarial training to improve the performance of brain MRI segmentation in two sets of images using a fully convolutional and a dilated network architecture.
2 Materials and Methods
2.1.1 Adult subjects
35 T1-weighted MR brain images (15 training, 20 test) were acquired on a Siemens Vision 1.5T scanner at an age () of 32.9 19.2 years, as provided by the MICCAI 2012 challenge on multi-atlas labelling . The images were segmented in six classes: white matter (WM), cortical grey matter (cGM), basal ganglia and thalami (BGT), cerebellum (CB), brain stem (BS), and lateral ventricular cerebrospinal fluid (lvCSF).
2.1.2 Elderly subjects
20 axial T1-weighted MR brain images (5 training, 15 test) were acquired on a Philips Achieva 3T scanner at an age () of 70.5 4.0 years, as provided by the MRBrainS13 challenge . The images were segmented in seven classes: WM, cGM, BGT, CB, BS, lvCSF, and peripheral cerebrospinal fluid (pCSF). Possible white matter lesions were included in the WM class.
2.2 Network architecture
Two different network architectures are used to evaluate the hypothesis that adversarial training can aid in improving segmentation performance: a fully convolutional network and a network with dilated convolutions. The outputs of these networks are input for a discriminator network, which distinguishes between generated and manual segmentations. The fully convolutional nature of both networks allows arbitrarily sized inputs during testing. Details of both segmentation networks are listed in Figure 1, left.
2.2.1 Fully convolutional network
A network with 15 convolutional layers of 32 33 kernels is used (Figure 1, left), which results in a receptive field of 3131 voxels. During training, an input of 5151 voxels is used, corresponding to an output of 2121 voxels. The network has 140,039 trainable parameters for classes (6 plus background; adult subjects) and 140,296 trainable parameters for classes (7 plus background; elderly subjects).
2.2.2 Dilated network
The dilated network uses the same architecture as proposed by Yu et al. , which uses layers of 33 kernels with increasing dilation factors (Figure 1, left). This results in a receptive field of 6767 voxels using only 7 layers of 33 convolutions, without any subsampling layers. During training, an input of 8787 voxels is used, which corresponds to an output of 2121 voxels. In each layer 32 kernels are trained. The network has 56,039 trainable parameters for classes (6 plus background; adult subjects) and 56,072 trainable parameters for classes (7 plus background; elderly subjects).
2.2.3 Discriminator network
The input to the discriminator network are the segmentation, as one-hot encoding or softmax output, and image data in the form of a 2525 patch. In this way, the network can distinguish real from generated combinations of image and segmentation patches. The image patch and the segmentation are concatenated after two layers of 33 kernels on the image patch. The discriminator network further consists of three layers of 32 33 kernels, a 3
3 max-pooling layer, two layers of 32 33 kernels, and a fully connected layer of 256 nodes. The output layer with two nodes, distinguishes between manual and generated segmentations.
31 for the fully convolutional network. No subsampling layers are used in both networks. Right: Overview of the adversarial training procedure. The red connections indicate how the discriminator loss influences the segmentation network during backpropagation.
2.3 Adversarial training
An overview of the adversarial training procedure is shown in Figure 1, right.
Three types of updates for the segmentation network parameters and the discriminator network parameters are possible during the training procedure: (1) an update of only the segmentation network based on the cross-entropy loss over the segmentation map, , (2) an update of the discriminator network based on the discrimination loss using a manual segmentation as input, , and (3) an update of the whole network (segmentation and discriminator network) based on the discriminator loss using an image as input, . Only and affect the segmentation network. The parameters are updated to maximise the discriminator loss , i.e. the updates for the segmentation network are performed in the direction to ascend the loss instead of to descend the loss.
The three types of updates are performed in an alternating fashion. The updates based on the segmentation loss and the updates based on the discriminator loss are performed with separate optimisers using separate learning rates. Using a smaller learning rate, the discriminator network adapts more slowly than the segmentation network, such that the discriminator loss does not converge too quickly and can have enough influence on the segmentation network.
For each network, rectified linear units are used throughout, batch normalisation is used on all layers and dropout  is used for the 11 convolution layers.
3 Experiments and Results
As a baseline, the segmentation networks are trained without the adversarial network. The updates are performed with RMSprop using a learning rate of
and minibatches of 300 samples. The networks are trained in 5 epochs, where each epoch corresponds to 50,000 training patches per class per image. Note that during this training sample balancing process, the class label corresponds to the label of the central voxel, even though a larger image patch is labelled.
The discriminator and segmentation network are trained using the alternating update scheme. The updates for both loss functions are performed with RMSprop using a learning rate of for the segmentation loss and a learning rate of for the discriminator loss. The updates alternate between the , and loss functions, using minibatches of samples for each.
Figure 2 provides a visual comparison between the segmentations obtained with and without adversarial training, showing that the adversarial approach generally resulted in less noisy segmentations. The same can be seen from the total number of 3D components (including the background class) that compose the segmentations. For the adult subjects, the number of components per image () decreased from to using the fully convolutional network and from to using the dilated network. For the elderly subjects, the number of components per image () decreased from to using the fully convolutional network and from to using the dilated network.
The evaluation results in terms of Dice coefficients (DC) between the automatic and manual segmentations are shown in Figure 3 as boxplots. Significantly improved DC, based on paired -tests, were obtained for each of the tissue classes, in both image sets, and for both networks. The only exception was lvCSF in the elderly subjects using the dilated network. For the adult subjects, the DC averaged over all 6 classes () increased from to using the fully convolutional network and from to using the dilated network. For the elderly subjects, the DC averaged over all 7 classes () increased from to using the fully convolutional network and from to using the dilated network.
4 Discussion and Conclusions
We have presented an approach to improve brain MRI segmentation by adversarial training. The results showed improved segmentation performance both qualitatively (Figure 2) and quantitatively in terms of DC (Figure 3). The improvements were especially clear for the deeper, more difficult to train, fully convolutional networks as compared with the more shallow dilated networks. Furthermore, the approach improved structural consistency, e.g. visible from the reduced number of components in the segmentations. Because these improvements were usually small in size, their effect on the DC was limited.
The approach includes an additional loss function that distinguishes between real and generated segmentations and can therefore capture inconsistencies that a normal per-voxel loss averaged over the output does not capture. The proposed approach can be applied to any network architecture that, during training, uses an output in the form of an image patch, image slice, or full image instead of a single pixel/voxel.
Various changes to the segmentation network that might improve the results could be evaluated in future work, such as different receptive fields, multiple inputs, skip-connections, 3D inputs, etc. Using a larger output patch size or even the whole image as output could possibly increase the effect of the adversarial training by including more information that could help in distinguishing manual from generated segmentations. This could, however, also reduce the influence of local information, resulting in a too global decision. Further investigation is necessary to evaluate which of the choices in the network architecture and training procedure have most effect on the results.
Acknowledgements The authors would like to thank the organisers of MRBrainS13 and the multi-atlas labelling challenge for providing the data. The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of a Titan X Pascal GPU.
-  Dai, W., Doyle, J., Liang, X., Zhang, H., Dong, N., Li, Y., Xing, E.P.: SCAN: Structure correcting adversarial network for chest X-rays organ segmentation. arXiv preprint arXiv:1703.08770 (2017)
-  Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J Mach Learn Res 17(59), 1–35 (2016)
-  Ghafoorian, M., Karssemeijer, N., Heskes, T., Bergkamp, M., Wissink, J., Obels, J., Keizer, K., de Leeuw, F.E., van Ginneken, B., Marchiori, E., Platel, B.: Deep multi-scale location-aware 3D convolutional neural networks for automated detection of lacunes of presumed vascular origin. NeuroImage: Clin 14, 391––399 (2017)
-  Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS. pp. 2672–2680 (2014)
-  Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin, P.M., Larochelle, H.: Brain tumor segmentation with deep neural networks. Med Image Anal 35, 18–31 (2017)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
-  Kamnitsas, K., Baumgartner, C., Ledig, C., Newcombe, V.F., Simpson, J.P., Kane, A.D., Menon, D.K., Nori, A., Criminisi, A., Rueckert, D., Glocker, B.: Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In: IPMI (2017)
-  Kamnitsas, K., Ledig, C., Newcombe, V.F., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D., Glocker, B.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med Image Anal 36, 61–78 (2017)
-  Kohl, S., Bonekamp, D., Schlemmer, H.P., Yaqubi, K., Hohenfellner, M., Hadaschik, B., Radtke, J.P., Maier-Hein, K.: Adversarial networks for the detection of aggressive prostate cancer. arXiv preprint arXiv:1702.08014 (2017)
-  Landman, B.A., Ribbens, A., Lucas, B., Davatzikos, C., Avants, B., Ledig, C., Ma, D., Rueckert, D., Vandermeulen, D., Maes, F., Erus, G., Wang, J., Holmes, H., Wang, H., Doshi, J., Kornegay, J., Manjon, J., Hammers, A., Akhondi-Asl, A., Asman, A.J., Warfield, S.K.: MICCAI 2012 Workshop on Multi-Atlas Labeling. CreateSpace Independent Publishing Platform (2012)
-  Luc, P., Couprie, C., Chintala, S., Verbeek, J.: Semantic segmentation using adversarial networks. In: NIPS workshop on adversarial training (2016)
-  Mendrik, A.M., Vincken, K.L., Kuijf, H.J., Breeuwer, M., Bouvy, W.H., de Bresser, J., Alansary, A., de Bruijne, M., Carass, A., El-Baz, A., Jog, A., Katyal, R., Khan, A.R., van der Lijn, F., Mahmood, Q., Mukherjee, R., van Opbroek, A., Paneri, S., Pereira, S., et al.: MRBrainS challenge: Online evaluation framework for brain image segmentation in 3T MRI scans. Comput Intel Neurosc p. 813696 (2015)
-  Moeskops, P., Viergever, M.A., Mendrik, A.M., de Vries, L.S., Benders, M.J., Išgum, I.: Automatic segmentation of MR brain images with a convolutional neural network. IEEE T Med Imaging 35(5), 1252–1261 (2016)
-  Moeskops, P., Wolterink, J.M., van der Velden, B.H., Gilhuijs, K.G., Leiner, T., Viergever, M.A., Išgum, I.: Deep learning for multi-task medical image segmentation in multiple modalities. In: MICCAI. pp. 478–486. Springer (2016)
-  Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016)
-  Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1), 1929–1958 (2014)
-  Wolterink, J.M., Leiner, T., Viergever, M.A., Išgum, I.: Generative adversarial networks for noise reduction in low-dose CT. IEEE T Med Imaging (2017)
-  Wolterink, J.M., Leiner, T., Viergever, M.A., Išgum, I.: Dilated convolutional neural networks for cardiovascular MR segmentation in congenital heart disease. In: MICCAI HVSMR workshop. pp. 95–102. Springer (2016)
-  Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)
-  Zhang, W., Li, R., Deng, H., Wang, L., Lin, W., Ji, S., Shen, D.: Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. NeuroImage 108, 214–224 (2015)