3D Global Convolutional Adversarial Network for Prostate MR Volume Segmentation

07/18/2018 ∙ by Haozhe Jia, et al. ∙ 0

Advanced deep learning methods have been developed to conduct prostate MR volume segmentation in either a 2D or 3D fully convolutional manner. However, 2D methods tend to have limited segmentation performance, since large amounts of spatial information of prostate volumes are discarded during the slice-by-slice segmentation process; and 3D methods also have room for improvement, since they use isotropic kernels to perform 3D convolutions whereas most prostate MR volumes have anisotropic spatial resolution. Besides, the fully convolutional structural methods achieve good performance for localization issues but neglect the per-voxel classification for segmentation tasks. In this paper, we propose a 3D Global Convolutional Adversarial Network (3D GCA-Net) to address efficient prostate MR volume segmentation. We first design a 3D ResNet encoder to extract 3D features from prostate scans, and then develop the decoder, which is composed of a multi-scale 3D global convolutional block and a 3D boundary refinement block, to address the classification and localization issues simultaneously for volumetric segmentation. Additionally, we combine the encoder-decoder segmentation network with an adversarial network in the training phrase to enforce the contiguity of long-range spatial predictions. Throughout the proposed model, we use anisotropic convolutional processing for better feature learning on prostate MR scans. We evaluated our 3D GCA-Net model on two public prostate MR datasets and achieved state-of-the-art performances.



There are no comments yet.


page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Efficient detection and segmentation of the gland capsule in prostate MR images are critical for diagnosis, management, and prognosis. The traditional approaches on automatic prostate segmentation in MR images are mainly based on anatomical atlas registration [5], deformable models [14], and optimization algorithms [12]. However, the task still faces great challenges since the prostate MR images tend to come with large variability in the size / shape of the gland, heterogeneity in signal intensity around endorectal coils, and low contrast between the gland and adjacent structures.

Figure 1: Overview of the proposed approach

The availability of a large amount of annotated medical image data and pre-trained deep models has made it feasible to use deep learning for medical image segmentation and classification. Recently some deep convolutional neural network (CNN) based methods have achieved convincing segmentation performance in prostate MR images. A 2D approach

[3] was built based on combining the Fully Convolutional Network (FCN) [8] and Residual Network (ResNet) [4] for efficient prostate segmentation. Similar to the popular U-Net [13], a 3D segmentation model [10]

was designed based on a volumetric, fully-convolutional neural network. Despite using volumetric convolutions and residual connections to maintain the spatial and contextual information,

[10] ignores that the prostate MR volumes tend to come with anisotropic voxel resolution and especially have low between-slice resolution. In addition, since the biomedical volumetric segmentation task can be regarded as a dense per-voxel classification problem, both classification and localization issues are crucial for accurate segmentation. However, existing methods mainly focus on the localization issues but neglect the per-voxel classification problem, which tend to limit the segmentation performance.
In this paper, we propose a new deep 3D Global Convolutional Adversarial Network (GCA-Net) that combines a 3D global convolutional network with an adversarial network for efficient prostate segmentation. Fig. 1 outlines the main components of our 3D GCA-Net. To address volumetric prediction of the prostate MR images, we first design a 3D fully convolutional encoder-decoder model for the segmentation task. We construct a 3D ResNet model [4] as the encoder. In the decoder part, we design multi-level hybrid global convolution blocks and boundary refinement blocks to tackle classification and localization issues simultaneously for volumetric segmentation. In addition, to adaptive to the imaging resolution in prostate MR volumes, the traditional 2D convolution kernels are replaced by 3D anisotropic convolutions [7] throughout the segmentation network. The multi-scale concatenation structure is also added to further reduce the loss of volumetric context information between different layers. Moreover, for CNN-based architectures, the spatial continuous prediction label maps are almost unavailable since the label variables are predicted independently by CNN from each other. To overcome this, in the training stage, we introduce adversarial training [9] to the segmentation network as a regularization term. Compared to the traditional approaches [2] which use conditional random fields (CRFs) to reinforce contiguity in the prediction label maps with pair-wise terms, the proposed 3D adversarial binary classification network can enforce the CNN to form higher-order consistent prediction maps, but also add no extra complexity to the segmentation model in the inference stage.
We quantitatively and qualitatively evaluate our approach against several state-of-the-art approaches [10, 3, 13] on both the MICCAI Grand Challenge-Prostate MR Image Segmentation Challenge 2012 (PROMISE 12) dataset [6] and the NCI-ISBI 2013 Challenge-Automated Segmentation of Prostate Structures (ASPS 13) dataset [1]. The experimental results show our 3D GCA-Net achieves superior segmentation performance compared to the state-of-the-art.

Figure 2: Architecture of 3D encoder-decoder segmentation network. A and B are the 3D global convolutional block and 3D boundary refinement block of encoder, respectively. The sizes of input volume and the convolutional or pooling kernel of each layer are also shown.

2 Method

2.1 3D Encoder-decoder Segmentation Network

To achieve accurate and efficient segmentation of prostate MR volumes, we construct a 3D fully convolutional encoder-decoder deep network, in which the 3D ResNet encoder extracts the abundant context features of input volumes, and the 3D global convolutional and boundary refinement decoder further exploits the multi-scale features generated by the encoder to give the per-voxel prediction of the original input volumes.

3D ResNet Encoder. We adopt the widely used ResNet-50 model [4]

as the base structure for our encoder, due to its superior ability to feature extraction. ResNet-50, however, is originally designed for 2D image processing. To extend it to 3D image segmentation, we replace its 2D convolutional layers with 3D ones. Specifically, in the ResNet-50 model, the first convolution layer with the kernel size of

, is supposed to receive RGB 3 channel 2D input images. After extending ResNet-50 to our 3D model, the first convolutional layer has 3D kernels with the size of , and therefore is able to receive 3D MR volumes. For all the other convolutional layers, we directly expand 2D convolutions to 3D convolutions with kernel size of 1 in z dimension. As a result, we extend ResNet-50 to a 3D ResNet encoder but retain compatibility with the original parameter setting. Hence it is still feasible for the 3D encoder to utilize the pre-trained parameters and transfer the knowledge about image representation learned on large scale natural images to characterize prostate MR images.
3D Global Convolutional and Boundary Refinement Decoder. Inspired by the Global Convolutional Network (GCN) [11] and aiming to achieve a sufficient exploitation of the 3D volumes features, we build the decoder with multi-scale 3D global convolution blocks and 3D boundary refinement blocks. Compared to conventional 3D convolutions with fixed and small kernel size, the 3D global convolution blocks use larger and multi-scale kernel convolutions to model the dense-connected structure of classification models, which can enhance the voxel classification capability of the network besides its original promising localization performance. Moreover, the 3D boundary refinement blocks use anisotropic and residual concatenation convolutions to further exploit both within-slice and between-slice features, which can achieve promising boundary localization of prostate MR volumes.
In the 3D global convolutional block, we achieve large kernel size 3D convolution by decomposing it into a combination of three 1D convolutions on x-y-z dimensions, respectively, which performs dense connections on a large 3D block in the feature map but with limited parameter numbers. As a result, we tackle the classification issues for the 3D volumetric semantic segmentation but also with limited computation cost. In addition, in consideration of the special anisotropic resolutions of prostate MR volumes, in our boundary refinement block, the input features are passed into 2 anisotropic convolutional layers [7], where the convolution further exploits the 2D features in x-y planes, and the convolution can focus on the features between different slices. Additionally, in both the global convolutional block and the boundary refinement block, the residual concatenations are further added for a minimal loss of the feature information. The structures of the global convolutional block and the boundary refinement block are shown in Fig. 2A and 2B, respectively.
In each stage of the decoder, the volumetric features extracted from the corresponding stage of the encoder will be passed to a 3D global convolutional block and a 3D boundary refinement block. Then, the output feature maps will be tri-linear upsampled and added with the higher scale ones. At the end of the decoder, we add a final convolution to generate the volumetric prediction as the segmentation of the input volume. The detailed components of the proposed segmentation network are shown in Fig. 2.

2.2 Adversarial Training

To further regularize the segmentation network to generate accurate and consistent volumetric prediction, we apply an adversarial training [9]

for our proposed segmentation network. In this approach, the adversarial learning is trained to detect the higher-order inconsistencies between ground truth and the segmentation result and guide the segmentation network to correct it, which is implemented with a hybrid loss function with the combination of a binary foreground-background weighted 3D cross-entropy segmentation loss term and an adversarial regularization term

[9]. We build a deep classification CNN with 6 convolutional layers but no fully connected layers as the discriminator model . The objective functions of generator loss and discriminator loss are expressed as:


where the generator , i.e., the proposed segmentation network, is trained to generate segmentation result that is similar to the ground truth of the prostate MR volume . In the meantime, is concatenated with and , separately, then passed into the for a discrimination. It is noted that the adversarial training is only applied in the training stage and only the segmentation network is used to segment the volumes in the inference stage. Based on this, we can train the 3D segmentation model in a robust and effective manner but with no extra time consumption and computation complexity.

3 Experiments and Results

Datasets: Two public prostate MR datasets were utilized to evaluate the proposed 3D GCA-Net. The first MICCAI PROMISE 12 challenge dataset [6] contains 50 training transverse T2-weighted MR scans with corresponding annotated ground truth and 30 testing scans for online independent evaluation. The NCI-ISBI ASPS 13 challenge dataset [1] consists of 60 MR scans, half of which are acquired with 1.5T machine and the other half with 3T machine. We trained the proposed method on the 50 training scans of PROMISE 12 dataset and submitted the segmentation results of the 30 testing data to the ongoing challenge. For the ASPS 13 challenge dataset, we randomly split all 60 training scans into 4 independent groups to conduct a 4-fold cross validation. Additionally, for an intuitive and quantitative evaluation, we also implemented a 3D version of U-Net [13] and V-Net [10]

with identical experiment setting for comparison. Three evaluation metrics were applied, dice similarity coefficient (DSC)

[6], 95% Hausdorff distance (95%HD) [6] and average boundary distance (ABD) [6].

Method Type DSC ABD(mm) 95%HD(mm) Score
Whole Base Apex Whole Base Apex Whole Base Apex
CAMP-TUM2 [10] 3D 0.869 0.843 0.844 2.233 2.458 2.030 5.708 5.835 4.618 82.39
UdeM 2D [3] 2D 0.874 0.849 0.842 2.171 2.386 2.070 6.124 6.444 4.705 83.02
MBIOS 2D 0.881 0.850 0.847 2.827 2.204 2.596 10.543 5.969 6.494 83.70
BDSLab 3D 0.883 0.876 0.798 1.864 1.997 2.574 5.341 5.316 6.312 85.16
3D GCA-Net (ours) 3D 0.889 0.877 0.861 1.901 1.969 1.901 4.990 4.703 4.300 85.20
CREATIS 2D/3D 0.893 0.866 0.868 1.926 2.135 1.742 5.594 5.620 4.222 85.74
CUMED [16] 3D 0.894 0.864 0.860 1.950 2.127 1.744 5.537 5.407 4.292 86.64
Table 1: Quantitative comparison with several variations of convolutional encoder-decoder networks on PROMISE 12 dataset. The evaluation results on all 3 metrics were obtained from the organizers. For DSC, higher values are better, for ABD and 95% HD, lower values are better.

Implementation Details:

We implemented the proposed method based on the Pytorch framework on a Linux system with an Intel 3.6GHz

8 CPU, 32G memory and a 11G Nvidia GeForce 1080 Ti GPU. In pre-processing, for both training and testing scans, we performed N4 bias field correction [15], unified the voxel resolution to a fixed size of

and normalized the intensity into zero mean and unit variance. During the training of the 3D GCA Net, due to the limited number of the training scans, we applied a multiple online data augmentation including random flipping (both up-down or left-right in x-y planes), random rotation (one of

25, 90, 180 and 270 degree in x-y planes), random Gaussian noise ( from 0.3 to 0.7). The input of the model was a 3D volumes with the size and batch size 2, which was also randomly extracted in an online manner. The 3D segmentation network and the CNN discriminator model were trained together, each using the Adam optimizer with the initial learning rate of , betas of (0.9, 0.999), weight decay of , and

. The weights of the 3D ResNet encoder were initialized with those of the pre-trained ResNet-50 model, other parameters in the model were randomly initialized. In the inference phase, we extracted the sub-volumes with a fixed stride of

in each testing scan and averaged the corresponding output of these sub-volumes predicted by the segmentation network to get the final segmentation result.
Performance of Prostate Segmentation: Fig. 3 represents the qualitative segmentation results of the proposed 3D GCA-Net on PROMISE 12 dataset. We can observe that the segmented boundary is very close to the real one on both base part (Case04-Slice16) and central part of the prostate gland. Table 1 shows the quantitative segmentation results achieved by six methods. From Table 1, we can find that our 3D GCA-Net is superior to CAMP-TUM2, UdeM 2D, MBIOS and BDSLab, and competitive with CREATIS and CUMED [16]

. Since a smaller 95%HD value means less outliers in segmentation results, we can see that the introduction of adversarial learning can generate more consistent and smooth output label maps than other methods. We also find out that the methods implemented with 3D convolutions outperform those using 2D convolutions. This suggests that the volumetric convolutions should be applied to 3D prostate MR image segmentation. Furthermore, we compare our 3D GCA-Net and 3D encoder-decoder with two other widely used 3D fully convolutional segmentation networks by conducting 4-fold cross validation on the ASPS 13 dataset. The results shown in Table 2 demonstrate that our 3D encoder-decoder network equipped with global convolutional and anisotropic boundary refinement blocks achieves more accurate segmentation than V-Net and 3D U-Net, which use isotropic and small convolutional kernels. Besides, from Table 2 we can observe that our proposed 3D segmentation encoder-decoder has much more convolutional layers but less parameters compared to V-Net and 3D U-Net, which demonstrates the efficiency of our proposed method. Lastly, we further validate the effect of adversarial learning. Regarding the last two rows of Table 2, especially the ABD values, we can find that the adversarial network can further enforce the 3D encoder-decoder segmentation model to generate reasonable and consistent output label maps.

Method Conv Layers Parameters DSC ABD(mm)
V-Net [10] 31 65,191,134 0.841 2.531
3D U-Net [13] 23 33,854,722 0.862 2.487
ResNet-50 [4] 53 23,507,904 / /
3D Encoder-decoder (ours) 141 29,601,094 0.878 2.402
3D GCA-Net (ours) 148 33,540,327 0.880 2.152
Table 2: Quantitative comparison of different methods on ASPS 13 dataset and the corresponding numbers of convolutional layers and parameters. ResNet-50 is shown here as a reference since our 3D ResNet encoder has the same numbers of convolutional layers and parameters with it.
Figure 3: Qualitative results of our proposed 3D GCA-Net on PROMISE 12 dataset, which were obtained from the organizers. The ground truth shown in yellow and segmentation result displayed in red.

4 Conclusions

In this paper, we propose a novel deep learning architecture called 3D GCA-Net for prostate MR volume segmentation. A 3D encoder-decoder segmentation network is first designed for the segmentation task, including a ResNet encoder for 3D prostate volume feature extraction and a multi-scale 3D global convolutional and boundary refinement decoder to successfully and simultaneously address both the classification and localization issues. Additionally, in the training phrase, an auxiliary adversarial network is introduced to the 3D segmentation network to further correct the segmentation results. The evaluation on two public MR prostate datasets demonstrates that our proposed approach improves the performance of the state-of-the-art.