Boundary-weighted Domain Adaptive Neural Network for Prostate MR Image Segmentation

02/21/2019 ∙ by Qikui Zhu, et al. ∙ Rensselaer Polytechnic Institute NetEase, Inc 8

Accurate segmentation of the prostate from magnetic resonance (MR) images provides useful information for prostate cancer diagnosis and treatment. However, automated prostate segmentation from 3D MR images still faces several challenges. For instance, a lack of clear edge between the prostate and other anatomical structures makes it challenging to accurately extract the boundaries. The complex background texture and large variation in size, shape and intensity distribution of the prostate itself make segmentation even further complicated. With deep learning, especially convolutional neural networks (CNNs), emerging as commonly used methods for medical image segmentation, the difficulty in obtaining large number of annotated medical images for training CNNs has become much more pronounced that ever before. Since large-scale dataset is one of the critical components for the success of deep learning, lack of sufficient training data makes it difficult to fully train complex CNNs. To tackle the above challenges, in this paper, we propose a boundary-weighted domain adaptive neural network (BOWDA-Net). To make the network more sensitive to the boundaries during segmentation, a boundary-weighted segmentation loss (BWL) is proposed. Furthermore, an advanced boundary-weighted transfer leaning approach is introduced to address the problem of small medical imaging datasets. We evaluate our proposed model on the publicly available MICCAI 2012 Prostate MR Image Segmentation (PROMISE12) challenge dataset. Our experimental results demonstrate that the proposed model is more sensitive to boundary information and outperformed other state-of-the-art methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 4

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Accurately segmenting prostate magnetic resonance (MR) images plays an important role in prostate diseases diagnosis and treatment, particularly for prostate cancer, which is one of the most common types of cancer in men [1]. In clinical practice, medical images can usually be manually segmented by radiologists, which is an expensive and time-consuming process and also prone to inter- and intra-observer variations. Automated segmentation of prostate MR image is highly desirable in clinical practice. Over the past decade, a number of research groups have proposed various automated prostate segmentation methods. For instance, Shen et al. [2] presented a statistical shape model for automatic prostate segmentation in ultrasound images by modeling the shape of the prostate. Guo et al. [3]

proposed a deformable prostate segmentation method, which employed deep feature learning model to extract prostate representation and utilized the sparse patch matching method to infer prostate likelihood map. Tian et al.

[4] proposed a superpixel-based 3D graph cut algorithm by combining a 3D graph cuts and a 3D active contour model for segmenting the prostate MR images. Although those methods achieved promising performance on prostate segmentation, the complexity of prostate MR images makes it a very challenging problem.

Recently, deep convolutional neural networks (CNNs) have achieved state-of-the-art performance in many fields [5, 6, 7, 8, 9, 10, 11]

, particularly in computer vision and image understanding

[12, 13, 14]. Many researchers have also employed CNNs in prostate segmentation[15, 16, 17]. For instance, Milletari et al. [18] proposed a volumetric CNN, which can segment prostate volumes in a fast and end-to-end manner. Yang et al. [19]

proposed a novel network, which seamlessly integrates feature extraction, shape prior exploring and boundary estimation together for prostate segmentation. Although great progress has been achieved, there remain challenges that have not been fully addressed, which results in a gap between the clinical needs and the performance of automatic segmentation.

One of the major difficulties in prostate MR image segmentation is that part of the prostate lacks of clear boundary with surrounding tissues, which can be further complicated by complex background texture and large variation in size, shape and intensity distribution of the prostate itself. Another major challenge is caused by the lack of enough training data, which makes it difficult to get complex networks fully trained as large dataset is a key pillar of the success of CNNs. Thus, the capability of CNNs can be limited for such segmentation tasks. Facing the above challenges, a number of methods have been proposed from different aspects. For instance, Yu et al. [20]

designed an efficient volumetric CNN by employing mixed long and short residual connections for improving the training efficiency and discriminating capability under limited training data. Nie et al. 

[21]

proposed a region-attention based semi-supervised learning strategy to overcome the challenge that lack of enough training data by employing unlabeled data. To reduce the influence from noise and suppress the tissues around the prostate with similar intensity, Wang et al. 

[22] developed a novel deep neural network which utilized the attention mechanism to selectively leverage the multi-level features for prostate segmentation. Although these methods improved the representation capability of network and training efficiency under limited data, obtaining accurate segmentation at slices in the apex and base areas lacking boundary information is still a challenging problem. In addition, efficiently utilizing additional data for training to improve the performance in those difficult locations is yet to be explored.

In this paper, to tackle the above mentioned challenges, a boundary-weighted domain adaptive neural network (BOWDA-Net) is proposed for prostate MR image segmentation. To achieve accurate segmentation results even at the places with very weak boundaries, a boundary-weighted segmentation loss (BWL) function is designed to make the trained networks being sensitive to boundaries and regularizing the boundary of segmentation results. In the same time, inspired by the advances of generative adversarial learning [23]

and transfer learning

[24, 25, 26, 27], we also employed transfer learning in our model to exploit useful information from other datasets to overcome the challenge brought by the lack of enough training data. However, different from the typical transfer leaning, a boundary-weighted knowledge transfer learning is designed to make the process of transferring focused on the boundary information required by the target data for solving the problem of lacking boundary information. Extensive experiments on the open MICCAI 2012 Prostate MR Image Segmentation (PROMISE12) challenge111https://promise12.grand-challenge.org/ dataset corroborated the effectiveness of the proposed BOWDA-Net. Our method outperformed other state-of-the-art methods and ranked the first in the challenge.

The remainder of the paper is organized as follows. Section II provides a brief review of the related works. Section III describes the data and Section IV presents the proposed BOWDA-Net in detail. In Section V, various experiments on prostate MR image segmentation are performed to validate the proposed methods. Finally, several concluding remarks are drawn in Section VI.

Ii Related Works

In this section, we briefly review the related works on medical image segmentation and deep domain adaptation.

Ii-a Medical Image Segmentation

Automated medical image segmentation provides very useful information for computer aided diagnosis and treating diseases [28, 27, 29, 30, 31]

. Since deep convolutional neural networks (CNNs) own high capability in feature representation and have achieved state-of-the-art performances in many fields, recently CNNs have also become major machine learning approaches in medical image segmentation. Researchers have employed various CNN models to segment different medical images

[32, 33, 34, 20]. For instance, Ronneberger et al. [35] proposed an efficient network named U-net, which can reduce information loss and accelerate the speed of convergence, for biomedical image segmentation. To accurately segment the prostate, Zhu et al. [33] proposed a novel network with bidirectional convolutional recurrent layers, which can extract both intra-slice and inter-slice information of prostate, for MRI prostate image segmentation.

A few studies employed 3D CNNs to extract volumetric features for segmentation. For example, Li et al. [36] proposed a mixed network for liver and tumor segmentation, which consists of a 2D Dense-U-Net and a 3D volumetric network for efficiently extracting intra-slice features and hierarchically aggregating volumetric contexts under the spirit of the auto-context algorithm. Yu et al. [37] employed the densely-connected mechanism and proposed a densely-connected volumetric convolutional neural network for automatically segmenting the cardiac and vascular structures from 3D cardiac MR images. Inspired by the deep residual learning network, Chen et al. [38] proposed a 3D volumetric network for brain segmentation. In addition, this model also seamlessly integrates the low-level image appearance features, implicit shape information and high-level context together for further improving the volumetric segmentation performance.

Generally, 3D CNNs have superiority in medical image analysis[39, 40] against 2D CNNs, due to the 3D nature of many medical images. However, 3D CNNs contain a much larger number of parameters, which makes them more difficult for optimization. Due to the usually limited size of medical image data, it is difficult to train the networks, which can easily suffer from overfitting. Therefore, there is still much to do in pushing the potential of CNNs under limited training data to improve the image segmentation performance.

Ii-B Deep Domain Adaptation

Fig. 1: Visualization of the source and target domain images using t-SNE showing the problem of domain shift.

Deep domain adaptation aims to minimize domain shift between the source and target domains by using deep learning methods. An illustration of such domain shift is shown in Fig. 1. Existing deep domain adaptation methods can be divided into three categories: unsupervised, supervised, and semi-supervised adaptations.

Unsupervised adaptation refers to the scenario where labels of target domain data are not available. For example, Zhang et al. [26] presented a fully convolutional adaptation networks (FCAN) architecture for semantic segmentation. The proposed model addresses domain adaptation from both visual appearance and representation. Hoffman et al. [41] presented an unsupervised domain adaptation framework with fully convolutional networks for semantic segmentation. Their model uses a novel semantic segmentation network with fully convolutional domain adversarial learning for global domain alignment. Sun et al. [42]

constructed a novel loss function – the CORAL loss by minimizing the difference between the source and target domains to achieve unsupervised domain adaptation.

In contrast to unsupervised adaptation, when labels of target domain data are available, the problem is referred as supervised domain adaptation. Tzeng et al. [43] introduced an adaptation layer and an additional domain confusion loss to learn the representation that is both semantically meaningful and domain invariant for domain adaptation. To improve the effectiveness of domain adaptation, Tzeng et al. [44] also proposed a network architecture, which can effectively adapt to a new domain through simultaneously transferring the learned source semantic structure to the target domain.

The semi-supervised adaptation refers to the situation where part of target domain data possesses labels and the rest dose not. Ghafoorian et al. [24] conducted extensive experiments in white matter hyper-intensity segmentation task, who trained a CNN and evaluated the performance of a domain adapted network on the same task with images from a different domain. They then compared the performance of the model in surrogate scenarios, where either the same trained network is used or a new network is trained from scratch on the new data. Long et al. [45]

proposed a deep adaptation network architecture, which uses a reproducing kernel in Hilbert space to embed the hidden representations of all task-specific layers.

Fig. 2: Overview of proposed boundary-weighted domain adaptive neural network.

Iii Materials

In our work, MICCAI 2012 Prostate MR Image Segmentation (PROMISE12) challenge dataset is used as the target domain dataset, a benchmark for evaluating algorithms of segmenting the prostate from MR images. In addition, it is publicly available, performance comparison can be easily performed with other state-of-the-art methods. In that dataset, there are in total 50 transversal T2-weighted MR images of the prostate and the corresponding ground truth segmentation acquired in different hospitals, which were checked and corrected by a radiological resident. These images are a representative set of the types of prostate MR images from multiple vendors and have different acquisition protocols and variations in voxel size, dynamic range, position, field of view and anatomic appearance.

In our experiments, a separate dataset – 81 prostate MR volumes acquired by a Philips 3T MRI scanner with endorectal coil - is used as the source domain dataset. In this dataset, each volume consists of 26 slices and each slice has 512512 pixels. The in-plane resolution is 0.27mm0.27mm and the inter-plane distance is 3mm.

To visualize the distribution of the datasets from these two domains, we randomly selected 280 slices from each domain, and then used a pre-trained VGG-16 network [12]

to map each slice to a feature vector with length of 4096. Then t-SNE 

[46] is employed to visualize the distribution of the datasets from the two domains as in Fig. 1. It can be seen that domain shift exits between the source and target domain data. Our hypothesis in this work is that advanced transfer learning can help achieve more accurate image segmentation results than simply extending the dataset by dealing with the domain shift problem. Details of the proposed method are presented in the following section.

Iv Boundary-weighted Domain Adaptation

In this section, we first give an overview of the proposed boundary-weighted domain adaptive neural network (BOWDA-Net) and then present the modules of the proposed model in detail. Fig. 2 illustrates the overview of our proposed BOWDA-Net, which consists of three main components, source domain image segmentation network (SNet-s), target domain image segmentation network (SNet-t) and domain feature discriminator (). The domain image segmentation network (SNet) and components are designed in an adversarial manner, which is derived from the idea of adversarial learning [23], for overcoming domain shift problem and exploiting the information carried by datasets from source domain to solve the problems of insufficient training data and weak boundaries. Furthermore, to address the problem of poor segmentation performance caused by weak boundaries, we propose a boundary-weighted segmentation loss (BWL) function for SNet-t to regularize the segmentation results by focusing more on the boundaries. Details of the proposed methods are presented as follows.

Iv-a Boundary-weighted Knowledge Transfer

Transferring information from related data has been shown to be useful in dealing with the problem of lacking sufficient training data [47, 48, 27]. However, domain shift caused by the data distribution difference between the datasets is a common problem impacting the efficiency and performance of transfer learning. Recently, adversarial adaptation methods have been proposed to deal with the problem, which seek to minimize the between domain distance through minimizing an adversarial loss with respect to a domain discriminator [49, 25]. During training, the representation extractor learns feature representations from source and target domain respectively, the domain discriminator tries to distinguish the features from the source and target domain. When the domain discriminator cannot distinguish the data of source domain from that of target domain, the process of domain adaptation completed and the domain shift problem be addressed. Although existing methods are effective in solving the problem of domain shift and enhancing the performance of transferring learning, the process of transferring is not focused on the information required by the target domain data, which results in the existing method cannot deal effectively with weak boundary.

To tackle the above mentioned challenge, in this paper, we propose a supervised boundary-weighted adversarial domain adaptation strategy. In our proposed method, to extract the feature information in source domain, we first train SNet-s under source domain data in a supervised manner, and then freeze the weights. During training, the SNet-s, SNet-t learn feature representations from source and target domain respectively, and then the extracted features be delivered to , which is designed to discriminate source from target domain feature. However, different from the traditional domain discriminator, in our model, to solve the problem of lacking strong boundary, where the segmentation is the most error-prone, we make the process of information transferring focus more on the boundaries by improving the capability of in recognizing boundary . To achieve this goal, we propose a boundary-weighted loss (BWL) for . Let represents the training images and ground truths from source domain, and be the training images and ground truths from target domain.

represents boundary map with Gaussian distribution, which is constructed by Gaussian function with 3

3 kernel size. The BWL for is defined as

(1)

where represents the boundary map of the source domain image, denotes the boundary map of that target domain data image, and is a weighted coefficient.

Iv-B Boundary-weighted Segmentation Loss

Generally, for the task of image segmentation, cross entropy () is an effective loss function. Let represents ground truth and be a segmentation result, can be computed as

(2)

However, a problem of using cross entropy as the loss function is that relies on region information, which thus makes the trained networks not able to accurately identify boundaries. To make the network to be more sensitive to the boundaries during segmentation to achieve accurate segmentation, in this paper, a boundary-weighted segmentation loss function (BWL) is designed. During training, the BWL utilizes a distance loss () to regularize the position, shape and continuity of the segmentation to make it close to the object boundaries. Accordingly, the BWL for segmentation network can be formulated as

(3)

where represents the whole segmenting region, denotes boundary points in the segmentation result, and is a weighted coefficient. is a distance map, which is constructed by the distance transform of the segmented boundary points.

In summary, when training SNet-s, we employ as loss function. When training SNet-t , a total loss consists of proposed and the adversarial loss be optimized, which is defined as

(4)

Iv-C Network Design and Configurations

The details of the networks used in our work are provided in this section. In order to fully leverage the 3D spatial contextual information of volumetric data to accurately segment prostate images, a new 3D network is designed for domain image segmentation network (SNet) with inspiration from the seminal work of U-Net [35] and DenseNet [5].

As it can be seen in Fig. 2

, SNet-s and SNet-t contain two paths: down-sampling path and up-sampling path. The down-sampling path consists of one convolutional block, three densely-connected residual blocks (DRBs) and three average pooling layers. The pooling layers use stride of two, which gradually reduce the resolution of feature map and increase the receptive field of the convolutional layers. After the down-sampling path, an up-sampling path is attached, which contains three deconvolutional layers and three DRBs. The deconvolutional layers gradually up-sample the feature map until reaching the original size. To further improve the gradient information flow between the down-sampling and up-sampling paths and avoid information loss, inspired by U-Net

[35]

, we employ long connections inside the network, which connect the blocks in the same resolution level from the down-sampling and up-sampling paths. Those connections have several advantages. First, they can help effectively propagate context and gradient information both forward and backward between down-sampling and up-sampling paths and alleviate the vanishing-gradient problem. Second, it can help deal with the problem of information loss. To be more specific, when the feature map passes the convolutional and pooling layer, part of the feature information is abandoned and detailed information may be lost. This in turn leads to inaccurate boundaries in the segmentation results. After adding the long connections, the up-sampling path can help retain the feature information from earlier blocks in the down-sampling path to help achieve more accurate segmentation.

The DRB is a new structure proposed in our work as shown in Fig. 2, which combines densely connected layers, transition layers, and residual connections together to tackle the problem of overfitting with small training dataset and to promote information propagation within network for faster convergence. Inside DRB, the densely connections provide direct connections between all subsequent layers and the feature maps produced by all preceding layers are concatenated as input for the subsequent layers. To reduce the number of features and fuse the features from densely connected layers, a transition layer is added at the end of densely connected layers. The transition layer consists of an 11 convolutional layer, which reduces the number of feature maps, fuses the feature maps and hence improves the model compactness. To further promote information propagation and make the network easier to optimize, residual connections are employed by DRBs. Formally, consider an input image that is passed through the DRB. Let be the output of the convolutional layer, is a non-liner transformation of the

layer and defined as a convolution followed by a batch normalization and a rectifier non-linearity (ReLU). For the DRBs, the output is

(5)

where represents the concatenation of the feature maps produced in layers , is a non-liner transformation of the transition layer. Compared with traditional CNNs, the DRBs can easily make the network become deeper and meanwhile possesses fewer parameters, which make the network to be more powerful with hierarchical representation capability.

In summary, the proposed SNet includes convolutional layers, pooling layers, DRBs and deconvolutional layers and has more than 100 layers in depth. The DRBs contain different numbers (4,8,16,8,2) of BN-ReLU-Conv(111)-BN-ReLU-Conv(333) with growth rate 32. After each Conv(333) layer, a dropout layer with 0.3 dropout rate is added to overcome the overfitting problem. Similarly, to make obtain more useful information and enhance the accuracy of adversarial leaning, in domain discriminator, we take the utilization of multi-level representations into account. The feature representations extracted by each DRB in up-sampling path of SNet-s and SNet-t, total six different features representations, are treated as input of domain discriminator . To eliminate the influence of weight imbalance between supervised loss from SNet-t and adversarial loss from

and make the boundary information be focused, we special design the output of domain discriminator has same size with input and each spatial unit in the output represents the probability of the corresponding image pixel belongs to the target domain. Inside domain discriminator, we employ three ConvBlocks (Conv(3

33)-BN-LeakyReLU) with stride = 1, two deconvolutional layers and one output layer (Conv(111)) to discriminate source and target domain.

V Experiments

 

User ABD [mm] HD [mm] DSC [%] RVD [%] Overall score
Whole Base Apex Whole Base Apex Whole Base Apex Whole Base Apex
whu_mlgroup (ours) 1.35 1.54 1.29 4.27 4.48 3.44 91.41 89.56 89.29 4.11 1.84 3.16 89.59
kakatao 1.29 1.47 1.40 4.14 4.32 3.77 91.76 90.05 88.27 2.11 0.39 1.89 89.54
sakinis.tomas 1.34 1.51 1.44 4.15 4.41 3.79 91.33 89.73 87.95 4.63 5.83 2.54 89.44
pxl_mcg 1.40 1.59 1.40 4.28 4.35 3.56 91.23 89.08 88.55 2.08 -0.07 2.23 89.39
Isensee (nnU-Net) 1.31 1.45 1.46 4.00 4.05 3.79 91.61 90.29 88.05 3.42 1.86 3.48 89.28
segsegseg 1.37 1.51 1.44 4.38 4.36 3.67 91.37 89.85 87.60 3.06 0.38 4.12 89.13
mls.dl.eecs 1.38 1.55 1.28 4.58 4.68 3.51 91.37 89.33 89.42 2.76 -0.45 1.84 88.92
fly2019 1.62 1.54 1.50 5.09 4.31 3.91 90.12 88.95 87.72 4.99 2.19 6.65 88.73
rcc 1.57 1.71 1.53 4.59 4.72 3.52 91.67 89.31 89.35 2.04 -0.73 2.40 88.62
NPUSAIIP_JFHealthcare 1.45 1.63 1.53 4.13 4.55 3.95 90.58 89.12 86.89 6.68 8.60 -4.49 88.59

 

TABLE I: Quantitative evaluation results of BOWDA-Net and other methods on PROMISE12 challenge dataset (by Jan 21, 2019)

V-a Implementation Details

In the experiment, due to the target domain dataset variation in voxel size, resolution, dynamic range, position, and field of view, we first resampled all the target domain image volumes into a fixed resolution of 0.625mm0.625mm

1.5mm, and then normalized each dataset to have zero mean and unit variance. For the source domain dataset, which has unite resolution 0.27mm

0.27mm3mm, we just normalized each dataset to have zero mean and unit variance. To alleviate the problem of overfitting, we also employ data augmentation for both source and target domain data, the operations including rotation and flipping. The random cropping strategy is employed to further boost the datasets. During the network training, we randomly cropped sub-volumes in the size of 169696 () voxels from the training data during every iteration. In the testing phase, similar to [37, 20], we used overlapping sliding windows to crop sub-volumes and used the average of the probability maps of these sub-volumes to get the whole volume prediction. The sub-volume size was 169696 and the stride was 84848.

The proposed method is implemented using the open source deep learning library Keras

[50]

. Each model is trained end-to-end with stochastic gradient descent (SGD) optimization method. In the training phase, the learning rate is initially set to 0.0001 and decreased by a weight decay of

after each epoch. The momentum is set to 0.9. The experiments were carried out on a NVIDIA GTX 1080ti GPU with 11GB memory. Due to the limitation of the GPU memory, we chose 4 as the batch size and set the weighted coefficients

and in Eqns. (1) and (3).

Fig. 3: Sample segmentation results of the prostate. The yellow and red contours indicate ground truth and our segmentation results, respectively.

V-B Segmentation Performance

To evaluate the performance of our BOWDA-Net, we compare the results against several other methods, which have also been applied on the MICCAI 2012 Prostate MR Image Segmentation (PROMISE12) challenge dataset. In the PROMISE12 challenge, the organizers provide 30 testing MR images and the corresponding ground truth is held out to evaluate the proposed algorithms. The evaluation metrics used in PROMISE12 challenge include Dice Similarity Coefficient (DSC), the relative volume difference (RVD), average over the shortest distance between the boundary points of the volumes (ABD) and Hausdorff Distance (HD). All the evaluation metrics are calculated in 3D. In addition to evaluating these metrics over the entire prostate segmentation, the challenge organizers also calculated the boundary measures specifically for the apex and base parts of the prostate, because those parts are difficult to segment however very important for many clinical applications. The apex and base are determined by dividing the prostate into three approximately equal sized parts along the axial direction (the first 1/3 as apex and the last 1/3 as base). Then an overall score will be computed by taking all the criteria into consideration rank the algorithms.

The results of our proposed BOWDA-Net and the competitors are shown in Table I. Note that all the results reported in this section were obtained directly from the challenge website222https://promise12.grand-challenge.org/evaluation/results/ on Jan 21, 2019. Since there are a large number of team submissions, only evaluation scores of the top 10 teams are listed. As it can be seen from Table I, we performed the best and therefore ranked the first place among all the teams with the overall score of 89.59, which demonstrates the advantage of boundary-weighted knowledge transfer and BWL. Remarkably, the source domain data utilized in BOWDA-Net is not resampled to match the target domain data, which shows that BOWDA-Net can take general similar data to be easily extended to other medical image analysis tasks, especially those with limited training data. Some qualitative results of our method are shown in Fig. 3. It is observed that BOWDA-Net can produce accurate segmentation results and delineate the clear contours of prostates in MR images.

Fig. 4: Example segmentation results obtained using different loss functions. The gold standard segmentation is delineated in yellow and the deep learning segmentation results are in red.

V-C Impact of Loss Function

 

SNet-t Loss Loss ABD [mm] HD [mm] RVD [%] DSC [%]
1.21 11.55 -3.25 90.38
2.38 12.02 4.02 90.49
1.65 6.83 3.71 91.47
1.58 6.42 3.24 92.54

 

TABLE II: Effects of loss functions in segmentation performance.

To analyze the impact of our proposed BWL on the performance of segmentation, we compared the performance of the proposed BOWDA-Net with different supervised and adversarial losses. Before training, we split the target domain dataset into two parts by randomly selecting data of 10 subjects for validation and data of the rest 40 subjects for training. The source domain data employed in this experiment are not resampled. Table II lists the performances of BOWDA-Net using various combinations of cross entropy loss and the segmentation loss containing BWL. In addition to employing DSC to evaluate the accuracy of segmentation, we also used ABD, HD, RVD to evaluate the segmentation performance on boundary.

From Table II, it can be observed that using BWL in the loss functions of both SNet-t and helps achieve better performance than using . That demonstrates that BWL can help enhance the performance of networks. In addition, the best performance measured by the majority of the evaluation metrics is achieved when the BWL is used in both loss functions, indicating that the proposed BWL makes the trained networks more effective in securing the prostate boundaries. Some segmentation examples from BOWDA-Net with different loss functions are shown in Fig. 4. It can be seen that the segmentation results produced by BOWDA-Net with BWL have obtained more smoothing and accurate boundaries, which clearly demonstrates that BWL is effective in improving the quality of image segmentation.

V-D Effects of Training Strategies

Since the source domain data employed in our experiments are also prostate MR images, mixing source and target datasets together can extend training data directly, which is a basic and straightforward way for solving the problem of lacking training data. On the other hand, fine-tuning a pre-trained network is also a commonly adopted strategy for dealing with this problem, especially when difference exists between the source and target domain datasets. Besides, fine-tuning is also a rudimentary way of transfer learning. In this section, we compare the performances of SNet using different strategies to demonstrate the effectiveness of our proposed BOWDA-Net. The tested training strategies include:

  1. Target Domain Training Only: The target domain data are split into training set and validation set . Only is used to train SNet.

  2. Direct mixing of source and target domains: We simply mix the source domain data and target domain training set together to augment the size of training data.

  3. Mixing after resampling source domain data: This is similar to the above strategy, except that the source domain data are resampled to the same resolution as the target.

  4. Fine-tuning after training in source domain: We pre-train SNet on source domain data and then fine-tune it on the target data .

  5. Fine-tuning after training with resampling: This is similar to the above strategy, except that the source domain data are resampled to the same resolution as the target.

  6. The proposed domain adaptation strategy (BOWDA-Net)

 

Strategy DSC [%]
Target domain training only 88.76
Direct mixing of source and target domains 87.78
Mixing after resampling source domain data 89.81
Fine-tuning after training in source domain 89.34
Fine-tuning after training with resampling 89.68
Domain adaptation (BOWDA-Net) 90.38

 

TABLE III: Quantitative evaluation of different training strategies.

Table III shows the segmentation performances using the training strategies described above. It can be seen that directly mixing the source domain data and target domain data has a negative impact on the segmentation performance, which is even worse than using target domain data alone for training. There are two major reasons for that. One of the them is the domain shift problem shown in Fig. 1. The other one is that the amount of source domain data is larger than the target domain data. Simply mixing the data together would make the network focus more on the source domain rather than the target domain. The SNet then yields poorer performance in the target domain in this case. This problem is partially remedied by resampling the source domain data to the same resolution as the target domain data, where the DSC value was increased from 87.78% to 89.81%.

Similar effects can be observed on fine-tuning SNet pre-trained in the source domain in Table III. Compared with pre-training SNet directly on the source domain data, fine-tuning can obtain more accurate segmentation results when the pre-training uses resampled source domain data. However, the performance is still not as good as training SNet by mixing the resampled source domain data and the target domain data. This indicates that the capability of fine-tuning is limited and cannot overcome the problem of lacking sufficient training data. Last, the proposed BOWDA-Net obtained the best performance, since it is more effective than other strategies in employing and transferring the information in the source domain data.

V-E Network Ablation Study

 

Configurations DSC [%]
FCN 77.92
FCN + Dense 86.02
FCN + Dense + Residual 86.94
SNet 88.76

 

TABLE IV: Performances of SNet under different ablation configurations.

In order to evaluate the effectiveness of residual and dense connections in DRBs and long connections used in our proposed SNet, we created four different configurations of our model as follows.

  1. Fully convolutional network (FCN): This is indeed the version of our model without all the dense, residual and long connections.

  2. FCN + Dense: Using only dense connections.

  3. FCN + Dense + Residual: Using both dense and residual connections.

  4. The proposed domain image segmentation network (SNet): Using dense, residual and long connections as presented earlier.

Table IV shows the performance of these networks trained by using the target domain data only. It can be seen that adding residual, long and dense connections can help achieve more accurate segmentation than other networks.

Vi Conclusions

In this paper, a boundary-weighted domain adaptive neural network (BOWDA-Net) is proposed to address two challenges in prostate image segmentation, which are the lack of clear boundary and the lack of enough annotated data for training CNNs. Advanced transfer learning method is proposed by incorporating boundary weighting to the scheme. Extensive experiments on an open challenge dataset (PROMISE12) demonstrate that our proposed method can get more accurate boundaries and achieve superior results compared with other state-of-the-art methods. In our future work, we will extend the proposed method to other segmentation tasks on different organs and modalities.

References

  • [1] P. A. Pinto, P. H. Chung, A. R. Rastinehad, A. A. Baccala Jr, J. Kruecker, C. J. Benjamin, S. Xu, P. Yan, S. Kadoury, C. Chua et al., “Magnetic resonance imaging/ultrasound fusion guided prostate biopsy improves cancer detection following transrectal ultrasound biopsy and correlates with multiparametric magnetic resonance imaging,” The Journal of Urology, vol. 186, no. 4, pp. 1281–1285, 2011.
  • [2] D. Shen, Y. Zhan, and C. Davatzikos, “Segmentation of prostate boundaries from ultrasound images using statistical shape model,” IEEE Transactions on Medical Imaging, vol. 22, no. 4, pp. 539–551, 2003.
  • [3] Y. Guo, Y. Gao, and D. Shen, “Deformable MR prostate segmentation via deep feature learning and sparse patch matching,” IEEE Transactions on Medical Imaging, vol. 35, no. 4, pp. 1077–1089, 2016.
  • [4] Z. Tian, L. Liu, Z. Zhang, and B. Fei, “Superpixel-based segmentation for 3D prostate MR images,” IEEE Transactions on Medical Imaging, vol. 35, no. 3, pp. 791–801, 2016.
  • [5] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely connected convolutional networks,” in

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , 2017, pp. 4700–4708.
  • [6] W. Wang, J. Shen, and L. Shao, “Video salient object detection via fully convolutional networks,” IEEE Transactions on Image Processing, vol. 27, no. 1, pp. 38–49, 2018.
  • [7] S. Lai, L. Xu, K. Liu, and J. Zhao, “Recurrent convolutional neural networks for text classification.” in

    AAAI Conference on Artificial Intelligence (AAAI)

    , 2015, pp. 2267–2273.
  • [8] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” in Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, pp. 2227–2237.
  • [9]

    S. Azizi, S. Bayat, P. Yan, A. Tahmasebi, J. T. Kwak, S. Xu, B. Turkbey, P. Choyke, P. Pinto, B. Wood, P. Mousavi, and P. Abolmaesumi, “Deep recurrent neural networks for prostate cancer detection: Analysis of temporal enhanced ultrasound,”

    IEEE Transactions on Medical Imaging, vol. 37, no. 12, pp. 2695–2703, 2018.
  • [10] Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, Y. Zhang, L. Sun, and G. Wang, “Low dose CT image denoising using a generative adversarial network with wasserstein distance and perceptual loss,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1348–1357, 2018.
  • [11] Z. Gao, Y. Li, Y. Sun, J. Yang, H. Xiong, H. Zhang, X. Liu, W. Wu, D. Liang, and S. Li, “Motion tracking of the carotid artery wall from ultrasound image sequences: a nonlinear state-space approach,” IEEE Transactions on Medical Imaging, vol. 37, no. 1, pp. 273–283, 2018.
  • [12] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR), 2015.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  • [14] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
  • [15] A. Meyer, A. Mehrtash, M. Rak, D. Schindele, M. Schostak, C. Tempany, T. Kapur, P. Abolmaesumi, A. Fedorov, and C. Hansen, “Automatic high resolution segmentation of the prostate from multi-planar MRI,” in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), April 2018, pp. 177–181.
  • [16] Z. Tian, L. Liu, Z. Zhang, and B. Fei, “PSNet: prostate segmentation on MRI based on a convolutional neural network,” Journal of Medical Imaging, vol. 5, no. 2, p. 021208, 2018.
  • [17] R. Cheng, H. R. Roth, L. Lu, S. Wang, B. Turkbey, W. Gandler, E. S. McCreedy, H. K. Agarwal, P. Choyke, R. M. Summers et al., “Active appearance model and deep learning for more accurate prostate segmentation on MRI,” in Medical Imaging 2016: Image Processing, vol. 9784, 2016, p. 97842I.
  • [18] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in International Conference on 3D Vision (3DV), 2016, pp. 565–571.
  • [19] X. Yang, L. Yu, L. Wu, Y. Wang, D. Ni, J. Qin, and P.-A. Heng, “Fine-grained recurrent neural networks for automatic prostate segmentation in ultrasound images.” in AAAI Conference on Artificial Intelligence (AAAI), 2017, pp. 1633–1639.
  • [20] L. Yu, X. Yang, H. Chen, J. Qin, and P.-A. Heng, “Volumetric convnets with mixed residual connections for automated prostate segmentation from 3D MR images.” in AAAI Conference on Artificial Intelligence (AAAI), 2017, pp. 66–72.
  • [21] D. Nie, Y. Gao, L. Wang, and D. Shen, “ASDNet: Attention based semi-supervised deep networks for medical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).   Springer, 2018, pp. 370–378.
  • [22] Y. Wang, Z. Deng, X. Hu, L. Zhu, X. Yang, X. Xu, P.-A. Heng, and D. Ni, “Deep attentional features for prostate segmentation in ultrasound,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).   Springer, 2018, pp. 523–530.
  • [23] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in International Conference on Neural Information Processing Systems (NIPS), 2014, pp. 2672–2680.
  • [24] M. Ghafoorian, A. Mehrtash, T. Kapur, N. Karssemeijer, E. Marchiori, M. Pesteie, C. R. Guttmann, F.-E. de Leeuw, C. M. Tempany, B. van Ginneken et al., “Transfer learning for domain adaptation in MRI: Application in brain lesion segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).   Springer, 2017, pp. 516–524.
  • [25] Z. Luo, Y. Zou, J. Hoffman, and L. F. Fei-Fei, “Label efficient learning of transferable representations acrosss domains and tasks,” in International Conference on Neural Information Processing Systems (NIPS), 2017, pp. 165–177.
  • [26] Y. Zhang, Z. Qiu, T. Yao, D. Liu, and T. Mei, “Fully convolutional adaptation networks for semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6810–6818.
  • [27] H. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers, “Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285–1298, 2016.
  • [28] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Low-dose CT with a residual encoder-decoder convolutional neural network,” IEEE Transactions on Medical Imaging, vol. 36, no. 12, pp. 2524–2535, 2017.
  • [29] D. Nie, R. Trullo, J. Lian, C. Petitjean, S. Ruan, Q. Wang, and D. Shen, “Medical image synthesis with context-aware generative adversarial networks,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).   Springer, 2017, pp. 417–425.
  • [30] J. Zhang, M. Liu, and D. Shen, “Detecting anatomical landmarks from limited medical imaging data using two-stage task-oriented deep neural networks,” IEEE Transactions on Image Processing, vol. 26, no. 10, pp. 4753–4764, 2017.
  • [31] W. Xue, A. Lum, A. Mercado, M. Landis, J. Warrington, and S. Li, “Full quantification of left ventricle via deep multitask learning network respecting intra-and inter-task relatedness,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).   Springer, 2017, pp. 276–284.
  • [32] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, p. 115, 2017.
  • [33] Q. Zhu, B. Du, B. Turkbey, P. Choyke, and P. Yan, “Exploiting interslice correlation for MRI prostate image segmentation, from recursive neural networks aspect,” Complexity, vol. 2018, p. 10, 2018.
  • [34] A. Mortazi, R. Karim, K. Rhode, J. Burt, and U. Bagci, “Cardiacnet: Segmentation of left atrium and proximal pulmonary veins from MRI using multi-view CNN,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).   Springer, 2017, pp. 377–385.
  • [35] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).   Springer, 2015, pp. 234–241.
  • [36] X. Li, H. Chen, X. Qi, Q. Dou, C. Fu, and P. Heng, “H-DenseUNet: Hybrid densely connected UNet for liver and liver tumor segmentation from CT volumes,” IEEE Transactions on Medical Imaging, vol. 37, no. 12, pp. 2663–2674, 2018.
  • [37] L. Yu, J.-Z. Cheng, Q. Dou, X. Yang, H. Chen, J. Qin, and P.-A. Heng, “Automatic 3D cardiovascular mr segmentation with densely-connected volumetric convnets,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).   Springer, 2017, pp. 287–295.
  • [38] H. Chen, Q. Dou, L. Yu, and P.-A. Heng, “Voxresnet: Deep voxelwise residual networks for volumetric brain segmentation,” arXiv preprint arXiv:1608.05895, 2016.
  • [39] O. Oktay, E. Ferrante, K. Kamnitsas, M. Heinrich, W. Bai, J. Caballero, S. A. Cook, A. de Marvao, T. Dawes, D. P. O‘Regan et al., “Anatomically constrained neural networks (ACNNs): application to cardiac image enhancement and segmentation,” IEEE Transactions on Medical Imaging, vol. 37, no. 2, pp. 384–395, 2018.
  • [40] K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker, “Efficient multi-scale 3D cnn with fully connected CRF for accurate brain lesion segmentation,” Medical Image Analysis, vol. 36, pp. 61–78, 2017.
  • [41] J. Hoffman, D. Wang, F. Yu, and T. Darrell, “FCNs in the wild: Pixel-level adversarial and constraint-based adaptation,” arXiv preprint arXiv:1612.02649, 2016.
  • [42] B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in European Conference on Computer Vision (ECCV).   Springer, 2016, pp. 443–450.
  • [43] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep domain confusion: Maximizing for domain invariance,” CoRR, vol. abs/1412.3474, 2014.
  • [44] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simultaneous deep transfer across domains and tasks,” in IEEE Conference on Computer Vision (ICCV), 2015, pp. 4068–4076.
  • [45] M. Long, Y. Cao, J. Wang, and M. I. Jordan, “Learning transferable features with deep adaptation networks,” in International Conference on International Conference on Machine Learning (ICML), 2015, pp. 97–105.
  • [46] L. v. d. Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, no. 11, pp. 2579–2605, 2008.
  • [47] H. Shan, Y. Zhang, Q. Yang, U. Kruger, M. K. Kalra, L. Sun, W. Cong, and G. Wang, “Correction for “3D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2D trained network”,” IEEE Transactions on Medical Imaging, vol. 37, no. 12, pp. 2750–2750, 2018.
  • [48] A. van Opbroek, M. A. Ikram, M. W. Vernooij, and M. de Bruijne, “Transfer learning improves supervised image segmentation across imaging protocols,” IEEE Transactions on Medical Imaging, vol. 34, no. 5, pp. 1018–1030, 2015.
  • [49] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2962–2971.
  • [50] F. Chollet et al., “Keras,” https://keras.io, 2015.