A Transfer Learning Approach for Automated Segmentation of Prostate Whole Gland and Transition Zone in Diffusion Weighted MRI

by   Saman Motamed, et al.

The segmentation of prostate whole gland and transition zone in Diffusion Weighted MRI (DWI) are the first step in designing computer-aided detection algorithms for prostate cancer. However, variations in MRI acquisition parameters and scanner manufacturing result in different appearances of prostate tissue in the images. Convolutional neural networks (CNNs) which have shown to be successful in various medical image analysis tasks including segmentation are typically sensitive to the variations in imaging parameters. This sensitivity leads to poor segmentation performance of CNNs trained on a source cohort and tested on a target cohort from a different scanner and hence, it limits the applicability of CNNs for cross-cohort training and testing. Contouring prostate whole gland and transition zone in DWI images are time-consuming and expensive. Thus, it is important to enable CNNs pretrained on images of source domain, to segment images of target domain with minimum requirement for manual segmentation of images from the target domain. In this work, we propose a transfer learning method based on a modified U-net architecture and loss function, for segmentation of prostate whole gland and transition zone in DWIs using a CNN pretrained on a source dataset and tested on the target dataset. We explore the effect of the size of subset of target dataset used for fine-tuning the pre-trained CNN on the overall segmentation accuracy. Our results show that with a fine-tuning data as few as 30 patients from the target domain, the proposed transfer learning-based algorithm can reach dice score coefficient of 0.80 for both prostate whole gland and transition zone segmentation. Using a fine-tuning data of 115 patients from the target domain, dice score coefficient of 0.85 and 0.84 are achieved for segmentation of whole gland and transition zone, respectively, in the target domain.


page 2

page 7

page 10

page 11


Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning

Transfer-learning methods aim to improve performance in a data-scarce ta...

Accurate Prostate Cancer Detection and Segmentation on Biparametric MRI using Non-local Mask R-CNN with Histopathological Ground Truth

Purpose: We aimed to develop deep machine learning (DL) models to improv...

Planar 3D Transfer Learning for End to End Unimodal MRI Unbalanced Data Segmentation

We present a novel approach of 2D to 3D transfer learning based on mappi...

Knowledge Transfer for Few-shot Segmentation of Novel White Matter Tracts

Convolutional neural networks (CNNs) have achieved stateof-the-art perfo...

Self-transfer learning via patches: A prostate cancer triage approach based on bi-parametric MRI

Prostate cancer (PCa) is the second most common cancer diagnosed among m...

Segmentation of Shoulder Muscle MRI Using a New Region and Edge based Deep Auto-Encoder

Automatic segmentation of shoulder muscle MRI is challenging due to the ...

1 Introduction

Prostate cancer remains the second most commonly diagnosed cancer and one of the leading causes of cancer death for men in developed countries [39]. Magnetic resonance imaging (MRI) has emerged as an alternative to the clinical standard transrectal ultrasound (TRUS) for cancer detection, localization, staging, biopsy guidance, and focal therapy [41]. More recently, multi-parametric (mp)-MRI has played an increasingly important role in prostate cancer assessment. In addition to the excellent soft tissue contrast, mp-MRI can provide metabolic, diffusion, and perfusion information of prostatic tissue that improves the identification of possible cancerous regions. As the use of mp-MRI increases in clinical practice, as a part of the clinical decision support system, automated prostate cancer detection and segmentation can help radiologists interpret images faster and more accurately. An important step in this process is automated segmentation of prostate whole-gland (WG) and transition zone (TZ) given the tedious nature of manual contouring. Accurate segmentation of prostate and related anatomic structures is an essential task for a number of clinical workflows including radiation treatment planning. In addition, prostate volume has been shown to be a clinical factor in prostate cancer diagnosis [Roobol2012].

Computer-aided detection (CAD) algorithms proposed for automated detection of prostate cancer rely on segmentation of prostate gland as a preprocessing step [Cameron2016, Chung2015, 30, Khalvati2018, 29, 28, DBLP:journals/corr/abs-1905-13145]

. Recently, deep convolutional networks (CNNs) have led to a series of breakthroughs in the field of computer vision. CNN models such as U-net 

[3] based architectures have been explored and shown prominent results in medical image segmentation. It is of high relevance to perform prostate WG and TZ segmentation on MR modalities such as Diffusion Weighted Imaging (DWI). DWI can reveal the internal prostatic anatomy, prostatic margins, and the extent of prostatic tumors [2, 3]

. DWI images, however, are subject to variation between different cohorts due to variance in acquisition parameters and scanner manufacturing differences.

Image variability limits the segmentation performance of CNNs trained on DWI images of a given scanner (source domain) when applied to DWI images acquired by a different scanner (target domain). DWI images are acquired with different b-values, which reflect the strength and timing of the gradients used to generate the images [38]. Figure 1 shows DWI images of prostate from two different cohorts with b-value of .

(a) b-100 DWI sample from Source Domain
(b) b-100 DWI sample from Target Domain
Figure 1: b-100 images of prostate in two cohorts

As it can be seen from Figure 1, there is distinguishable difference in quality and resolution of b-100 images from these two cohorts.

The task of acquiring medical imaging data and the process of manual contouring of images by radiologists can be cumbersome and expensive. Thus, it is imperative to be able to use all available labeled data from different cohorts and scanners, and transfer images and labels from one (source) cohort to the target cohort so that training a segmentation algorithm for the target cohort becomes possible with a minimal amount of labeled images. Data variability between cohorts and scanners remains a challenge for transferring learned knowledge from one cohort to another. Transfer learning is a promising approach to close the gap caused by data variability in image acquisition [42].

Transfer learning is a deep learning technique that enables harnessing a neural network that has been trained on one domain to be applied to another domain. Transfer learning is useful when there is insufficient data for training a neural network on a new (target) domain while there is sufficient training data in another domain (source domain) that can be transferred to the task at hand. The lack of enough data is a major challenge in medical imaging and DWI variability can benefit from a transfer learning approach.

In this work, we propose a deep transfer learning approach for segmentation of prostate WG and TZ based on a modified U-net architecture. A deep learning architecture pretrained on a relatively large cohort (n=533) was fine-tuned using a small dataset from the target cohort and tested on the remaining of the target cohort (n=33). The source and target cohorts were acquired from two different hospitals using different MRI scanners (Philips scanner for source cohort and Siemens scanner for target cohort). We explored the increase in accuracy of prostate WG and TZ segmentation, using different target domain dataset sizes for fine-tuning (n=8-115). To the best of our knowledge, the required number of labeled data in the target domain to perform transfer learning for prostate segmentation has not been previously explored.

Dice Score Coefficient (DSC) [8] with values between 0 (no overlap of prediction and ground truth) and 1 (perfect segmentation prediction) has been used to evaluate the performance of medical image segmentation algorithms. In order to achieve better performance in segmentation tasks, CNN architectures such as VGG-16 have been used [1, vgg] as a preprocessing step to first detect prostate and only then train the segmentation architecture on images containing prostate [1]. With limited amount of data, however, training more architectures leads to loss of information over each model. To explore the effect of different rewards for prostate and non-prostate images and their corresponding segmentation predictions, we made a modification to the conventional DSC loss function and studied the effect of the modification on overall DSC, sensitivity, specificity and precision of each model’s performance.

2 Related Work

Over the past few years, as the importance of prostate MRI segmentation grows, several segmentation methods have been proposed for the task of prostate zonal segmentation, including registration-based methods [Khalvati2015c, Khalvati2013, 10, 11, 12]. Recently, deep learning methods have made advances in semi- and fully automatic segmentation of medical images including prostate [18, 1, 3, 19].

In registration-based segmentation algorithms, a model is built to represent the prior knowledge, such as shape, features, or relative positions of anatomical parts. After correctly registering the model to the target image, the segmentation result is generated by applying the registration transformation matrix to the training label. Deformable models as a type of registration based segmentation, offer an approach that combines geometry, physics, and approximation theory. They have proven to be effective in segmenting, matching, and tracking anatomic structures by exploiting (bottom-up) constraints derived from the image data together with (top-down) knowledge about the location, size, and shape of these structures. Deformable models are capable of accommodating the significant variability of biological structures over time and across different individuals [25].

Recently, CNNs [31] have been a ground-breaking addition to existing segmentation algorithms, which are dominating the field of Computer Vision. CNNs have been responsible for significant advancements in tasks such as object classification [32] and object localization [33], and the continuous improvements to CNN architectures are bringing further radical progresses [36, 34, 35]. Semantic Segmentation tasks have also been revolutionized by CNNs. The ’U-net’ architecture, proposed by Ronnenberger [3] was a great success for the task of medical image segmentation and has been used as a skeleton architecture in many studies [2, 22, 1]. The structure of U-net comprises of an encoder and a decoder network. Furthermore, the corresponding layers of the encoder and decoder networks are connected by skip connections prior to a pooling and subsequent to a deconvolution operation, respectively. U-net has been showing promising potential in segmenting medical images, even with a scarce amount of labeled training data. We have used this architecture as the base model and have proposed a transfer learning based segmentation algorithm for prostate WG and TZ.

With the rise of deep learning models which heavily depend on data size in order to be trained sufficiently and perform tasks with high accuracy, transfer learning methods are specially of interest in the field of medical imaging where data scarcity and MR variation between different institutes and image acquisition protocols significantly affect the performance of deep learning approaches. There is however, a lack of studies in exploring the amount of data needed in order to perform transfer learning from a source domain to a target domain. This is especially of high importance since the task of contouring medical images by radiologists is expensive, time consuming, and suffers from human-error and inter- and intra-user variability.

In a study by Ghafoorian et. al [4], the effect of transfer learning on brain lesion segmentation of MR images were explored. They trained a CNN on legacy MR images of brain and evaluated the performance of the domain-adapted network on the same task with images from a different domain, reporting the DSC increase based on different dataset sizes used as the target domain. As of now, studies that explore the data size effects on different tasks such as prostate segmentation are very limited in the literature, yet it is of high importance to know how many labeled images are needed in order to efficiently extend a study from one cohort to another.

In this work, we propose an architecture with a transfer learning based approach for the task of prostate WG and TZ segmentation. The study can also act as a guide for any future work by measuring the amount of data needed for the task of prostate segmentation via transfer learning, by comparing final accuracy results based on multiple training size instances.

3 Data and Methods

3.1 Datasets

To perform the task of Prostate WG and TZ segmentation, and extending our trained model to different DWI domains, we used two different datasets from different institutes. Institutional review board approval was obtained for this study from both institutions and the need for written informed patient consent was waived.

  1. Our source domain contains DWI images of 4 b-values; for a total of 533 patients from Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada. The data is split into Training, Validation, and Test sets with a ratio of , respectively.

  2. Our target domain includes DWI images containing of 148 patients from University Health Network, Toronto, Ontario, Canada. Through our training, we explore different subset sizes of our target domain and the effects of the size of dataset used to fine-tune the transfer learning architecture on the accuracy of prostate WG and TZ segmentation evaluated on a randomly fixed subset of 33 patients from the target domain (out of the 148).

3.2 Architecture and Training

We follow the network architecture proposed by Clark et. al [1]. Figure 2

shows the proposed architecture where regular convolution blocks in U-net are embedded with inception and residual blocks. Inception blocks apply four convolution and pooling operations in parallel and then concatenate the feature tensors at the end of the block. Merging of signals after parallel operations has been shown theoretically and experimentally to increase segmentation and classification accuracy 


. Residual connections were added to the connections between up-sampling and down-sampling paths. This was especially important in the first layer skip connection which had only undergone one convolution operation. This helped to reduce areas of false positive. Adam optimizer with the same set of initial learning rates were used for training all the instances of U-net for 25 epochs, using an Nvidia GeForce GTX 1080 Ti 11GB GPU.

In training our base model using the source dataset, we combined the 4 available b-value DW images to increase the number of training images rather than individually training the U-net on each b-value separately and picking the best result. Data augmentation was done using Keras built-in augmentation that performs random horizontal and vertical flips and axis rotations.

In convolutional neural networks, shallow layers / features of the network are known to contain more generic features (e.g. edges and size of the object) but deeper layers of the network become progressively more specific to the details of the classes contained in the trained dataset. For the U-net architecture shown in Figure 2, we expect shallow layers (Down-blocks) to be responsible for learning lower level features which are shared by both source and target datasets, while the deeper layers (Up-blocks) are responsible for more high level features that might differ between the two datasets.

Figure 2: Modified U-net Architecture

Left: The modified U-net architecture, which is comprised of Down and Up blocks. Right: Expanded Down and Up blocks.

We experimented with different combinations of which layers to fine-tune. Within the radiology community, there is higher agreement on Prostate WG segmentation compared to that of TZ due to the general shape and vague boundaries of TZ. As a result, for the task of prostate WG segmentation, our best performing model was achieved by fine-tuning only the Up-blocks while keeping the Down-block learned features the same as our source domain learned model. As TZ contouring has more variance between different radiologists compared to WG, our best model for TZ segmentation was achieved by fine-tuning Up-blocks while also fine-tuning the first half of the Down-block layers. We use both conventional and our proposed modified DSC loss function in order to compare the results between and within models (pretrained and transfer learning models). Target domain dataset used for fine-tuning was gradually increased with 10% increments, starting from 8 patients (194 images) to 115 patients (2,768 images), in order to explore minimum dataset size required to attain acceptable results. The final test dataset from the target domain was kept fixed at 33 patients and separate from the fine-tuning datasets.

An overview of the models used for training and comparison purposes are listed below.

  1. Training our U-net based architecture from scratch on the target data with conventional DSC loss function (Equation 2) ignoring the source domain dataset.

  2. Transfer Learning by using conventional DSC loss function where we use the features learned from the source domain and use the target domain images to fine-tune and test our model.

  3. Transfer Learning using our proposed modified DSC loss (Equation 4).

3.3 Modified Dice Score Measure

DSC is the well-accepted measure of segmentation accuracy for medical images in well-known competitions such as PROMISE12 [promise12]. DSC measures the overlap between the predicted mask and ground truth .


where is a small number to avoid division by zero.

By using the DSC loss function defined above, the calculated DSC for cases that do not contain the prostate will be 1. In other words, the reward of predicting a perfect segmentation for prostate containing images and not returning any segmentation prediction for images that do not contain the prostate is the same.

We propose a change to calculating segmentation accuracy by making the following modification to the DSC based loss function where we explore values from 0.0 to 1.0 with 0.1 increments.


With this modification, we anticipate lower X values to increase segmentation accuracy of the model where training data size is smaller, and bigger X values to perform better in bigger datasets. The reason lies behind the nature of the prostate data where of images at patient level are non-prostate images hence by adjusting the reward system in smaller dataset sizes and penalizing correct performance of the model on such images when no segmentation is produced, we shift the focus of the model to the images containing prostate. As training data size increases, the effect of the modified DSC may become negligible since the network will have seen enough cases to to reward/penalzie correctly.

3.4 Post-processing

Two morphological transformations [Gonzalez2007], Opening and Closing, were used in order to improve segmentation accuracy and reduce noise. We know prostate zones are continuous in volume. Closing operation which is a dilation followed by an erosion, fills out holes in the predicted masks by our algorithm. The opening operator which is an erosion followed by dilation, is useful in removing noise outside of our predicted masks. The application of these transformations is shown in Figure 3. Although DSC improvement with these transformations was minimal (), for applications of automated segmentation such as prostate volume calculation, it makes a great difference to have accurate enclosed continuous masks.

The segmentation of prostate WG at the prostate base and apex is challenging for both radiologists and CAD tools. As a result, to mitigate the number of false positive cases in our segmentation task, we dismiss predicted masks below a threshold of 120 pixels which translates to of the mean number of pixels for prostate base and apex (prostate end-points with smallest size within prostate) among all patients. We apply the same methodology to TZ segmentation and filter results at 65 pixels, 90% of the mean number of pixels for prostate TZ base and apex among all patients. This showed an improvement dice score in our final result, without sacrificing specificity, sensitivity or precision.

(a) Pre-Transformation
(b) Post-Transformation
Figure 3: Postprocessing segmentation results

4 Results

In the following, the results of the proposed segmentation model is presented for source domain and target domain datasets for both prostate WG and TZ.

4.1 Source Domain Segmentation Results for WG and TZ

The result of the modified U-net architecture, trained and validated on 448 patients and tested on 85 patients (6,688 images), all from the source domain dataset, can be seen in Table 1. We used our proposed modified DSC loss in training and we report the conventional DSC on the test set images containing prostate. We calculated specificity, sensitivity, and precision for detection of slices that contain prostate.

Zone DSC Specificity Sensitivity Precision
Table 1: Segmentation result for source domain dataset

4.2 Target Domain Segmentation

From the target domain, we randomly picked 33 patients (20% of total number of images) as our test set through all iterations of training. For fine-tuning the model, we used different sizes from 8 patients to 115 patients in the target dataset. Unlike training source domain data, in training (fine-tuning) the target domain images, we did not use data augmentation to keep the training fast and explore minimal data size requirement for transfer learning to achieve optimal segmentation accuracy.

For each set target size, we trained our modified U-net in 2 different ways; 1) Training the U-net from scratch with exploring DSC loss modifications, using target domain training data only 2) Transfer Learning by using modifications to the DSC loss, pretrained on source domain and fine-tuned on target domain. We also used the performance of our trained network on source domain and applied, without any training (fine-tuning), to the target domain as a baseline. Source domain’s performance on its test set acts as an optimal performing result for comparison purposes.

4.2.1 Prostate WG

Table 2 shows detailed average DSCs over training with different training data size in the target domain. “Transfer Learning" and "Train Target from Scratch" are the results of our training with and without a transfer learning approach after exploring different values from Equation 3. We achieved higher accuracy with for transfer learning on all dataset sizes and for training target from scratch only for dataset sizes of 8 and 30 patients. For the rest of the experiments with data sizes of 42 patients and higher (42-115), the conventional loss where results in a higher accuracy. Hence, the reported DSC for both transfer learning and training from scratch uses our modified DSC loss, except for training from scratch on data sizes of 42-115 patients. Transfer learning DSC results with our proposed DSC loss performed, on average, 1% better compared to DSC achieved with conventional loss. Our proposed loss performed much better in dataset sizes of 8 and 30 when the model was trained from scratch. We achieved A DSC of 0.58 compared to 0.44 when training on 8 patients and 0.65 compared to 0.57 on 30 patients. “No Training on Target" uses the model trained only using the source data and applies it to the target domain test cases.

Target Size Transfer Learning Train Target from Scratch No Training on Target
8 0.64
30 0.64
42 0.64
70 0.64
85 0.64
106 0.64
115 0.64
Table 2: DSC results for segmentation of prostate WG in the target domain

Figure 4 shows the improvement of average DSCs for WG on our 33 test patients, trained with and without transfer learning and modified DSC loss. The blue line shows the result of transfer learning with modified DSC where in Equation 3 and our model is pretrained using the source data and fine-tuned using the target data with different sizes (8-115). We used our modified loss on dataset sizes of 8 and 30 for training the model from scratch (green line). For n = 42-115, we used the conventional loss. the black line shows the result of applying the pretrained model to the 33 target test patients, without any training.

Figure 4: DSC trend on segmentation of the target domain prostate WG

Transfer learning is able to extend our WG segmenting model, trained on source data to target data with a DSC of with only 8 cases from the target dataset that are contoured by the radiologist. Training the U-net from scratch will result in a DSC of with a longer training time compared to the transfer learning approach. Even though our modified DSC loss achieves a higher overall DSC compared to the conventional DSC counterpart in the smaller dataset sizes, consideration of measures such as specificity, sensitivity and precision on image level show that DSC alone is not a good measure of accuracy for segmentation tasks. Figure 5 compares sensitivity, specificity and precision of testing our model, trained with 8, 70 and 115 patients using the conventional DSC loss measured against our proposed modified dice loss.

(a) 8 Patients
(b) 70 Patients
(c) 115 Patients
Figure 5: The overall sensitivity, specificity and precision for WG detection using modified DSC loss, both with a transfer learning and from scratch training, trained with 8, 70 and 115 patients and tested on 33 patients.

The proposed DSC loss, where in Equation 3, and its counterpart conventional loss both performed well using a transfer learning approach, with DSC, sensitivity, specificity and precision not being persistently better in one or the other. The proposed loss however performs much better when training a model from scratch, using a limited training dataset as can be seen in Figure 5 (a).

Figure  6 shows the prediction of prostate WG compared to the ground truth contoured by radiologists, using our loss function against its counterpart. As expected, when training from scratch, the conventional loss performs better than our proposed loss. The subtle improvement of predictions as we move from training from scratch and conventional DSC to our transfer learning approach with modified DSC may not be visible in the figure.

Figure 5: From Scratch
Figure 5: From Scratch + Modified DSC loss
Figure 5: Transfer Learning
Figure 5: Transfer Learning + Modified DSC loss
Figure 6: Prostate Whole Gland Predicted Mask vs Ground Truth

Predicted prostate WG mask (white) vs. ground truth (blue) for sample cases. From left to write, 1) Training from scratch 2) Training from Scratch with modified DSC 3) Transfer Learning 4) Transfer Learning with modified DSC.

4.2.2 Prostate TZ

For prostate TZ segmentation, we followed the same training scheme as for WG, keeping our test patients unchanged. Table 3 shows detailed average DSCs over training with different sizes of target domain data.

Target Size Transfer Learning Train Target From Scratch No Training on Target
8 0.71
30 0.71
42 0.71
70 0.71
85 0.71
106 0.71
115 0.71
Table 3: DSC results for segmentation of prostate TZ in the target domain

Figure 7 shows the improvement of average DSCs for TZ on our 33 test patients, trained with and without transfer learning and modified DSC loss. The blue line which is the result of transfer learning uses our modified DSC loss on all training instances (n = 8-115) while we used the same methodology as prostate WG for training from scratch and used the modified DSC loss on 8 and 30 patient data sizes and used conventional loss for n = 42-115.

Figure 7: DSC trend on segmentation of the target domain prostate TZ

Figure 8 shows the prediction of prostate TZ compared to the ground truth of the zone contoured by radiologists, using our loss function against its counterpart.

Figure 7: From Scratch
Figure 7: From Scratch + Modified DSC loss
Figure 7: Transfer Learning
Figure 7: Transfer Learning + Modified DSC loss
Figure 8: Prostate transition zone Predicted Mask vs Ground Truth

Predicted prostate TZ (white) vs. ground truth (blue) for sample cases. From left to write, 1) Training from scratch 2) Training from Scratch with modified DSC 3) Transfer Learning 4) Transfer Learning with modified DSC

5 Discussion

With the increase in using DWI images for the purpose of prostate cancer diagnosis and prognosis, automatic accurate segmentation of DWI images into prostate WG and TZ can assist radiologists in diagnosis and prognosis of prostate cancer. In this work, we have proposed a transfer learning architecture to extend a trained CNN model on a large MRI dataset, to target domain MRI images with different acquisition parameters. With the limitations on available data size in the field of medical imaging, transfer learning allows the use of all available data from different cohorts for a task at hand. As shown in our results, training on the source domain and crudely applying the trained model to target domain results in poor performance (DSCs of 0.64 and 0.71 for WG and TZ, respectively). Moreover, training a model from scratch never reaches the accuracy of the proposed transfer learning method regardless of the size of the training dataset in the target cohort (Figures 4 and 7).

The goal of this work was also to identify minimum required labeled dataset size, in order to achieve acceptable segmentation results. We have shown that with datasets as small as 30 patients, we can extend our model, trained on the source domain containing 533 patients, to a target domain with DSC of on the test set for segmentation of both prostate WG and TZ segmentation. As the number of training cases in the target domain increases (115), the DSC of prostate WG and TZ segmentation reaches to 0.85 and 0.84, respectively. This is a reasonable DSC, which can be accepted for different of applications such as prostate volume calculation. Given the small training dataset required on the target domain, the proposed approach can have significant practical impact.

We also proposed a loss function based on a modified DSC measure. Although the proposed loss function does not show significant improvement in overall DSC on the test images while using the transfer learning approach, it shows great improvement on DSC, specificity, and precision of detecting images with prostate when training the network from scratch using with small training datasets. This was achieved by making the network focus only on images containing prostate. As the dataset size grows, training from scratch with this method leads to lower specificity with the increase of false positives. Another implication of our modified DSC loss function is showing that DSC, on its own, as currently used by researchers, may not be the best measure for reporting accuracy. As we showed in Figure 5, DSC becomes more meaningful when combined with sensitivity, specificity and precision measures to give a bigger picture of accuracy of different models (e.g. Figure 5 (c)).

Some of the challenges and limitations of prostate segmentation is the ambiguity and difficulty in segmentation of base and apex regions by radiologists. Our best performing model for prostate WG segmentation results in 50 false negative and positive predictions (0.06% of total images in test set). Out of these mis-predictions, we estimated what portion belong to apex/base and what portion belong to mid-gland. We used pixel thresholding based on min / max of appearing images (base/apex masks are much smaller than mid-gland on average) and positions of images (if there are predictions before and after a slice, it is most-likely mid-gland and base/apex otherwise). Around only 14 images (28%) of the mis-predictions were not base/apex while the rest belong to either base or apex of prostate. Out of those 36 images, different radiologists may or may not include those images as a part of the prostate since the zones get small and difficult to detect. A future work could make use of robust registration / detection methods at the base and apex to improve the segmentation of these slices.

6 Conclusion

This work proposes a transfer learning architecture for prostate WG and TZ segmentation by extending one cohort to another with a small amount of labeled data. This will allow clinicians and radiologists to have a guideline for the optimal number of contoured images they need in order to successfully achieve segmentation results based on their accuracy needs. The modification of dice coefficient loss eliminated the need for training a separate network for detecting the prostate before performing segmentation, while outperforming the conventional dice loss. This model can be used to calculate prostate WG and TZ volumes and to preprocess MR images in order to perform anomaly detection computer aided models that require the segmentation of prostate WG and TZ.