Prostate segmentation from radiology scans is often necessary for radiotherapy, prostatectomy, and calculation of prostate-specific antigen (PSA) density 
. Among imaging modalities, magnetic resonance imaging (MRI) provides the best soft tissue contrast and yields the most accurate estimation on prostate volume, consistent with prostatectomy specimen volumes. Unlike MRI, computed tomographic (CT) scans have difficulties to distinguish the boundaries of prostates and other adjacent tissues during segmentation . Despite this, in current clinical practice, prostate radiation therapy dose calculations is primarily based on CT scans as it is the only modality that can derive electron density needed for the dosimetry calculations . Therefore, planning systems generally require anatomical information to be delineated on CT scans.
In this study, we address a practical yet still very challenging issue of prostate segmentation from CT images when there are no ground truth CT annotations to supervise the segmentation algorithm. Instead, we target utilizing segmentation labels from widely available MRI data sets, and propose a two step knowledge transfer algorithm to map the segmentation labels from MRI to CT scans. The correspondence between MRI to CT is established through a CycleGAN algorithm  with a structural similarity preserving cost function. Highly realistic synthetic CT scans generated in the first step are then used to supervise a deep segmentation network in the second step. The training for the segmentation network is performed only on the synthetic images while testing is done on both synthetic and real CT scans for evaluation. While our framework does not enforce the use of any specific segmentation network to finalize the delineation process, we choose 2.5D Res-U-Net to accomplish this task with faster convergence, and higher accuracy.
The proposed workflow includes two main steps as demonstrated in Figure 1. First step is to generate high-quality and reliable CT images (SynCT) from MR images. Previous work  has shown that domain adaptation from MR images to CT images is feasible using the CycleGAN architecture. We used a similar CycleGAN approach as baseline to create high-quality knowledge transfer between unpaired MRI and CT.
Second step is to conduct automatic segmentation of prostate. We trained a U-Net based segmentation network to delineate the whole prostate area but with two main differences from the existing literature: (i) we used SynCT in training and real CT scans in testing, and (ii) we modified the U-Net  to increase the segmentation performance by adding residual blocks into the segmentation network. For better 3D information fusion, we also modified the segmentation architecture to utilize two additional adjacent slices in its input (i.e., 3-channel input).
We used a total of three different data sets for our experiments and evaluations. For cycleGAN training, T2 weighted MRI scans from publicly available PROSTATEx-challenge data  was used. T2-weighted images were acquired using a turbo spin echo sequence with in-plane resolution of - mm, slice thickness of mm and zero gap. Secondly, the testing data set for CycleGAN included 60 prostate MRI cases along with their high-quality delineation obtained from publicly available NCI-ISBI challenge data . This data was used for generating the synthetic CT scans. We used 6-fold stratified cross validation for evaluation of the algorithms. Third, for real CT scans, as part of retrospective IRB approved study, we acquired prostate CT data from 120 anonymized patients from our institution with resolution (). CT intensity was clipped to HU to HU to reveal more soft tissue contrast similar to a soft tissue CT window. Prostate MRI and CT data are completely different from each other, namely unpaired. Among in-house collected CT data, we chose of them to be manually segmented by a board certified radiologist for Dice score (DSC) comparison with our automatic segmentation method.
2.2 Synthetic CT Network: CycleGAN
The synthetic CT images were generated by the CycleGAN model , which consisted of two pairs of generative adversarial networks (GAN) and two extra generators that convert generated data back to the original domain enforcing cycle consistency. In our study, the forward-direction GAN has a generator, , that generate synthetic CT as real as possible such that a discriminator, cannot distinguish it from the real CT. The discriminator is to ensure the likeness of generated data with original data, hence, the reliability of the generated data heavily depends on the performance of the discriminator, the discriminator loss is described by Eq. 1.
Where denotes the j-th true CT slice; represents the i-th MRI slice; represents the generated image by generator from ; represents the discriminator who is trying to differentiate the generated image from CT images, if the discriminator cannot distinguish the generated image, it is labeled 1, which means the discriminator recognized this generated image as true CT image, otherwise a 0 label is given.
The generator is translating the SynCT back to its’ original data domain (MR domain). By minimizing the difference between the reconstructed data and the original data (cycle-consistency loss), a powerful constraint has been enforced on the model to prevent generated data deviation from ground-truth. The cycle-consistency loss is express as Eq. 2 here.
where is the image patch, is number of pixels in , and is the index of pixel; SSIM, for a pixel , is defined as in Eq.3. Where and
denotes mean pixel intensity and the standard deviations of pixel intensity in a local image patch centering at eitheror . Also, and are small constants being added for stability. The cycle loss compares the reconstructed MRI with the true MRI slices in a pixel by pixel manner. In our new formulation, instead of computing mean-square-error (MSE), we propose to use structural similarity index (SSIM) that takes into account the context of the images at a higher level than pixel-level MSE .
2.3 Segmentation Network: 2.5D Res-U-Net
The U-Net architecture has long skip connections to preserve spatial information during down-sampling. Besides long skip connections, short skip connections were also added forming residual blocks to prevent vanishing gradient and increase the convergence speed, the U-Net with short skip connections is called Res-U-Net . Also, the proposed 2.5D input technique loads multiple slices simultaneously, which includes one central slice and its adjacent slices in out-of-plane direction. The number of channels is determined as the sum of central slice and the adjacent slices (). The number of adjacent slices is defined through a designated context number which can query adjacent slices in both positive and negative directions (). For instance, if the context number is set to be 1, the selected adjacent slices will include +1 and -1 slices adjacent to the central slices. The context number can be adjusted in order to optimized the segmentation results.
The CycleGAN model was trained using Adam optimizer for 200 epochs with initial learning rate 0.0002; the 2.5D Res-U-Net model was trained using Adam optimizer for 300 epochs and binary cross entropy loss function was used because there are only two classes, masks and non-masks. Training took about 24 hours for CycleGAN to generate SynCT and about 12 hours for 2.5D Res-U-Net on a DGX-station with 4x Tesla V100 GPUs each with 32GB RAM. The segmentation results are displayed in Figure2. For data augmentation, rotation, flipping, and random crops from ratio 1 (no crop) to 0.5 (half crop) of original images were performed during training.
|Training dataset||Testing dataset||Dice score (DSC)|
|Soft-tissue SynCT Data augmentated||SynCT|
|Soft-tissue SynCT Data augmentated SSIM loss||SynCT|
2.5D Res-U-Net trained and tested on MRI data illustrates the upper bounds of performance, network trained on CT/SynCT data will intuitively be lower than 0.9 (Table 1). SynCTs paired with MRI segmentations were used to train the automatic segmentation network. For SynCT generated from default CycleGAN setting (MSE loss, random crop with fix ratio, 284 to 256 pixels) and no intensity clipping, we achieved 0.83 and 0.45 DSC for SynCT and CT testing set, respectively; for Soft-tissue SynCT (intensity clipped from -500 HU to 500 HU), we achieved 0.82 and 0.62 DSC for SynCT and CT testing set, respectively. More aggressive data augmentation (random crop with random ratio, rotation, flipping) also adapted to generate higher quality SynCT from CycleGAN, which achieved 0.65 and 0.68 DSC for SynCT and CT segmentation testing set, respectively. To increase the structure accuracy, the cycle loss has replaced into structural similarity index (SSIM), the 2.5D Res-U-Net trained with SynCT-SSIM achieved 0.80 and 0.73 DSC for SynCT and CT testing set, respectively. Note that the DSC of SynCT decrease and the DSC of CT increase to reach a compatible point with no statistical difference (), also the standard deviations are converging. This tendency indicated our SynCT gradually reached a point where there was no difference with true CT from 2.5D Res-U-Net network perspective.
4 Discussion and Concluding Remarks
Intensive studies have been made regarding prostate CT automatic segmentation. Recently, the reported highest DSC is by Liu et al. using U-Net and 1114 ture CT cases. Our average result is which is compatibe with Burgos et al. 
using multi-atlas based SynCT (0.73 DSC). We have shown that the SynCT and the CT testing results have no statistical difference indicating the feasibility of using SynCT to train a neural network for a very challenging segmentation task. In some cases DCS is low but not due to low performance of the proposed network. The low DSC is sometimes due to noise in the contouring in the hand-drawn CT ground-truth segmentation and large anatomical and pathological variations (see Figure2).
Data Augmentation: We used MRI and CT scans from different data sources, MRI have smaller field-of-view (FOV) compared to CT. Inconsistent FOV encouraged CycleGAN to shift the anatomy without focusing on anatomical details. To generate high-quality SynCT, we central cropped the CT images by 50% to remove the surrounding air and scanning table. Then augment the data with random ratio (1 - 0.5) random crop, rotation, and flipping to reduce certain geometry tendency affecting the learning process.
2.5D Technique: 2.5D multi-slices input technique can affect the segmentation network performance as Figure 3 shows here. For SynCT, from single slice to 3-slices, DSC increases significantly () by , from 3-slices to 5-slices no significant difference was found, from 5-slices to 7-slices, DSC decreased ; for CT, from single slice to 3-slices, DSC increase significantly by , from 3-slices to 5-slices no significant difference found, from 5-slices to 7-slices, DSC drop significantly by . Therefore, to optimized the performance of 2.5D Res-U-Net and also save training time, context number 1 (3-slices input) was used for all experiments.
In summary, we proposed a novel approach to segment prostate from CT scans when the ground-truth was absent. Synthetic CT scans that share high-quality segmentation with MRI were used to train a deep-learning based automatic segmentation network (2.5D Res-U-Net). The testing results on true CT achieved 0.73 DSC which is comparable with SynCT. We also examined and identified the optimal numbers of multiple slices input, which are 3 or 5 slices. Future steps will include 3D volume assessment and continue improvement of the quality of synthetic CT generation.
-  Nordström, T., et al.: Prostate-specific antigen (PSA) density in the diagnostic algorithm of prostate cancer. Prostate Cancer and Prostatic Diseases 21(1), 57-63 (2017)
-  Smith, W.L. et al.: Prostate volume contouring: A 3D analysis of segmentation using 3DTRUS, CT, and MR. International Journal of Radiation Oncology*Biology*Physics 67(4), 1238–1247 (2007)
-  Rasch, C. et al.: Definition of the prostate in CT and MRI: a multi-observer study. International Journal of Radiation Oncology*Biology*Physics 43(1) 57–66 (1999)
-  Chowdhury, N. et al.: Concurrent segmentation of the prostate on MRI and CT via linked statistical shape models for radiotherapy planning. Medical Physics 39(4) 2214–2228 (2012)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: International Conference on Computer Vision (2017)
-  Wolterink, J.M., Dinkla, A.M., Savenije, M.H., Seevinck, P.R., van den Berg, C.A., Iˇsgum, I.: Deep MR to CT synthesis using unpaired data. In: Workshop on Simulation and Synthesis in Medical Imaging (2017)
-  Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).
-  Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., Pal, C.: The importance of skip connections in biomedical image segmentation. In: Carneiro, G., et al. (eds.) LABELS/DLMIA -2016. LNCS, vol. 10008, pp. 179–187. Springer, Cham (2016)
-  Litjens, G., Debats, O., Barentsz, J., Karssemeijer, N., Huisman, H.: SPIE-AAPM PROSTATEx Challenge Data. doi:10.7937/K9TCIA.2017.MURS5CL
-  Bloch N. et al.: NCI-ISBI 2013 Challenge: Automated Segmentation of Prostate Structures. The Cancer Imaging Archive . http://doi.org/10.7937/K9/TCIA.2015.zF0vlOPv (2015).
-  Zhao, H. et al.: Loss Functions for Image Restoration With Neural Networks. IEEE Transactions on Computational Imaging. 3(1) 47–57 (2017)
-  Liu, C. et al.: Automatic Segmentation of the Prostate on CT Images Using Deep Neural Networks (DNN). International Journal of Radiation Oncology*Biology*Physics. 104(4) 924–932 (2019)
-  Burgos, N., et al.: Iterative framework for the joint segmentation and CT synthesis of MR images: application to MRI-only radiotherapy treatment planning. Physics in Medicine and Biology. 62 4237–4253 (2017)