Cross-modality Knowledge Transfer for Prostate Segmentation from CT Scans

08/26/2019 ∙ by Yucheng Liu, et al. ∙ 4

Creating large scale high-quality annotations is a known challenge in medical imaging. In this work, based on the CycleGAN algorithm, we propose leveraging annotations from one modality to be useful in other modalities. More specifically, the proposed algorithm creates highly realistic synthetic CT images (SynCT) from prostate MR images using unpaired data sets. By using SynCT images (without segmentation labels) and MR images (with segmentation labels available), we have trained a deep segmentation network for precise delineation of prostate from real CT scans. For the generator in our CycleGAN, the cycle consistency term is used to guarantee that SynCT shares the identical manually-drawn, high-quality masks originally delineated on MR images. Further, we introduce a cost function based on structural similarity index (SSIM) to improve the anatomical similarity between real and synthetic images. For segmentation followed by the SynCT generation from CycleGAN, automatic delineation is achieved through a 2.5D Residual U-Net. Quantitative evaluation demonstrates comparable segmentation results between our SynCT and radiologist drawn masks for real CT images, solving an important problem in medical image segmentation field when ground truth annotations are not available for the modality of interest.



There are no comments yet.


page 3

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Prostate segmentation from radiology scans is often necessary for radiotherapy, prostatectomy, and calculation of prostate-specific antigen (PSA) density [1]

. Among imaging modalities, magnetic resonance imaging (MRI) provides the best soft tissue contrast and yields the most accurate estimation on prostate volume, consistent with prostatectomy specimen volumes 

[2]. Unlike MRI, computed tomographic (CT) scans have difficulties to distinguish the boundaries of prostates and other adjacent tissues during segmentation [3]. Despite this, in current clinical practice, prostate radiation therapy dose calculations is primarily based on CT scans as it is the only modality that can derive electron density needed for the dosimetry calculations [4]. Therefore, planning systems generally require anatomical information to be delineated on CT scans.

In this study, we address a practical yet still very challenging issue of prostate segmentation from CT images when there are no ground truth CT annotations to supervise the segmentation algorithm. Instead, we target utilizing segmentation labels from widely available MRI data sets, and propose a two step knowledge transfer algorithm to map the segmentation labels from MRI to CT scans. The correspondence between MRI to CT is established through a CycleGAN algorithm [5] with a structural similarity preserving cost function. Highly realistic synthetic CT scans generated in the first step are then used to supervise a deep segmentation network in the second step. The training for the segmentation network is performed only on the synthetic images while testing is done on both synthetic and real CT scans for evaluation. While our framework does not enforce the use of any specific segmentation network to finalize the delineation process, we choose 2.5D Res-U-Net to accomplish this task with faster convergence, and higher accuracy.

2 Methods

The proposed workflow includes two main steps as demonstrated in Figure 1. First step is to generate high-quality and reliable CT images (SynCT) from MR images. Previous work [6] has shown that domain adaptation from MR images to CT images is feasible using the CycleGAN architecture. We used a similar CycleGAN approach as baseline to create high-quality knowledge transfer between unpaired MRI and CT.

Figure 1: Workflow of CT image synthesis and automatic segmentation. The red box indicate the first step, CT image synthesis via CycleGAN model. SynCTs with identical anatomical structures as MRI were generated thus shared high-quality segmentation with MRI ( labeled red). The blue box indicate the second step, automatic segmentation via 2.5D Res-U-Net train with SynCT. The automatic generated segmentation (labeled pink) on true CT images were compared against manual segmentation from radiologist.

Second step is to conduct automatic segmentation of prostate. We trained a U-Net based segmentation network to delineate the whole prostate area but with two main differences from the existing literature: (i) we used SynCT in training and real CT scans in testing, and (ii) we modified the U-Net [7] to increase the segmentation performance by adding residual blocks into the segmentation network. For better 3D information fusion, we also modified the segmentation architecture to utilize two additional adjacent slices in its input (i.e., 3-channel input).

2.1 Data

We used a total of three different data sets for our experiments and evaluations. For cycleGAN training, T2 weighted MRI scans from publicly available PROSTATEx-challenge data [9] was used. T2-weighted images were acquired using a turbo spin echo sequence with in-plane resolution of - mm, slice thickness of mm and zero gap. Secondly, the testing data set for CycleGAN included 60 prostate MRI cases along with their high-quality delineation obtained from publicly available NCI-ISBI challenge data [10]. This data was used for generating the synthetic CT scans. We used 6-fold stratified cross validation for evaluation of the algorithms. Third, for real CT scans, as part of retrospective IRB approved study, we acquired prostate CT data from 120 anonymized patients from our institution with resolution (). CT intensity was clipped to HU to HU to reveal more soft tissue contrast similar to a soft tissue CT window. Prostate MRI and CT data are completely different from each other, namely unpaired. Among in-house collected CT data, we chose of them to be manually segmented by a board certified radiologist for Dice score (DSC) comparison with our automatic segmentation method.

2.2 Synthetic CT Network: CycleGAN

The synthetic CT images were generated by the CycleGAN model [5], which consisted of two pairs of generative adversarial networks (GAN) and two extra generators that convert generated data back to the original domain enforcing cycle consistency. In our study, the forward-direction GAN has a generator, , that generate synthetic CT as real as possible such that a discriminator, cannot distinguish it from the real CT. The discriminator is to ensure the likeness of generated data with original data, hence, the reliability of the generated data heavily depends on the performance of the discriminator, the discriminator loss is described by Eq. 1.


Where denotes the j-th true CT slice; represents the i-th MRI slice; represents the generated image by generator from ; represents the discriminator who is trying to differentiate the generated image from CT images, if the discriminator cannot distinguish the generated image, it is labeled 1, which means the discriminator recognized this generated image as true CT image, otherwise a 0 label is given.

The generator is translating the SynCT back to its’ original data domain (MR domain). By minimizing the difference between the reconstructed data and the original data (cycle-consistency loss), a powerful constraint has been enforced on the model to prevent generated data deviation from ground-truth. The cycle-consistency loss is express as Eq. 2 here.


where is the image patch, is number of pixels in , and is the index of pixel; SSIM, for a pixel , is defined as in Eq.3. Where and

denotes mean pixel intensity and the standard deviations of pixel intensity in a local image patch centering at either

or . Also, and are small constants being added for stability. The cycle loss compares the reconstructed MRI with the true MRI slices in a pixel by pixel manner. In our new formulation, instead of computing mean-square-error (MSE), we propose to use structural similarity index (SSIM) that takes into account the context of the images at a higher level than pixel-level MSE [11].

2.3 Segmentation Network: 2.5D Res-U-Net

The U-Net architecture[7] has long skip connections to preserve spatial information during down-sampling. Besides long skip connections, short skip connections were also added forming residual blocks to prevent vanishing gradient and increase the convergence speed, the U-Net with short skip connections is called Res-U-Net [8]. Also, the proposed 2.5D input technique loads multiple slices simultaneously, which includes one central slice and its adjacent slices in out-of-plane direction. The number of channels is determined as the sum of central slice and the adjacent slices (). The number of adjacent slices is defined through a designated context number which can query adjacent slices in both positive and negative directions (). For instance, if the context number is set to be 1, the selected adjacent slices will include +1 and -1 slices adjacent to the central slices. The context number can be adjusted in order to optimized the segmentation results.

3 Results

The CycleGAN model was trained using Adam optimizer for 200 epochs with initial learning rate 0.0002; the 2.5D Res-U-Net model was trained using Adam optimizer for 300 epochs and binary cross entropy loss function was used because there are only two classes, masks and non-masks. Training took about 24 hours for CycleGAN to generate SynCT and about 12 hours for 2.5D Res-U-Net on a DGX-station with 4x Tesla V100 GPUs each with 32GB RAM. The segmentation results are displayed in Figure 

2. For data augmentation, rotation, flipping, and random crops from ratio 1 (no crop) to 0.5 (half crop) of original images were performed during training.

Training dataset Testing dataset Dice score (DSC)
Soft-tissue SynCT SynCT
Soft-tissue SynCT Data augmentated SynCT
Soft-tissue SynCT Data augmentated SSIM loss SynCT
Table 1: Segmentation results (DSC) of MRI, SynCT and CT testing dataset.

2.5D Res-U-Net trained and tested on MRI data illustrates the upper bounds of performance, network trained on CT/SynCT data will intuitively be lower than 0.9 (Table 1). SynCTs paired with MRI segmentations were used to train the automatic segmentation network. For SynCT generated from default CycleGAN setting (MSE loss, random crop with fix ratio, 284 to 256 pixels) and no intensity clipping, we achieved 0.83 and 0.45 DSC for SynCT and CT testing set, respectively; for Soft-tissue SynCT (intensity clipped from -500 HU to 500 HU), we achieved 0.82 and 0.62 DSC for SynCT and CT testing set, respectively. More aggressive data augmentation (random crop with random ratio, rotation, flipping) also adapted to generate higher quality SynCT from CycleGAN, which achieved 0.65 and 0.68 DSC for SynCT and CT segmentation testing set, respectively. To increase the structure accuracy, the cycle loss has replaced into structural similarity index (SSIM), the 2.5D Res-U-Net trained with SynCT-SSIM achieved 0.80 and 0.73 DSC for SynCT and CT testing set, respectively. Note that the DSC of SynCT decrease and the DSC of CT increase to reach a compatible point with no statistical difference (), also the standard deviations are converging. This tendency indicated our SynCT gradually reached a point where there was no difference with true CT from 2.5D Res-U-Net network perspective.

4 Discussion and Concluding Remarks

Intensive studies have been made regarding prostate CT automatic segmentation. Recently, the reported highest DSC is by Liu et al. [12]using U-Net and 1114 ture CT cases. Our average result is which is compatibe with Burgos et al. [13]

using multi-atlas based SynCT (0.73 DSC). We have shown that the SynCT and the CT testing results have no statistical difference indicating the feasibility of using SynCT to train a neural network for a very challenging segmentation task. In some cases DCS is low but not due to low performance of the proposed network. The low DSC is sometimes due to noise in the contouring in the hand-drawn CT ground-truth segmentation and large anatomical and pathological variations (see Figure 


Figure 2: Example slices of segmentation results on true CT. (A) Under-segmented prostate by expert radiologists. 2.5D Res-U-Net can generate better segmentation (C) since it adapted the segmentation from MRI, however, resulting a misleadingly lower DSC, 0.74. CT with normal intensity can vary from -1000 HU (air) to 1000 (bone), therefore soft tissues consists of similar HU numbers may not be seen clearly on the images, as demonstrated on the middle part of the figure, where (D) is CT with ground-truth segmentation from radiologist, (E) is CT without any intensity adjustment, and (F) is CT with 2.5D Res-U-Net generated segmentation. Last row demonstrates CT with soft tissue window (-500 HU to 500 HU, we called ST-CT (soft tissue CT)), which is slightly larger than typical soft tissue window, -150 HU to 350 HU, to accommodate more information in the slices. Where (G) is ST-CT with ground-truth segmentation, (H) is the ST-CT, and (I) is the ST-CT with 2.5D Res-U-Net generated segmentation. At the same case, the DSC of CT and ST-CT is 0.57 and 0.80, respectively.

Data Augmentation: We used MRI and CT scans from different data sources, MRI have smaller field-of-view (FOV) compared to CT. Inconsistent FOV encouraged CycleGAN to shift the anatomy without focusing on anatomical details. To generate high-quality SynCT, we central cropped the CT images by 50% to remove the surrounding air and scanning table. Then augment the data with random ratio (1 - 0.5) random crop, rotation, and flipping to reduce certain geometry tendency affecting the learning process.

2.5D Technique: 2.5D multi-slices input technique can affect the segmentation network performance as Figure 3 shows here. For SynCT, from single slice to 3-slices, DSC increases significantly () by , from 3-slices to 5-slices no significant difference was found, from 5-slices to 7-slices, DSC decreased ; for CT, from single slice to 3-slices, DSC increase significantly by , from 3-slices to 5-slices no significant difference found, from 5-slices to 7-slices, DSC drop significantly by . Therefore, to optimized the performance of 2.5D Res-U-Net and also save training time, context number 1 (3-slices input) was used for all experiments.

Figure 3: Boxplots are showing the Dice scores for prostate segmentation from MRI, SynCT, and CT, respectively.

In summary, we proposed a novel approach to segment prostate from CT scans when the ground-truth was absent. Synthetic CT scans that share high-quality segmentation with MRI were used to train a deep-learning based automatic segmentation network (2.5D Res-U-Net). The testing results on true CT achieved 0.73 DSC which is comparable with SynCT. We also examined and identified the optimal numbers of multiple slices input, which are 3 or 5 slices. Future steps will include 3D volume assessment and continue improvement of the quality of synthetic CT generation.


  • [1] Nordström, T., et al.: Prostate-specific antigen (PSA) density in the diagnostic algorithm of prostate cancer. Prostate Cancer and Prostatic Diseases 21(1), 57-63 (2017)
  • [2] Smith, W.L. et al.: Prostate volume contouring: A 3D analysis of segmentation using 3DTRUS, CT, and MR. International Journal of Radiation Oncology*Biology*Physics 67(4), 1238–1247 (2007)
  • [3] Rasch, C. et al.: Definition of the prostate in CT and MRI: a multi-observer study. International Journal of Radiation Oncology*Biology*Physics 43(1) 57–66 (1999)
  • [4] Chowdhury, N. et al.: Concurrent segmentation of the prostate on MRI and CT via linked statistical shape models for radiotherapy planning. Medical Physics 39(4) 2214–2228 (2012)
  • [5]

    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: International Conference on Computer Vision (2017)

  • [6] Wolterink, J.M., Dinkla, A.M., Savenije, M.H., Seevinck, P.R., van den Berg, C.A., Iˇsgum, I.: Deep MR to CT synthesis using unpaired data. In: Workshop on Simulation and Synthesis in Medical Imaging (2017)
  • [7] Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).
  • [8] Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., Pal, C.: The importance of skip connections in biomedical image segmentation. In: Carneiro, G., et al. (eds.) LABELS/DLMIA -2016. LNCS, vol. 10008, pp. 179–187. Springer, Cham (2016)
  • [9] Litjens, G., Debats, O., Barentsz, J., Karssemeijer, N., Huisman, H.: SPIE-AAPM PROSTATEx Challenge Data. doi:10.7937/K9TCIA.2017.MURS5CL
  • [10] Bloch N. et al.: NCI-ISBI 2013 Challenge: Automated Segmentation of Prostate Structures. The Cancer Imaging Archive . (2015).
  • [11] Zhao, H. et al.: Loss Functions for Image Restoration With Neural Networks. IEEE Transactions on Computational Imaging. 3(1) 47–57 (2017)
  • [12] Liu, C. et al.: Automatic Segmentation of the Prostate on CT Images Using Deep Neural Networks (DNN). International Journal of Radiation Oncology*Biology*Physics. 104(4) 924–932 (2019)
  • [13] Burgos, N., et al.: Iterative framework for the joint segmentation and CT synthesis of MR images: application to MRI-only radiotherapy treatment planning. Physics in Medicine and Biology. 62 4237–4253 (2017)