Cross-modality image synthesis from unpaired data using CycleGAN: Effects of gradient consistency loss and training data size

03/18/2018
by   Yuta Hiasa, et al.
0

CT is commonly used in orthopedic procedures. MRI is used along with CT to identify muscle structures and diagnose osteonecrosis due to its superior soft tissue contrast. However, MRI has poor contrast for bone structures. Clearly, it would be helpful if a corresponding CT were available, as bone boundaries are more clearly seen and CT has standardized (i.e., Hounsfield) units. Therefore, we aim at MR-to-CT synthesis. The CycleGAN was successfully applied to unpaired CT and MR images of the head, these images do not have as much variation of intensity pairs as do images in the pelvic region due to the presence of joints and muscles. In this paper, we extended the CycleGAN approach by adding the gradient consistency loss to improve the accuracy at the boundaries. We conducted two experiments. To evaluate image synthesis, we investigated dependency of image synthesis accuracy on 1) the number of training data and 2) the gradient consistency loss. To demonstrate the applicability of our method, we also investigated a segmentation accuracy on synthesized images.

READ FULL TEXT VIEW PDF

page 3

page 5

page 6

page 7

09/10/2019

Integrating cross-modality hallucinated MRI with CT to aid mediastinal lung tumor segmentation

Lung tumors, especially those located close to or surrounded by soft tis...
09/12/2018

Unpaired Brain MR-to-CT Synthesis using a Structure-Constrained CycleGAN

The cycleGAN is becoming an influential method in medical image synthesi...
11/25/2020

Three-dimensional Segmentation of the Scoliotic Spine from MRI using Unsupervised Volume-based MR-CT Synthesis

Vertebral bone segmentation from magnetic resonance (MR) images is a cha...
05/28/2018

Deep CT to MR Synthesis using Paired and Unpaired Data

MR imaging will play a very important role in radiotherapy treatment pla...
12/11/2018

Evaluating the Impact of Intensity Normalization on MR Image Synthesis

Image synthesis learns a transformation from the intensity features of a...
09/23/2017

A semi-automated segmentation method for detection of pulmonary embolism in True-FISP MRI sequences

Pulmonary embolism (PE) is a highly mortal disease, currently assessed b...
03/13/2020

Random smooth gray value transformations for cross modality learning with gray value invariant networks

Random transformations are commonly used for augmentation of the trainin...

1 Introduction

Computed tomography (CT) is commonly used in orthopedic procedures. Magnetic resonance imaging (MRI) is used along with CT to identify muscle structures and diagnose osteonecrosis due to its superior soft tissue contrast [1]. However, MRI has poor contrast for bone structures. It would be helpful if a corresponding CT were available, as bone boundaries are more clearly seen and CT has standardized (i.e., Hounsfield) units. Considering radiation exposure in CT, it is preferable if we can delineate boundaries of both muscle and bones in MRI. Therefore, we aim at MR-to-CT synthesis.

Image synthesis has been extensively studied using the patch-based learning [2]

as well as deep learning, specifically, convolutional neural networks (CNN)

[3] and generative adversarial networks (GAN) [4]. The conventional approaches required the paired training data, i.e., images of the same patient from multiple modalities that are registered, which limited the application. A method recently proposed by Zhu et al. [5]

, called CycleGAN, utilizes the unpaired training data by appreciating the cycle consistency loss function. While CycleGAN has already applied to MR-to-CT synthesis

[6], all these previous approaches in medical image application targeted CT and MRI of the head in which the scan protocol (i.e., field-of-view (FOV) and the head orientation within the FOV) is relatively consistent resulting in a small variation in the two image distributions even without registration, thus a small number of training data set (20 to 30) allowed a reasonable accuracy. On the other hand, our target anatomy, the hip region, has larger variation in the anatomy as well as their pose (i.e., joint angle change and deformation of muscles).

Applications of image synthesis include segmentation. Some previous studies aimed at segmentation of musculoskeletal structures in MRI [7, 8], but the issues in these studies were the requirement for multiple sequences and devices. Another challenge in segmentation of MRI is that there is no standard unit as in CT. Therefore, manually traced label data are necessary for training of each sequence and each imaging device. Thus, MR-to-CT synthesis realizes modality independent segmentation [9].

In this study, we extend the CycleGAN approach by adding the gradient consistency (GC) loss to encourage edge alignment between images in the two domains and using an order-of-magnitude larger training data set (302 MR and 613 CT volumes) in order to overcome the larger variation and improve the accuracy at the boundaries. We investigated dependency of image synthesis accuracy on 1) the number of training data and 2) incorporation of the GC loss. To demonstrate the applicability of our method, we also investigated a segmentation accuracy on synthesized images.

2 Method

2.1 Materials

The datasets we used in this study are MRI dataset consisting of 302 unlabeled volumes and CT dataset consisting of 613 unlabeled, and 20 labeled volumes which are associated with manual segmentation labels of 19 muscles around hip and thigh, pelvis, femur and sacrum bones. Patients with metallic artifact due to implant in the volume were excluded. As an evaluation dataset, we also used other three sets of paired MR and CT volumes, and 10 MR volumes associated with manual segmentation labels of gluteus medius and minimus muscles, pelvis and femur bones, as a ground truth. MR volumes were scanned in the coronal plane for diagnosis of osteonecrosis by a 1.0T MR imaging system. The T1-weighted volumes were obtained by 3D spoiled gradient recalled echo sequence (SPGR) with a repetition time (TR) of 7.9 ms, echo time (TE) of 3.08 ms, and flip angle of 30. The field of view was 320 mm, and the matrix size was 256256. The slab thickness was 76 mm, and the slice thickness was 2 mm without an inter-slice gap. CT volumes were scanned in the axial plane for diagnosis of the patients subjected to total hip arthroplasty (THA) surgery. The field of view was 360360 mm and the matrix size was 512512. The slice thickness was 2.0 mm for the region including pelvis and proximal femur, 6.0 mm for the femoral shaft region, and 1.0 mm for the distal femur region. In this study, the CT volumes were cropped and resliced so that the FOV resembles that of MRI volumes, as shown in Figure 1, and then resized to 256256.

Figure 1: Training datasets used in this study. MRI dataset consists of 302 unlabeled volumes and CT dataset consists of 613 unlabeled and 20 labeled volumes. N4ITK intensity inhomogeneity correction [10] was applied to all MRI volumes. Two datasets have similar field-of-view, although these are not registered.

2.2 Image synthesis using CycleGAN with gradient-consistency loss

The underlying algorithm of the proposed MR-to-CT synthesis follows that of Zhu et al [5] which allows to translate an image from CT domain to MR domain without pairwise aligned CT and MR training images of the same patient. The workflow of the proposed method is shown in Figure 2. The networks and are generators to translate real MR and CT images to synthesized CT and MR images, respectivery. The networks and are discriminators to distinguish between real and synthesized images. While discriminators try to distinguish synthesized images by maximizing adversarial losses and , defined as

(1)
(2)

generators try to synthesize images which is indistinguishable from the target domain by minimizing these losses. Where and are images from domains and . However, networks with large capacity have potential to converge to the one that translate the same set of images from source domain to any random permutation of images in the target domain. Thus, adversarial losses alone cannot guarantee that the learned generator can translate an individual input to a desired corresponding output. Therefore, the loss function is regularized by cycle consistency, which is defined by the difference between real and reconstructed image, which is the inverse mapping of the synthesized image [5]. The cycle consistency loss is defined as

(3)

We extended the CycleGAN approach by explicitly adding the gradient consistency loss between real and synthesized images to improve the accuracy at the boundaries. The gradient correlation (GC) [11] has been used as a similarity metric in the medical image registration, which is defined by the normalized cross correlation between two images. Given gradients in horizontal and vertical directions of thes two images, and , GC is defined as

(4)

and and are the gradient operator of each direction, is the mean value of . We formulate the gradient-consistency loss as

(5)
Figure 2: Workflow of the proposed method. and are generator networks that translate MR to CT images, and CT to MR images, respectively. and are discriminator networks to distinguish between real and synthesized images. The cycle consistency loss is a regularization term defined by the difference between real and reconstructed image. To improve the accuracy at the edges, loss function is regularized by gradient consistency loss .

Finally, our objective function is defined as:

(6)

where and are weights to balance each loss. Then, we solve:

(7)

In this paper, we used 2D CNN with 9 residual blocks for generator, similar to the one proposed in [12]. For discriminators, we used PatchGAN [13]. We replaced the Eq. (1) and Eq. (2) by least-squares loss as in [14]. These settings follows [5, 6]. The CycleGAN was trained using Adam [15] for the first iterations at fixed learning rate of 0.0002, and the last iterations at learing rate which linearly reducing to zero. The balancing weights were empirically determined as and . CT and MR volumes are normalized such that intensity of [-150, 350] HU and [0, 100] are mapped to [0, 255], respectively.

3 Result

3.1 Quantitative evaluation on image synthesis

To evaluate image synthesis, we investigated dependency of the accuracy on the number of training data and with or without the GC loss. The CycleGAN was trained with datasets of different sizes, i) 20 MR and 20 CT volumes, ii) 302 MR and 613 CT volumes, and both with and without GC loss. We conducted two experiments. The first experiment used three sets of paired MR and CT volumes of the same patient for test data. Because availability of paired MR and CT volumes was limited, we conducted the second experiment in which unpaired 10 MR and 20 CT volumes were used.

In the first experiment, we evaluated synthesized CT by means of mean absolute error (MAE) and peak-signal-to-noise ratio (PSNR) [dB] between synthesized CT and ground truth CT, both of which were normalized as mentioned in 2.2. The ground truth CT here is a CT registered to the MR of the same patient. CT and MR volumes were aligned using landmark-based registration as initialization, and then aligned using rigid and non-rigid registration. The results of MAE and PSNR are shown in Table 1. PSNR is calculated as , where MSE is mean squared error. The average of MAE decreased and PSNR increased according to the increase of training data size and inclusion of GC loss, respectively. Fig 3 shows representative results.

20 volumes 300 volumes
w/o GC /w GC w/o GC /w GC
MAE Patient #1 30.121 30.276 26.899 26.388
Patient #2 26.927 26.911 22.319 21.593
Patient #3 33.651 32.155 29.630 28.643
2-6 Average SD 30.233 2.177 29.781 1.777 26.283 1.367 25.541 1.129
PSNR Patient #1 14.797 14.742 15.643 15.848
Patient #2 15.734 15.628 17.255 17.598
Patient #3 14.510 14.820 15.674 15.950
2-6 Average SD 15.014 0.330 15.063 0.380 16.190 0.273 16.465 0.296
Table 1: Mean absolute error (MAE) and Peak-signal-to-noise ratio (PSNR) between synthesized and real CT volumes.
Figure 3: Representative results of the absolute error between the ground truth paired CT and synthesized CT from two patients. Since the FOV of MR and CT volumes are slightly different, there is no corresponding region near the top edge of the ground truth volumes (filled with white color). This area was not used for evaluation.

In the second experiment, we tested with unpaired 10 MR and 20 CT volumes. Mutual information (MI) between synthesized CT and original MR was used for evaluation when the paired ground truth was not available. The quantitative results are show in Fig.4(a). The left side is the box and whisker plots of the mean of each slice of MI between real CT and synthesized MR (i.e., 20 data points in total). The right side is the mean of MI between real MR and synthesized CT (i.e., 10 data points in total). The result shows that the larger number of training data yielded statistically significant improvement () according to the paired -test in MI. The GC loss also leads to an increase in MI between MR and synthesized CT (). Fig.4(b) and Fig.5 show examples of the visualization of real MR and synthesized CT volumes. As indicated by arrows, we can see that synthesized volumes with GC loss preserved the shape near the femoral head and adductor muscles.

Figure 4: Evaluation of similarity between the real and synthesized volumes. (a) quantitative comparison of mutual information on different training data size with and without the gradient-consistency loss. (b) representative result of one patient.
Figure 5: Representative results of translation from real MR to synthesized CT of four patients with and without the gradient consistency loss. As indicated by arrows, synthesized volumes with gradient consistency loss helped to preserve the shape near the adductor muscles.

3.2 Quantitative evaluation on segmentation

To demonstrate the applicability of image synthesis in segmentation task, we evaluated the segmentation accuracy. Twenty labeled CT datasets were used to train the segmentation network. Then, we evaluated the segmentation accuracy with 10 MR volumes with manual segmentation labels of the gluteus medius and minimus muscles and femur.

We employed the 2D U-net proposed by Ronneberger et al. [16] as segmentation network, which is widely used in medical image analysis and demonstrated high performance with a limited number of labeled volumes. In MRI, muscle boundaries are clearer while bone boundaries are clearer in CT. To incorporate the advantage of both CT and MR, we modified the 2D U-net to take the two-channel input of both CT and synthesized MR images. We trained on 2D U-net using Adam [15] for iterations at learning rate of 0.0001. At the test phase, a pair of MR and synthesized CT was used as two-channel input.

The results with 4 musculoskeletal structures for 10 patients are shown in Fig.6 (i.e., 10 data points in total on each plot). The result shows that the larger number of training data yielded statistically significant improvement in DICE on pelvis (), femur (), glutes medius () and glutes minimus regions () of paired -test. The GC loss also leads to an increase in DICE on the glutes minimus regions (). The average DICE coefficient in the cases trained with more than 300 cases and GC loss was 0.8080.036 (pelvis), 0.8830.029 (femur), 0.8040.040 (gluteus medius) and 0.6690.054 (gluteus minimus), respectively. Fig.7 shows example visualization of real MR, synthesized CT, and esimated label for one patient. The result with GC loss has smoother segmentation not only in the gluteus minimus but also near the adductor muscles.

Figure 6: Evaluation of segmentation accuracy on different training data size in CycleGAN with and without the gradient-consistency loss. Segmentation of (a) pelvis, (b) femur, (c) gluteus medius and (d) gluteus minimus muscle in MR volumes were performed using MR-to-CT synthesis.
Figure 7:

Representative results of segmentation from one patient. The ground truth label is consist of 4 musculoskeletal structures in MRI. Although we evaluated only on 4 structures because ground truth were not available for the other structures on MRI, all 22 estimated labels are shown for qualitative evaluation. In the right-most column, all estimated labels are overlayed on the real MRI. p, f, gmed, gmin denote DICE of pelvis, femur, gluteus medius, and gluteus minimus, respectively.

4 Discussion and Conclusion

In this study, we proposed an image synthesis method which extended the CycleGAN approach by adding the GC loss to improve the accuracy at the boundaries. Specifically, the contributions of this paper are 1) introduction of GC loss in CycleGAN, and 2) quantitative and qualitative evaluation of the dependency of both image synthesis accuracy and segmentation accuracy on a large number of training data. One limitation in this study is that we excluded the patients with implants, while our target cohort (i.e., THA patients) sometime has implant on one side, for example, in case of the planning of secondary surgery. As a comparison against a single modality training, we performed 5-fold cross validation of MR segmentation using 10 labeled MR volumes (i.e., trained with 8 MR volumes and tested on remaining 2 MR volumes) using U-net segmentation network. The DICE was 0.8150.046 (pelvis), 0.9210.023 (femur), 0.8250.029 (gluteus medius) and 0.7520.045 (gluteus minimus), respectively. We found the gap of accuracy between modality independent and dependent segmentation. A potential improvement of modality independent segmentation is to construct an end-to-end network that performs image synthesis and segmentation [17]. Our future work also includes development of a method that effectively incorporates information in unlabeled CT and MR volumes to improve segmentation accuracy [18].

References

  • [1] Cvitanic, O., et al.: MRI diagnosis of tears of the hip abductor tendons (gluteus medius and gluteus minimus). American Journal of Roentgenology 182(1) (2004) 137–143
  • [2] Torrado-Carvajal, A., et al.: Fast patch-based pseudo-CT synthesis from T1-weighted MR images for PET/MR attenuation correction in brain studies. Journal of Nuclear Medicine 57(1) (2016) 136–143
  • [3] Zhao, C., et al.: Whole brain segmentation and labeling from CT using synthetic MR images.

    In: International Workshop on Machine Learning in Medical Imaging, Springer (2017) 291–298

  • [4] Kamnitsas, K., et al.: Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In: International Conference on Information Processing in Medical Imaging, Springer (2017) 597–609
  • [5] Zhu, J.Y., et al.:

    Unpaired image-to-image translation using cycle-consistent adversarial networks.

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) 2223–2232

  • [6] Wolterink, J.M., et al.: Deep MR to CT synthesis using unpaired data. In: International Workshop on Simulation and Synthesis in Medical Imaging, Springer (2017) 14–23
  • [7] Gilles, B., et al.: Musculoskeletal MRI segmentation using multi-resolution simplex meshes with medial representations. Medical image analysis 14(3) (2010) 291–302
  • [8] Ranzini, M.B.M., et al.: Joint multimodal segmentation of clinical CT and MR from hip arthroplasty patients. In: International Workshop and Challenge on Computational Methods and Clinical Applications in Musculoskeletal Imaging, Springer (2017) 72–84
  • [9] Hamarneh, G., et al.: Simulation of ground-truth validation data via physically-and statistically-based warps. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer (2008) 459–467
  • [10] Tustison, N.J., et al.: N4ITK: improved N3 bias correction. IEEE transactions on medical imaging 29(6) (2010) 1310–1320
  • [11] Penney, G.P., et al.: A comparison of similarity measures for use in 2-D-3-D medical image registration. IEEE transactions on medical imaging 17(4) (1998) 586–595
  • [12] Johnson, J., et al.:

    Perceptual losses for real-time style transfer and super-resolution.

    In: European Conference on Computer Vision, Springer (2016) 694–711
  • [13] Isola, P., et al.:

    Image-to-image translation with conditional adversarial networks.

    arXiv preprint (2017)
  • [14] Mao, X., et al.: Multi-class generative adversarial networks with the L2 loss function. CoRR, abs/1611.04076 2 (2016)
  • [15] Kingma, D.P., et al.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  • [16] Ronneberger, O., et al.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, Springer (2015) 234–241
  • [17] Huo, Y., et al.: Adversarial synthesis learning enables segmentation without target modality ground truth. arXiv preprint arXiv:1712.07695 (2017)
  • [18] Zhang, Y., et al.: Deep adversarial networks for biomedical image segmentation utilizing unannotated images. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer (2017) 408–416