Multi-modality super-resolution loss for GAN-based super-resolution of clinical CT images using micro CT image database

by   Tong Zheng, et al.

This paper newly introduces multi-modality loss function for GAN-based super-resolution that can maintain image structure and intensity on unpaired training dataset of clinical CT and micro CT volumes. Precise non-invasive diagnosis of lung cancer mainly utilizes 3D multidetector computed-tomography (CT) data. On the other hand, we can take micro CT images of resected lung specimen in 50 micro meter or higher resolution. However, micro CT scanning cannot be applied to living human imaging. For obtaining highly detailed information such as cancer invasion area from pre-operative clinical CT volumes of lung cancer patients, super-resolution (SR) of clinical CT volumes to μCT level might be one of substitutive solutions. While most SR methods require paired low- and high-resolution images for training, it is infeasible to obtain precisely paired clinical CT and micro CT volumes. We aim to propose unpaired SR approaches for clincial CT using micro CT images based on unpaired image translation methods such as CycleGAN or UNIT. Since clinical CT and micro CT are very different in structure and intensity, direct application of GAN-based unpaired image translation methods in super-resolution tends to generate arbitrary images. Aiming to solve this problem, we propose new loss function called multi-modality loss function to maintain the similarity of input images and corresponding output images in super-resolution task. Experimental results demonstrated that the newly proposed loss function made CycleGAN and UNIT to successfully perform SR of clinical CT images of lung cancer patients into micro CT level resolution, while original CycleGAN and UNIT failed in super-resolution.



There are no comments yet.


page 3

page 5


Super-resolution of clinical CT volumes with modified CycleGAN using micro CT volumes

This paper presents a super-resolution (SR) method with unpaired trainin...

Micro CT Image-Assisted Cross Modality Super-Resolution of Clinical CT Images Utilizing Synthesized Training Dataset

This paper proposes a novel, unsupervised super-resolution (SR) approach...

A comparative study of paired versus unpaired deep learning methods for physically enhancing digital rock image resolution

X-ray micro-computed tomography (micro-CT) has been widely leveraged to ...

GAN-based disentanglement learning for chest X-ray rib suppression

Clinical evidence has shown that rib-suppressed chest X-rays (CXRs) can ...

Micro-CT Synthesis and Inner Ear Super Resolution via Bayesian Generative Adversarial Networks

Existing medical image super-resolution methods rely on pairs of low- an...

Robust Super-Resolution GAN, with Manifold-based and Perception Loss

Super-resolution using deep neural networks typically relies on highly c...

CT Super-resolution GAN Constrained by the Identical, Residual, and Cycle Learning Ensemble(GAN-CIRCLE)

Computed tomography (CT) is a popular medical imaging modality for scree...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Purpose

The purpose of this paper is to propose new loss function for GAN-based super-resolution of clinical CT images with unpaired micro CT (CT) images. Lung cancer causes the largest number of deaths per year among cancers of male in Japan[Cancerdeath]. Precise non-invasive diagnosis of lung cancer mainly uses clinical CT images. For more precise clinical diagnosis including diagnosing cancer invasion areas, super-resolution (SR) of clinical CT image to CT image resolution level would be one of options. Most SR methods usually require paired training dataset. However, it is infeasible to collect paired clinical and CT volumes.

It is feasible to use unpaired image translation approach like CycleGAN[CycleGAN] or UNIT[UNIT] for super-resolution of clinical CT. However, the original loss function of CycleGAN and UNIT was not designed to maintain similarity of input images and corresponding output SR images. This drawback makes CycleGAN and UNIT tend to generate arbitrary images in SR. It is important to design a loss function that can maintain the similarity of input images and corresponding output images.

Figure 1: Network structures of (a) SR-CycleGAN and (b) SR-UNIT. Modification to both network is same: replacement of generator with a Resnet[he2016deep]-based SR network, replaced generator with network of which the length and width of output image size is 1/8 of input. We also added loss terms named as ”multi-modality SR loss” during training phase. Modified network structure and newly proposed loss functions made both SR-CycleGAN and SR-UNIT successfully performed SR of clinical CT to CT scale.

2 Methods

2.1 Overview

We propose the loss function named multi-modality loss function for GAN-based super-resolution on unpaired dataset. We evaluate the effectiveness of proposed loss function by implementing it with CycleGAN or UNIT. We compare the modified models with original CycleGAN or UNIT.

Network training using clinical CT and CT volumes is required. We assume that they have around 8-times difference in resolution. We train our network using 2D patches cropped from clinical or CT volumes. We set the patch sizes from clinical and CT volumes are 3232 pixels and 256256 pixels. Clinical CT and CT images of same patients are used for network training.

2.2 Multi-modality super-resolution loss (MMSR Loss)

Since CycleGAN and UNIT are designed for domain translation, such as Monet’s illustation to Gogh’s one, they does not guarantee the generated images are similar to the original images. Regardless of SR, we would like to keep the structure similarity on clinical CT volumes. Therefore, we would like to consider differences of 1) similarity of structure and 2) intensity range among two domains on the loss function.

The first loss term is based on SSIM[SSIM] (structure similarity). SSIM is an evaluation criterion of similarity of structure between two images. We define the SSIM term for our proposed loss function by


where is the average intensity of a given image x, and

is the variance of a given image

x. is the covariance of given image x and y. is a constant number.

Moreover, regardless of intensity range differences among clinical and CT volumes, the intensity of the images after SR should be kept as if the clinical CT volumes. We introduce new loss called the upsample and downsample loss terms, defined by


where represents the nearest-neighbor upsampling function that could rescale an image 8-times larger than its original size and is the fake clinical CT image generated by the generator . is the average pooling function that rescales an given image to 1/8 of its original size and is super-resolution result generated by the generator . We calculate the MSE (mean squared error) inside these equations. Although this does not directly influence the SR result, it helps to maintain the intensity and structure when translating images from CT domain to clinical CT domain. Then we translate the image back to CT domain again.

Here, we write the overall loss function of CycleGAN as


where is a term consisting of loss function that are used in original CycleGAN[CycleGAN]. , , , are the weights of each loss term.

2.3 Super-resolution CycleGAN (SR-CycleGAN)

CycleGAN can learn to translate an image from a source domain X to a target domain Y in the absence of paired examples. The mathematical idea of CycleGAN is to get an mapping : and another translator : . A loss term called ”cycle consistency loss” is added to encourage and , where x are images from domain and y are images from domain Y. An discriminator

is added to classify whether a given image is definitively from domain

Y or generated by the generator from domain . Another discriminator is added to classify a given image is definitively from domain X or generated by the generator from domain Y.

However, existing loss function in CycleGAN cannot guarantee the similarity of structure and intensity of input and output images. To solve this problem, we utilize the proposed MMSR Loss in CycleGAN. Furthermore, output image is bigger than the input image in super-resolution because resolution of output image is higher. We have to modify the image-translate generator from domain to domain to a image super-resolution generator, as well as to replace the generator to a generator that could generates image of CT domain to clinical CT domain, as well as downsample the image to one-eighth of its original size. We name the modified CycleGAN as SR-CycleGAN as shown in Fig. 1.

2.4 Super-resolution UNIT (SR-UNIT)

UNIT can be seen as a variantion of CycleGAN. When facing with super-resolution problem, UNIT has problems that are similar to CycleGAN: its loss function also could not meet the requirements of super-resolution problem, and it is not a SR network. We name the modified UNIT as SR-UNIT. Structure of SR-UNIT is also shown in Fig. 1.

2.5 Super-resolution process

Lung regions can be obtained by simple thresholding followed by morphological operation to fill holes and remove excess regions. Intensity normalization is also performed for each scanning modality.

For training, we obtain 2D patches both from clinical CT volumes and CT volumes and use them for training CycleGAN or UNIT. Patch size is 3232 pixels from the clinical CT, and 256256 from the CT. We took 2000 patches randomly from each clinical CT and CT volumes. For inference, we obtain output of the trained super-resolution network generator for patches from input clinical CT volumes.

3 Experimental results and discussion

3.1 Dataset

We evaluated the proposed method on five clinical CT volumes and five corresponding micro-CT volumes of lung cancer specimens obtained after lung resection surgeries. The clinical CT volumes were scanned by a clinical CT scanner (SOMATOM Definition Flash, Siemens Inc., Munich, Germany). The resolution of the clinical CT volume was 0.6250.6250.6 m. The micro CT volumes were scanned by a micro-CT scanner (inspeXio SMX-90CT Plus, Shimadzu, Kyoto, Japan). The lung cancer specimens were scanned with isotropic resolutions in the range of 42-52 m.

3.2 Condition

In the training phase, we extracted 2000 patches from each case. The size of patches extracted from clinical CT volumes were of 3232 pixels. The size of patches extracted from CT volumes were of 256256 pixels. Since super-resolution always enlarged the images to power of 2 times, and comparing the resolution of clinical CT volumes (625m) and CT volumes (52m), we considered 8-times super-resolution to be the most proper. The weights of proposed loss function were set empirically as and

. Training epoch was 200. Number of total patches was 10000.

3.3 Results and discussion

SR results of our proposed methods were compared with original CycleGAN, original UNIT, and original clinical CT, as shown in Fig. 2. Lung anatomies, such as the bronchus can be observed more clearly in the result of SR-CycleGAN and SR-UNIT as shown in Fig. 2. Original CycleGAN’s and UNIT’s result has produced very different results from original clinical CT volumes. These results demonstrate the proposed loss function works well for clinical CT image super-resolution.

We could obtain results that CycleGAN almost performed better than UNIT qualitatively. The pathiological information was kept after SR: in SR result of CycleGAN, small structures are such as vein were well preserved.

One drawback is that SR result of CycleGAN have artifact like that appeared in CT, which makes it noiseable. By contrst, SR result of UNIT do not have much artifact like that appeared in CycleGAN.

Figure 2: Comparison of CycleGAN and UNIT with/without proposed loss function. (a) Images cropped from bronchus region. (b) Images cropped from tumor region. (c) Images cropped from vessel. We could obtain both SR-CycleGAN and SR-UNIT could perform SR of clinical CT, while SR-CycleGAN outperforms other methods, especially in bronchus region. In addition, SR-CycleGAN could rebuild the bronchus walls while SR-UNIT could not. Original CycleGAN and UNIT failed to generate SR images.

3.4 Difficulty of quantitative evaluation

Quantitative evaluation is usually conducted by comparing SR and original image pairs. However, it is infeasible to obtain such pairs between clinical CT and

CT volumes, as also mentioned in Introduction. In this scheme, feasible quantitative evaluation approach is only to compare original clinical CT volumes and their SR results. This approach is possible by using some metrics like MSE (mean squared error) or PSNR (Peak signal-to-noise ratio)

[hore2010image]. These metrics evaluates how our method produced similar intensities to the original clinical CT volumes without destroying intensity distribution or appearance structures. However, we also believe that this approach is still not complete as quantitative evaluation. Finding ways for that is our future work.

4 Conclusions and Future Work

4.1 Conclusions

Newly proposed loss function named MMSR loss were added to CycleGAN and UNIT for maintaining image structure and intensity, as well as avoiding generate arbitrary images after SR. Image translation generators of the networks were replaced by image SR generators as well. Experiments showed proposed method successfully performed SR of lung clinical CT images into CT level, while original CycleGAN and UNIT just produced blank images.

4.2 Future Work

Future work includes quantitative evaluation of the proposed methods. Since it is infeasible to obtain paired HR- and LR- data, we could not evaluated the similarity such as PSNR and SSIM directly. Furthermore, although the proposed methods focused on SR of clinical CT to CT scale, the method is not specific to lung clinical CT SR task. It could be applied to other SR task using medical images as processing target, such as SR of CT into H&E-stained image scale. Since it is often difficult to register images from modalities with different resolutions, we believe that SR methods with training by unpaired LR- and HR- images will be important and widely used in the near future.

Parts of this research was supported by MEXT/JSPS KAKENHI (26108006, 17H00867 and 17K20099), the JSPS Bilateral International Collaboration Grants, the AMED (18lk1010028s0401 and 19lk1010036h0001) and the Hori Sciences & Arts Foundation. Submitted elsewhere This work has never been submitted to elsewhere.