The purpose of this paper is to propose new loss function for GAN-based super-resolution of clinical CT images with unpaired micro CT (CT) images. Lung cancer causes the largest number of deaths per year among cancers of male in Japan[Cancerdeath]. Precise non-invasive diagnosis of lung cancer mainly uses clinical CT images. For more precise clinical diagnosis including diagnosing cancer invasion areas, super-resolution (SR) of clinical CT image to CT image resolution level would be one of options. Most SR methods usually require paired training dataset. However, it is infeasible to collect paired clinical and CT volumes.
It is feasible to use unpaired image translation approach like CycleGAN[CycleGAN] or UNIT[UNIT] for super-resolution of clinical CT. However, the original loss function of CycleGAN and UNIT was not designed to maintain similarity of input images and corresponding output SR images. This drawback makes CycleGAN and UNIT tend to generate arbitrary images in SR. It is important to design a loss function that can maintain the similarity of input images and corresponding output images.
We propose the loss function named multi-modality loss function for GAN-based super-resolution on unpaired dataset. We evaluate the effectiveness of proposed loss function by implementing it with CycleGAN or UNIT. We compare the modified models with original CycleGAN or UNIT.
Network training using clinical CT and CT volumes is required. We assume that they have around 8-times difference in resolution. We train our network using 2D patches cropped from clinical or CT volumes. We set the patch sizes from clinical and CT volumes are 3232 pixels and 256256 pixels. Clinical CT and CT images of same patients are used for network training.
2.2 Multi-modality super-resolution loss (MMSR Loss)
Since CycleGAN and UNIT are designed for domain translation, such as Monet’s illustation to Gogh’s one, they does not guarantee the generated images are similar to the original images. Regardless of SR, we would like to keep the structure similarity on clinical CT volumes. Therefore, we would like to consider differences of 1) similarity of structure and 2) intensity range among two domains on the loss function.
The first loss term is based on SSIM[SSIM] (structure similarity). SSIM is an evaluation criterion of similarity of structure between two images. We define the SSIM term for our proposed loss function by
where is the average intensity of a given image x, and
is the variance of a given imagex. is the covariance of given image x and y. is a constant number.
Moreover, regardless of intensity range differences among clinical and CT volumes, the intensity of the images after SR should be kept as if the clinical CT volumes. We introduce new loss called the upsample and downsample loss terms, defined by
where represents the nearest-neighbor upsampling function that could rescale an image 8-times larger than its original size and is the fake clinical CT image generated by the generator . is the average pooling function that rescales an given image to 1/8 of its original size and is super-resolution result generated by the generator . We calculate the MSE (mean squared error) inside these equations. Although this does not directly influence the SR result, it helps to maintain the intensity and structure when translating images from CT domain to clinical CT domain. Then we translate the image back to CT domain again.
Here, we write the overall loss function of CycleGAN as
where is a term consisting of loss function that are used in original CycleGAN[CycleGAN]. , , , are the weights of each loss term.
2.3 Super-resolution CycleGAN (SR-CycleGAN)
CycleGAN can learn to translate an image from a source domain X to a target domain Y in the absence of paired examples. The mathematical idea of CycleGAN is to get an mapping : and another translator : . A loss term called ”cycle consistency loss” is added to encourage and , where x are images from domain and y are images from domain Y. An discriminator
is added to classify whether a given image is definitively from domainY or generated by the generator from domain . Another discriminator is added to classify a given image is definitively from domain X or generated by the generator from domain Y.
However, existing loss function in CycleGAN cannot guarantee the similarity of structure and intensity of input and output images. To solve this problem, we utilize the proposed MMSR Loss in CycleGAN. Furthermore, output image is bigger than the input image in super-resolution because resolution of output image is higher. We have to modify the image-translate generator from domain to domain to a image super-resolution generator, as well as to replace the generator to a generator that could generates image of CT domain to clinical CT domain, as well as downsample the image to one-eighth of its original size. We name the modified CycleGAN as SR-CycleGAN as shown in Fig. 1.
2.4 Super-resolution UNIT (SR-UNIT)
UNIT can be seen as a variantion of CycleGAN. When facing with super-resolution problem, UNIT has problems that are similar to CycleGAN: its loss function also could not meet the requirements of super-resolution problem, and it is not a SR network. We name the modified UNIT as SR-UNIT. Structure of SR-UNIT is also shown in Fig. 1.
2.5 Super-resolution process
Lung regions can be obtained by simple thresholding followed by morphological operation to fill holes and remove excess regions. Intensity normalization is also performed for each scanning modality.
For training, we obtain 2D patches both from clinical CT volumes and CT volumes and use them for training CycleGAN or UNIT. Patch size is 3232 pixels from the clinical CT, and 256256 from the CT. We took 2000 patches randomly from each clinical CT and CT volumes. For inference, we obtain output of the trained super-resolution network generator for patches from input clinical CT volumes.
3 Experimental results and discussion
We evaluated the proposed method on five clinical CT volumes and five corresponding micro-CT volumes of lung cancer specimens obtained after lung resection surgeries. The clinical CT volumes were scanned by a clinical CT scanner (SOMATOM Definition Flash, Siemens Inc., Munich, Germany). The resolution of the clinical CT volume was 0.6250.6250.6 m. The micro CT volumes were scanned by a micro-CT scanner (inspeXio SMX-90CT Plus, Shimadzu, Kyoto, Japan). The lung cancer specimens were scanned with isotropic resolutions in the range of 42-52 m.
In the training phase, we extracted 2000 patches from each case. The size of patches extracted from clinical CT volumes were of 3232 pixels. The size of patches extracted from CT volumes were of 256256 pixels. Since super-resolution always enlarged the images to power of 2 times, and comparing the resolution of clinical CT volumes (625m) and CT volumes (52m), we considered 8-times super-resolution to be the most proper. The weights of proposed loss function were set empirically as and
. Training epoch was 200. Number of total patches was 10000.
3.3 Results and discussion
SR results of our proposed methods were compared with original CycleGAN, original UNIT, and original clinical CT, as shown in Fig. 2. Lung anatomies, such as the bronchus can be observed more clearly in the result of SR-CycleGAN and SR-UNIT as shown in Fig. 2. Original CycleGAN’s and UNIT’s result has produced very different results from original clinical CT volumes. These results demonstrate the proposed loss function works well for clinical CT image super-resolution.
We could obtain results that CycleGAN almost performed better than UNIT qualitatively. The pathiological information was kept after SR: in SR result of CycleGAN, small structures are such as vein were well preserved.
One drawback is that SR result of CycleGAN have artifact like that appeared in CT, which makes it noiseable. By contrst, SR result of UNIT do not have much artifact like that appeared in CycleGAN.
3.4 Difficulty of quantitative evaluation
Quantitative evaluation is usually conducted by comparing SR and original image pairs. However, it is infeasible to obtain such pairs between clinical CT and
CT volumes, as also mentioned in Introduction. In this scheme, feasible quantitative evaluation approach is only to compare original clinical CT volumes and their SR results. This approach is possible by using some metrics like MSE (mean squared error) or PSNR (Peak signal-to-noise ratio)[hore2010image]. These metrics evaluates how our method produced similar intensities to the original clinical CT volumes without destroying intensity distribution or appearance structures. However, we also believe that this approach is still not complete as quantitative evaluation. Finding ways for that is our future work.
4 Conclusions and Future Work
Newly proposed loss function named MMSR loss were added to CycleGAN and UNIT for maintaining image structure and intensity, as well as avoiding generate arbitrary images after SR. Image translation generators of the networks were replaced by image SR generators as well. Experiments showed proposed method successfully performed SR of lung clinical CT images into CT level, while original CycleGAN and UNIT just produced blank images.
4.2 Future Work
Future work includes quantitative evaluation of the proposed methods. Since it is infeasible to obtain paired HR- and LR- data, we could not evaluated the similarity such as PSNR and SSIM directly. Furthermore, although the proposed methods focused on SR of clinical CT to CT scale, the method is not specific to lung clinical CT SR task. It could be applied to other SR task using medical images as processing target, such as SR of CT into H&E-stained image scale. Since it is often difficult to register images from modalities with different resolutions, we believe that SR methods with training by unpaired LR- and HR- images will be important and widely used in the near future.