Cycle-free CycleGAN using Invertible Generator for Unsupervised Low-Dose CT Denoising

04/17/2021 ∙ by Taesung Kwon, et al. ∙ KAIST 수리과학과 0

Recently, CycleGAN was shown to provide high-performance, ultra-fast denoising for low-dose X-ray computed tomography (CT) without the need for a paired training dataset. Although this was possible thanks to cycle consistency, CycleGAN requires two generators and two discriminators to enforce cycle consistency, demanding significant GPU resources and technical skills for training. A recent proposal of tunable CycleGAN with Adaptive Instance Normalization (AdaIN) alleviates the problem in part by using a single generator. However, two discriminators and an additional AdaIN code generator are still required for training. To solve this problem, here we present a novel cycle-free Cycle-GAN architecture, which consists of a single generator and a discriminator but still guarantees cycle consistency. The main innovation comes from the observation that the use of an invertible generator automatically fulfills the cycle consistency condition and eliminates the additional discriminator in the CycleGAN formulation. To make the invertible generator more effective, our network is implemented in the wavelet residual domain. Extensive experiments using various levels of low-dose CT images confirm that our method can significantly improve denoising performance using only 10 learnable parameters and faster training time compared to the conventional CycleGAN.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 7

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

X-ray computed tomography (CT) is one of the most commonly used medical imaging modalities with the benefits of high-resolution imaging in a short scan time. However, excessive X-ray radiation can potentially increase the incidence of cancer, so the low-dose CT scanning has been extensively studied to minimize the radiation dose to the patients. Unfortunately, various artifacts appear on low-dose CT images, which significantly reduces the diagnostic values.

Recently, deep learning approaches [1, 2, 3, 4, 5, 6] have been proposed for low-dose CT denoising with impressive performance. The majority of these works [1, 3, 2, 4]

are based on supervised learning, where the neural network is trained with paired low-dose CT (LDCT) and standard-dose CT (SDCT) images. However, the simultaneous acquisition of images with low and high doses is often difficult, which also leads to increased radiation exposure to the subjects.

Accordingly, unsupervised learning approaches that do not require matched LDCT and SDCT images have become a major focus of research in the CT community

[5, 7, 8, 6]. In particular, the authors in [5, 6] proposed a CycleGAN approach [9] for low-dose CT denoising that trains the networks with unpaired LDCT and SDCT images. To enable such unpaired training, two generators are necessary: one for the forward mapping from LDCT to SDCT, and the other for inverse mapping from SDCT to LDCT. The cycle consistency is then enforced so that an image that goes through the successive application of forward and inverse mapping should revert to the original one. In fact, a recent theoretical study [10] reveals that this CycleGAN architecture emerges as a dual formulation of an optimal transport problem where the statistical distances between the empirical and transported measures in both source and target domains are simultaneously minimized.

Although two generators are required for training, only the forward generator is used at the time of inference. Nonetheless, the inverse mapping generator requires a similar number of learnable parameters and memory as the forward mapping, making the CycleGAN architecture inefficient. Furthermore, two generators and two discriminators should be trained simultaneously for convergence, which requires high-level of skills and know-how for training. To mitigate this problem, Gu et al. [6] proposed a tunable CycleGAN with adaptive instance normalization (AdaIN) [11]. The main idea is that a single generator can be switched to the forward or inverse generator by simply changing the AdaIN code that is generated by a lightweight AdaIN code generator. However, the architecture still requires two discriminators to distinguish the fake and real samples for LDCT and SDCT domains, the individual complexity of which is still as high as that of a generator.

Therefore, one of the ultimate goals of the CycleGAN study for low-dose CT noise removal would be to eliminate the unnecessary generator and discriminator while still maintaining the optimality of CycleGAN from the point of view of optimal transport. Indeed, one of the most important contributions of this paper is to show that using an invertible generator architecture can automatically satisfy the cycle consistency term and completely remove one of the discriminators without affecting the CycleGAN framework (see Fig. 1).

To meet the invertibility conditions, our generator is implemented using the coupling layers originally proposed for the normalizing flow [12, 13]. Then our generator is trained with just a single discriminator that distinguishes the fake SDCT from the real SDCT images. To make the invertible generator sufficiently expressive for low-dose CT denoising, our network is trained using the wavelet residual domain. Despite the lack of explicit cycle consistency, our algorithm maintains the optimality of CycleGAN and offers state-of-the-art noise removal with only 10% of the trainable parameters compared to the conventional CycleGAN. Furthermore, the training time is two times faster. Since there is no explicit cycle consistency, our method is dubbed as cycle-free CycleGAN.

This paper is structured as follows. Section II reviews the existing theory of normalizing flow. Then, Section III explains the mathematical theory behind our cycle-free CycleGAN. Section IV explains the implementation issues, training and analysis details, and our low-dose CT datasets. Experimental results using the various levels of low-dose CT denoising tasks are shown in Section V, which is followed by conclusion in Section VII.

Ii Related works

Ii-a Normalizing Flow

Our method is inspired by the normalizing flow (NF) or invertible flow [12, 13, 14], so we review them briefly to highlight the similarities and differences from our work. However, the original derivation of NF [15, 12, 16, 13, 14] is difficult to reveal the link to our cycle-free CycleGAN, so here we present a new derivation, which is inspired from -VAE [17].

Let and denote the ambient space and latent space, respectively. In classical variational inference, the model distribution is obtained by combining a latent space distribution with a family of conditional distributions , which leads to an interesting lower bound:

(1)
(2)

where denotes the Kullback–Leibler (KL) divergence [15]. The lower bound in (1) is often called the evidence lower bound (ELBO) or the variational lower bound [18]. Then, the goal of the variational inference tries to find and the posterior that maximize the lower bound.

Among the various choices of posterior for the ELBO, the following form of the posterior is most often used [17]:

(3)

where

is zero-mean unit-variance Gaussian, and

is the encoder function parameterized by for a given input in addition to noise .

For the given encoder in (3), the ELBO loss in (2) can be simplified as [17]:

(4)

where the first term in (II-A) is obtained from the first term in (2

) that corresponds to the likelihood term. This can be represented as following by assuming the Gaussian distribution:

(5)

where is the decoder function parameterized by . Furthermore, VAE chooses the following form of the encoder function:

(6)

where

is the noise standard deviation, which is often called the reparametrization trick

[15]. Then, the normalizing flow further enforces that is an invertible function such that

(7)

Thanks to the invertibility condition in (7), a very interesting phenomenon happens. More specifically, (II-A) can be simplified as follows:

(8)

which becomes a constant. Therefore, the decoder part is no more necessary in the parameter estimation. Accordingly, the ELBO loss in (

II-A) can be simplified as

(9)

where we have also removed term since this is also a constant. If we further assume the zero mean unit variance Gaussian measure for the latent space , (9) can be further simplified as

(10)

which is the final loss function for NF.

Now the main technical difficulty of minimizing the loss function in (10) arises from the last term which involves with complicated determinant calculation for huge size matrix. Aside from the invertible network architecture that satisfies (7), normalizing flow therefore focuses on encoder function which is composed of a sequence of transformations:

(11)

For this encoder function, the change of variable formula leads to

(12)

where . Note that the complicated determinant computation in (10) can be replaced by the relative easy computation for each step [12].

Iii Main Contribution

Iii-a Derivation of cycle-free CycleGAN

(a) (b)
Fig. 2: Geometry of optimal transport: (a) Conventional CycleGAN minimizes two statistical distances and simultaneously. (b) Cycle-free CycleGAN minimizes a single distance as the other distance can be automatically minimized due to the invertibility.

Similar to the normalizing flow which is concerned about the conversion between the latent space and the ambient space for image generation, the main goal of CycleGAN is the image transfer between two spaces, say and .

Specifically, for the case of low-dose CT denoising, the target SDCT image space

is equipped with a probability measure

, whereas the LDCT image space is with a probability measure (see Fig. 2(a).) Then, the goal of the CycleGAN is to transport the distribution of the LDCT to the SDCT distribution so that the LDCT distribution can follow the SDCT distribution. It turns out that this is closely related to the optimal transport [19, 20].

In particular, the transport from to is performed by the forward operator , so that “pushes forward” the measure in to in the space [19, 20]. On the other hand, the mass transportation from the measure space to another measure space is done by a generator , i.e. the generator pushes forward the measure in to a measure in the target space . Then, the optimal transport map for unsupervised learning can be achieved by simultaneously minimizing the statistical distances between and , and between and .

Although various forms of statistical distance could be used (for example, KL divergence in the case of VAE), in our prior work [10] and its extension [21, 22, 23, 24, 25], we use the Wasserstein metric as the statistical distance. Then, it was shown that the simultaneous statistical distance minimization can be done by solving the following Kantorovich optimal transport problem:

(13)

where

refers to the set of the joint distributions with the margins

and , and the transportation cost is defined by

(14)

where denotes some weighting parameter. In particular, the role of in (14) was originally studied in the context of -CycleGAN [25]. In many inverse problems, additional regularization is often used. For example, one could use the following [23, 24]:

(15)

where is the regularization parameter and the last term penalizes the variation by the generator. Note that the first two terms in (III-A) are computed by using both and , whereas the last term is only with respect to . From the optimal transport perspective, this makes a huge differences, since the computation of the last term is trivial whereas the first term requires the dual formulation [23, 24].

One of the most important contributions of our companion paper [10] is to show that the primal formulation of the unsupervised learning in (13) with the transport cost (III-A) can be represented by:

(16)

where

where is the hyper-parameter, and the cycle-consistency term is given by

(17)

and

(18)

where denotes the space of -Lipschitz functions with the domain , and

(19)

To make the paper self-contained, see Appendix for the detailed derivation.

Similar to the key simplification step (8) in NF, a very interesting thing happens if we use an invertible generator in (7) for the CycleGAN training. The following proposition is our key result.

Proposition 1.

Suppose that the generators are invertible, i.e. and is a -Lipschiz function. Then, the CycleGAN problem in (16) with the transport cost given by (III-A) with , can be equivalently represented by

(20)

where

(21)

Here, is defined in (19) and

Proof.

First, the invertibility condition in (7) implies that and so that we can easily see that in (17) vanishes. Second, thanks to the invertibility condition in (7), we have

(22)

where the set is defined by

(23)

and the last equality comes from that , as for . Furthermore, since is a -Lischitz function, we have

where the inequality (a) comes from the -Lipschitz condition of and (b) comes from that is a -Lischitz function, where by the assumption. Accordingly, is 1-Lipschitz function. Therefore, we can obtain the following upper bound

(24)

by extending the function space from to all 1-Lipschitz functions. Next, we will show that the upper bound in (III-A) is tight. Suppose that is the maximizer for (III-A). To show that the bound is tight, we need to show the existence of such that

Thanks to the invertibility condition (7), we can always find such that for all . Accordingly,

which achieves the upper bound. Therefore, we have

and . This concludes the proof. ∎

Compared to the NF, our cycle-free CycleGAN has several advantages. First, in NF, the latent space is usually assumed to be Gaussian distribution so that the main focus is an image generation from noises in to the ambient space . In order to apply NF to image translation between to domain, we need to implement two NF networks: one for conversion from to , and the other from to . During this image translation via the latent space, our empirical results shows that the information loss are present due to the restriction to the Gaussian latent variable. On the other hand, in our cycle-free CycleGAN, the space and can be any empirical distributions.

Additionally, our method has very interesting geometric interpretation. By replacing the forward operator with the inverse of invertible generator , two statistical distance minimization problem in the original CycleGAN in Fig. 2(a) can be replaced by the single statistical distance minimization problem as shown in Fig. 2(b).

Fig. 3: The architecture of our (a) Invertible block and (b) invertible generator . Thanks to its invertibility, we can reverse its operation to get input from output by reversing the operations.

Iii-B Invertible Generator

Various architecture has been proposed to construct invertible neural networks for flow-based generative models [12, 13, 14]. For example, the Nonlinear Independent Component Estimation (NICE) [12] is based on additive coupling layer that leads to volume-preserving invertible mapping. Later, the method is further extended to the affine coupling layer, which increases the expressiveness of the model [13].

However, this architecture imposes some constraints on the functions that the network can represent: for instance, it can only represent volume-preserving mappings. Follow-up works [13, 14] addressed this limitation by introducing a new reversible transformation. More specifically, the authors in [13] proposed a coupling layer using real-valued non-volume preserving (Real NVP) transformations. On the other hand, Kingma et al. [14] proposed an invertible 1 1 convolution as a generalization of a permutation operation, which significantly improves the image generation quality of the flow-based generative models.

In the following, we explain specific components of invertible blocks that are used in our method. Specifically, our network architecture is shown in Fig. 3, which is composed of repetition of squeeze/unsqueeze block interleaved with invertible 11 convolution and stable additive coupling layers. The detailed explanation follows.

Iii-B1 Squeeze and Unsqueeze operation

Squeeze operation splits input image into four sub-images which are arranged along channel direction as shown in Fig. 4. Mathematically, this can be written by

Squeeze operation is essential to build the coupling layer, which becomes evident soon. Unsqueeze operation , denoted by

then rearranges separated channels into one image as an inverse operation of squeeze operation (see Fig. 4). This operation is applied using the output of the coupling layer, so that unsqueezed output maintains the same spatial dimension of the input image .

Fig. 4: Squeeze and unsqueeze operation .

Iii-B2 Invertible 11 convolution

The squeeze operation decomposes the input into four components along the channel dimension. With the resulting fixed channel arrangement, only limited spatial information passes through the neural network. Therefore, random shuffling and reversing the order of channel dimension [12, 13] were proposed. On the other hand, Generative Flow with Invertible 11 Convolutions (Glow) [14] proposed an invertible 11 convolution with an equal number of input and output channels as a generalization of permutation operation with learnable parameters.

Mathematically, 11 convolution can be represented by multiplying a matrix as follows:

(25)

which is illustrated in Fig. 5. By multiplying a fully populated matrix, the channel-wise separated input information can be mixed together so that the subsequent operation can be applied more efficiently. Then, the corresponding inversion operation can be written by [14]

(26)

if is invertible.

Fig. 5: Invertible 11 convolution and its inverse.

Iii-B3 Stable Additive coupling layer

Coupling layer is the essential component that gives invertibility but also provides expressiveness of the neural network. The additive coupling layer in NICE [12]

is based on the even and odd decomposition of the sequence, after which neural networks are applied in an alternating manner.

Tomczak et al [26] further extended the additive coupling layer to general coupling layer where input image can split into four channel blocks, and neural networks are applied at every step. By applying the general invertible transformation, we can handle separated input more efficiently.

Specifically, the stable coupling layer is given by

(27)

where are neural networks. Then, the block inversion can be readily done by

(28)

For example, additive operation and its inverse operation are shown as Fig. 6.

(a) (b)
Fig. 6: Implementation of (a) single additive operation . and (b) its inverse operation .

Iii-B4 Lipschitz constant computation

It is easy to see that the Jacobian of the stable coupling layer has unit determinant [12]. In fact, among the aforementioned modules in the invertible networks, only module that does not have unit determinant is the 11 convolution layer. Specifically, the log-determinant of the step (25) is determined by that of [14]:

(29)

Similarly, the Lipschitz constant for the invertible generator can be easily determined by the matrix norm of .

Iii-C Wavelet Residual Learning

Unlike the image generation from noises, one of the important observations in the image denoising is that the noisy and clean images share structural similarities. Accordingly, rather than learning all components of the images, the authors in [27, 6] proposed wavelet residual domain learning approach, and we follow the same procedure.

Specifically, as shown in Fig. 7(a), wavelet decomposition separates high-frequency component and low-frequency components, then by nulling only the low-frequency (LL) component at the last level decomposition, we can obtain the wavelet residual images that contain high-frequency components. Then, as shown in Fig. 7(b), our network is trained using only high-frequency components. This makes the network handles CT noise components much easier, because most of the CT noises are concentrated in high frequency and the common low-pass images are not processed by neural networks.

(a) (b)
Fig. 7: (a) Generation of wavelet residual image by nulling out the lowest band of the wavelet decomposition. (b) Our network is trained using wavelet residual. Noise pattern is acquired, and then the final image is obtained by subtracting the noise pattern.

Iv Method

Iv-a Dataset

To verify the denoising performance of our framework, we use two datasets, one for the quantitative analysis and the other for the qualitative analysis. For quantitative analysis, we use paired low-dose and standard-dose CT image dataset which was used for study by Kang et al.[3]. Specifically, the data are abdominal CT projection data from the AAPM 2016 Low Dose CT Grand Challenge. For qualitative experiments, we use unpaired 20% dose cardiac multiphase CT scan dataset which was used for study by Kang et al.[5]. The details are as follows.

Iv-A1 AAPM CT dataset

AAPM CT dataset is a reconstructed CT image dataset from abdominal CT projection data in the AAPM 2016 Low Dose CT Grand Challenge, which was used for study by Kang et al.[3]. Total 10 patients’ data were obtained after approval by the institutional review board of the Mayo Clinic. CT images with the size of were reconstructed using a conventional filtered backprojection algorithm. Poisson noise was inserted into the projection data to make noise level corresponded to 25% of the standard-dose. As the low-dose CT image data were simulated based on standard-dose CT images, they are paired dataset. For the training, every value of the dataset is converted into Hounsfield unit [HU] and the value lower than -1000 HU is truncated to -1000 HU. Then, we divide the dataset into 4000 to normalize all data values between [-1, 1]. To train our network, we use 3839 CT images, while the other 421 images were used to test our network.

Iv-A2 20% dose Multiphase Cardiac CT scan dataset

The 20% dose cardiac multiphase CT scan dataset was acquired from 50 CT scans of mitral value prolapse patients and 50 CT scans of coronary artery disease patients. The dataset was collected at the University of Ulsan College of Medicine and used for study by Kang et al.[5] and Gu et al.[6]. The detailed information of CT scan protocol is described in previous reports [28, 29]. Electrocardiography (ECG)-gated cardiac CT scanning with second-generation dual-source CT scanner was performed. For the low-dose CT scan, the tube current is reduced to 20% of the standard-dose CT scan. For the training, every value of the dataset is converted into Hounsfield unit [HU] and the value lower than -1024 HU is truncated to -1024 HU. After that, we divide the dataset into 4096 to normalize all data values between [-1, 1]. To train our network, we use 4684 CT images, while the other 772 images were used to test our model.

(a) (b)
Fig. 8: Network architecture for (a) in the coupling layer, and (b) PatchGAN discriminator.
(a) Forward mapping (b) Inverse mapping
Fig. 9: (a) Denoising results of AAPM data: (i) input low-dose image, and denoising results from (ii) the AdaIN-based tunable CycleGAN, (iii) the proposed CycleGAN with an invertible generator, and (iv) target standard-dose image. (v)(vi) Difference images between (i) and the results (ii)(iii), respectively. (b) Synthetic noise generation results: (i) input standard-dose image, and synthetic low-dose image from (ii) the AdaIN-based tunable CycleGAN, (iii) the proposed CycleGAN with an invertible generator, and (iv) the target low-dose image. The intensity window of CT image is (-1000, 1000) [HU] and the intensity window of difference is (-200, 200) [HU].

Iv-B Implementation Details

The invertible generator is constructed based on the flow of the invertible generator with as shown in Fig. 3. To extract the wavelet residual, we use daub3 wavelets, and the level of wavelet decomposition was set to 6 for all datasets.

The architecture of the neural network in the coupling layer (see (III-B3)) is shown in Fig. 8(a). Basically, the architecture is composed of three convolution layers with spectral normalization[30, 31] followed by the multi-channel input single-channel channel output convolution. The first and last convolution layer use 3

3 kernel with stride of 1 and the second convolution layer uses 1

1 kernel with stride of 1. And the latent feature map channel size is 256. Zero-padding is applied for the first and last convolution layer so that at each stage, the height and width of the feature map are equal to the previous feature map.

The discriminator is constructed based on a PatchGAN architecture [32]. The overall architecture of the discriminator is shown in Fig. 8

(b), which is based on the PatchGAN discriminator composed of four convolution layers than five convolution layers. First two convolution layers use stride of 2, and the rest of the convolution layers use stride of 1. After the first and last convolution layer, we do not apply batch normalization. Except for the last convolution layer, we apply LeakyReLU with a slope of 0.2 after the batch normalization. At the first convolution layer, which does not have batch normalization, LeakyReLU was applied after the convolution layer. The discriminator loss is calculated with the LSGAN loss

[33].

For all datasets, the network was trained with for in 19, using ADAM optimizer [34] with , , , and the mini-batch size of 1. The learning rate was initialized to

and halved in every 50,000 iterations. We trained network for 150,000 iterations on NVIDIA GeForce RTX 2080 Ti. Also, our code was implemented with Pytorch v1.6.0 and CUDA 10.1.

Fig. 10: Denoising results for AAPM data using the proposed method. (a) Input low-dose CT images (i-iii), and (b) denoising results from the proposed cycle-free CycleGAN (i-iii). The intensity window of CT image is (-1000, 1000) [HU] and the intensity window of difference between input and output image is (-200, 200) [HU].
(a) Forward mapping (b) Inverse mapping
Fig. 11: (a) Denoising results of 20% dose CT data. (i) Input low-dose image, denoising result from (ii) the AdaIN-based tunable CycleGAN, (iii) the proposed CycleGAN with an invertible generator, and (iv) target standard-dose image. (v-vi) Difference between (i) and the results (ii-iii). (b) Synthetic noise generation results: (i) input standard-dose image, and synthetic low-dose image from (ii) the AdaIN-based tunable CycleGAN, (iii) the proposed CycleGAN with an invertible generator, and (iv) the target low-dose image. Note that the target is not perfectly aligned with the input, since there are no perfectly aligned high-dose images in in vivo experiments. The intensity window of CT image is (-1024, 1024) [HU] and the intensity window of difference is (-200, 200) [HU]

Iv-C Quantitative Metrics

For quantitative experiment analysis, we use the peak signal-to-noise ratio (PSNR) and the structural similarity index metric (SSIM)

[35]. The PSNR is defined as follows:

(30)

where is the input image, is target image, and is possible maximum pixel value of image .

The SSIM is defined as follows:

(31)

where is the average of image, is the variance of image, , , , as in the original paper[35].

Iv-D Comparative Methods

We compared our method with the existing unsupervised LDCT denoising networks [9, 6]. For AAPM dataset, we compared our network performance with the conventional CycleGAN[9] whose the generator is based on U-net[36] architecture. We also compared with AdaIN-based tunable CycleGAN [6], which shows state-of-the-art performace for LDCT denoising. For unpaired 20% dose CT scan datasets, we compare our method with AdaIN-based tunable CycleGAN.

For the training of the conventional CycleGAN, the images are cropped into patches, the learning rate is initialized to

, the network trained for 200 epochs, and the other training settings are set to the same as the proposed network training. In AdaIN-based tunable CycleGAN, the same patch size was used, and the learning rate is initialized to

, the network trained for 200 epochs, and the other training settings are the same as the proposed network training. For both comparative methods, we used PatchGAN consisting of five convolution layers for discriminator architecture.

V Experimental Results

V-a AAPM CT dataset

For the AAPM CT dataset, we first compare the noise reduction performance quantitatively with the conventional CycleGAN and AdaIN-based CycleGAN based on PSNR and SSIM. As shown in Table I, our network shows the highest PSNR and comparable SSIM values.

Network PSNR SSIM
LDCT input 30.468 0.695
Conventional CycleGAN 34.621 0.818
AdaIN-based CycleGAN 34.801 0.824
Proposed 34.940 0.821
TABLE I: Quantitative results for AAPM dataset
Conventional CycleGAN AdaIN CycleGAN[6] Proposed
Network # of Parameters Network # of Parameters Network # of Parameters
6,251,392 5,900,865 1,204,320
6,251,392 274,560 - -
2,766,209 2,766,209 662,401
2,766,209 2,766,209 - -
Total 18,035,202 Total 11,707,843 Total 1,866,721
TABLE II: Comparison of the network complexity in terms of trainable parameters
Invertible block component PSNR SSIM
Coupling layer 11 conv
29.931 0.691
30.706 0.704
34.940 0.821
TABLE III: Ablation study for invertible block components

Fig. 9(a) shows representative denoising results by various methods. The resulting images are cropped at to more accurately visualize the denoising performance. The intensity of the CT images shown is (-1000, 1000) [HU] and the difference is (-200, 200) [HU]. Our CycleGAN with an invertible generator removes more noise components than the AdaIN-based CycleGAN method without losing any information. As can be seen from the Fig. 9(a), the proposed network (Fig. 9(a-iii)) removes noise components around high-intensity metals more evenly than AdaIN-based CycleGAN(Fig. 9(a-ii)).

To verify that the invertible generator can also properly perform the inverse mapping, we provide an inversely mapped output from SDCT images as shown in Fig. 9(b). The resulting images are cropped at resolution in order to more accurately visualize the improved quality. The intensity of the CT images shown is (-1000, 1000) [HU]. Even if the proposed method does not apply discriminator or loss for inverse mapping, the proposed method adds a reasonable level of noise to the SDCT, which makes the output of the LDCT appear closer than the AdaIN-based CycleGAN.

In Fig. 10, there are three representative denoising results by the proposed CycleGAN with an invertible generator to verify the noise reduction performance qualitatively. The gray boxes in the input low-dose CT images and the denoising result images are enlarged in order to more accurately visualize the noise reduction performance, and their difference from the input are also visualized. The difference images clearly show the removed noise components. As can be seen from Fig. 10, the proposed method removes noise components evenly without any structural information loss. Therefore, it distinguishes bone and each soft tissue more clear.

V-B 20% dose cardiac CT scan dataset

The dataset does not have paired reference data so that quantitative comparison using PSNR and SSIM is not possible. Therefore, we qualitatively compared denoising performance. Intensity of the CT images is shown (-1024, 1024) [HU], whereas the difference images are shown in (-200, 200) [HU] for 20% dose.

Fig. 11(a) shows denoising result by various methods. Note that the target is not perfectly aligned with the input, since there are no perfectly aligned high-dose images in in vivo experiments. Still, the visual inspection and the difference images from the input shows that our cycle-free CycleGAN with an invertible generator removes various noise components more uniformly than the AdaIN-based CycleGAN method without incurring any structural distortion. In Fig. 11(b), SDCT images can be successfully converted to noisy images. Even if the proposed method does not apply any discriminator or loss for inverse mapping, our method adds proper noise level to SDCT.

Also, Fig. 12 shows three representative denoising results by the proposed cycle-free CycleGAN with an invertible generator. The gray boxes in the low-dose inputs and denoised outputs are enlarged. The proposed method properly removes noise components from input low-dose CT images, so that each soft tissue in the resulting denoised images is distinguished clearly.

Vi Discussion

As shown in Table II, our network uses only 10% parameters of conventional CycleGAN, and 15% parameters of AdaIN-based CycleGAN. This is because we use a single generator and a single discriminator thanks to the invertibility. In addition, the networks in stable coupling layer are relatively light. Accordingly, the discriminator also requires relatively few parameters.

Thanks to its efficient parameter requirement and memory consumption, cycle-free CycleGAN also reduced training time. While training input image size with 256256 resolution, cycle-free CycleGAN shows training time of 12.2 iterations per second. However, our comparative model AdaIN-based tunable CycleGAN shows training time of 6.8 iterations per second. Accordingly, the training time is twice faster than that of AdaIN-based CycleGAN.

To investigate the optimality of our network architecture, we performed ablation studies. In particular, we investigate the effect of invertible 11 convolution and stable coupling layer, as these are the critical parts for the network design. As shown in Table III, both modules are critical. In particular, the results without 11 convolution layers show inferior performance than using the invertible 11 convolution.

Fig. 12: Denoising results for 20% dose CT data using the proposed method. (a) Input low-dose CT images (i-iii), and (b) denoising results by the proposed cycle-free cycleGAN with an invertible generator (i-iii). The intensity window of CT image is (-1024, 1024) [HU] and the intensity window of difference is (-200, 200) [HU]

Vii Conclusion

In this paper, we proposed a cycle-free CycleGAN architecture with an invertible generator. Thanks to the invertibility, only a single pair of a generator and a discriminator is necessary, which significantly reduced complexity. Although the number of trainable parameters are only 10% of conventional CycleGAN and 15% of AdaIN-based CycleGAN, extensive experimental results confirmed that the proposed method shows better low-dose CT denoising performance with using significantly reduced learnable parameters.

Viii Acknowledgement

This work was supported by the National Research Foundation (NRF) of Korea grant NRF-2020R1A2B5B03001980. The authors would like to thank Dr. Dong Hyun Yang from the University of Ulsan College of Medicine for providing the multiphase cardiac CT scan dataset. The authors also thank the Mayo Clinic, the American Association of Physicists in Medicine (AAPM), and the National Institute of Biomedical Imaging and Bioengineering for providing the Low-Dose CT Grand Challenge dataset.

The derivation of the dual formula is simple modification of the technique in [10]. Consider the primal OT problem:

where refers to the set of the joint distributions with the margins and , and the transportation cost is defined by

We can easily show that

where is defined in (19) and

We now define the optimal joint measure for the primal problem. Using the Kantorovich dual formulations, we have the following two equalities:

(32)
(33)

Using -Lipschitz continuity of the Kantorovich potential , we have

Using -Lipschitz continuity of the Kantorovich potential , we have

This leads to two lower bounds and by taking the average of the two, we have

where is defined in (18). For and upper bound, instead of finding the , we choose in (32); similarly, instead of , we chose in (33). By taking the average of the two upper bounds, we have

where is defined (17). The remaining part of the proof for the dual formula is a simple repetition of the techniques in [10].

References

  • [1]

    H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Low-dose CT with a residual encoder-decoder convolutional neural network,”

    IEEE transactions on medical imaging, vol. 36, no. 12, pp. 2524–2535, 2017.
  • [2]

    Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, Y. Zhang, L. Sun, and G. Wang, “Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss,”

    IEEE transactions on medical imaging, vol. 37, no. 6, pp. 1348–1357, 2018.
  • [3] E. Kang, J. Min, and J. C. Ye, “A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction,” Medical physics, vol. 44, no. 10, pp. e360–e375, 2017.
  • [4] E. Kang, W. Chang, J. Yoo, and J. C. Ye, “Deep convolutional framelet denosing for low-dose CT via wavelet residual network,” IEEE transactions on medical imaging, vol. 37, no. 6, pp. 1358–1369, 2018.
  • [5] E. Kang, H. J. Koo, D. H. Yang, J. B. Seo, and J. C. Ye, “Cycle-consistent adversarial denoising network for multiphase coronary CT angiography,” Medical physics, vol. 46, no. 2, pp. 550–562, 2019.
  • [6] J. Gu and J. C. Ye, “AdaIN-based tunable CycleGAN for efficient unsupervised low-dose CT denoising,” IEEE Transactions on Computational Imaging, vol. 7, pp. 73–85, 2021.
  • [7] C. You, G. Li, Y. Zhang, X. Zhang, H. Shan, M. Li, S. Ju, Z. Zhao, Z. Zhang, W. Cong et al.

    , “CT super-resolution GAN constrained by the identical, residual, and cycle learning ensemble (GAN-CIRCLE),”

    IEEE transactions on medical imaging, vol. 39, no. 1, pp. 188–203, 2019.
  • [8] K. Kim, S. Soltanayev, and S. Y. Chun, “Unsupervised training of denoisers for low-dose CT reconstruction without full-dose ground truth,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 6, pp. 1112–1125, 2020.
  • [9]

    J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in

    Proceedings of the IEEE international conference on computer vision

    , 2017, pp. 2223–2232.
  • [10] B. Sim, G. Oh, J. Kim, C. Jung, and J. C. Ye, “Optimal transport driven CycleGAN for unsupervised learning in inverse problems,” SIAM Journal on Imaging Sciences, vol. 13, no. 4, pp. 2281–2306, 2020.
  • [11] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1501–1510.
  • [12] L. Dinh, D. Krueger, and Y. Bengio, “NICE: Non-linear independent components estimation,” arXiv preprint arXiv:1410.8516, 2014.
  • [13] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using Real NVP,” arXiv preprint arXiv:1605.08803, 2016.
  • [14] D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,” in NeurIPS, 2018.
  • [15] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  • [16] D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in

    International Conference on Machine Learning

    .   PMLR, 2015, pp. 1530–1538.
  • [17] J. Su and G. Wu, “f-VAEs: Improve VAEs with conditional flows,” arXiv preprint arXiv:1809.05861, 2018.
  • [18] M. J. Wainwright and M. I. Jordan, Graphical models, exponential families, and variational inference.   Now Publishers Inc, 2008.
  • [19] C. Villani, Optimal transport: old and new.   Springer Science & Business Media, 2008, vol. 338.
  • [20] G. Peyré, M. Cuturi et al., “Computational optimal transport,” Foundations and Trends® in Machine Learning, vol. 11, no. 5-6, pp. 355–607, 2019.
  • [21] S. Lim, H. Park, S.-E. Lee, S. Chang, B. Sim, and J. C. Ye, “CycleGAN with a blur kernel for deconvolution microscopy: Optimal transport geometry,” IEEE Transactions on Computational Imaging, vol. 6, pp. 1127–1138, 2020.
  • [22] G. Oh, B. Sim, H. Chung, L. Sunwoo, and J. C. Ye, “Unpaired deep learning for accelerated MRI using optimal transport driven CycleGAN,” IEEE Transactions on Computational Imaging, vol. 6, pp. 1285–1296, 2020.
  • [23] E. Cha, H. Chung, E. Y. Kim, and J. C. Ye, “Unpaired training of deep learning tMRA for flexible spatio-temporal resolution,” IEEE Transactions on Medical Imaging, vol. 40, no. 1, pp. 166–179, 2021.
  • [24] H. Chung, E. Cha, L. Sunwoo, and J. C. Ye, “Two-stage deep learning for accelerated 3D time-of-flight MRA without matched training data,” Medical Image Analysis, p. 102047, 2021.
  • [25] J. Lee, J. Gu, and J. C. Ye, “Unsupervised CT metal artifact learning using attention-guided beta-CycleGAN,” arXiv preprint arXiv:2007.03480, 2020.
  • [26] J. M. Tomczak, “General invertible transformations for flow-based generative modeling,” arXiv preprint arXiv:2011.15056, 2020.
  • [27] J. Song, J.-H. Jeong, D.-S. Park, H.-H. Kim, D.-C. Seo, and J. C. Ye, “Unsupervised denoising for satellite imagery using wavelet directional CycleGAN,” IEEE Transactions on Geoscience and Remote Sensing, 2020.
  • [28] H. J. Koo, D. H. Yang, S. Y. Oh, J.-W. Kang, D.-H. Kim, J.-K. Song, J. W. Lee, C. H. Chung, and T.-H. Lim, “Demonstration of mitral valve prolapse with CT for planning of mitral valve repair,” Radiographics, vol. 34, no. 6, pp. 1537–1552, 2014.
  • [29] D. H. Yang, Y.-H. Kim, J.-H. Roh, J.-W. Kang, D. Han, J. Jung, N. Kim, J. B. Lee, J.-M. Ahn, J.-Y. Lee et al., “Stress myocardial perfusion CT in patients suspected of having coronary artery disease: visual and quantitative analysis—validation by using fractional flow reserve,” Radiology, vol. 276, no. 3, pp. 715–723, 2015.
  • [30] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in International Conference on Learning Representations, 2018.
  • [31] J. Behrmann, P. Vicol, K.-C. Wang, R. Grosse, and J.-H. Jacobsen, “Understanding and mitigating exploding inverses in invertible neural networks,” in

    International Conference on Artificial Intelligence and Statistics

    .   PMLR, 2021, pp. 1792–1800.
  • [32] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , 2017, pp. 1125–1134.
  • [33] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2794–2802.
  • [34] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [35] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
  • [36] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention.   Springer, 2015, pp. 234–241.