DU-GAN: Generative Adversarial Networks with Dual-Domain U-Net Based Discriminators for Low-Dose CT Denoising

08/24/2021 ∙ by Zhizhong Huang, et al. ∙ Sichuan University FUDAN University 5

LDCT has drawn major attention in the medical imaging field due to the potential health risks of CT-associated X-ray radiation to patients. Reducing the radiation dose, however, decreases the quality of the reconstructed images, which consequently compromises the diagnostic performance. Various deep learning techniques have been introduced to improve the image quality of LDCT images through denoising. GANs-based denoising methods usually leverage an additional classification network, i.e. discriminator, to learn the most discriminate difference between the denoised and normal-dose images and, hence, regularize the denoising model accordingly; it often focuses either on the global structure or local details. To better regularize the LDCT denoising model, this paper proposes a novel method, termed DU-GAN, which leverages U-Net based discriminators in the GANs framework to learn both global and local difference between the denoised and normal-dose images in both image and gradient domains. The merit of such a U-Net based discriminator is that it can not only provide the per-pixel feedback to the denoising network through the outputs of the U-Net but also focus on the global structure in a semantic level through the middle layer of the U-Net. In addition to the adversarial training in the image domain, we also apply another U-Net based discriminator in the image gradient domain to alleviate the artifacts caused by photon starvation and enhance the edge of the denoised CT images. Furthermore, the CutMix technique enables the per-pixel outputs of the U-Net based discriminator to provide radiologists with a confidence map to visualize the uncertainty of the denoised results, facilitating the LDCT-based screening and diagnosis. Extensive experiments on the simulated and real-world datasets demonstrate superior performance over recently published methods both qualitatively and quantitatively.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 5

page 7

page 8

page 9

Code Repositories

DU-GAN

DU-GAN: Generative Adversarial Networks with Dual-Domain U-Net Based Discriminators for Low-Dose CT Denoising


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Computed tomography (CT) can provide the cross-sectional images of the internal body by the x-ray radiation, which is one of the most important imaging modalities in clinical diagnosis. Although CT plays an essential role in diagnosing diseases, the widespread use of CT is raising more and more public concerns towards its safety since CT-related X-ray radiation may cause unavoidable damage to the health of humans and induce cancers. Consequently, reducing the radiation dose of CT as low as reasonably achievable (a.k.a. ALARA) is a well-accepted principle in CT-related research over the past decades [1]. The reduction of radiation dose, however, inevitably brings the noise and artifacts into the reconstructed images, severely compromising the subsequent diagnosis and other tasks such as LDCT-based lung nodule classification [2].

A straightforward way to address this issue is to reduce the noise in the LDCT image [3, 4]. However, it remains a challenging problem due to its ill-posed nature. In recent years, various deep learning based methods have been proposed for LDCT denoising [5, 6, 7, 8, 9, 10, 11]

, achieving impressive results. There are two key components in designing a denoising model: network architecture and loss function; the former one can determine the capacity of the denoising model while the latter one can control how the denoised images visually look like 

[7]

. Although different network architectures such as 2D convolutional neural networks (CNNs) 

[5], 3D CNNs [7, 10], and residual encoder-decoder CNNs (RED-CNN) [12] have been explored for LDCT denoising, literature has shown that the loss function is relatively more important than the network architecture as it has a direct impact on the image quality [7, 13].

One of the most popular loss functions is the mean-squared error (MSE), which computes the average of the squares of the per-pixel errors between the denoised and normal-dose images. Although gaining impressive performance in terms of peak signal-to-noise (PSNR), MSE usually leads to over-smoothened images, which has been proven to poorly correlate with the human perception of image quality [14, 15]. In view of this observation, alternative loss functions such as perceptual loss, loss, and adversarial loss have been investigated for LDCT denoising. Among them, adversarial loss has been shown to be a powerful one as it can dynamically measure the similarity between the denoised and normal-dose images during the training, which enables the denoised images to preserve more texture information from normal-dose one. The computation of the adversarial loss is based on the discriminator, which is a classification network to learn a representation differentiating the denoised images from the normal-dose images; it can measure the most discriminant difference either in a global or local level, depending on that one unit of the output of discriminator corresponds to the whole image or a local region. Such a discriminator is prone to forgetting previous difference because the distribution of synthetics samples shifts as the generator constantly changes through training, failing to maintain a powerful data representation to characterize the global and local image difference [16]. As a result, it often results in the generated images with discontinued and mottled local structures [17] or images with incoherent geometric and structural patterns [18]. In addition to the noise, LDCT images may contain severe streak artifacts caused by photon starvation, which may not be effectively removed through the loss function solely in the image domain.

To learn a powerful data representation to regularize the denoising model in the adversarial training, we propose a U-Net based discriminator in the GANs framework for LDCT denoising, termed DU-GAN, which can simultaneously learn the global and local difference between the denoised and normal-dose images in image and gradient domains. More specifically, our proposed discriminator follows the U-Net architecture including an encoder and a decoder network, where the encoder encodes the input to a scalar value focusing on the global structures while the decoder reconstructs a per-pixel confidence map capturing the changes of local details between the denoised and normal-dose images. In doing so, it can provide not only the per-pixel feedback but also the global structural difference to the denoising network. In addition to the adversarial training in the image domain, we also apply another U-Net based discriminator in the image gradient domain to alleviate the artifacts caused by photon starvation and enhance the edge of the denoised images. Moreover, the CutMix data augmentation technique between and denoised and normal-dose images is introduced to regularize the encoder and decoder of U-Net independently, enabling the per-pixel outputs of the U-Net based discriminator to provide radiologists with a confidence map to visualize the uncertainty of the denoised results, facilitating radiologists’ screening and diagnosis when using the denoised LDCT images.

The benefits of the proposed DU-GAN are as follows.

  1. Unlike existing GAN-based denoising methods that use a classification as the discriminator, the proposed DU-GAN utilizes a U-Net based discriminator for LDCT denoising, which can simultaneously learn global and local difference between the denoised and normal-dose images. Consequently, it can provide not only the per-pixel feedback but also the global structural difference to the denoising model.

  2. In addition to adversarial training in the image domain, the proposed DU-GAN also performs adversarial training in the image gradient domains, which can alleviate the streak artifacts caused by photon starvation and enhance the edge of the denoised images.

  3. The proposed DU-GAN can provide radiologists with a confidence map visualizing the uncertainty of the denoised results through the CutMix technique, which could facilitate radiologists’ screening and diagnosis when using the denoised LDCT images.

  4. Extensive experiments on simulated and real-world datasets demonstrate the effectiveness of the proposed method through both qualitative and quantitative comparisons.

The remainder of this paper is organized as follows. We briefly surveys the developments of the LDCT denoising methods and generative adversarial networks in Section II. We present our LDCT denoising framework DU-GAN with dual-domain U-Net based discriminators, and then introduce the CutMix regularization technique as well as the network architectures and loss functions in our framework in Section III, followed by both qualitative and quantitative comparisons with the state-of-the-art methods on the simulated and real-world datasets in Section IV. Finally, we conclude this paper in Section V.

Ii Related Work

This section briefly surveys the development of LDCT denoising and generative adversarial networks.

Ii-a LDCT Denoising

The noise reduction algorithms for LDCT can be summarized into three categories: 1) sinogram filtration; 2) iterative reconstruction; and 3) image post-processing. As a significant difference from routine CT, the LDCT acquires noisy sinogram data from scanner. A straightforward solution is to perform the denoising process on the sinogram data before image reconstruction, i.e. sinogram filtration-based methods [19, 20, 21]. Iterative reconstruction methods combine the statistics of raw data in the sinogram domain [22, 23] and the prior information in the image domain such as total variation [24] and dictionary learning [25]; these pieces of generic information can be effectively integrated in the maximum likelihood and compressed sensing frameworks. These two categories, however, require the access to raw data that are typically unavailable from commercial CT scanner.

Fig. 1: Overall framework of our proposed DU-GAN. The generator produces denoised LDCT images, and two independent branches with U-Net based discriminators perform at the image and gradient domains. The U-Net based discriminator provides both global structure and local per-pixel feedback to the generator. Furthermore, the image discriminator encourages the generator to produce photo-realistic CT images while the gradient discriminator is utilized for better edge and alleviating streak artifacts caused by photon starvation.

Different from the previous two categories, image post-processing methods directly operate on the reconstructed images that are publicly available after removing patient privacy. Traditional methods such as non-local means [26] and block-matching 3D [27], however, lead to the loss of some critical structural details and result in over-smoothened denoised LDCT images. The rapid development of deep learning techniques have advanced many medical applications. In LDCT denoising, deep-learning-based models have achieved impressive results [5, 7, 9, 10, 12, 28]. There are two critical components in designing a deep-learning-based denoising model: network architecture and loss function; the former one determines the capacity of a denoising model while the later one controls how the denoised images visually look like. Although the literature has proposed several different network architectures for LDCT denoising such as 2D CNNs [5], 3D CNN [7, 10], RED-CNN [5], and cascaded CNN [12], the literature has shown that the loss function plays a relatively more important role than network architecture as it has a direct impact on the image quality [7, 13]. The simplest loss function is the MSE, which however has been shown to poorly correlate with the human perception of image quality [14, 15]. In view of this observation, alternative loss functions such as perceptual loss, loss, adversarial loss, or mixed loss functions have been investigated for LDCT denoising. Among them, adversarial loss has been shown to be a powerful one as it can dynamically measure the similarity between the denoised and normal-dose images during the training, which enables the denoised images to preserve more texture information from normal-dose one. Adversarial loss reflects either global or local similarity, depending on the design of discriminator.

Unlike the conventional adversarial loss, the adversarial loss used in this study is based on a U-Net based discriminator, which can simultaneously characterize global and local difference between the denoised and normal-dose images, better regularizing the denoising model. In addition to the adversarial loss in the image domain, the adversarial loss in the image gradient domain proposed in this paper can alleviate the streak artifacts caused by photon starvation and enhance the edge of the denoised images.

Ii-B Generative Adversarial Networks (GANs)

As one of the most hot research topics in recent years, GANs [14] and their variants have been successfully applied to various tasks [29, 30, 31]. They typically consist of two networks: 1) a generator learning to capture the data distribution of training data and produce new samples that are indistinguishable from the real ones, and 2) a discriminator attempting to distinguish real samples from fake ones produced by the generator. These two networks are trained alternatively, ending once the balance is achieved. In the context of LDCT denoising, the generator aims to produce photo-realistic denoised results to fool the discriminator while the discriminator tries to distinguish the real normal-dose CT (NDCT) images and denoised ones. To foster the stability of training GANs, various variants of GANs have been proposed, such as Wasserstein GAN (WGAN) [32], WGAN with gradient penalty (WGAN-GP) [33], and least-squares GANs [34].

In this paper, we adopt the least-squares GANs [34], spectral normalization [35], and U-Net based discriminator [16] to form the GANs framework for LDCT denoising. As a significant difference, our DU-GAN performs adversarial training in both image and gradient domains, which can reduce noise and alleviate streak artifacts simultaneously. We note that the proposed DU-GAN is also suitable for other variants of GAN such as WGAN and WGAN-GP.

Iii Methodology

Fig. 1 presents the proposed DU-GAN for LDCT denoising, which contains a denoising model as generator, and two U-Net based discriminators in both image and gradient domains. We highlight that the U-Net based discriminator is able to learn the global and local difference between denoised and normal-dose images. Next, we present all components, network architecture, and loss functions in detail.

Iii-a The Denoising Process

The denoising process is to learn a generative model that maps an LDCT image of size to its normal-dose CT (NDCT) counterpart by removing the noise in LDCT image. Formally, it can be written as:

(1)

where denotes the denoised LDCT image. Typically, LDCT denoising can be seen as a specific image translation problem. Therefore, the GANs-based methods [7, 8, 9, 36]

utilize the GANs to improve the visual quality of denoised LDCT images thanks to its strong capability of GANs in generating high-quality images. Different from the conventional GANs that take a noise vector to generate an image, our denoising model serves as the generator that only takes the LDCT image as the input. In this study, we used the RED-CNN 

[6] as the denoising model to demonstrate the effectiveness of the dual-domain U-Net based discriminators in the adversarial training.

Fig. 2: The difference between a) traditional classification discriminator, and b) U-Net based discriminator. U-Net based discriminator extends traditional one to capture the global and local information simultaneously.

Iii-B Dual-Domain U-Net Based Discriminator

The GANs-based methods [7, 8, 9, 36] for LDCT denoising usually maintain the competition of GANs under the structural level, whose discriminator progressively downsamples the input into a scalar value and are trained with Wasserstein GANs [32, 33], as shown in Fig. 2(a). However, the discriminator is prone to forgetting previous samples because the distribution of synthetics samples shifts as the generator constantly changes during training, failing to maintain a powerful data representation to characterize the global and local image difference [37].

To address the problems above, we introduce the U-Net based discriminators in both image and gradient domains.

Iii-B1 U-Net based discriminator in the image domain

To learn a powerful data representation that can characterize both global and local difference, we design an LDCT denoising framework based on GANs to deal with LDCT denoising. Traditionally, U-Net contains an encoder, a decoder, and several skip connections copying the feature-maps from the encoder to the decoder to preserve high-resolution features, which has demonstrated its state-of-the-art performance in many semantic segmentation tasks [38, 39] and image translation tasks [16, 30]. In the context of LDCT denoising, we highlight that U-Net and its variants are only used as the denoising model, which has not been explored as the discriminator. We adapted the U-Net to replace the standard classification discriminator in GANs to have a U-Net style discriminator that allows the discriminator to maintain both global and local data representation. Fig. 2(b) details the architecture of U-Net based discriminator.

Fig. 3: Gradients of horizontal and vertical directions from a pair of LDCT and NDCT images. The streak artifacts can be easily captured in the gradient domain.

Here, we use to denote the U-Net based discriminator in the image domain. The encoder of , , follows the traditional discriminator that progressively downsamples the input using several convolutional layers, capturing the global structure context. On the other hand, the decoder performs progressive upsampling with skip connections from encoder in a reverse order, further enhancing the ability of discriminator to draw the local details of real and fake samples. Furthermore, the discriminator loss is computed from the outputs of both and , while the traditional discriminator used in previous works [7, 8, 36] onlyclassifies the inputs into being real and fake from the encoder. In doing so, the U-Net based discriminator can provide more informative feedback to the generator including both local per-pixel and global structural information. In this paper, we employ the least-squares GANs [34] rather than conventional GANs [14] for the discriminators to stabilize the training process and improve the visual quality of denoised LDCT. Formally, the discriminator loss for from both and can be written as:

(2)

where 1 is the decision boundary of least-squares GANs.

Iii-B2 U-Net based discriminator in the gradient domain

However, the competition in the image domain alone is only able to force the generator towards generating photo-realistic denoised LDCT images; it is insufficient to encourage better edge for keeping the pathological changes of original NDCT images and alleviate the streak artifacts caused by photon starvation in LDCT. Previous methods such as [9] measure the different MSE in the gradient domain, which may be insufficient to enhance the edge as MSE tends to blur image. To this end, we propose to perform an additional GANs competition in the gradient domain, where our motivation is presented in Fig. 3

. Specifically, the streaks and edge in CT images are highlighted in their horizontal and vertical gradient magnitudes. Therefore, another branch of the gradients estimated by a Sobel operator 

[40] is performed aside the image branch, which encourages better edge information and alleviates streak artifacts. Similar to (2), we can define the discriminator loss in the gradient domain , where represents the discriminator in the gradient domain.

Iii-B3 Dual-domain U-Net based discriminators

Combining the U-Net based discriminators in the image and gradient domains, two independent GANs competitions are maintained during training. The overall framework of our proposed LDCT denoising model is shown in Fig. 1. In detail, the generator is to denoise an LDCT image, which is then fed into two independent discriminators operating in the image and gradient domains. The discriminator in the image domain branch penalizes the generator generating photo-realistic denoised LDCT while the discriminator in the gradient domain branch encourages better edge while alleviating streak artifacts caused by photon starvation. Additionally, the discriminator in each branch employs a U-Net based architecture to encourage the generator focusing both global structure and local details, which can also boost the interpretability of the denoising process with the per-pixel confidence map output by and . Finally, the dual-domain U-Net based discriminator loss can be defined as follows:

(3)

Iii-C CutMix Regularization

The discriminator suffers from the decreasing capability in recognizing the local differences between real and fake samples as the training goes, which may unexpectedly harm the denoising performance. Besides, the discriminator is supposed to focus on structure change at the global level and local details at the per-pixel level. To address these issues, we adapt the CutMix augmentation technique to regularize the discriminator inspiring by [16, 41], which can empower the discriminator to learn the intrinsic difference between real and fake samples. Specifically, CutMix technique generates a new training image from two images by cutting patches from the one and pasting them to another. We define this augmentation technique in the context of LDCT denoising as follows:

(4)

where is a binary mask controlling how to mix the NDCT and denoised images, and represents the element-wise multiplication.

Fig. 4: Illustration of the CutMix regularization for the U-Net based discriminator, i.e., . We randomly sample the ratio and top-left coordinates of the bounding box to form the mask controlling where to crop. is able to effectively capture the pixel differences between NDCT and denoised LDCT images while can predict the mixed ratio. Note that the blue color of indicates the lower confidence score. Therefore, a well-trained discriminator can provide radiologists with a confidence map showing the uncertainty of the denoised results.

The mixed samples should be regarded as fake samples globally by the encoder since the CutMix operation has destroyed the global context of NDCT image; otherwise the CutMix may be introduced to denoised LDCT images during the training of GANs, causing undesirable denoising. Similarly, the should be able to recognize the mixed area to provide the generator with accurate per-pixel feedback. Therefore, the regularization loss of CutMix can be formulated as:

(5)

where used in CutMix also serves as the ground truth for .

Furthermore, to penalize the outputs of discriminator to be consistent with the per-pixel predictions after the CutMix operation, we further introduce another consistency loss following [16] to regularize the discriminator with CutMix operation, which can be written as:

(6)

where represents the Frobenius norm.

During training, the binary mask is generated following the same pipline as [41, 42]. Specifically, we first sample the combination ratio

from beta distribution

and then uniformly sample the top-left coordinates of the bounding box of cropping regions from to , with preserving the ratio. Similar to [41, 42]

, we employ a probability

to control whether to apply the CutMix regularization technique for each mini-batch samples, which is empirically set to . Fig. 4 presents the visual results of with CutMix regularization technique. It can be observed that the outputs of are the spatial combination of the real and generated patches with respect to the real/fake classification score. Therefore, the results have demonstrated the strong discriminative capability of the U-Net based discriminator in accurately learning per-pixel differences between real and generated samples, even though they are cut and mixed together to fool the discriminator. Besides learning the per-pixel local details, can accurately predictions the proportion of real patches, i.e., the mixed ratio, as it is to focus on the global structures.

Iii-D Network Architecture

As we described above, our proposed method follows the GANs framework to optimize the generator effectively for LDCT denoising, with the U-Net based discriminator focusing on both global structures and local details, and an extra gradient branch encouraging better boundaries and details. In this subsection, we describe the network architectures of the generator and U-Net based discriminator.

Iii-D1 RED-CNN based generator

In this paper, we employ RED-CNN [6] as the generator of our framework for LDCT denoising since this paper mainly focuses on the adversarial loss from dual-domain U-Net based discriminators. The main difference from [6]

is that our framework is optimized in GANs manner, while the vanilla RED-CNN suffers the problem of over-smoothened LDCT images with MSE. Specifically, RED-CNN employs the U-Net architecture but removes the downsampling/upsampling operations to prevent information loss. We stack 10 (de)convolutional layers at both encoder and decoder, each of which has 32 filters for the sake of the computation cost, followed by a ReLU activation function. There are in total 10 residual skip connections. It is important to note that although RED-CNN is adopted as the generator in our framework, the proposed method can be also adapted to other GANs-based methods such as CPCE 

[7] and WGAN-VGG [8] with only changing the discriminators.

Iii-D2 U-Net based discriminator

As detailed in Section III-B, there are two independent discriminators in both image and gradient domains, each of which follows a U-Net architecture. Specifically, has 6 downsampling ResBlocks [43] with increasing number of filters; i.e. 64, 128, 256, 512, 512, and 512. At the bottom of , a fully-connected layer is used to output the global confidence score. Similarly, used the same number of ResBlocks in a reverse order to process the bilinearly upsampled features and the skip residuals of the same resolution, followed by a convolutional layer to output the per-pixel confidence map. Most importantly, a spectral normalization layer [35] and a Leaky ReLU activation with a slope of 0.2 for negative input follow each convolutional layer of except the last one.

Iii-E Loss Functions

Iii-E1 Adversarial loss

Here we employ the sum of these two branches as the adversarial loss, which is defined in the context of least-squares GANs as follows:

(7)

where denotes the Sobel operator to obtain the image gradient.

Iii-E2 Pixel-wise loss

To encourage the generator output the denoised LDCT images that match the NDCT images with both pixel level and gradient level, we adopt an pixel-wise loss between the NDCT images and denoised LDCT images, which includes a pixel loss and gradient loss for each branch as shown in Fig. 1. The additional gradient loss can encourage to better preserve edge information at the pixel level. The two losses can be written as:

(8)
(9)

Note that we employ the mean squared error in pixel level rather than the feature level using pretrained model [7, 8] for the sake of computation cost, and the absolute mean error in gradient level as the gradients is much sparser than pixels.

Iii-E3 Final loss

To encourage the generator to generate photo-realistic denoised LDCT images with better edge information and alleviate streak artifacts, the final loss function to optimize the generator is expressed as:

(10)

where controls the among between different loss components.

The discriminators and are optimized by minimizing the following mixed loss:

(11)

Note that we employ the same loss function in (11) to optimize both and but they are independent to each other and has an additional Sobel operator to compute the gradients.

Iv Experiments

This section presents the datasets, implementation details, qualitative and quantitative evaluations, uncertainty visualization, and ablation study.

Fig. 5: Transverse CT images from the Mayo-10%, Mayo-25%, and Piglet-5%: (a) LDCT; (b) NDCT; (c) RED-CNN; (d) WGAN-VGG; (e) CPCE-2D; (f) Q-AE; and (g) DU-GAN (ours). Zoomed ROI of the red rectangle is shown below the full-size one. The display window is [-160, 240] HU for better visualization. Red arrow indicates low attenuation lesion. Green arrow indicates the white edge artifacts shown in some baseline algorithms while not shown in our method.

Iv-a Datasets

Iv-A1 Simulated dataset

The LDCT dataset used in this study was originally for the 2016 NIHAAPM-Mayo Clinic Low-Dose CT Grand Challenge, and lately released in [44]. It provides scans from three regions of the body with different simulated low doses; i.e., head with of normal-dose, abdomen with , and chest with . In our experiments, we used the 25% abdomen and 10% chest datasets, named Mayo-25% and Mayo-10%, respectively. We evaluated our method on abdomen scans for comparisons with most previous works, and conducted experiments on chest scans since of normal-dose at chest is rather challenging compared to the of normal-dose at abdomen. For each dataset, we randomly select 20 patients for training and another 20 patients for testing; no identity overlapping between training and testing. In detail, 300K and 64K image patches were randomly selected from each set. For more information about this dataset, please refer to [44].

Fig. 6: Transverse neck CT images from the Mayo-10%: (a) LDCT; (b) NDCT; (c) RED-CNN; (d) WGAN-VGG; (e) CPCE-2D; (f) Q-AE; and (g) DU-GAN (ours). Zoomed ROI of the red rectangle is shown in the second row. The display window is [-160, 240] HU for better visualization. Red arrow indicates bone area while green arrow indicates a small structure.
Method Mayo-10% Mayo-25% Piglet-5%
PSNR RMSE SSIM PSNR RMSE SSIM PSNR RMSE SSIM
LDCT 14.6382 0.1913 0.6561 31.5517 0.0283 0.8639 28.7279 0.0395 0.8587
MSE-based RED-CNN [6] 23.1388 0.0721 0.7249 34.5740 0.0196 0.9236 26.9691 0.0450 0.9318
Q-AE [45] 21.3149 0.0884 0.7045 34.6477 0.0197 0.9215 29.7081 0.0331 0.9317
GAN-based WGAN-VGG [8] 20.3922 0.0992 0.7029 33.2910 0.0226 0.9092 30.3787 0.0318 0.9232
CPCE-2D [7] 20.1435 0.0899 0.7295 33.0612 0.0232 0.9125 28.5329 0.0379 0.9211
DU-GAN (ours) 22.3075 0.0802 0.7489 34.6186 0.0196 0.9196 29.8598 0.0325 0.9345
TABLE I: Quantitative comparisons of different methods on the testing sets of two simulated datasets and one real-world dataset. The best results among MSE- and GAN-based methods are marked in bold.

Iv-A2 Real-world dataset

The real-world dataset from [36] includes 850 CT scans of a deceased piglet obtained by a GE scanner (Discovery CT750 HD). The dataset provides CT scans of the normal-dose, 50%, 25%, 10% and 5% dose with a size of , 708 of which is served for training while the left for testing. We evaluated our method on 5% low-dose CTs as it is the most challenging dose, where the dataset is named Piglet-5%. We randomly selected 60K and 12K image patches from training and testing sets, respectively. For more information about this dataset, please refer to [36].

Iv-B Implementation Details

Following [7, 8, 45], we employed the image patches with a size of and a window of to train all models with emphasis on tissue CT window, which are then directly applied to the whole image for visualization and testing. Note that we excluded those image patches that were mostly air. During training, all images are linearly normalized to .

During training, we trained the model with a maximum of 100K iterations and with a mini-batch of size 64 on one NVIDIA V100 GPU. All networks in the proposed framework are initialized with He initialization [46], and optimized by Adam optimization method [47] with a fixed learning rate of

. The hyperparameters in the loss functions were empirically set as follows:

was ; was ; and was . We implemented four deep-learning-based methods including RED-CNN [6], WGAN-VGG [8], CPCE-2D [7], and Q-AE [45] with the reference of official source code.

Iv-C Qualitative Evaluations

To demonstrate the effectiveness of the proposed method in generating photo-realistic denoised results with faithful details, Fig. 5 showcases the representative results from three different datasets while Fig. 6 presents the results of one neck CT slice with strong streak artifacts. The regions-of-interest (ROIs) marked by the red rectangles are zoomed below, respectively.

Fig. 7: Uncertainty visualization of applying the trained discriminator to the outputs of different methods: (a) LDCT; (b) NDCT; (c) RED-CNN; (d) WGAN-VGG; (e) CPCE-2D; (f) Q-AE; and (g) DU-GAN (ours). The display window is [-160, 240] HU for better visualization. Note that the blue color of indicates the lower confidence score while red color indicates higher confidence score.

All methods present visually well denoised results to some degrees. However, RED-CNN and Q-AE over-smoothed and blurred the LDCT images as they are optimized by the MSE loss that tends to average the results, causing the loss of structural details. Although WGAN-VGG and CPCE-2D have greatly improved the visual fidelity, as expected, due to the use of adversarial loss, minor streak artifacts can still be observed since their traditional classification discriminator only provide the generator with global structure feedback. Besides, they employed the perceptual loss in the high-level feature space to suppress the blurriness resulting from MSE loss. The perceptual loss, however, can only preserve the structures of NDCT images since some local details may be lost after processed by a pre-trained model. For example, the low attenuation lesions in Fig. 5, and the bones in Fig. 6 are less clear by WGAN-VGG and CPCE-2D while they can be easily observed in NDCT as well as the results of our methods. Most importantly, the small structures with their boundaries are consistently preserved with a clear visual fidelity. This benefits from the well-designed dual-domain U-Net based discriminators, which can provide feedback of both global structures and local details to the generator, compared to the traditional classification discriminator used in WGAN-VGG and CPCE-2D with only structure information. Besides, the gradient domain branch can also encourage the denoising model to better preserve edge information.

Beyond encouraging better edge, Fig. 6 also demonstrates its impressive performance in dealing with the LDCT images with strong streak artifacts caused by photon starvation. Compared to the baseline methods that produce strongly blurry and ghosted denoised results, our method can effectively address this problem in the following aspects:

  1. streak artifacts can be easily detected by the gradient domain branch; and

  2. once detected, the dual-domain U-Net discriminators can fill the occlusion area by adversarial training to alleviate the impact of streak artifacts.

In summary, all of these results further validate the superiority of our methods.

Iv-D Quantitative Evaluations

For quantitative evaluations, we adopted three widely-used metrics including peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and root mean square error (RMSE). More specifically, PSNR and RMSE measure the denoising performance at pixel level while SSIM computes the structural similarity within a window. Table 

I presents the results of different methods. First, RED-CNN and Q-AE are MSE-based denoising methods as they are directly trained with solely MSE loss. Although they achieve better PSNE and RMSE results, the visual results in Figs. 5 and 6 confirm that MSE-based methods produce over-smoothed results compared to the NDCT images, leading to lose of structural information [7, 8, 48]. Note that the over-smoothed denoising results lead to a lower SSIM score. Second, WGAN-VGG, CPCE-2D, and our DU-GAN are GAN-based methods. CPCE-2D performed better than WGAN-VGG due to the conveying path since WGAN-VGG has to reconstruct the denoised results from the input LDCT images. Obviously, our method performs the best in terms of SSIM score with high visual fidelity while the PSNR and RMSE are also better than WGAN-VGG and CPCE-2D, indicating the superior denoising performance of our method while better structural fidelity.

Iv-E Uncertainty Visualization

Fig. 4 shows the proposed discriminator with the U-Net architecture and CutMix regularization can robustly learn the per-pixel differences of local details between NDCT and denoised LDCT images by the decoder and also focus on the global structures by the encoder. With this well-trained discriminator, we can provide radiologist with a confidence map showing the uncertainty of the denoised results, since it is to learn the distribution of real samples, i.e., NDCT images. Therefore, we directly applied the trained discriminator to the LDCT, NDCT, and the denoised LDCT images of different methods.

Fig. 7 shows the uncertainty visualization. Obviously, the discriminator can accurately distinguish the LDCT from NDCT images, on both global score and per-pixel confidence. As both RED-CNN and Q-AE over-smoothen the LDCT images, the abdomen area of transverse CT image becomes better than LDCT images on the confidence map, according to the results of . This also explains why RED-CNN and Q-AE have the lowest global score of , which indicates that the discriminator can robustly detect the blurriness in the CT images. Furthermore, although CPCE-2D can produce more clear denoised results than RED-CNN, the streak artifacts significantly compromised the quality of the denoised results. Similarly, the WGAN-VGG has learned more local details than CPCE-2D but it still cannot handle the impact of the streak artifacts. On the contrary, the proposed method can produce the most photo-realistic denoised results with the highest global score. Compared to the traditional classification discriminator used in CPCE-2D and WGAN-VGG, our DU-GAN can provide the generator with the per-pixel feedback by learning the local detail differences. It can be seen from the per-pixel of . In other words, we achieve a more smooth per-pixel confidence, indicating that the discriminator cannot distinguish the real and fake samples at the per-pixel level.

Iv-F Ablation Study

In this subsection, we conducted the ablation study of our method to fully explore the proposed method in terms of the importance of different components, the architectures of discriminator, and the different patch sizes. The ablation study was done on the testing set of Mayo-10% dataset, which includes a total of 6,590 slices from 20 patients.

Iv-F1 Components analysis

We investigate the impact of the U-Net based discriminator in the image domain, CutMix regularization, and dual-domain training (i.e., with gradient branch) by gradually applying them to the baseline method. Similar to WGAN-VGG and CPCE-2D, the baseline method only includes the traditional classification discriminator with the same hyperparameters for fair comparison.

Method PSNR RMSE SSIM
Baseline 21.4988 0.0871 0.7365
 + U-Net Based Discriminator 22.1214 0.0816 0.7454
  + CutMix Regularization 21.7894 0.0844 0.7477
Ours (+ Dual-Domain) 22.3075 0.0802 0.7489
TABLE II: Ablation study of component analysis. Our method is the baseline method with U-Net discriminator in the image domain, CutMix regularization, and the U-Net discriminator in the gradient domain (dual-domain). The best results are marked in bold.

Table II

presents the quantitative results for ablation study. First, replacing the traditional classification discriminator with a U-Net based discriminator can simultaneously provide the generator with both global structure and local per-pixel feedback, which leads to a significant increase in terms of SSIM. Second, when we further use CutMix technique to regularize the U-Net based discriminator, the mixed samples can boost the discriminant capacity of discriminator and make discriminator more focus on the local details, leading to the increased SSIM score and a slightly decreased PSNR and RMSE. Last, further adding the U-net based discriminator in the gradient domain into the method above forming the dual-domain training yields our method. Specifically, the additional gradient domain training can help our method remove the streak artifacts and encourage more clear edge in the denoised LDCT images. As a result, it can effectively improve all metrics including the PSNR and RMSE in pixel space and SSIM in visual similarity.

Method PSNR RMSE SSIM
Patch 21.4988 0.0871 0.7365
Global 22.6810 0.0760 0.7262
Pixel 23.1102 0.0724 0.7343
U-Net 22.1214 0.0816 0.7454
TABLE III: Ablation study of different discriminators on testing set of Mayo-10% dataset. The best results are marked in bold.

Iv-F2 Architectures of discriminator

Since the architectures of discriminator play a critical role in the training of GANs, it is worthwhile studying the advantage of the U-Net based discriminator over other classical discriminator architectures such as patch discriminator [30], pixel discriminator [30], and traditional global discriminator. Compared to traditional classification discriminator that classifies the real and fake samples at image level, patch discriminator focuses on the image patches. Due to the patch training of low-dose CT denoising, this discriminator architecture can be seen as the patch discriminator. The discriminator with seven convolutional layers and one fully-connected layer is regarded as the global discriminator. On the other hand, the pixel discriminator [30] contains 7 convolutional layers to penalizes the generator at per-pixel level. For fair comparisons, we trained patch and pixel discriminator with image patches and trained the global discriminator with the whole images with the size of , respectively. Table III shows that the combination of global and pixel information in U-Net based discriminator produces the best SSIM score.

Size PSNR RMSE SSIM
6464 22.3075 0.0802 0.7489
128128 22.0218 0.0826 0.7479
256256 21.8478 0.0843 0.7467
512512 21.7254 0.0855 0.7441
TABLE IV: Ablation study of patch sizes on the testing set of Mayo-10% dataset. The best results are marked in bold.

Iv-F3 Patch size

Due to the U-Net architecture of the discriminator, it is also important to analyze the influence of the patch size during training. However, it is very difficult to directly train the denoising model from scratch. Therefore, we trained our model with the image size of , , , and , and we fine-tuned the generator based on the model trained on previous smaller size. Table IV shows that a small patch size can achieve better performance because the larger patch sizes may introduce training difficulties with less training samples.

V Discussion and Conclusion

In this paper, we proposed a novel DU-GAN for LDCT denoising. The introduced U-Net based discriminator can not only provide the per-pixel feedback to the denoising network but also focus on the global structure. We further add an extra U-Net based discriminator into the gradient domain, which can enhance the edge information and alleviate the streak artifacts caused by photon starvation. We also examined that the CutMix technique can boost the training of discriminator, which can provide the radiologists with a confidence map on the uncertainty of the denoised results. Extensive experiments demonstrated the effectiveness of the proposed method through visual comparison and quantitative comparison.

We acknowledge some limitations in this work. First, we used the qualitative and quantitative comparisons to evaluate the image quality. A human reader study may be needed to further validate its potential in clinical diagnosis, though there are significant difference between the proposed and other baseline methods. Second, the U-net based discriminator can provide radiologists with a confidence map of the denoised images. How this helps radiologists in clinical routine could be examined with specific tasks such as liver lesion diagnosis, which can be further studied as a future direction.

In conclusion, the proposed DU-GAN achieves better denoising performance than other GAN-based models and has great potential for clinical use with uncertainty visualization.

References

  • [1] N. B. Shah and S. L. Platt, “ALARA: is there a cause for alarm? reducing radiation risks from computed tomography scanning in children,” Current Opinion Pediatrics, vol. 20, no. 3, pp. 243–247, 2008.
  • [2] Y. Lei, Y. Tian, H. Shan, J. Zhang, G. Wang, and M. K. Kalra, “Shape and margin-aware lung nodule classification in low-dose CT images via soft activation mapping,” Med. Image Anal., vol. 60, p. 101628, 2020.
  • [3] F. Attivissimo, G. Cavone, A. M. L. Lanzolla, and M. Spadavecchia, “A technique to improve the image quality in computer tomography,” IEEE Trans. Instrum. Meas., vol. 59, no. 5, pp. 1251–1257, 2010.
  • [4] G. Wang, J. C. Ye, and B. De Man, “Deep learning for tomographic image reconstruction,” Nat. Mach. Intell., vol. 2, no. 12, pp. 737–748, 2020.
  • [5] H. Chen, Y. Zhang, W. Zhang, P. Liao, K. Li, J. Zhou, and G. Wang, “Low-dose CT via convolutional neural network,” Biomed. Opt. Express, vol. 8, no. 2, pp. 679–694, 2017.
  • [6] H. Chen et al., “Low-dose CT with a residual encoder-decoder convolutional neural network (RED-CNN),” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2524–2535, 2017.
  • [7] H. Shan et al.

    , “3-D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2-D trained network,”

    IEEE Trans. Med. Imaging, vol. 37, no. 6, pp. 1522–1534, 2018.
  • [8] Q. Yang et al., “Low dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss,” IEEE Trans. Med. Imaging, vol. 37, no. 6, pp. 1348–1357, 2018.
  • [9] H. Shan et al., “Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose CT image reconstruction,” Nat. Mach. Intell., vol. 1, no. 6, pp. 269–276, 2019.
  • [10] J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Isgum, “Generative adversarial networks for noise reduction in low-dose CT,” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2536–2545, 2017.
  • [11]

    G. Wang, J. C. Ye, K. Mueller, and J. A. Fessler, “Image reconstruction is a new frontier of machine learning,”

    IEEE Trans. Med. Imaging, vol. 37, no. 6, pp. 1289–1296, 2018.
  • [12] D. Wu, K. Kim, G. E. Fakhri, and Q. Li, “A cascaded convolutional nerual network for x-ray low-dose CT image denoising,” arXiv preprint arXiv:1705.04267, 2017.
  • [13] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,” IEEE Trans. Comput. Imaging, vol. 3, no. 1, pp. 47–57, 2016.
  • [14] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
  • [15] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
  • [16] E. Schonfeld, B. Schiele, and A. Khoreva, “A U-Net based discriminator for generative adversarial networks,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2020, pp. 8207–8216.
  • [17]

    C. H. Lin, C.-C. Chang, Y.-S. Chen, D.-C. Juan, W. Wei, and H.-T. Chen, “COCO-GAN: generation by parts via conditional coordinating,” in

    Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 4512–4521.
  • [18] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in Proc. Int. Conf. Mach. Learn.   PMLR, 2019, pp. 7354–7363.
  • [19]

    J. Wang, H. Lu, T. Li, and Z. Liang, “Sinogram noise reduction for low-dose CT by statistics-based nonlinear filters,” in

    Proc. of SPIE, vol. 5747, 2005, p. 2059.
  • [20] J. Wang, T. Li, H. Lu, and Z. Liang, “Penalized weighted least-squares approach to sinogram noise reduction and image reconstruction for low-dose X-ray computed tomography,” IEEE Trans. Med. Imaging, vol. 25, no. 10, pp. 1272–1283, 2006.
  • [21] A. Manduca et al., “Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT,” Med. Phys., vol. 36, no. 11, pp. 4911–4919, 2009.
  • [22] S. Ramani and J. A. Fessler, “A splitting-based iterative algorithm for accelerated statistical x-ray ct reconstruction,” IEEE Trans. Med. Imaging, vol. 31, no. 3, pp. 677–688, 2011.
  • [23]

    W. Wu, J. Shi, H. Yu, W. Wu, and V. Vardhanabhuti, “Tensor gradient L0-norm minimization-based low-dose CT and its application to COVID-19,”

    IEEE Trans. Instrum. Meas., vol. 70, pp. 1–12, 2021.
  • [24] X. Zheng, S. Ravishankar, Y. Long, and J. A. Fessler, “PWLS-ULTRA: An efficient clustering and learning-based approach for low-dose 3d ct image reconstruction,” IEEE Trans. Med. Imaging, vol. 37, no. 6, pp. 1498–1510, 2018.
  • [25] Q. Xu, H. Yu, X. Mou, L. Zhang, J. Hsieh, and G. Wang, “Low-dose x-ray CT reconstruction via dictionary learning,” IEEE Trans. Med. Imaging, vol. 31, no. 9, pp. 1682–1697, 2012.
  • [26] J. Ma et al., “Low-dose computed tomography image restoration using previous normal-dose scan,” Med. Phys., vol. 38, no. 10, pp. 5713–5731, 2011.
  • [27] P. F. Feruglio, C. Vinegoni, J. Gros, A. Sbarbati, and R. Weissleder, “Block matching 3D random noise filtering for absorption optical projection tomography,” Phys. Med. Biol., vol. 55, no. 18, p. 5401, 2010.
  • [28]

    M. Li, W. Hsu, X. Xie, J. Cong, and W. Gao, “SACNN: self-attention convolutional neural network for low-dose CT denoising with self-supervised perceptual loss network,”

    IEEE Trans. Med. Imaging, vol. 39, no. 7, pp. 2289–2301, 2020.
  • [29] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of GANs for improved quality, stability, and variation,” in Proc. Int. Conf. Learn Represent., 2018, pp. 1–26.
  • [30]

    P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in

    Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2017, pp. 1125–1134.
  • [31] A. Guo, L. Fang, M. Qi, and S. Li, “Unsupervised denoising of optical coherence tomography images with nonlocal-generative adversarial network,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–12, 2020.
  • [32] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.
  • [33] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of Wasserstein GANs,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5769–5779.
  • [34] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in Proc. IEEE Int. Conf. Comp. Vis., 2017, pp. 2794–2802.
  • [35] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in Proc. Int. Conf. Learn Represent., 2018, pp. 1–26.
  • [36] X. Yi and P. Babyn, “Sharpness-aware low-dose CT denoising using conditional generative adversarial network,” J. Digit. Imaging, vol. 31, no. 5, pp. 655–669, 2018.
  • [37] T. Chen, X. Zhai, M. Ritter, M. Lucic, and N. Houlsby, “Self-supervised GANs via auxiliary rotation loss,” in Proc. IEEE/CVF Conf. Comp. Vis. Patt. Recogn., 2019, pp. 12 154–12 163.
  • [38] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested U-net architecture for medical image segmentation,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support.   Springer, 2018, pp. 3–11.
  • [39] Y. Han and J. C. Ye, “Framing U-Net via deep convolutional framelets: Application to sparse-view CT,” IEEE Trans. on Med. Imaging, vol. 37, no. 6, pp. 1418–1429, 2018.
  • [40] N. Kanopoulos, N. Vasanthavada, and R. L. Baker, “Design of an image edge detection filter using the Sobel operator,” IEEE J. Solid-State Circuits, vol. 23, no. 2, pp. 358–367, 1988.
  • [41] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “CutMix: Regularization strategy to train strong classifiers with localizable features,” in Proc. IEEE Int. Conf. Comp. Vis., 2019, pp. 6023–6032.
  • [42] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” Proc. Int. Conf. Learn Represent., 2018.
  • [43] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016, pp. 770–778.
  • [44] T. R. Moen et al., “Low-dose CT image and projection dataset,” Med. Phys., vol. 48, no. 2, pp. 902–911, 2021.
  • [45] F. Fan et al.

    , “Quadratic autoencoder (Q-AE) for low-dose CT denoising,”

    IEEE Trans. Med. Imaging, vol. 39, no. 6, pp. 2035–2050, 2019.
  • [46]

    K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in

    Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1026–1034.
  • [47] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [48]

    J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in

    Eur. Conf. Comp. Vis.   Springer, 2016, pp. 694–711.