DU-GAN: Generative Adversarial Networks with Dual-Domain U-Net Based Discriminators for Low-Dose CT Denoising
LDCT has drawn major attention in the medical imaging field due to the potential health risks of CT-associated X-ray radiation to patients. Reducing the radiation dose, however, decreases the quality of the reconstructed images, which consequently compromises the diagnostic performance. Various deep learning techniques have been introduced to improve the image quality of LDCT images through denoising. GANs-based denoising methods usually leverage an additional classification network, i.e. discriminator, to learn the most discriminate difference between the denoised and normal-dose images and, hence, regularize the denoising model accordingly; it often focuses either on the global structure or local details. To better regularize the LDCT denoising model, this paper proposes a novel method, termed DU-GAN, which leverages U-Net based discriminators in the GANs framework to learn both global and local difference between the denoised and normal-dose images in both image and gradient domains. The merit of such a U-Net based discriminator is that it can not only provide the per-pixel feedback to the denoising network through the outputs of the U-Net but also focus on the global structure in a semantic level through the middle layer of the U-Net. In addition to the adversarial training in the image domain, we also apply another U-Net based discriminator in the image gradient domain to alleviate the artifacts caused by photon starvation and enhance the edge of the denoised CT images. Furthermore, the CutMix technique enables the per-pixel outputs of the U-Net based discriminator to provide radiologists with a confidence map to visualize the uncertainty of the denoised results, facilitating the LDCT-based screening and diagnosis. Extensive experiments on the simulated and real-world datasets demonstrate superior performance over recently published methods both qualitatively and quantitatively.READ FULL TEXT VIEW PDF
The explosive rise of the use of Computer tomography (CT) imaging in med...
Low-dose computed tomography (CT) has attracted a major attention in the...
Among the major remaining challenges for generative adversarial networks...
Image restoration is a typical ill-posed problem, and it contains variou...
As one of the most commonly ordered imaging tests, computed tomography (...
Clinical evidence has shown that rib-suppressed chest X-rays (CXRs) can
Noise and artifacts are intrinsic to low dose CT (LDCT) data acquisition...
DU-GAN: Generative Adversarial Networks with Dual-Domain U-Net Based Discriminators for Low-Dose CT Denoising
Computed tomography (CT) can provide the cross-sectional images of the internal body by the x-ray radiation, which is one of the most important imaging modalities in clinical diagnosis. Although CT plays an essential role in diagnosing diseases, the widespread use of CT is raising more and more public concerns towards its safety since CT-related X-ray radiation may cause unavoidable damage to the health of humans and induce cancers. Consequently, reducing the radiation dose of CT as low as reasonably achievable (a.k.a. ALARA) is a well-accepted principle in CT-related research over the past decades . The reduction of radiation dose, however, inevitably brings the noise and artifacts into the reconstructed images, severely compromising the subsequent diagnosis and other tasks such as LDCT-based lung nodule classification .
A straightforward way to address this issue is to reduce the noise in the LDCT image [3, 4]. However, it remains a challenging problem due to its ill-posed nature. In recent years, various deep learning based methods have been proposed for LDCT denoising [5, 6, 7, 8, 9, 10, 11]
, achieving impressive results. There are two key components in designing a denoising model: network architecture and loss function; the former one can determine the capacity of the denoising model while the latter one can control how the denoised images visually look like
. Although different network architectures such as 2D convolutional neural networks (CNNs), 3D CNNs [7, 10], and residual encoder-decoder CNNs (RED-CNN)  have been explored for LDCT denoising, literature has shown that the loss function is relatively more important than the network architecture as it has a direct impact on the image quality [7, 13].
One of the most popular loss functions is the mean-squared error (MSE), which computes the average of the squares of the per-pixel errors between the denoised and normal-dose images. Although gaining impressive performance in terms of peak signal-to-noise (PSNR), MSE usually leads to over-smoothened images, which has been proven to poorly correlate with the human perception of image quality [14, 15]. In view of this observation, alternative loss functions such as perceptual loss, loss, and adversarial loss have been investigated for LDCT denoising. Among them, adversarial loss has been shown to be a powerful one as it can dynamically measure the similarity between the denoised and normal-dose images during the training, which enables the denoised images to preserve more texture information from normal-dose one. The computation of the adversarial loss is based on the discriminator, which is a classification network to learn a representation differentiating the denoised images from the normal-dose images; it can measure the most discriminant difference either in a global or local level, depending on that one unit of the output of discriminator corresponds to the whole image or a local region. Such a discriminator is prone to forgetting previous difference because the distribution of synthetics samples shifts as the generator constantly changes through training, failing to maintain a powerful data representation to characterize the global and local image difference . As a result, it often results in the generated images with discontinued and mottled local structures  or images with incoherent geometric and structural patterns . In addition to the noise, LDCT images may contain severe streak artifacts caused by photon starvation, which may not be effectively removed through the loss function solely in the image domain.
To learn a powerful data representation to regularize the denoising model in the adversarial training, we propose a U-Net based discriminator in the GANs framework for LDCT denoising, termed DU-GAN, which can simultaneously learn the global and local difference between the denoised and normal-dose images in image and gradient domains. More specifically, our proposed discriminator follows the U-Net architecture including an encoder and a decoder network, where the encoder encodes the input to a scalar value focusing on the global structures while the decoder reconstructs a per-pixel confidence map capturing the changes of local details between the denoised and normal-dose images. In doing so, it can provide not only the per-pixel feedback but also the global structural difference to the denoising network. In addition to the adversarial training in the image domain, we also apply another U-Net based discriminator in the image gradient domain to alleviate the artifacts caused by photon starvation and enhance the edge of the denoised images. Moreover, the CutMix data augmentation technique between and denoised and normal-dose images is introduced to regularize the encoder and decoder of U-Net independently, enabling the per-pixel outputs of the U-Net based discriminator to provide radiologists with a confidence map to visualize the uncertainty of the denoised results, facilitating radiologists’ screening and diagnosis when using the denoised LDCT images.
The benefits of the proposed DU-GAN are as follows.
Unlike existing GAN-based denoising methods that use a classification as the discriminator, the proposed DU-GAN utilizes a U-Net based discriminator for LDCT denoising, which can simultaneously learn global and local difference between the denoised and normal-dose images. Consequently, it can provide not only the per-pixel feedback but also the global structural difference to the denoising model.
In addition to adversarial training in the image domain, the proposed DU-GAN also performs adversarial training in the image gradient domains, which can alleviate the streak artifacts caused by photon starvation and enhance the edge of the denoised images.
The proposed DU-GAN can provide radiologists with a confidence map visualizing the uncertainty of the denoised results through the CutMix technique, which could facilitate radiologists’ screening and diagnosis when using the denoised LDCT images.
Extensive experiments on simulated and real-world datasets demonstrate the effectiveness of the proposed method through both qualitative and quantitative comparisons.
The remainder of this paper is organized as follows. We briefly surveys the developments of the LDCT denoising methods and generative adversarial networks in Section II. We present our LDCT denoising framework DU-GAN with dual-domain U-Net based discriminators, and then introduce the CutMix regularization technique as well as the network architectures and loss functions in our framework in Section III, followed by both qualitative and quantitative comparisons with the state-of-the-art methods on the simulated and real-world datasets in Section IV. Finally, we conclude this paper in Section V.
This section briefly surveys the development of LDCT denoising and generative adversarial networks.
The noise reduction algorithms for LDCT can be summarized into three categories: 1) sinogram filtration; 2) iterative reconstruction; and 3) image post-processing. As a significant difference from routine CT, the LDCT acquires noisy sinogram data from scanner. A straightforward solution is to perform the denoising process on the sinogram data before image reconstruction, i.e. sinogram filtration-based methods [19, 20, 21]. Iterative reconstruction methods combine the statistics of raw data in the sinogram domain [22, 23] and the prior information in the image domain such as total variation  and dictionary learning ; these pieces of generic information can be effectively integrated in the maximum likelihood and compressed sensing frameworks. These two categories, however, require the access to raw data that are typically unavailable from commercial CT scanner.
Different from the previous two categories, image post-processing methods directly operate on the reconstructed images that are publicly available after removing patient privacy. Traditional methods such as non-local means  and block-matching 3D , however, lead to the loss of some critical structural details and result in over-smoothened denoised LDCT images. The rapid development of deep learning techniques have advanced many medical applications. In LDCT denoising, deep-learning-based models have achieved impressive results [5, 7, 9, 10, 12, 28]. There are two critical components in designing a deep-learning-based denoising model: network architecture and loss function; the former one determines the capacity of a denoising model while the later one controls how the denoised images visually look like. Although the literature has proposed several different network architectures for LDCT denoising such as 2D CNNs , 3D CNN [7, 10], RED-CNN , and cascaded CNN , the literature has shown that the loss function plays a relatively more important role than network architecture as it has a direct impact on the image quality [7, 13]. The simplest loss function is the MSE, which however has been shown to poorly correlate with the human perception of image quality [14, 15]. In view of this observation, alternative loss functions such as perceptual loss, loss, adversarial loss, or mixed loss functions have been investigated for LDCT denoising. Among them, adversarial loss has been shown to be a powerful one as it can dynamically measure the similarity between the denoised and normal-dose images during the training, which enables the denoised images to preserve more texture information from normal-dose one. Adversarial loss reflects either global or local similarity, depending on the design of discriminator.
Unlike the conventional adversarial loss, the adversarial loss used in this study is based on a U-Net based discriminator, which can simultaneously characterize global and local difference between the denoised and normal-dose images, better regularizing the denoising model. In addition to the adversarial loss in the image domain, the adversarial loss in the image gradient domain proposed in this paper can alleviate the streak artifacts caused by photon starvation and enhance the edge of the denoised images.
As one of the most hot research topics in recent years, GANs  and their variants have been successfully applied to various tasks [29, 30, 31]. They typically consist of two networks: 1) a generator learning to capture the data distribution of training data and produce new samples that are indistinguishable from the real ones, and 2) a discriminator attempting to distinguish real samples from fake ones produced by the generator. These two networks are trained alternatively, ending once the balance is achieved. In the context of LDCT denoising, the generator aims to produce photo-realistic denoised results to fool the discriminator while the discriminator tries to distinguish the real normal-dose CT (NDCT) images and denoised ones. To foster the stability of training GANs, various variants of GANs have been proposed, such as Wasserstein GAN (WGAN) , WGAN with gradient penalty (WGAN-GP) , and least-squares GANs .
In this paper, we adopt the least-squares GANs , spectral normalization , and U-Net based discriminator  to form the GANs framework for LDCT denoising. As a significant difference, our DU-GAN performs adversarial training in both image and gradient domains, which can reduce noise and alleviate streak artifacts simultaneously. We note that the proposed DU-GAN is also suitable for other variants of GAN such as WGAN and WGAN-GP.
Fig. 1 presents the proposed DU-GAN for LDCT denoising, which contains a denoising model as generator, and two U-Net based discriminators in both image and gradient domains. We highlight that the U-Net based discriminator is able to learn the global and local difference between denoised and normal-dose images. Next, we present all components, network architecture, and loss functions in detail.
The denoising process is to learn a generative model that maps an LDCT image of size to its normal-dose CT (NDCT) counterpart by removing the noise in LDCT image. Formally, it can be written as:
utilize the GANs to improve the visual quality of denoised LDCT images thanks to its strong capability of GANs in generating high-quality images. Different from the conventional GANs that take a noise vector to generate an image, our denoising model serves as the generator that only takes the LDCT image as the input. In this study, we used the RED-CNN as the denoising model to demonstrate the effectiveness of the dual-domain U-Net based discriminators in the adversarial training.
The GANs-based methods [7, 8, 9, 36] for LDCT denoising usually maintain the competition of GANs under the structural level, whose discriminator progressively downsamples the input into a scalar value and are trained with Wasserstein GANs [32, 33], as shown in Fig. 2(a). However, the discriminator is prone to forgetting previous samples because the distribution of synthetics samples shifts as the generator constantly changes during training, failing to maintain a powerful data representation to characterize the global and local image difference .
To address the problems above, we introduce the U-Net based discriminators in both image and gradient domains.
To learn a powerful data representation that can characterize both global and local difference, we design an LDCT denoising framework based on GANs to deal with LDCT denoising. Traditionally, U-Net contains an encoder, a decoder, and several skip connections copying the feature-maps from the encoder to the decoder to preserve high-resolution features, which has demonstrated its state-of-the-art performance in many semantic segmentation tasks [38, 39] and image translation tasks [16, 30]. In the context of LDCT denoising, we highlight that U-Net and its variants are only used as the denoising model, which has not been explored as the discriminator. We adapted the U-Net to replace the standard classification discriminator in GANs to have a U-Net style discriminator that allows the discriminator to maintain both global and local data representation. Fig. 2(b) details the architecture of U-Net based discriminator.
Here, we use to denote the U-Net based discriminator in the image domain. The encoder of , , follows the traditional discriminator that progressively downsamples the input using several convolutional layers, capturing the global structure context. On the other hand, the decoder performs progressive upsampling with skip connections from encoder in a reverse order, further enhancing the ability of discriminator to draw the local details of real and fake samples. Furthermore, the discriminator loss is computed from the outputs of both and , while the traditional discriminator used in previous works [7, 8, 36] onlyclassifies the inputs into being real and fake from the encoder. In doing so, the U-Net based discriminator can provide more informative feedback to the generator including both local per-pixel and global structural information. In this paper, we employ the least-squares GANs  rather than conventional GANs  for the discriminators to stabilize the training process and improve the visual quality of denoised LDCT. Formally, the discriminator loss for from both and can be written as:
where 1 is the decision boundary of least-squares GANs.
However, the competition in the image domain alone is only able to force the generator towards generating photo-realistic denoised LDCT images; it is insufficient to encourage better edge for keeping the pathological changes of original NDCT images and alleviate the streak artifacts caused by photon starvation in LDCT. Previous methods such as  measure the different MSE in the gradient domain, which may be insufficient to enhance the edge as MSE tends to blur image. To this end, we propose to perform an additional GANs competition in the gradient domain, where our motivation is presented in Fig. 3
. Specifically, the streaks and edge in CT images are highlighted in their horizontal and vertical gradient magnitudes. Therefore, another branch of the gradients estimated by a Sobel operator is performed aside the image branch, which encourages better edge information and alleviates streak artifacts. Similar to (2), we can define the discriminator loss in the gradient domain , where represents the discriminator in the gradient domain.
Combining the U-Net based discriminators in the image and gradient domains, two independent GANs competitions are maintained during training. The overall framework of our proposed LDCT denoising model is shown in Fig. 1. In detail, the generator is to denoise an LDCT image, which is then fed into two independent discriminators operating in the image and gradient domains. The discriminator in the image domain branch penalizes the generator generating photo-realistic denoised LDCT while the discriminator in the gradient domain branch encourages better edge while alleviating streak artifacts caused by photon starvation. Additionally, the discriminator in each branch employs a U-Net based architecture to encourage the generator focusing both global structure and local details, which can also boost the interpretability of the denoising process with the per-pixel confidence map output by and . Finally, the dual-domain U-Net based discriminator loss can be defined as follows:
The discriminator suffers from the decreasing capability in recognizing the local differences between real and fake samples as the training goes, which may unexpectedly harm the denoising performance. Besides, the discriminator is supposed to focus on structure change at the global level and local details at the per-pixel level. To address these issues, we adapt the CutMix augmentation technique to regularize the discriminator inspiring by [16, 41], which can empower the discriminator to learn the intrinsic difference between real and fake samples. Specifically, CutMix technique generates a new training image from two images by cutting patches from the one and pasting them to another. We define this augmentation technique in the context of LDCT denoising as follows:
where is a binary mask controlling how to mix the NDCT and denoised images, and represents the element-wise multiplication.
The mixed samples should be regarded as fake samples globally by the encoder since the CutMix operation has destroyed the global context of NDCT image; otherwise the CutMix may be introduced to denoised LDCT images during the training of GANs, causing undesirable denoising. Similarly, the should be able to recognize the mixed area to provide the generator with accurate per-pixel feedback. Therefore, the regularization loss of CutMix can be formulated as:
where used in CutMix also serves as the ground truth for .
Furthermore, to penalize the outputs of discriminator to be consistent with the per-pixel predictions after the CutMix operation, we further introduce another consistency loss following  to regularize the discriminator with CutMix operation, which can be written as:
where represents the Frobenius norm.
from beta distributionand then uniformly sample the top-left coordinates of the bounding box of cropping regions from to , with preserving the ratio. Similar to [41, 42]
, we employ a probabilityto control whether to apply the CutMix regularization technique for each mini-batch samples, which is empirically set to . Fig. 4 presents the visual results of with CutMix regularization technique. It can be observed that the outputs of are the spatial combination of the real and generated patches with respect to the real/fake classification score. Therefore, the results have demonstrated the strong discriminative capability of the U-Net based discriminator in accurately learning per-pixel differences between real and generated samples, even though they are cut and mixed together to fool the discriminator. Besides learning the per-pixel local details, can accurately predictions the proportion of real patches, i.e., the mixed ratio, as it is to focus on the global structures.
As we described above, our proposed method follows the GANs framework to optimize the generator effectively for LDCT denoising, with the U-Net based discriminator focusing on both global structures and local details, and an extra gradient branch encouraging better boundaries and details. In this subsection, we describe the network architectures of the generator and U-Net based discriminator.
In this paper, we employ RED-CNN  as the generator of our framework for LDCT denoising since this paper mainly focuses on the adversarial loss from dual-domain U-Net based discriminators. The main difference from 
is that our framework is optimized in GANs manner, while the vanilla RED-CNN suffers the problem of over-smoothened LDCT images with MSE. Specifically, RED-CNN employs the U-Net architecture but removes the downsampling/upsampling operations to prevent information loss. We stack 10 (de)convolutional layers at both encoder and decoder, each of which has 32 filters for the sake of the computation cost, followed by a ReLU activation function. There are in total 10 residual skip connections. It is important to note that although RED-CNN is adopted as the generator in our framework, the proposed method can be also adapted to other GANs-based methods such as CPCE and WGAN-VGG  with only changing the discriminators.
As detailed in Section III-B, there are two independent discriminators in both image and gradient domains, each of which follows a U-Net architecture. Specifically, has 6 downsampling ResBlocks  with increasing number of filters; i.e. 64, 128, 256, 512, 512, and 512. At the bottom of , a fully-connected layer is used to output the global confidence score. Similarly, used the same number of ResBlocks in a reverse order to process the bilinearly upsampled features and the skip residuals of the same resolution, followed by a convolutional layer to output the per-pixel confidence map. Most importantly, a spectral normalization layer  and a Leaky ReLU activation with a slope of 0.2 for negative input follow each convolutional layer of except the last one.
Here we employ the sum of these two branches as the adversarial loss, which is defined in the context of least-squares GANs as follows:
where denotes the Sobel operator to obtain the image gradient.
To encourage the generator output the denoised LDCT images that match the NDCT images with both pixel level and gradient level, we adopt an pixel-wise loss between the NDCT images and denoised LDCT images, which includes a pixel loss and gradient loss for each branch as shown in Fig. 1. The additional gradient loss can encourage to better preserve edge information at the pixel level. The two losses can be written as:
Note that we employ the mean squared error in pixel level rather than the feature level using pretrained model [7, 8] for the sake of computation cost, and the absolute mean error in gradient level as the gradients is much sparser than pixels.
To encourage the generator to generate photo-realistic denoised LDCT images with better edge information and alleviate streak artifacts, the final loss function to optimize the generator is expressed as:
where controls the among between different loss components.
The discriminators and are optimized by minimizing the following mixed loss:
Note that we employ the same loss function in (11) to optimize both and but they are independent to each other and has an additional Sobel operator to compute the gradients.
This section presents the datasets, implementation details, qualitative and quantitative evaluations, uncertainty visualization, and ablation study.
The LDCT dataset used in this study was originally for the 2016 NIHAAPM-Mayo Clinic Low-Dose CT Grand Challenge, and lately released in . It provides scans from three regions of the body with different simulated low doses; i.e., head with of normal-dose, abdomen with , and chest with . In our experiments, we used the 25% abdomen and 10% chest datasets, named Mayo-25% and Mayo-10%, respectively. We evaluated our method on abdomen scans for comparisons with most previous works, and conducted experiments on chest scans since of normal-dose at chest is rather challenging compared to the of normal-dose at abdomen. For each dataset, we randomly select 20 patients for training and another 20 patients for testing; no identity overlapping between training and testing. In detail, 300K and 64K image patches were randomly selected from each set. For more information about this dataset, please refer to .
The real-world dataset from  includes 850 CT scans of a deceased piglet obtained by a GE scanner (Discovery CT750 HD). The dataset provides CT scans of the normal-dose, 50%, 25%, 10% and 5% dose with a size of , 708 of which is served for training while the left for testing. We evaluated our method on 5% low-dose CTs as it is the most challenging dose, where the dataset is named Piglet-5%. We randomly selected 60K and 12K image patches from training and testing sets, respectively. For more information about this dataset, please refer to .
Following [7, 8, 45], we employed the image patches with a size of and a window of to train all models with emphasis on tissue CT window, which are then directly applied to the whole image for visualization and testing. Note that we excluded those image patches that were mostly air. During training, all images are linearly normalized to .
During training, we trained the model with a maximum of 100K iterations and with a mini-batch of size 64 on one NVIDIA V100 GPU. All networks in the proposed framework are initialized with He initialization , and optimized by Adam optimization method  with a fixed learning rate of
. The hyperparameters in the loss functions were empirically set as follows:was ; was ; and was . We implemented four deep-learning-based methods including RED-CNN , WGAN-VGG , CPCE-2D , and Q-AE  with the reference of official source code.
To demonstrate the effectiveness of the proposed method in generating photo-realistic denoised results with faithful details, Fig. 5 showcases the representative results from three different datasets while Fig. 6 presents the results of one neck CT slice with strong streak artifacts. The regions-of-interest (ROIs) marked by the red rectangles are zoomed below, respectively.
All methods present visually well denoised results to some degrees. However, RED-CNN and Q-AE over-smoothed and blurred the LDCT images as they are optimized by the MSE loss that tends to average the results, causing the loss of structural details. Although WGAN-VGG and CPCE-2D have greatly improved the visual fidelity, as expected, due to the use of adversarial loss, minor streak artifacts can still be observed since their traditional classification discriminator only provide the generator with global structure feedback. Besides, they employed the perceptual loss in the high-level feature space to suppress the blurriness resulting from MSE loss. The perceptual loss, however, can only preserve the structures of NDCT images since some local details may be lost after processed by a pre-trained model. For example, the low attenuation lesions in Fig. 5, and the bones in Fig. 6 are less clear by WGAN-VGG and CPCE-2D while they can be easily observed in NDCT as well as the results of our methods. Most importantly, the small structures with their boundaries are consistently preserved with a clear visual fidelity. This benefits from the well-designed dual-domain U-Net based discriminators, which can provide feedback of both global structures and local details to the generator, compared to the traditional classification discriminator used in WGAN-VGG and CPCE-2D with only structure information. Besides, the gradient domain branch can also encourage the denoising model to better preserve edge information.
Beyond encouraging better edge, Fig. 6 also demonstrates its impressive performance in dealing with the LDCT images with strong streak artifacts caused by photon starvation. Compared to the baseline methods that produce strongly blurry and ghosted denoised results, our method can effectively address this problem in the following aspects:
streak artifacts can be easily detected by the gradient domain branch; and
once detected, the dual-domain U-Net discriminators can fill the occlusion area by adversarial training to alleviate the impact of streak artifacts.
In summary, all of these results further validate the superiority of our methods.
For quantitative evaluations, we adopted three widely-used metrics including peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and root mean square error (RMSE). More specifically, PSNR and RMSE measure the denoising performance at pixel level while SSIM computes the structural similarity within a window. TableI presents the results of different methods. First, RED-CNN and Q-AE are MSE-based denoising methods as they are directly trained with solely MSE loss. Although they achieve better PSNE and RMSE results, the visual results in Figs. 5 and 6 confirm that MSE-based methods produce over-smoothed results compared to the NDCT images, leading to lose of structural information [7, 8, 48]. Note that the over-smoothed denoising results lead to a lower SSIM score. Second, WGAN-VGG, CPCE-2D, and our DU-GAN are GAN-based methods. CPCE-2D performed better than WGAN-VGG due to the conveying path since WGAN-VGG has to reconstruct the denoised results from the input LDCT images. Obviously, our method performs the best in terms of SSIM score with high visual fidelity while the PSNR and RMSE are also better than WGAN-VGG and CPCE-2D, indicating the superior denoising performance of our method while better structural fidelity.
Fig. 4 shows the proposed discriminator with the U-Net architecture and CutMix regularization can robustly learn the per-pixel differences of local details between NDCT and denoised LDCT images by the decoder and also focus on the global structures by the encoder. With this well-trained discriminator, we can provide radiologist with a confidence map showing the uncertainty of the denoised results, since it is to learn the distribution of real samples, i.e., NDCT images. Therefore, we directly applied the trained discriminator to the LDCT, NDCT, and the denoised LDCT images of different methods.
Fig. 7 shows the uncertainty visualization. Obviously, the discriminator can accurately distinguish the LDCT from NDCT images, on both global score and per-pixel confidence. As both RED-CNN and Q-AE over-smoothen the LDCT images, the abdomen area of transverse CT image becomes better than LDCT images on the confidence map, according to the results of . This also explains why RED-CNN and Q-AE have the lowest global score of , which indicates that the discriminator can robustly detect the blurriness in the CT images. Furthermore, although CPCE-2D can produce more clear denoised results than RED-CNN, the streak artifacts significantly compromised the quality of the denoised results. Similarly, the WGAN-VGG has learned more local details than CPCE-2D but it still cannot handle the impact of the streak artifacts. On the contrary, the proposed method can produce the most photo-realistic denoised results with the highest global score. Compared to the traditional classification discriminator used in CPCE-2D and WGAN-VGG, our DU-GAN can provide the generator with the per-pixel feedback by learning the local detail differences. It can be seen from the per-pixel of . In other words, we achieve a more smooth per-pixel confidence, indicating that the discriminator cannot distinguish the real and fake samples at the per-pixel level.
In this subsection, we conducted the ablation study of our method to fully explore the proposed method in terms of the importance of different components, the architectures of discriminator, and the different patch sizes. The ablation study was done on the testing set of Mayo-10% dataset, which includes a total of 6,590 slices from 20 patients.
We investigate the impact of the U-Net based discriminator in the image domain, CutMix regularization, and dual-domain training (i.e., with gradient branch) by gradually applying them to the baseline method. Similar to WGAN-VGG and CPCE-2D, the baseline method only includes the traditional classification discriminator with the same hyperparameters for fair comparison.
|+ U-Net Based Discriminator||22.1214||0.0816||0.7454|
|+ CutMix Regularization||21.7894||0.0844||0.7477|
|Ours (+ Dual-Domain)||22.3075||0.0802||0.7489|
presents the quantitative results for ablation study. First, replacing the traditional classification discriminator with a U-Net based discriminator can simultaneously provide the generator with both global structure and local per-pixel feedback, which leads to a significant increase in terms of SSIM. Second, when we further use CutMix technique to regularize the U-Net based discriminator, the mixed samples can boost the discriminant capacity of discriminator and make discriminator more focus on the local details, leading to the increased SSIM score and a slightly decreased PSNR and RMSE. Last, further adding the U-net based discriminator in the gradient domain into the method above forming the dual-domain training yields our method. Specifically, the additional gradient domain training can help our method remove the streak artifacts and encourage more clear edge in the denoised LDCT images. As a result, it can effectively improve all metrics including the PSNR and RMSE in pixel space and SSIM in visual similarity.
Since the architectures of discriminator play a critical role in the training of GANs, it is worthwhile studying the advantage of the U-Net based discriminator over other classical discriminator architectures such as patch discriminator , pixel discriminator , and traditional global discriminator. Compared to traditional classification discriminator that classifies the real and fake samples at image level, patch discriminator focuses on the image patches. Due to the patch training of low-dose CT denoising, this discriminator architecture can be seen as the patch discriminator. The discriminator with seven convolutional layers and one fully-connected layer is regarded as the global discriminator. On the other hand, the pixel discriminator  contains 7 convolutional layers to penalizes the generator at per-pixel level. For fair comparisons, we trained patch and pixel discriminator with image patches and trained the global discriminator with the whole images with the size of , respectively. Table III shows that the combination of global and pixel information in U-Net based discriminator produces the best SSIM score.
Due to the U-Net architecture of the discriminator, it is also important to analyze the influence of the patch size during training. However, it is very difficult to directly train the denoising model from scratch. Therefore, we trained our model with the image size of , , , and , and we fine-tuned the generator based on the model trained on previous smaller size. Table IV shows that a small patch size can achieve better performance because the larger patch sizes may introduce training difficulties with less training samples.
In this paper, we proposed a novel DU-GAN for LDCT denoising. The introduced U-Net based discriminator can not only provide the per-pixel feedback to the denoising network but also focus on the global structure. We further add an extra U-Net based discriminator into the gradient domain, which can enhance the edge information and alleviate the streak artifacts caused by photon starvation. We also examined that the CutMix technique can boost the training of discriminator, which can provide the radiologists with a confidence map on the uncertainty of the denoised results. Extensive experiments demonstrated the effectiveness of the proposed method through visual comparison and quantitative comparison.
We acknowledge some limitations in this work. First, we used the qualitative and quantitative comparisons to evaluate the image quality. A human reader study may be needed to further validate its potential in clinical diagnosis, though there are significant difference between the proposed and other baseline methods. Second, the U-net based discriminator can provide radiologists with a confidence map of the denoised images. How this helps radiologists in clinical routine could be examined with specific tasks such as liver lesion diagnosis, which can be further studied as a future direction.
In conclusion, the proposed DU-GAN achieves better denoising performance than other GAN-based models and has great potential for clinical use with uncertainty visualization.
, “3-D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2-D trained network,”IEEE Trans. Med. Imaging, vol. 37, no. 6, pp. 1522–1534, 2018.
G. Wang, J. C. Ye, K. Mueller, and J. A. Fessler, “Image reconstruction is a new frontier of machine learning,”IEEE Trans. Med. Imaging, vol. 37, no. 6, pp. 1289–1296, 2018.
C. H. Lin, C.-C. Chang, Y.-S. Chen, D.-C. Juan, W. Wei, and H.-T. Chen, “COCO-GAN: generation by parts via conditional coordinating,” inProc. IEEE Int. Conf. Comput. Vis., 2019, pp. 4512–4521.
J. Wang, H. Lu, T. Li, and Z. Liang, “Sinogram noise reduction for low-dose CT by statistics-based nonlinear filters,” inProc. of SPIE, vol. 5747, 2005, p. 2059.
W. Wu, J. Shi, H. Yu, W. Wu, and V. Vardhanabhuti, “Tensor gradient L0-norm minimization-based low-dose CT and its application to COVID-19,”IEEE Trans. Instrum. Meas., vol. 70, pp. 1–12, 2021.
M. Li, W. Hsu, X. Xie, J. Cong, and W. Gao, “SACNN: self-attention convolutional neural network for low-dose CT denoising with self-supervised perceptual loss network,”IEEE Trans. Med. Imaging, vol. 39, no. 7, pp. 2289–2301, 2020.
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” inProc. IEEE Conf. Comp. Vis. Patt. Recogn., 2017, pp. 1125–1134.
, “Quadratic autoencoder (Q-AE) for low-dose CT denoising,”IEEE Trans. Med. Imaging, vol. 39, no. 6, pp. 2035–2050, 2019.
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” inProc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1026–1034.
J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” inEur. Conf. Comp. Vis. Springer, 2016, pp. 694–711.