Structure-sensitive Multi-scale Deep Neural Network for Low-Dose CT Denoising

05/02/2018 ∙ by Chenyu You, et al. ∙ Sichuan University Rensselaer Polytechnic Institute NetEase, Inc 0

Computed tomography (CT) is a popular medical imaging modality in clinical applications. At the same time, the x-ray radiation dose associated with CT scans raises public concerns due to its potential risks to the patients. Over the past years, major efforts have been dedicated to the development of Low-Dose CT (LDCT) methods. However, the radiation dose reduction compromises the signal-to-noise ratio (SNR), leading to strong noise and artifacts that down-grade CT image quality. In this paper, we propose a novel 3D noise reduction method, called Structure-sensitive Multi-scale Generative Adversarial Net (SMGAN), to improve the LDCT image quality. Specifically, we incorporate three-dimensional (3D) volumetric information to improve the image quality. Also, different loss functions for training denoising models are investigated. Experiments show that the proposed method can effectively preserve structural and texture information from normal-dose CT (NDCT) images, and significantly suppress noise and artifacts. Qualitative visual assessments by three experienced radiologists demonstrate that the proposed method retrieves more detailed information, and outperforms competing methods.



There are no comments yet.


page 3

page 7

page 8

page 9

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

X-ray computed tomography (CT) is one of the most popular imaging modalities in clinical, industrial, and other applications [1]. Nevertheless, the potential risks (i.e., a chance to induce cancer and cause genetic damage) of ionizing radiation associated with medical CT scans cause a public concern [2]. Studies from the National Council on Radiation Protection and Measurements (NCRP) demonstrate a 600% increase in medical radiation dose to the US population from 1980 to 2006, showing both great successes of the CT technology and an elevated alert to patients [3].

The main drawback of radiation dose reduction is to increase the image background noise, which could severely compromise diagnostic information. How to minimize the exposure to ionizing radiation while maintaining diagnostic utility of low-dose CT (LDCT) has been a challenge for researchers, who follows the well-known ALARA (as low as reasonably achievable) guideline [1]. Numerous methods were designed for LDCT noise reduction. These methods can be categorized as follows: (1) Sinogram filtering-based techniques [4, 5, 6, 7, 8, 9]: these methods directly process projection data in the projection domain [6]. The main advantage of these methods is computational efficiency. However, they may result in loss of structural information and spatial resolution [6, 10, 7]; (2) Iterative reconstruction (IR) [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]: IR techniques may potentially produce high signal-to-noise ratio (SNR). However, these methods require a substantial computational cost and troublesome parametric turning; (3) Image space denoising techniques [21, 22, 20, 23, 24, 25, 26, 27]: these techniques can be performed directly on reconstructed images so that they can be applied across various CT scanners at a very low cost. Examples are non-local means-based filters [21, 16]

, dictionary-learning-based K-singular value decomposition (K-SVD) method 

[20] and the block-matching 3D (BM3D) algorithms [25, 24]. Even though these algorithms greatly suppress noise and artifacts, edge blurring or resolution loss may persist in processed LDCT images.

Deep learning (DL) has recently received a tremendous attention in the field of medical imaging [28, 29], such as brain image segmentation [30], image registration [31, 32], image classification[33], and LDCT noise reduction [34, 35, 36, 37, 38, 39, 40]. For example, Chen et al. [35]

proposed a Residual Encoder-Decoder Convolutional Neural Network (REN-CNN) to predict NDCT images from noisy LDCT images. This method greatly reduces the background noise and artifacts. However, a limitation is that the results look blurry sometimes since the method targets minimizing the mean-squared error between the generated LDCT and corresponding NDCT images. To cope with this problem, the generative adversarial network (GAN) 

[41] offers an attractive solution. In the GAN, the generator learns to capture a real data distribution while the discriminator attempts to discriminate between the synthetic data distribution and the real counterpart. Note that the loss used in GAN, called the adversarial loss, measures the distance between the synthetic data distribution and the real one in order to improve the performance of and simultaneously. Originally, GAN uses the Jensen-Shannon (JS) divergence to evaluate the similarity of the two data distributions [41]. However, several problems exist in training GAN, such as unstable training and non-convergence. To address these issues, Arjovsky et al. introduced the Wasserstein distance instead of the Jensen-Shannon divergence to improve the neural network training [42]. We will discuss more details on this aspect in Section II-D3.

Fig. 1: The overall structure of the proposed SMGAN network. Note that the variable denotes the number of filters and

denotes the stride size.

In our previous work [37], we first introduced the perceptual loss to capture perceptual differences between denoised LDCT images and the reference NDCT images, providing the perceptually better results for clinical diagnosis at a cost of low scores in traditional image quality metrics. Since the traditional image quality metrics evaluate the generated images with reference to the gold-standard in generic ways, minimizing the perceptual loss does not ensure the results optimal in terms of the traditional image quality metrics. To address this discrepancy and inspired by the work in [36, 43], here we propose a novel 3D clinical Structurally-sensitive Multi-scale Generative Adversarial Network (SMGAN) to capture subtle structural features while maintaining high visual sensitivity. The proposed structurally-sensitive loss leverages a combination of adversarial loss[42], perceptually-favorable structural loss, and pixel-wise loss. Moreover, to validate the diagnostic quality of images processed by our method, we report qualitative image assessments by three expert radiologists. Systematically, we demonstrate the feasibility and merits of mapping LDCT images to corresponding NDCT images in the GAN framework.

Our main contributions in this paper are summarized as follows:

  1. To keep the underlying structural information in LDCT images, we adopt a 3D CNN model as a generator based on WGAN which can enhance the image quality for better diagnosis.

  2. To measure the structural difference between generated LDCT images and the NDCT gold-standard, a structurally-sensitive loss is used to enhance the accuracy and robustness of the algorithm. Different from [37], we replace the perceptual loss with a combination of loss and structural loss.

  3. To compare the performance of the 2D and the 3D models, we perform an extensive evaluation on their convergence rate and denoising performance.

This paper is organized as follows: Section II introduces the proposed approach and analyzes the impact of each component loss function on the image quality. Section III presents the experimental design and results. Section IV discusses relevant issues. Finally, the concluding remarks and future plans are given in Section V.

Ii Methods

Ii-a Problem Inversion

Assuming that denotes the original LDCT image, and denotes the corresponding NDCT image, the relationship between them can be expressed as:


where is a generic noising process that degrades a real sample of NDCT to a corresponding LDCT sample in a non-linear way. stands for the additive noise and unmodeled factors, and , , are height, width and depth respectively.

From another standpoint, considering that the real NDCT distribution is unknown, we focus on extracting information to recover desired images from the noisy LDCT images . In general, the noise distribution in CT images is regarded as the mixture of Poisson quantum noise and Gaussian electronic noise[44]. Compared with traditional denoising methods, the DL-based method is capable of effectively modeling any type of data distributions since the DL-based denoising model itself can be easily adapted to any practical noise model with statistical properties of typical noise distributions in a combination. Therefore, the proposed DL-based denoising network is to solve the inverse problem to retrieve feasible images , and the solution can be expressed as:


As shown in Fig.1, the overall network comprises three parts. Part 1 is the generator , part 2 is the Structurally-Sensitive loss (SSL) function, and part 3 is the discriminator .

maps a volumetric LDCT image to the NDCT feature space, thereby estimating a NDCT image. The SSL function computes the structurally-sensitive dissimilarity which encodes multi-scale structural information. The loss computed by the SSL function aims to improve the ability of

to generate realistic results. distinguishes a pair of synthetic and real NDCT images. If can identify the input image as “synthetic” or “real” correctly and tell us the discrepancy between the estimated CT image and the corresponding real NDCT image, we will know if yields a high-quality estimation or not. With the indication from , can optimize its performance. Also, can upgrade its ability as well. Hence, and are in competition: attempts to generate a convincing estimate to an NDCT image while aims to distinguish the estimated image from real NDCT images. See Sections II-C and II-D for more details. For your convenience, the summary of notations that we use in this paper is in Table V.

Ii-B 3D Spatial Information

The advantages of using 3D spatial information are evident. Hence, volumetric imaging and 3D visualization have become standards in diagnostic radiology [45]. There is a large amount of 3D NDCT and LDCT volumetric images available in practice. However, most of the networks are of 2D-based architecture. With a 3D network architecture, adjacent cross-section slices from a 3D CT image volume exhibit strong spatial correlation which we can utilize to preserve more information than with 2D models.

As mentioned above, here we use a 3D ConvNet as the generator and introduce a 3D Structurally-Sensitive loss (SSL) function. Accordingly, we extract 3D image patches and use a 3D filter instead of a 2D filter. The generator in our network takes 3D volumetric LDCT patches as the input and process them with 3D non-linear transform operations. For convenience and comparison, 2D and 3D denoising networks are referred to as SMGAN-2D and SMGAN-3D respectively. The details of the network architecture are in the following Section 


Ii-C Network Structure

Inspired by the studies in [36, 37], we introduce our proposed SMGAN-3D network structure. First, in Section II-C1 we present the 3D generator which captures local anatomical features. Then, in Section II-C2 we define the 3D SSL function which guides the learning process . Finally, we outline the 2.5D discriminator in Section II-C3.

Ii-C1 3D CNN Generator

The generator

consists of eight 3D convolutional (Conv) layers. The first 7 layers each has 32 filters, and the last layer has only 1 filter. The odd-numbered convolutional layers apply

filters, while the even-numbered convolutional layers use filters. The size of the extracted 3D patches is as the input to our whole network; see Fig. 1. Note that the variable denotes the number of the filters and denotes the stride size, which is the step size of the filer when moving across an image so that

stands for 32 feature maps with a unit stride. Furthermore, a pooling layer after each Conv layer may lead to loss of subtle textural and structural information. Therefore, the pooling layer is not applied in this network. The Rectified Linear Unit (ReLU


is our activation function after each Conv layer.

Ii-C2 Structurally-Sensitive Loss (SSL) Function

The proposed 3D SSL function measures the patch-wise discrepancy between a 3D output from the 3D ConvNet and the 3D NDCT image in the spatial domain. This measure is back-propagated[47] through the neural network to update the parameters of the network; see Section II-D for more details.

Ii-C3 Discriminator

The discriminator consists of six convolutional layers with , , , , , and filters and the kernel size of . Two fully-connected (FC) layers produce and feature maps respectively. Each layer is followed by a leaky ReLU defined as  [46], where is a small constant. A stride of one pixel is applied for odd-numbered Conv layers and a stride of two pixels for even-numbered Conv layers. The input fed to is of the size  , which comes from the output of . The reason why we use a 2D filter in is to reduce the computational complexity. Since the adversarial loss between each two adjacent slices in one volumetric patch contribute equally to the weighted average in one iteration, it can be easily computed. Following the suggestion in [42], we do not use the sigmoid cross entropy layer in .

Ii-D Loss Functions for Noise Reduction

In this sub-section, we evaluate the impact of different loss functions on LDCT noise reduction. This justifies the use of a hybrid loss function for optimal diagnostic quality.

Ii-D1  loss

The loss can efficiently suppress the background noise, but it could make the denoised results unnatural and blurry. This is expected due to its regression-to-mean nature [43, 48]. Furthermore, the loss assumes that background noise is white Gaussian noise, which is independent of local image features [49] and not desirable for LDCT imaging.

The formula of loss is expressed as:


where , , stand for the height, width, and depth of a 3D image patch respectively, denotes the gold-standard (NDCT), and represents the generated result from the source (LDCT) image . It is worth noting that since the loss has appealing properties of differentiability, convexity, and symmetry, the mean squared error (MSE) or loss is still a popular choice in denoising tasks[50].

Ii-D2  Loss

The and losses are both the mean-based measures, the impacts of these two loss functions are different on denoising results. Compared with the loss, the loss does not over-penalize large differences or tolerate small errors between denoised and gold-standard images. Thus, the loss can alleviate some drawbacks of the loss we mentioned earlier. Additionally, the loss enjoys the same fine characteristics as loss except for the differentiability.

The formula for the loss is written as:


As shown in Figs. 18 -57, compared with the loss, the loss suppresses blurring, but does not help reduce blocky artifacts. For more details, see Section III.

Ii-D3 Adversarial Loss

The Wasserstein distance with the regularization term was proposed in [48], which is formulated as


where the first two terms are for the Wasserstein distance, and the third term implements the gradient penalty. Note that denotes for brevity. is uniformly sampled along the straight line between a pair of points sampled from and corresponding NDCT images.

Ii-D4 Structural Loss

Medical images contain strong feature correlations. For example, their voxels have strong inter-dependencies. The structural similarity index (SSIM) [49] and the multi-scale structural similarity index (MS-SSIM) [51]

are perceptually motivated metrics, and perform better in visual pattern recognition than mean-based metrics 

[49]. To measure the structural and perceptual similarity between two images, the SSIM [49] is formulated as follows:


where , are constants and ,,,,

denote means, standard deviations and cross-covariance of the image pair

from and the corresponding NDCT image respectively. , are the first term and second factor we defined in Eqn. 6.

The multiscale SSIM provides more flexibility for multi-scale analysis [51]. The formula for MS-SSIM [51] is expressed as:


where , are the local image content at the level, and is the number of scale levels. Clearly, SSIM is a special case of MS-SSIM.

The formula for the structural loss (SL) is generally expressed as:


Note that the loss can be easily back-propagated to update weights in the network, since it can be differentiated [43].

Ii-D5 Objective Function

As mentioned in the recent studies [43, 37], minimizing the loss leads to over-smoothed appearance. The adversarial loss in GAN may yield sharp images, but it does not exactly match the corresponding real NDCT images [37]. The perceptual loss computed by a VGG network [47] evaluates the perceptual differences between the generated images and real NDCT images in a high-level feature space instead of the voxel space. Since the VGG network is trained on a large dataset of natural images, not CT images, it may result in distortions of processed CT images. To tackle these issues, we propose to utilize different loss terms together for high image quality.

As revealed in [43], the loss allows noise suppression and SNR improvement. However, it blurs anatomical structures to some extent. In contrast, the structural loss discourages blurring and keeps high contrast resolution. To have the merits of both loss functions, the structural sensitive loss (SSL) is expressed as:


where is the weighting factor to balance between structure preservation in the first term (from Eq. 9) and noise suppression in the second term (from Eq. 4).

Nevertheless, the above-mentioned two losses may still miss some diagnostic features. Hence, the adversarial loss is incorporated to keep textural and structural features as much as possible. In summary, the overall objective function of SMGAN is expressed as:


where is the weight for the adversarial loss. In the last step of the network, we compare the difference between the output volume and the target volume, and then the error can be back-propagated for optimization [52].

Iii Experiments and results

Iii-a Experimental Datasets and Setup

To show the effectiveness of the proposed network for LDCT noise reduction, we used a real clinical dataset, published by Mayo Clinic for the 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge [53]. The Mayo dataset consists of 2,378 normal dose CT (NDCT) and low dose (quarter dose) CT (LDCT) images from 10 anonymous patients. The reconstruction interval and slice thickness in the dataset were and respectively.

For limited data, the denoising performance of DL-based methods depends on the size of the training datasets, so large-scale valid training datasets can improve the denoising performance. However, it is worth noting that the training image library may not contain many valid images. To enhance the performance of the network, the strategies we utilized are as follows. First of all, in order to improve generalization performance of the network and avoid over-fitting, we adopted the “10-fold cross validation” strategy. The original dataset was partitioned into 10 equal size subsets. Then, a single subset was used in turn as the validation subset and the rest of data were utilized for training. Moreover, considering the limited number of CT images, we applied the overlapping patches strategy because it can not only consider patch-wise spatial interconnections, but also significantly increase the size of the training patch dataset [54, 55].

For data preprocessing, the original LDCT and NDCT images are of pixels. Since directly processing the entire patient images is computationally inefficient and infeasible, our denoising model was applied to image patches. First, we applied the overlapped sliding window with a sliding size of to obtain image patches and then randomly extracted 100,100 pairs of training patches and 5,100 pairs for validation from remaining patient images of the same size . Then, the “10-fold cross validation” strategy is used to ensure the accuracy of the proposed algorithm. Next, the CT Hounsfield Unit (HU) scale was normalized to [0, 1] before the images were fed to the network.

For qualitative comparison, in order to validate the performance of our proposed methods (SMGAN-2D and SMGAN-3D), we compare them with eight state-of-the-art denoising methods, including CNN-L2 (-net), CNN-L1 (-net), structural-loss net (SL-net), multi-scale structural-loss net (MSL-net), WGAN, BM3D [25], RED-CNN [35], and WGAN-VGG [37]. Among these existing denoising methods, BM3D is a classical image space denoising algorithm. WGAN-VGG represents a 2D perceptual-loss-based network, and RED-CNN refers to a 2D pixel-wise network. Note that the parameter settings in these methods [37, 35, 25] had been followed per the suggestions from the original papers.

For quantitative comparison, to evaluate the effectiveness of the proposed methods, three metrics were chosen to perform image quality evaluation, including peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) [51], and root-mean-square error (RMSE).



Fig. 5:

Comparison of loss function value versus the number of epochs with respect to different algorithms. (a) L1 Loss, (b) Structural Loss, and (c) Wasserstein Distance curves.

Iii-B Parameter Selection

In our experiments, the Adam optimization algorithm was implemented for our network training [56]

. In the training phase, the mini-batch size was 64. The hyperparameter

for the balance between the Wasserstein distance and gradient penalty was set 10, per the suggestion from the original paper [42]. The parameter for the trade-off between adversarial loss and mixture loss was set be . The parameter

was set to 0.89. The slope of the leaky ReLu activation function was set to 0.2. The networks are implemented in the TensorFlow 

[57] on an NVIDIA Titan Xp GPU.

(a) Full Dose FBP
(b) Quarter Dose FBP
(c) CNN-L2
(d) CNN-L1
(e) CNN-SL
(g) WGAN
(h) BM3D
(k) SMGAN-2D
(l) SMGAN-3D
Fig. 18: Results from abdomen CT images. (a) NDCT, (b) LDCT, (c) CNN-L2, (d) CNN-L1, (e) SL-net, (f) MSL-net, (g) WGAN (h) BM3D, (i) RED-CNN, (j) WGAN-VGG, (k) SMGAN-2D, and (l) SMGAN-3D. The red rectangle indicates the region zoomed in Fig. 31. The display window is [-160, 240]HU.
(a) Full Dose FBP
(b) Quarter Dose FBP
(c) CNN-L2
(d) CNN-L1
(e) CNN-SL
(g) WGAN
(h) BM3D
(k) SMGAN-2D
(l) SMGAN-3D
Fig. 31: Zoomed parts of the region of interests (ROIs) marked by the red rectangle in Fig. 18. (a) NDCT, (b) LDCT, (c) CNN-L2, (d) CNN-L1, (e) SL-net, (f) MSL-net, (g) WGAN, (h) BM3D, (i) RED-CNN, (j) WGAN-VGG, (k) SMGAN-2D and (l) SMGAN-3D. The red circle indicates the metastasis and the green and blue arrows indicate two subtle structure parts. The display window is [-160,240]HU.

Iii-C Network Convergence

To examine the robustness of different denoising algorithms, ten methods corresponding to the loss , structural loss (SL), and Wasserstein distance were separately trained in the same settings as that for SMGAN-3D. Note that the parameter settings of RED-CNN, WGAN-VGG, and BM3D from the original papers had been followed [35, 37, 25]. In addition, the size of the input patches of the 2D network is while our proposed 3D model uses training patches with the size of . We calculated the averaged loss value achieved by different methods versus the number of epochs as the measure of convergence in Fig. 5.

(a) Full Dose FBP
(b) Quarter Dose FBP
(c) CNN-L2
(d) CNN-L1
(e) CNN-SL
(g) WGAN
(h) BM3D
(k) SMGAN-2D
(l) SMGAN-3D
Fig. 44: Results from abdomen CT images. (a) NDCT, (b) LDCT, (c) CNN-L2, (d) CNN-L1, (e) SL-net, (f) MSL-net, (g) WGAN (h) BM3D, (i) RED-CNN, (j) WGAN-VGG, (k) SMGAN-2D, and (l) SMGAN-3D. The red rectangle indicates the region zoomed in Fig. 57. This display window is [-160, 240]HU.
(a) Full Dose FBP
(b) Quarter Dose FBP
(c) CNN-L2
(d) CNN-L1
(e) CNN-SL
(g) WGAN
(h) BM3D
(k) SMGAN-2D
(l) SMGAN-3D
Fig. 57: Zoomed parts of the region of interests (ROIs) marked by the red rectangle in Fig. 44. (a) NDCT, (b) LDCT, (c) CNN-L2, (d) CNN-L1, (e) SL-net, (f) MSL-net, (g) WGAN, (h) BM3D, (i) RED-CNN, (j) WGAN-VGG, (k) SMGAN-2D and (l) SMGAN-3D. The red circle indicates the metastasis and the green and blue arrows indicates two subtle structures. The display window is [-160,240]HU.

In Fig. (a)a and (b)b, in terms of and SL, we observe that -net and -net achieved the fastest convergence rate and have similar convergence trends in that all curves decreased initially and then smoothly converged, indicating that these mean-based algorithms both have fast convergence rates. Fig. (a)a shows that they both converged around the epoch. In contrast, in Fig. (a)a, there are differences between SL-based and mean-based methods. We can see that the convergence curve of the SL-net decreases initially and then slightly rises around the epoch as shown in Fig. (a)a. MSL-net also shows a small increase like SL-net in terms of . This observation indicates that SL-based and mean-based methods have different emphasis on minimizing perceptually motivated similarity between real NDCT images and generated NDCT images. For WGAN-based methods, it can be clearly observed that the curves for WGAN, WGAN-VGG, SMGAN-2D, and SMGAN-3D slightly oscillate in the convergence process after the epoch in Fig. (a)a and (b)b. The reason for such oscillatory behaviors is as follows: attempts to mimic the real NDCT distribution while aims to differentiate between the real NDCT distribution and the denoised LDCT distribution. Since GAN’s intrinsic nature is a two-player game, the distributions of and are constantly changing, and this leads to the oscillatory behavior when converging to their optimal status.

As shown in Fig. (c)c, we can evaluate the convergence performance of WGAN. It can be seen that our proposed SMGAN-2D has the mildest oscillatory behavior compared with the other three models and reaches a stable state after the epoch. Moreover, the SMGAN-3D oscillates in a relatively large range in the training process. This is because our proposed SMGAN-3D considers 3D structural information which results in a relatively larger vibrating amplitude in the training process. However, the curve still oscillates close to the x-axis, indicating SMGAN-3D’s robustness in minimizing the Wasserstein distance between the generated samples and real samples.

Iii-D Denoising Performance

To demonstrate the effectiveness of the proposed network, we perform the qualitative comparisons over three representative abdominal images presented in Figs. 1844 and 70. For better evaluations of the image quality with different denoising models, zoomed regions-of-interest (ROIs) are marked by red rectangles and shown in Figs. 3157 and 83 respectively. Note that all results from different denoising models focus on two aspects: content restoration and noise-reduction. All CT images in axial view are displayed in the angiography window [-160, 240]HU.

The real NDCT images and corresponding LDCT images are presented in Figs. (a)a and (b)b. As observed, there are distinctions between ground truth (NDCT) images and LDCT images. Figs. (a)a and (a)a show the lesions/metastasis. Fig. (a)a presents focal fatty sparing/focal fat. In Figs. (a)a(a)a and (a)a, these lesions can be clearly observed in NDCT images; in contrast, from Figs. (b)b(b)b, and (b)b, it can be seen that the original LDCT image is noisy, and lacks structural features for task-based clinical diagnosis. All adopted denoising models suppress noise to some extent.

Iii-D1 Comparison with CNN-based denoising methods

To study the robustness of the adversarial learning framework in SMGAN-3D, we compared SMGAN-3D with the CNN-based methods, including CNN-L2, CNN-L1, RED-CNN [35], SL-net and MSL-net. It is worth noting that CNN-L2, CNN-L1, and RED-CNN are mean-based denoising methods, and SL-net and MSL-net are SL-based denoising methods. All of the methods greatly reduce the noise compared with LDCT images. Our proposed method preserves more structural details, thereby yielding better image quality, compared with the other five methods.

Mean-based methods can effectively reduce noise, but the side effect is impaired image contents. In Fig. (c)c, -net greatly suppresses the noise, but blurs some crucial structural information in the porta hepatis region. Meanwhile, some waxy artifacts can still be observed in Fig. (c)c. -net does not produce good visual quality because it assumes that the noise is independent of local characteristics of the images. Even though it retains high SNR, its results are not clinically preferable. Compared with -net, in Figs. (d)d and (d)d, it can been seen that -net encourages less blurring and preserves more structural information. However, as observed in Fig. (d)d, it still over-smooths some anatomical details. Meanwhile, in Fig. (d)d, there are some blocky effects marked by the blue arrow. The results obtained by RED-CNN [35] deliver high SNR but blur the vessel details as shown in Figs. (i)i and (i)i.

For SL-based methods, as observed in Figs. (e)e and (e)e, SL-net generates images with higher contrast resolution and preserves texture of real NDCT images better than -net and -net. However, Figs. (e)e and (e)e show that SL-net does not preserve the structural features well, and there still remain small streak artifacts. Subsequently, in Figs. (e)e and (f)f

, SL-net and MSL-net have low frequency image intensity variance because SSIM/MS-SSIM is insensitive to uniform biases 

[49, 51]. On the other hand, -net preserves the overall image intensity, but it does not preserve high contrast resolution well as SL-net and MSL-net do.

From Figs. 70 and 83, we can see mean-based and SL-based methods work well with effective noise suppression and artifact removal. However, the illustrations in Fig. 83 show that these methods blur the local strutural features. Our proposed SMGAN-based methods present a better edge preservation than the competing methods.

Overall, the observations above support the following statements. First, although the voxel-wise methods show good noise-reduction properties, to some extent they blur the contents and lead to the loss of structural details because they optimize the results in the voxel-wise manner. Second, SL-based methods better preserve texture than mean-based methods, but they cannot preserve overall image intensity. Third, the results produced by the proposed SMGAN-3D demonstrate the benefits of the combination of two loss functions and the importance of the adversarial training [41, 42].

Fig. 18 Fig. 44 Fig. 70
LDCT 22.818 0.761 0.0723 21.558 0.659 0.0836 24.169 0.737 0.0618
CNN-L1 27.791 0.822 0.0408 26.794 0.738 0.0457 29.162 0.807 0.0348
CNN-L2 27.592 0.819 0.0418 26.630 0.736 0.0466 28.992 0.806 0.0355
SL-net 26.864 0.831 0.0453 25.943 0.745 0.0504 28.069 0.813 0.0395
MSL-net 27.667 0.831 0.0414 26.685 0.744 0.0469 28.902 0.812 0.0359
WGAN 25.727 0.801 0.0517 24.655 0.711 0.0585 26.782 0.781 0.0458
BM3D 27.312 0.809 0.0431 26.525 0.728 0.0472 28.959 0.794 0.0356
RED-CNN 28.279 0.825 0.0385 27.243 0.743 0.0444 29.679 0.811 0.0328
WGAN-VGG 26.464 0.811 0.0475 25.300 0.722 0.0543 27.161 0.793 0.0419
SMGAN-2D 26.627 0.821 0.0466 25.507 0.732 0.0530 27.731 0.795 0.0406
SMGAN-3D 26.569 0.824 0.0473 25.372 0.739 0.0538 27.398 0.794 0.0411
TABLE I: Quantitative results associated with different approaches in Figs. 18 and 44.

Iii-D2 Comparison with WGAN-based denoising methods

To evaluate the effectiveness of our proposed objective function, we compare our method with existing WGAN-based networks, including WGAN and WGAN-VGG. Considering the importance of clinical image quality and specific structural features for medical diagnosis, we adopted the adversarial learning method [41, 42] in our experiments because WGAN could help to capture more structural information. Nevertheless, based on our prior experience, utilizing WGAN alone may yield stronger noise than other selected approaches, because it only maps the data distribution from LDCT to NDCT without consideration of local voxel intensity and structural correlations. The observations demonstrate that the noise texture is coarse in the images, as shown in Fig. (g)g and Fig. (g)g, which support our intuition.

Indeed, the images of WGAN-VGG[37], as shown in Fig. (j)j, exhibit better visual quality with respect to more details and share structural details similar to NDCT images according to human perceptual evaluations. However, Figs. (j)j (marked by the red circle) and (j)j (marked by the green circle) suggest that it may severely distort the original structural information. A possible reason is that the VGG network  [47] is a pre-trained deep CNN network based on natural images, and the structural information and contents of natural images are different from medical images.

Compared with WGAN and WGAN-VGG, our proposed SMGAN-3D, as shown in Figs. (l)l (marked by the red circle) and (l)l (marked by the green circle), can more clearly visualize the metastasis and better preserve of the portal vein.

In Figs. 70 and 83, it can be found that the SMGAN-based methods can achieve better anatomical feature preservations and visual quality than other state-of-the-art methods.

The experimental results demonstrate that our proposed objective function is essential to capture more accurate anatomical details.

Iii-D3 Comparison with Image space denoising

To validate the robustness of DL-based methods, we compared our method with the image space denoising method. Figs. (h)h and (h)h show that BM3D blurs the low-contrast lesion marked by the red circle and smooths specific features marked by the blue arrow. In contrast, SMGAN-3D exhibits better on the low-contrast lesion and yields sharper features as shown in Figs. (l)l and (l)l.

(a) Full Dose FBP
(b) Quarter Dose FBP
(c) CNN-L2
(d) CNN-L1
(e) CNN-SL
(g) WGAN
(h) BM3D
(k) SMGAN-2D
(l) SMGAN-3D
Fig. 70: Results from abdomen CT images. (a) NDCT, (b) LDCT, (c) CNN-L2, (d) CNN-L1, (e) SL-net, (f) MSL-net, (g) WGAN (h) BM3D, (i) RED-CNN, (j) WGAN-VGG, (k) SMGAN-2D, and (l) SMGAN-3D. The red rectangle indicates the region zoomed in Fig. 83. This display window is [-160, 240]HU.
(a) Full Dose FBP
(b) Quarter Dose FBP
(c) CNN-L2
(d) CNN-L1
(e) CNN-SL
(g) WGAN
(h) BM3D
(k) SMGAN-2D
(l) SMGAN-3D
Fig. 83: Zoomed parts of the region of interests (ROIs) marked by the red rectangle in Fig. 70. (a) NDCT, (b) LDCT, (c) CNN-L2, (d) CNN-L1, (e) SL-net, (f) MSL-net, (g) WGAN, (h) BM3D, (i) RED-CNN, (j) WGAN-VGG, (k) SMGAN-2D and (l) SMGAN-3D. The red and the green circles indicate subtle edges. The display window is [-160,240]HU.

Iii-D4 Comparison with 2D-based SMGAN network

In order to evaluate the 3D structural information, we compared SMGAN-3D with SMGAN-2D. As shown in Fig. (l)l, our proposed SMGAN-3D generated the results with better subtle details than SMGAN-2D and enjoys more similar statistical noise properties to the corresponding NDCT images. The reasons why SMGAN-3D outperforms SMGAN-2D are follows. First, SMGAN-3D incorporates 3D structural information to improve image quality. Second, SMGAN-2D takes input slice by slice, thus potentially leading to the loss of spatial correlation between adjacent slices.

Figs. 70 and 83 demonstrate that the SMGAN-3D can be used to provide improved anatomical feature preservation over other state-of-the-art methods.

In summary, we compared our proposed methods with existing methods, and it can be clearly observed that SMGAN-3D achieves robust performance in noise suppression, artifact removal, and texture preservation. Note that we recommend the reader to see ROIs (in Fig. 31 and 57) or zoom in to better evaluate our results. To further validate the generalization ability of our proposed model, we conclude more details in Appendix A.

Fig. 87: Performance comparison of LDCT and ten algorithms over the ROIs marked by the red rectangles in Fig. (a)a and Fig. (a)a.
Fig. 31 Fig. 57 Fig. 83
Mean SD Mean SD Mean SD
NDCT 115.282 45.946 56.903 58.512 51.225 73.297
LDCT 114.955 (-0.2837%) 74.299 (61.709%) 57.228 (0.571%) 85.854 (46.729%) 50.142 (-2.114%) 89.346 (21.896%)
CNN-L1 115.809 (0.4571%) 28.532 (-37.9010%) 57.709 (1.416%) 42.315 (-27.682%) 50.917 (-0.6013%) 66.359 (-9.466%)
CNN-L2 117.191 (1.656%) 29.933 (-34.852%) 58.956 (3.608%) 43.411 (-25.808%) 52.229 (1.960%) 66.922 (-8.698%)
SL-net 131.333 (13.923%) 35.844 (-21.987%) 68.471 (20.329%) 50.789 (-13.199%) 63.874 (24.693%) 72.718 (-0.790%)
MSL-net 118.395 (2.701%) 32.548 (-29.160%) 63.271 (11.191%) 46.979 (-19.711%) 57.052 (11.375%) 69.519 (-5.154%)
WGAN 105.461 (-8.519%) 42.659 (-7.154%) 48.432 (-14.887%) 54.306 (-7.188%) 42.417 (-17.195%) 70.904 (-3.265%)
BM3D 114.058 (-1.062%) 31.515 (-31.409%) 25.649 (-54.925%) 69.411 (18.627%) 15.183 (-70.360%) 100.08 (36.540%)
RED-CNN 116.642 (1.180%) 27.194 (-40.813%) 57.985 (1.902%) 42.048 (-28.138%) 51.272 (0.0918%) 66.961 (-8.644%)
WGAN-VGG 108.229 (-6.118%) 36.721 (-20.078%) 54.450 (-4.311%) 48.660 (-16.838%) 44.959 (-12.232%) 67.059 (-8.511%)
SMGAN-2D 108.758 (-5.659%) 40.948 (-10.878%) 51.243 (-9.947%) 53.065 (-9.309%) 48.230 (-5.847%) 72.073 (-1.670%)
SMGAN-3D 115.569 (0.749%) 43.654 (-6.723%) 54.356 (-4.476%) 56.552 (-3.350%) 55.378 (8.107%) 73.303 (-0.00821%)
TABLE II: Statistical properties of the images in Figs. 3157 and  83. These are the ROIs indicated by the red rectangles in Figs. 1844 and 70. Note that the relative percentage difference of NDCT values versus the rest of models is added to aid the readers.
Sharpness Noise Suppression Diagnostic Acceptability Contrast Retention Overall Quality
LDCT 2.551.43 1.550.80 1.850.96 1.750.83 1.931.01
CNN-L1 2.800.81 3.300.71 2.700.78 2.750.77 2.890.77
CNN-L2 2.120.42 3.980.58 1.930.78 2.070.83 2.530.55
SL-net 2.950.86 3.150.65 2.700.71 2.800.81 2.900.76
MSL-net 3.010.94 3.160.57 2.870.83 2.840.69 2.970.76
WGAN 3.300.56 2.800.81 3.150.91 3.451.02 3.090.66
BM3D 2.211.08 3.290.80 2.210.86 2.290.88 2.500.91
RED-CNN 3.290.88 3.790.70 3.510.70 3.461.12 3.510.85
WGAN-VGG 3.350.91 3.501.07 3.350.91 3.451.02 3.410.94
SMGAN-2D 3.250.65 3.480.66 3.320.58 3.210.78 3.320.67
SMGAN-3D 3.560.73 3.590.68 3.580.46 3.611.02 3.590.72
TABLE III: Visual assessment scores by three radiologist readers.

Iii-E Quantitative analysis

We performed the quantitative analysis with respect to three selected metrics (PNSR, SSIM, and RMSE). Then, we investigated the statistical properties of the denoised images for each noise-reduction algorithm. Furthermore, we performed a blind reader study with three radiologists on 10 groups of images. Note that quantitative full-size measurements are in Table I and image quality assessments of ROIs are in Fig. 87. The NDCT images are chosen as the gold-standard.

Iii-E1 Image quality analysis

As shown in Table I, RED-CNN scores the highest PSNR and RMSE, and ranks the second place in SSIM. Since the properties of PSNR and RMSE are regression to the mean, it is expected that RED-CNN, a mean-based regressiom optimization, has better performance than other feature-based models. For SL-net and MSL-net, it is not surprising that both models achieve the highest SSIM scores due to the adoption of structural similarity loss. However, a good score measured by image quality metrics does not ensure the preservation of high-level feature information and structural details, and this explains why RED-CNN can have the best PSNR and RMSE despite over-smoothing the content. PSNR, SSIM and RMSE are not perfect, and they are subject to image blurring abd blocky/waxy artifacts in the denoised images, as shown in Figs. 18 - 83. Hence, these metrics may not be sufficient in evaluating image quality and indicating diagnostic performance. Indeed, WGAN can provide better visual quality and achieve improved statistical properties. Compared with the CNN-based methods, the WGAN architecture can progressively reserve the consistency of the feature distributions between LDCT and NDCT images. By encouraging less blurring, WGAN alone could introduce more image noise to compromise diagnosis. To keep information in LDCT images, our novel loss function with a regularization term is structurally alert to enhance the clinical usability as compared to the other methods.

Although mean-based approaches, such as -net, -net, enjoy high metric scores, they may over-smooth the overall image contents and lose feature characteristics, which do not satisfy our HVS requirements because mean-based methods favor the regression toward the mean. Meanwhile, WGAN-VGG satisfies HVS requirements, but gets the lowest scores in the three selected metrics. The reason for the lowest scores is that WGAN-VGG may suffer from loss of subtle structural information or noise features, which may severely affect the diagnostic accuracy. The proposed SMGAN-2D outperforms the feature-based method WGAN-VGG with reference to the three metrics, illustrating the robust denoising capability of our proposed loss function. Compared with the SMGAN-2D model, SMGAN-3D achieves higher scores in PSNR and SSIM since it incorporates 3D spatial information. To further validate the performance of each denoising model with respect to clinically significant local details, we performed the quantitative analysis over ROIs. The summary of the quantitative results from ROIs is shown in Fig. 87. It is worth noting that the quantitative results of the ROIs follow a similar trend to that of the full-size images.

Iii-E2 Statistical analysis

To quantitatively evaluate the statistical properties of processed images by different denoising models, we calculate the mean CT number (Hounsfield Unit) and standard deviations (SDs) of ROIs, as shown in Table II. For each denoising model, the percent error of the mean and SD values were calculated in comparison to those of the reference (NDCT) images. The lower percent errors correspond to more robust denoising models. As shown in Table II, -net, -net, SL-net, MSL-net, BM3D, RED-CNN, and WGAN-VGG generate high percent errors in SD with respect to the NDCT images. There are blocky and over-smoothing effects in the images which match our visual inspections. Specifically, for Fig. 83, the absolute difference in SD between BM3D and NDCT is the largest among all of the denoising models, which indicates that BM3D has the most noticeable blurring effects. The standard deviation of BM3D supports our visual observations as shown in Figs. (h)h(h)h, and (h)h. The mean values of WGAN, WGAN-VGG, SL-net and SMGAN-2D deviated much from that of the NDCT image in Fig. 31. This indicates that WGAN, WGAN-VGG, and SMGAN-2D effectively reduce the noise level but compromise significant content information. Nevertheless, the SD value of SMGAN-2D is close to that of NDCT, which indicates that it supports HVS requirements. From the quantitative analysis in Table II, it can be observed that our proposed SMGAN-3D achieves the best matching SD to the NDCT images out of all other methods. Overall, SMGAN-3D is a highly competitive denoising model for clinical use.

Iii-E3 Visual assessments

To validate clinical image quality of processed results, three radiologists performed a visual assessment on 10 groups of images. Each group includes an original LDCT image with lesions, the corresponding reference NDCT image, and the processed images by different denoising methods. NDCT, considered as the gold-standard, is the only labeled image in each group. All other images were evaluated on sharpness, noise suppression, diagnostic acceptability, and contrast retention using a five-point scale (5 = excellent and 1 = unacceptable). We invited three radiologists with mean clinical experience of 12.3 years to join our study. Note that these results were evaluated independently and the overall image quality score for each method was computed an averaging score from the four evaluation criteria. For different methods, the final score is presented as meanSD (average score of three radiologistsstandard deviation). The final quantitative results are listed in Table III.

As observed, the original LDCT images have the lowest scores because of their severe image quality degradation. All denoising models improve the scores to some extent in this study. From Table III, RED-CNN obtains the highest score in noise suppression. Compared to all other methods, our proposed SMGAN-3D scores best with respect to sharpness, diagnostic acceptability, and contrast retention. Furthermore, voxel-wise optimization (CNN-L2) has the best visually-assessed image noise suppression, but it suffers from relatively low scores in sharpness and diagnostic acceptability, indicating a loss of image details. The proposed SMGAN-3D model gets a superior overall image quality score relative to the 2D model, which indicates that a 3D model can enhance CT image denoising performance by incorporating spatial information from adjacent slices.

In brief, the visual assessment demonstrates that SMGAN-3D has powerful capabilities in noise reduction, subtle image structure and edge preservation, and artifact removal. Most importantly, it satisfies the HVS requirements as shown in Figs. 18 - 57.

Iii-F Computational Cost

In CT reconstruction, there is a trade-off between the computational cost and the image quality. In this aspect, a DL-based algorithm has great advantages in computational efficiency. Although the training of DL-based methods is time-consuming, it can rapidly perform the denoising tasks on reconstructed LDCT images after the training is completed. In our study, the proposed 2D method requires about 15 hours and the 3D model needs approximately 26 hours for training to converge. WGAN-VGG, which has the same number of layers, takes about 18 hours in the training phase. Compared with iterative reconstruction, any DL-based approach will require much less execution time, which facilitates the clinical workflow. In practice, our proposed SMGAN-2D and SMGAN-3D took 0.534s and 4.864s respectively in the validation phase on a NVIDA Titan GPU. Compared with the results in [58, 59], our method took significantly less time. For example, the computational cost for soft threshold filtering (STF)-based TV minimization in the ordered-subset simultaneous algebraic reconstruction technique (OS-SART) framework took 45.1s per iteration on the same computing platform. Hence, it is clear that once the model is trained, it requires far less computational overhead than an iterative reconstruction method given that other conditions are equal.

Iv Discussions

As mentioned before, different emphases on visual evaluation and traditional image quality metrics were extensively investigated. When training with only the mean-based losses (-net, -net, RED-CNN), the results can achieve high scores in quantitative metrics and yield promising results with substantial noise reduction. When training with the feature-based methods (WGAN-VGG), the results can meet HVS requirements for visualization since they preserve more structural details than mean-based methods. However, these methods suffer from the potential risk of content distortion since a perceptual loss is computed based on a network [47] trained on a natural image dataset. Practically and theoretically, even though adversarial learning can prevent smoothing in the image, and capture structural characteristics, they may often result in severe loss of diagnostic information. To integrate the best characteristics of these loss functions, we have proposed a hybrid loss function to deliver the LDCT image quality optimally.

Although our proposed network has achieved high-quality denoised LDCT images, there are still rooms for potential improvements. First and foremost, some feature edges in the processed results still look blurry. Also, some structural variations between NDCT and LDCT do not perfectly match. A possible way to enhance correlation between NDCT and LDCT is to design a network with a better modeling capability, which is the work we have started. As far as our reader study is concerned, although visual assessment may be subject to intra- as well as inter-operator variability, on average such assessment can still evaluate different algorithms effectively, especially in a pilot study. In our follow-up study, we will invite more radiologists to rate the results, and then quantify inter-operator variability in a task-specific fashion, and also study intra-operator variability.

V Conclusion

In conclusion, we have presented a 3D CNN-based method for LDCT noise reduction. As a follow-up to our previous work [37], a 3D convolutional neural network is utilized to improve the image quality in the 3D contextual setting. In addition, we have highlighted that the purpose of loss functions is to preserve high-resolution and critical features for diagnosis. Different from the state-of-the-art LDCT denoising method used in [36], an efficient structurally-sensitive loss has been included to capture informative structural features. Moreover, we have employed the Wasserstein distance to stabilize the training process for GAN. We have performed the quantitative and qualitative comparison of the image quality. The assessments have demonstrated that SMGAN-3D can produce results with higher-level image quality for clinical usage compared with the existing denoising networks [34, 35, 36, 37].

In the future, we will extend our model to other medical imaging modalities in a task-specific manner. Moreover, we plan to incorporate more advanced denoising models such as the networks mentioned in [60, 61, 62] for LDCT reconstruction. Finally, we are also interested in making our denoising software robust over different scanners.

Appendix A Different training sets for SMGAN-3D training

We randomly splitted the Mayo dataset [53] into four different training sets,each with 5,000 image patches of size pixels. Then, different training sets were used to validate the generalizability of our proposed 3D SMGAN model. The results are presented in Fig. 100 and Table IV.

Figs. (a)a - (d)d Figs. (e)e - (h)h Figs. (i)i - (l)l
Case1 26.678 0.811 0.0463 25.842 0.776 0.0510 26.538 0.812 0.0472
Case2 26.759 0.814 0.0459 25.848 0.781 0.0510 26.544 0.814 0.0470
Case3 26.589 0.807 0.0468 25.701 0.772 0.0519 26.455 0.806 0.0475
Case4 26.903 0.815 0.0452 25.914 0.782 0.0506 26.662 0.816 0.0464
TABLE IV: Quantitative results associated with different training sets for SMGAN-3D in Figs. 100.
(a) CASE1
(b) CASE2
(c) CASE3
(d) CASE4
(e) CASE1
(f) CASE2
(g) CASE3
(h) CASE4
(i) CASE1
(j) CASE2
(k) CASE3
(l) CASE4
Fig. 100: Results from four different training sets for SMGAN-3D. (a)-(d) refer to Fig. 18, (e)-(h) refer to Fig. 44 and (i)-(l) refer to Fig. 70. This display window is [-160, 240]HU.

Appendix B Summary of notations

Notation Meaning
NDCT Normal dose CT
LDCT Low dose CT
SSL Structurally sensitive loss, integrating the structural loss and the loss as defined in Eq. 10
SSIM Structural similarity index (SSIM) [49]
MS-SSIM Multi-scale structural similarity index (MS-SSIM) [51]
SL-net (CNN-SL) 8-layer CNN with only structural similarity loss
MSL-net(CNN-MSL) 8-layer CNN with only multi-scale structural similarity loss
WGAN Wasserstein Generative Adversarial Networks with loss
BM3D Block-matching and 3D filtering
RED-CNN Residual encoder-decoder CNN with only loss
WGAN-VGG Wasserstein generative adversarial network with perceptual loss
SMGAN-2D 2D Wasserstein generative adversarial network with SSL loss
SMGAN-3D 3D Wasserstein generative adversarial network with SSL loss
TABLE V: Summary of notations.


The authors would like to thank NVIDIA Corporation for the donation of Titan Xp GPU, which has been utilized for this study. The authors are grateful for helpful discussion with Dr. Mats Persson (Stanford University). This work was supported in part by the National Natural Science Foundation of China under Grant 61671312 and Science and Technology Project of Sichuan Province of China under Grant 2018HH0070, and in part by the National Institutes of Health under Grants R21 EB019074, R01 EB016977, and U01 EB017140.


  • [1] D. J. Brenner and E. J. Hall, “Computed tomography —- an increasing source of radiation exposure,” New Eng. J. Med., vol. 357, no. 22, pp. 2277–2284, 2007.
  • [2] A. B. de González, M. Mahesh, K.-P. Kim, M. Bhargavan, R. Lewis, F. Mettler, and C. Land, “Projected cancer risks from computed tomographic scans performed in the united states in 2007,” Arch. Intern. Med., vol. 169, no. 22, pp. 2071–2077, 2009.
  • [3] D. A. Schauer and O. W. Linton, “National council on radiation protection and measurements report shows substantial medical exposure increase,” pp. 293–296, 2009.
  • [4]

    J. Wang, H. Lu, T. Li, and Z. Liang, “Sinogram noise reduction for low-dose CT by statistics-based nonlinear filters,” in

    Proc. of SPIE Vol, vol. 5747, 2005, p. 2059.
  • [5] J. Wang, T. Li, H. Lu, and Z. Liang, “Penalized weighted least-squares approach to sinogram noise reduction and image reconstruction for low-dose X-ray computed tomography,” IEEE Trans. Med. Imaging, vol. 25, no. 10, pp. 1272–1283, 2006.
  • [6] M. Balda, J. Hornegger, and B. Heismann, “Ray contribution masks for structure adaptive sinogram filtering,” IEEE Trans. Med. Imaging, vol. 31, no. 6, pp. 1228–1239, 2012.
  • [7] G.-Z. Yang, P. Burger, D. N. Firmin, and S. Underwood, “Structure adaptive anisotropic image filtering,” Proc. IEEE Int. Conf. Image Process. Applicat., vol. 14, no. 2, pp. 135–145, 1996.
  • [8] J. Liu, J. Ma, Y. Zhang, Y. Chen, J. Yang, H. Shu, L. Luo, G. Coatrieux, W. Yang, Q. Feng et al., “Discriminative feature representation to improve projection data inconsistency for low dose ct imaging,” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2499–2509, 2017.
  • [9] Y. Chen, J. Ma, Q. Feng, L. Luo, P. Shi, and W. Chen, “Nonlocal prior bayesian tomographic reconstruction,” Journal of Mathematical Imaging and Vision, vol. 30, no. 2, pp. 133–146, 2008.
  • [10] A. Manduca, L. Yu, J. D. Trzasko, N. Khaylova, J. M. Kofler, C. M. McCollough, and J. G. Fletcher, “Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT,” Med. Phys., vol. 36, no. 11, pp. 4911–4919, 2009.
  • [11] E. Y. Sidky, Y. Duchin, X. Pan, and C. Ullberg, “A constrained, total-variation minimization algorithm for low-intensity x-ray ct,” Med. Phys., vol. 38, no. S1, 2011.
  • [12] B. De Man and S. Basu, “Distance-driven projection and backprojection in three dimensions,” Phys. Med. Biol., vol. 49, no. 11, p. 2463, 2004.
  • [13] B. R. Whiting, P. Massoumzadeh, O. A. Earl, J. A. O’Sullivan, D. L. Snyder, and J. F. Williamson, “Properties of preprocessed sinogram data in x-ray computed tomography,” Med. Phys., vol. 33, no. 9, pp. 3290–3303, 2006.
  • [14] I. A. Elbakri and J. A. Fessler, “Statistical image reconstruction for polyenergetic X-ray computed tomography,” IEEE Trans. Med. Imaging, vol. 21, no. 2, pp. 89–99, 2002.
  • [15] Z. Tian, X. Jia, K. Yuan, T. Pan, and S. B. Jiang, “Low-dose CT reconstruction via edge-preserving total variation regularization,” Phys. Med. Biol., vol. 56, no. 18, p. 5949, 2011.
  • [16] Y. Liu, J. Ma, Y. Fan, and Z. Liang, “Adaptive-weighted total variation minimization for sparse data toward low-dose x-ray computed tomography image reconstruction,” Phys. Med. Biol., vol. 57, no. 23, p. 7923, 2012.
  • [17] Q. Xu, H. Yu, X. Mou, L. Zhang, J. Hsieh, and G. Wang, “Low-dose X-ray CT reconstruction via dictionary learning,” IEEE Trans. Med. Imaging, vol. 31, no. 9, pp. 1682–1697, 2012.
  • [18]

    Y. Zhang, X. Mou, G. Wang, and H. Yu, “Tensor-based dictionary learning for spectral CT reconstruction,”

    IEEE Trans. Med. Imaging, vol. 36, no. 1, pp. 142–154, 2017.
  • [19] E. Y. Sidky and X. Pan, “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization,” Phys. Med. Biol., vol. 53, no. 17, p. 4777, 2008.
  • [20] Y. Chen, X. Yin, L. Shi, H. Shu, L. Luo, J.-L. Coatrieux, and C. Toumoulin, “Improving abdomen tumor low-dose CT images using a fast dictionary learning based processing,” Phys. Med. Biol., vol. 58, no. 16, p. 5803, 2013.
  • [21] J. Ma, J. Huang, Q. Feng, H. Zhang, H. Lu, Z. Liang, and W. Chen, “Low-dose computed tomography image restoration using previous normal-dose scan,” Med. Phys., vol. 38, no. 10, pp. 5713–5731, 2011.
  • [22] Z. Li, L. Yu, J. D. Trzasko, D. S. Lake, D. J. Blezek, J. G. Fletcher, C. H. McCollough, and A. Manduca, “Adaptive nonlocal means filtering based on local noise level for CT denoising,” Med. Phys., vol. 41, no. 1, 2014.
  • [23] A. Buades, B. Coll, and J.-M. Morel, “A review of image denoising algorithms, with a new one,” Multiscale Model. Simul., vol. 4, no. 2, pp. 490–530, 2005.
  • [24] A. Cheddad, C. Svensson, J. Sharpe, F. Georgsson, and U. Ahlgren, “Image processing assisted algorithms for optical projection tomography,” IEEE Trans. Med. Imaging, vol. 31, no. 1, pp. 1–15, 2012.
  • [25] P. F. Feruglio, C. Vinegoni, J. Gros, A. Sbarbati, and R. Weissleder, “Block matching 3D random noise filtering for absorption optical projection tomography,” Phys. Med. Biol., vol. 55, no. 18, p. 5401, 2010.
  • [26] Y. Chen, L. Shi, Q. Feng, J. Yang, H. Shu, L. Luo, J.-L. Coatrieux, and W. Chen, “Artifact suppressed dictionary learning for low-dose ct image processing,” IEEE Trans. Med. Imaging, vol. 33, no. 12, pp. 2271–2292, 2014.
  • [27] J. Liu, Y. Hu, J. Yang, Y. Chen, H. Shu, L. Luo, Q. Feng, Z. Gui, and G. Coatrieux, “3d feature constrained reconstruction for low dose ct imaging,” IEEE Trans. on Circuits Syst. Video Technol., 2016.
  • [28] G. Wang, “A perspective on deep imaging,” IEEE Access, vol. 4, pp. 8914–8924, 2016.
  • [29]

    G. Wang, M. Kalra, and C. G. Orton, “Machine learning will transform radiology significantly within the next 5 years,”

    Med. Phys., vol. 44, no. 6, pp. 2041–2044, 2017.
  • [30] W. Zhang, R. Li, H. Deng, L. Wang, W. Lin, S. Ji, and D. Shen, “Deep convolutional neural networks for multi-modality isointense infant brain image segmentation,” NeuroImage, vol. 108, pp. 214–224, 2015.
  • [31]

    S. Wang, M. Kim, G. Wu, and D. Shen, “Scalable high performance image registration framework by unsupervised deep feature representations learning,” in

    Deep Learning for Medical Image Analysis.   Elsevier, 2017, pp. 245–269.
  • [32] X. Cao, J. Yang, Y. Gao, Q. Wang, and D. Shen, “Region-adaptive deformable registration of ct/mri pelvic images via learning-based image synthesis,” IEEE Trans. Image Process., 2018.
  • [33] L. Cattell, G. Platsch, R. Pfeiffer, J. Declerck, J. A. Schnabel, C. Hutton, A. D. N. Initiative et al., “Classification of amyloid status using machine learning with histograms of oriented 3d gradients,” NeuroImage: Clinical, vol. 12, pp. 990–1003, 2016.
  • [34] H. Chen, Y. Zhang, W. Zhang, P. Liao, K. Li, J. Zhou, and G. Wang, “Low-dose CT via convolutional neural network,” Biomed. Opt. Express, vol. 8, no. 2, pp. 679–694, 2017.
  • [35] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Low-dose CT with a residual encoder-decoder convolutional neural network,” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2524–2535, 2017.
  • [36] J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Išgum, “Generative adversarial networks for noise reduction in low-dose CT,” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2536–2545, 2017.
  • [37] Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, and G. Wang, “Low dose CT image denoising using a generative adversarial network with wasserstein distance and perceptual loss,” arXiv preprint arXiv:1708.00961, 2017.
  • [38] E. Kang, J. Min, and J. C. Ye, “A deep convolutional neural network using directional wavelets for low-dose x-ray ct reconstruction,” arXiv preprint arXiv:1610.09736, 2016.
  • [39] H. Shan, Y. Zhang, Q. Yang, U. Kruger, W. Cong, and G. Wang, “3D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2D trained network,” arXiv preprint arXiv:1802.05656, 2018.
  • [40] W. Yang, H. Zhang, J. Yang, J. Wu, X. Yin, Y. Chen, H. Shu, L. Luo, G. Coatrieux, Z. Gui et al., “Improving low-dose ct image using residual convolutional network,” IEEE Access, vol. 5, pp. 24 698–24 705, 2017.
  • [41] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
  • [42] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” arXiv preprint arXiv:1701.07875, 2017.
  • [43] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,” IEEE Trans. Comput. Imaging, vol. 3, no. 1, pp. 47–57, 2017.
  • [44] L. Fu, T.-C. Lee, S. M. Kim, A. M. Alessio, P. E. Kinahan, Z. Chang, K. Sauer, M. K. Kalra, and B. De Man, “Comparison between pre-log and post-log statistical models in ultra-low-dose CT reconstruction,” IEEE Trans. Med. Imaging, vol. 36, no. 3, pp. 707–720, 2017.
  • [45] P. S. Calhoun, B. S. Kuszyk, D. G. Heath, J. C. Carley, and E. K. Fishman, “Three-dimensional volume rendering of spiral ct data: theory and method,” Radiographics, vol. 19, no. 3, pp. 745–764, 1999.
  • [46]

    V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in

    Proc. 27th Int. Conf. Machine Learning, 2010, pp. 807–814.
  • [47] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [48] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Proc. Advances Neural Information Processing Systems Conf., 2017, pp. 5769–5779.
  • [49] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
  • [50] Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it? a new look at signal fidelity measures,” IEEE Signal Process. Mag., vol. 26, no. 1, pp. 98–117, 2009.
  • [51] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. IEEE Asilomar Conf. Signals, Syst., Comput., vol. 2.   Ieee, 2003, pp. 1398–1402.
  • [52] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  • [53] AAPM, “Low dose ct grand challenge,” 2017. [Online]. Available:
  • [54] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with deep neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 341–349.
  • [55]

    C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,”

    IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, 2016.
  • [56] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [57] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
  • [58] R. Liu, Y. Luo, and H. Yu, “Gpu-based acceleration for interior tomography,” IEEE Access, vol. 2, pp. 757–770, 2014.
  • [59] D. Matenine, Y. Goussard, and P. Després, “Gpu-accelerated regularized iterative reconstruction for few-view cone beam ct,” Medical physics, vol. 42, no. 4, pp. 1505–1517, 2015.
  • [60] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” CoRR, abs/1703.06211, vol. 1, no. 2, p. 3, 2017.
  • [61] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016, pp. 770–778.
  • [62] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 3859–3869.