I Introduction
Xray computed tomography (CT) is one of the most popular imaging modalities in clinical, industrial, and other applications [1]. Nevertheless, the potential risks (i.e., a chance to induce cancer and cause genetic damage) of ionizing radiation associated with medical CT scans cause a public concern [2]. Studies from the National Council on Radiation Protection and Measurements (NCRP) demonstrate a 600% increase in medical radiation dose to the US population from 1980 to 2006, showing both great successes of the CT technology and an elevated alert to patients [3].
The main drawback of radiation dose reduction is to increase the image background noise, which could severely compromise diagnostic information. How to minimize the exposure to ionizing radiation while maintaining diagnostic utility of lowdose CT (LDCT) has been a challenge for researchers, who follows the wellknown ALARA (as low as reasonably achievable) guideline [1]. Numerous methods were designed for LDCT noise reduction. These methods can be categorized as follows: (1) Sinogram filteringbased techniques [4, 5, 6, 7, 8, 9]: these methods directly process projection data in the projection domain [6]. The main advantage of these methods is computational efficiency. However, they may result in loss of structural information and spatial resolution [6, 10, 7]; (2) Iterative reconstruction (IR) [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]: IR techniques may potentially produce high signaltonoise ratio (SNR). However, these methods require a substantial computational cost and troublesome parametric turning; (3) Image space denoising techniques [21, 22, 20, 23, 24, 25, 26, 27]: these techniques can be performed directly on reconstructed images so that they can be applied across various CT scanners at a very low cost. Examples are nonlocal meansbased filters [21, 16]
, dictionarylearningbased Ksingular value decomposition (KSVD) method
[20] and the blockmatching 3D (BM3D) algorithms [25, 24]. Even though these algorithms greatly suppress noise and artifacts, edge blurring or resolution loss may persist in processed LDCT images.Deep learning (DL) has recently received a tremendous attention in the field of medical imaging [28, 29], such as brain image segmentation [30], image registration [31, 32], image classification[33], and LDCT noise reduction [34, 35, 36, 37, 38, 39, 40]. For example, Chen et al. [35]
proposed a Residual EncoderDecoder Convolutional Neural Network (RENCNN) to predict NDCT images from noisy LDCT images. This method greatly reduces the background noise and artifacts. However, a limitation is that the results look blurry sometimes since the method targets minimizing the meansquared error between the generated LDCT and corresponding NDCT images. To cope with this problem, the generative adversarial network (GAN)
[41] offers an attractive solution. In the GAN, the generator learns to capture a real data distribution while the discriminator attempts to discriminate between the synthetic data distribution and the real counterpart. Note that the loss used in GAN, called the adversarial loss, measures the distance between the synthetic data distribution and the real one in order to improve the performance of and simultaneously. Originally, GAN uses the JensenShannon (JS) divergence to evaluate the similarity of the two data distributions [41]. However, several problems exist in training GAN, such as unstable training and nonconvergence. To address these issues, Arjovsky et al. introduced the Wasserstein distance instead of the JensenShannon divergence to improve the neural network training [42]. We will discuss more details on this aspect in Section IID3.In our previous work [37], we first introduced the perceptual loss to capture perceptual differences between denoised LDCT images and the reference NDCT images, providing the perceptually better results for clinical diagnosis at a cost of low scores in traditional image quality metrics. Since the traditional image quality metrics evaluate the generated images with reference to the goldstandard in generic ways, minimizing the perceptual loss does not ensure the results optimal in terms of the traditional image quality metrics. To address this discrepancy and inspired by the work in [36, 43], here we propose a novel 3D clinical Structurallysensitive Multiscale Generative Adversarial Network (SMGAN) to capture subtle structural features while maintaining high visual sensitivity. The proposed structurallysensitive loss leverages a combination of adversarial loss[42], perceptuallyfavorable structural loss, and pixelwise loss. Moreover, to validate the diagnostic quality of images processed by our method, we report qualitative image assessments by three expert radiologists. Systematically, we demonstrate the feasibility and merits of mapping LDCT images to corresponding NDCT images in the GAN framework.
Our main contributions in this paper are summarized as follows:

To keep the underlying structural information in LDCT images, we adopt a 3D CNN model as a generator based on WGAN which can enhance the image quality for better diagnosis.

To measure the structural difference between generated LDCT images and the NDCT goldstandard, a structurallysensitive loss is used to enhance the accuracy and robustness of the algorithm. Different from [37], we replace the perceptual loss with a combination of loss and structural loss.

To compare the performance of the 2D and the 3D models, we perform an extensive evaluation on their convergence rate and denoising performance.
This paper is organized as follows: Section II introduces the proposed approach and analyzes the impact of each component loss function on the image quality. Section III presents the experimental design and results. Section IV discusses relevant issues. Finally, the concluding remarks and future plans are given in Section V.
Ii Methods
Iia Problem Inversion
Assuming that denotes the original LDCT image, and denotes the corresponding NDCT image, the relationship between them can be expressed as:
(1) 
where is a generic noising process that degrades a real sample of NDCT to a corresponding LDCT sample in a nonlinear way. stands for the additive noise and unmodeled factors, and , , are height, width and depth respectively.
From another standpoint, considering that the real NDCT distribution is unknown, we focus on extracting information to recover desired images from the noisy LDCT images . In general, the noise distribution in CT images is regarded as the mixture of Poisson quantum noise and Gaussian electronic noise[44]. Compared with traditional denoising methods, the DLbased method is capable of effectively modeling any type of data distributions since the DLbased denoising model itself can be easily adapted to any practical noise model with statistical properties of typical noise distributions in a combination. Therefore, the proposed DLbased denoising network is to solve the inverse problem to retrieve feasible images , and the solution can be expressed as:
(2) 
As shown in Fig.1, the overall network comprises three parts. Part 1 is the generator , part 2 is the StructurallySensitive loss (SSL) function, and part 3 is the discriminator .
maps a volumetric LDCT image to the NDCT feature space, thereby estimating a NDCT image. The SSL function computes the structurallysensitive dissimilarity which encodes multiscale structural information. The loss computed by the SSL function aims to improve the ability of
to generate realistic results. distinguishes a pair of synthetic and real NDCT images. If can identify the input image as “synthetic” or “real” correctly and tell us the discrepancy between the estimated CT image and the corresponding real NDCT image, we will know if yields a highquality estimation or not. With the indication from , can optimize its performance. Also, can upgrade its ability as well. Hence, and are in competition: attempts to generate a convincing estimate to an NDCT image while aims to distinguish the estimated image from real NDCT images. See Sections IIC and IID for more details. For your convenience, the summary of notations that we use in this paper is in Table V.IiB 3D Spatial Information
The advantages of using 3D spatial information are evident. Hence, volumetric imaging and 3D visualization have become standards in diagnostic radiology [45]. There is a large amount of 3D NDCT and LDCT volumetric images available in practice. However, most of the networks are of 2Dbased architecture. With a 3D network architecture, adjacent crosssection slices from a 3D CT image volume exhibit strong spatial correlation which we can utilize to preserve more information than with 2D models.
As mentioned above, here we use a 3D ConvNet as the generator and introduce a 3D StructurallySensitive loss (SSL) function. Accordingly, we extract 3D image patches and use a 3D filter instead of a 2D filter. The generator in our network takes 3D volumetric LDCT patches as the input and process them with 3D nonlinear transform operations. For convenience and comparison, 2D and 3D denoising networks are referred to as SMGAN2D and SMGAN3D respectively. The details of the network architecture are in the following Section
IIC.IiC Network Structure
Inspired by the studies in [36, 37], we introduce our proposed SMGAN3D network structure. First, in Section IIC1 we present the 3D generator which captures local anatomical features. Then, in Section IIC2 we define the 3D SSL function which guides the learning process . Finally, we outline the 2.5D discriminator in Section IIC3.
IiC1 3D CNN Generator
The generator
consists of eight 3D convolutional (Conv) layers. The first 7 layers each has 32 filters, and the last layer has only 1 filter. The oddnumbered convolutional layers apply
filters, while the evennumbered convolutional layers use filters. The size of the extracted 3D patches is as the input to our whole network; see Fig. 1. Note that the variable denotes the number of the filters and denotes the stride size, which is the step size of the filer when moving across an image so thatstands for 32 feature maps with a unit stride. Furthermore, a pooling layer after each Conv layer may lead to loss of subtle textural and structural information. Therefore, the pooling layer is not applied in this network. The Rectified Linear Unit (ReLU)
[46]is our activation function after each Conv layer.
IiC2 StructurallySensitive Loss (SSL) Function
IiC3 Discriminator
The discriminator consists of six convolutional layers with , , , , , and filters and the kernel size of . Two fullyconnected (FC) layers produce and feature maps respectively. Each layer is followed by a leaky ReLU defined as [46], where is a small constant. A stride of one pixel is applied for oddnumbered Conv layers and a stride of two pixels for evennumbered Conv layers. The input fed to is of the size , which comes from the output of . The reason why we use a 2D filter in is to reduce the computational complexity. Since the adversarial loss between each two adjacent slices in one volumetric patch contribute equally to the weighted average in one iteration, it can be easily computed. Following the suggestion in [42], we do not use the sigmoid cross entropy layer in .
IiD Loss Functions for Noise Reduction
In this subsection, we evaluate the impact of different loss functions on LDCT noise reduction. This justifies the use of a hybrid loss function for optimal diagnostic quality.
IiD1 loss
The loss can efficiently suppress the background noise, but it could make the denoised results unnatural and blurry. This is expected due to its regressiontomean nature [43, 48]. Furthermore, the loss assumes that background noise is white Gaussian noise, which is independent of local image features [49] and not desirable for LDCT imaging.
The formula of loss is expressed as:
(3) 
where , , stand for the height, width, and depth of a 3D image patch respectively, denotes the goldstandard (NDCT), and represents the generated result from the source (LDCT) image . It is worth noting that since the loss has appealing properties of differentiability, convexity, and symmetry, the mean squared error (MSE) or loss is still a popular choice in denoising tasks[50].
IiD2 Loss
The and losses are both the meanbased measures, the impacts of these two loss functions are different on denoising results. Compared with the loss, the loss does not overpenalize large differences or tolerate small errors between denoised and goldstandard images. Thus, the loss can alleviate some drawbacks of the loss we mentioned earlier. Additionally, the loss enjoys the same fine characteristics as loss except for the differentiability.
IiD3 Adversarial Loss
The Wasserstein distance with the regularization term was proposed in [48], which is formulated as
(5) 
where the first two terms are for the Wasserstein distance, and the third term implements the gradient penalty. Note that denotes for brevity. is uniformly sampled along the straight line between a pair of points sampled from and corresponding NDCT images.
IiD4 Structural Loss
Medical images contain strong feature correlations. For example, their voxels have strong interdependencies. The structural similarity index (SSIM) [49] and the multiscale structural similarity index (MSSSIM) [51]
are perceptually motivated metrics, and perform better in visual pattern recognition than meanbased metrics
[49]. To measure the structural and perceptual similarity between two images, the SSIM [49] is formulated as follows:(6)  
(7) 
where , are constants and ,,,,
denote means, standard deviations and crosscovariance of the image pair
from and the corresponding NDCT image respectively. , are the first term and second factor we defined in Eqn. 6.The multiscale SSIM provides more flexibility for multiscale analysis [51]. The formula for MSSSIM [51] is expressed as:
(8) 
where , are the local image content at the level, and is the number of scale levels. Clearly, SSIM is a special case of MSSSIM.
The formula for the structural loss (SL) is generally expressed as:
(9) 
Note that the loss can be easily backpropagated to update weights in the network, since it can be differentiated [43].
IiD5 Objective Function
As mentioned in the recent studies [43, 37], minimizing the loss leads to oversmoothed appearance. The adversarial loss in GAN may yield sharp images, but it does not exactly match the corresponding real NDCT images [37]. The perceptual loss computed by a VGG network [47] evaluates the perceptual differences between the generated images and real NDCT images in a highlevel feature space instead of the voxel space. Since the VGG network is trained on a large dataset of natural images, not CT images, it may result in distortions of processed CT images. To tackle these issues, we propose to utilize different loss terms together for high image quality.
As revealed in [43], the loss allows noise suppression and SNR improvement. However, it blurs anatomical structures to some extent. In contrast, the structural loss discourages blurring and keeps high contrast resolution. To have the merits of both loss functions, the structural sensitive loss (SSL) is expressed as:
(10) 
where is the weighting factor to balance between structure preservation in the first term (from Eq. 9) and noise suppression in the second term (from Eq. 4).
Nevertheless, the abovementioned two losses may still miss some diagnostic features. Hence, the adversarial loss is incorporated to keep textural and structural features as much as possible. In summary, the overall objective function of SMGAN is expressed as:
(11) 
where is the weight for the adversarial loss. In the last step of the network, we compare the difference between the output volume and the target volume, and then the error can be backpropagated for optimization [52].
Iii Experiments and results
Iiia Experimental Datasets and Setup
To show the effectiveness of the proposed network for LDCT noise reduction, we used a real clinical dataset, published by Mayo Clinic for the 2016 NIHAAPMMayo Clinic Low Dose CT Grand Challenge [53]. The Mayo dataset consists of 2,378 normal dose CT (NDCT) and low dose (quarter dose) CT (LDCT) images from 10 anonymous patients. The reconstruction interval and slice thickness in the dataset were and respectively.
For limited data, the denoising performance of DLbased methods depends on the size of the training datasets, so largescale valid training datasets can improve the denoising performance. However, it is worth noting that the training image library may not contain many valid images. To enhance the performance of the network, the strategies we utilized are as follows. First of all, in order to improve generalization performance of the network and avoid overfitting, we adopted the “10fold cross validation” strategy. The original dataset was partitioned into 10 equal size subsets. Then, a single subset was used in turn as the validation subset and the rest of data were utilized for training. Moreover, considering the limited number of CT images, we applied the overlapping patches strategy because it can not only consider patchwise spatial interconnections, but also significantly increase the size of the training patch dataset [54, 55].
For data preprocessing, the original LDCT and NDCT images are of pixels. Since directly processing the entire patient images is computationally inefficient and infeasible, our denoising model was applied to image patches. First, we applied the overlapped sliding window with a sliding size of to obtain image patches and then randomly extracted 100,100 pairs of training patches and 5,100 pairs for validation from remaining patient images of the same size . Then, the “10fold cross validation” strategy is used to ensure the accuracy of the proposed algorithm. Next, the CT Hounsfield Unit (HU) scale was normalized to [0, 1] before the images were fed to the network.
For qualitative comparison, in order to validate the performance of our proposed methods (SMGAN2D and SMGAN3D), we compare them with eight stateoftheart denoising methods, including CNNL2 (net), CNNL1 (net), structuralloss net (SLnet), multiscale structuralloss net (MSLnet), WGAN, BM3D [25], REDCNN [35], and WGANVGG [37]. Among these existing denoising methods, BM3D is a classical image space denoising algorithm. WGANVGG represents a 2D perceptuallossbased network, and REDCNN refers to a 2D pixelwise network. Note that the parameter settings in these methods [37, 35, 25] had been followed per the suggestions from the original papers.
For quantitative comparison, to evaluate the effectiveness of the proposed methods, three metrics were chosen to perform image quality evaluation, including peak signaltonoise ratio (PSNR), structural similarity index (SSIM) [51], and rootmeansquare error (RMSE).
Comparison of loss function value versus the number of epochs with respect to different algorithms. (a) L1 Loss, (b) Structural Loss, and (c) Wasserstein Distance curves.
IiiB Parameter Selection
In our experiments, the Adam optimization algorithm was implemented for our network training [56]
. In the training phase, the minibatch size was 64. The hyperparameter
for the balance between the Wasserstein distance and gradient penalty was set 10, per the suggestion from the original paper [42]. The parameter for the tradeoff between adversarial loss and mixture loss was set be . The parameterwas set to 0.89. The slope of the leaky ReLu activation function was set to 0.2. The networks are implemented in the TensorFlow
[57] on an NVIDIA Titan Xp GPU.IiiC Network Convergence
To examine the robustness of different denoising algorithms, ten methods corresponding to the loss , structural loss (SL), and Wasserstein distance were separately trained in the same settings as that for SMGAN3D. Note that the parameter settings of REDCNN, WGANVGG, and BM3D from the original papers had been followed [35, 37, 25]. In addition, the size of the input patches of the 2D network is while our proposed 3D model uses training patches with the size of . We calculated the averaged loss value achieved by different methods versus the number of epochs as the measure of convergence in Fig. 5.
In Fig. (a)a and (b)b, in terms of and SL, we observe that net and net achieved the fastest convergence rate and have similar convergence trends in that all curves decreased initially and then smoothly converged, indicating that these meanbased algorithms both have fast convergence rates. Fig. (a)a shows that they both converged around the epoch. In contrast, in Fig. (a)a, there are differences between SLbased and meanbased methods. We can see that the convergence curve of the SLnet decreases initially and then slightly rises around the epoch as shown in Fig. (a)a. MSLnet also shows a small increase like SLnet in terms of . This observation indicates that SLbased and meanbased methods have different emphasis on minimizing perceptually motivated similarity between real NDCT images and generated NDCT images. For WGANbased methods, it can be clearly observed that the curves for WGAN, WGANVGG, SMGAN2D, and SMGAN3D slightly oscillate in the convergence process after the epoch in Fig. (a)a and (b)b. The reason for such oscillatory behaviors is as follows: attempts to mimic the real NDCT distribution while aims to differentiate between the real NDCT distribution and the denoised LDCT distribution. Since GAN’s intrinsic nature is a twoplayer game, the distributions of and are constantly changing, and this leads to the oscillatory behavior when converging to their optimal status.
As shown in Fig. (c)c, we can evaluate the convergence performance of WGAN. It can be seen that our proposed SMGAN2D has the mildest oscillatory behavior compared with the other three models and reaches a stable state after the epoch. Moreover, the SMGAN3D oscillates in a relatively large range in the training process. This is because our proposed SMGAN3D considers 3D structural information which results in a relatively larger vibrating amplitude in the training process. However, the curve still oscillates close to the xaxis, indicating SMGAN3D’s robustness in minimizing the Wasserstein distance between the generated samples and real samples.
IiiD Denoising Performance
To demonstrate the effectiveness of the proposed network, we perform the qualitative comparisons over three representative abdominal images presented in Figs. 18, 44 and 70. For better evaluations of the image quality with different denoising models, zoomed regionsofinterest (ROIs) are marked by red rectangles and shown in Figs. 31, 57 and 83 respectively. Note that all results from different denoising models focus on two aspects: content restoration and noisereduction. All CT images in axial view are displayed in the angiography window [160, 240]HU.
The real NDCT images and corresponding LDCT images are presented in Figs. (a)a and (b)b. As observed, there are distinctions between ground truth (NDCT) images and LDCT images. Figs. (a)a and (a)a show the lesions/metastasis. Fig. (a)a presents focal fatty sparing/focal fat. In Figs. (a)a, (a)a and (a)a, these lesions can be clearly observed in NDCT images; in contrast, from Figs. (b)b, (b)b, and (b)b, it can be seen that the original LDCT image is noisy, and lacks structural features for taskbased clinical diagnosis. All adopted denoising models suppress noise to some extent.
IiiD1 Comparison with CNNbased denoising methods
To study the robustness of the adversarial learning framework in SMGAN3D, we compared SMGAN3D with the CNNbased methods, including CNNL2, CNNL1, REDCNN [35], SLnet and MSLnet. It is worth noting that CNNL2, CNNL1, and REDCNN are meanbased denoising methods, and SLnet and MSLnet are SLbased denoising methods. All of the methods greatly reduce the noise compared with LDCT images. Our proposed method preserves more structural details, thereby yielding better image quality, compared with the other five methods.
Meanbased methods can effectively reduce noise, but the side effect is impaired image contents. In Fig. (c)c, net greatly suppresses the noise, but blurs some crucial structural information in the porta hepatis region. Meanwhile, some waxy artifacts can still be observed in Fig. (c)c. net does not produce good visual quality because it assumes that the noise is independent of local characteristics of the images. Even though it retains high SNR, its results are not clinically preferable. Compared with net, in Figs. (d)d and (d)d, it can been seen that net encourages less blurring and preserves more structural information. However, as observed in Fig. (d)d, it still oversmooths some anatomical details. Meanwhile, in Fig. (d)d, there are some blocky effects marked by the blue arrow. The results obtained by REDCNN [35] deliver high SNR but blur the vessel details as shown in Figs. (i)i and (i)i.
For SLbased methods, as observed in Figs. (e)e and (e)e, SLnet generates images with higher contrast resolution and preserves texture of real NDCT images better than net and net. However, Figs. (e)e and (e)e show that SLnet does not preserve the structural features well, and there still remain small streak artifacts. Subsequently, in Figs. (e)e and (f)f
, SLnet and MSLnet have low frequency image intensity variance because SSIM/MSSSIM is insensitive to uniform biases
[49, 51]. On the other hand, net preserves the overall image intensity, but it does not preserve high contrast resolution well as SLnet and MSLnet do.From Figs. 70 and 83, we can see meanbased and SLbased methods work well with effective noise suppression and artifact removal. However, the illustrations in Fig. 83 show that these methods blur the local strutural features. Our proposed SMGANbased methods present a better edge preservation than the competing methods.
Overall, the observations above support the following statements. First, although the voxelwise methods show good noisereduction properties, to some extent they blur the contents and lead to the loss of structural details because they optimize the results in the voxelwise manner. Second, SLbased methods better preserve texture than meanbased methods, but they cannot preserve overall image intensity. Third, the results produced by the proposed SMGAN3D demonstrate the benefits of the combination of two loss functions and the importance of the adversarial training [41, 42].
Fig. 18  Fig. 44  Fig. 70  

PSNR  SSIM  RMSE  PSNR  SSIM  RMSE  PSNR  SSIM  RMSE  
LDCT  22.818  0.761  0.0723  21.558  0.659  0.0836  24.169  0.737  0.0618 
CNNL1  27.791  0.822  0.0408  26.794  0.738  0.0457  29.162  0.807  0.0348 
CNNL2  27.592  0.819  0.0418  26.630  0.736  0.0466  28.992  0.806  0.0355 
SLnet  26.864  0.831  0.0453  25.943  0.745  0.0504  28.069  0.813  0.0395 
MSLnet  27.667  0.831  0.0414  26.685  0.744  0.0469  28.902  0.812  0.0359 
WGAN  25.727  0.801  0.0517  24.655  0.711  0.0585  26.782  0.781  0.0458 
BM3D  27.312  0.809  0.0431  26.525  0.728  0.0472  28.959  0.794  0.0356 
REDCNN  28.279  0.825  0.0385  27.243  0.743  0.0444  29.679  0.811  0.0328 
WGANVGG  26.464  0.811  0.0475  25.300  0.722  0.0543  27.161  0.793  0.0419 
SMGAN2D  26.627  0.821  0.0466  25.507  0.732  0.0530  27.731  0.795  0.0406 
SMGAN3D  26.569  0.824  0.0473  25.372  0.739  0.0538  27.398  0.794  0.0411 
IiiD2 Comparison with WGANbased denoising methods
To evaluate the effectiveness of our proposed objective function, we compare our method with existing WGANbased networks, including WGAN and WGANVGG. Considering the importance of clinical image quality and specific structural features for medical diagnosis, we adopted the adversarial learning method [41, 42] in our experiments because WGAN could help to capture more structural information. Nevertheless, based on our prior experience, utilizing WGAN alone may yield stronger noise than other selected approaches, because it only maps the data distribution from LDCT to NDCT without consideration of local voxel intensity and structural correlations. The observations demonstrate that the noise texture is coarse in the images, as shown in Fig. (g)g and Fig. (g)g, which support our intuition.
Indeed, the images of WGANVGG[37], as shown in Fig. (j)j, exhibit better visual quality with respect to more details and share structural details similar to NDCT images according to human perceptual evaluations. However, Figs. (j)j (marked by the red circle) and (j)j (marked by the green circle) suggest that it may severely distort the original structural information. A possible reason is that the VGG network [47] is a pretrained deep CNN network based on natural images, and the structural information and contents of natural images are different from medical images.
Compared with WGAN and WGANVGG, our proposed SMGAN3D, as shown in Figs. (l)l (marked by the red circle) and (l)l (marked by the green circle), can more clearly visualize the metastasis and better preserve of the portal vein.
In Figs. 70 and 83, it can be found that the SMGANbased methods can achieve better anatomical feature preservations and visual quality than other stateoftheart methods.
The experimental results demonstrate that our proposed objective function is essential to capture more accurate anatomical details.
IiiD3 Comparison with Image space denoising
To validate the robustness of DLbased methods, we compared our method with the image space denoising method. Figs. (h)h and (h)h show that BM3D blurs the lowcontrast lesion marked by the red circle and smooths specific features marked by the blue arrow. In contrast, SMGAN3D exhibits better on the lowcontrast lesion and yields sharper features as shown in Figs. (l)l and (l)l.
IiiD4 Comparison with 2Dbased SMGAN network
In order to evaluate the 3D structural information, we compared SMGAN3D with SMGAN2D. As shown in Fig. (l)l, our proposed SMGAN3D generated the results with better subtle details than SMGAN2D and enjoys more similar statistical noise properties to the corresponding NDCT images. The reasons why SMGAN3D outperforms SMGAN2D are follows. First, SMGAN3D incorporates 3D structural information to improve image quality. Second, SMGAN2D takes input slice by slice, thus potentially leading to the loss of spatial correlation between adjacent slices.
Figs. 70 and 83 demonstrate that the SMGAN3D can be used to provide improved anatomical feature preservation over other stateoftheart methods.
In summary, we compared our proposed methods with existing methods, and it can be clearly observed that SMGAN3D achieves robust performance in noise suppression, artifact removal, and texture preservation. Note that we recommend the reader to see ROIs (in Fig. 31 and 57) or zoom in to better evaluate our results. To further validate the generalization ability of our proposed model, we conclude more details in Appendix A.
Fig. 31  Fig. 57  Fig. 83  

Mean  SD  Mean  SD  Mean  SD  
NDCT  115.282  45.946  56.903  58.512  51.225  73.297 
LDCT  114.955 (0.2837%)  74.299 (61.709%)  57.228 (0.571%)  85.854 (46.729%)  50.142 (2.114%)  89.346 (21.896%) 
CNNL1  115.809 (0.4571%)  28.532 (37.9010%)  57.709 (1.416%)  42.315 (27.682%)  50.917 (0.6013%)  66.359 (9.466%) 
CNNL2  117.191 (1.656%)  29.933 (34.852%)  58.956 (3.608%)  43.411 (25.808%)  52.229 (1.960%)  66.922 (8.698%) 
SLnet  131.333 (13.923%)  35.844 (21.987%)  68.471 (20.329%)  50.789 (13.199%)  63.874 (24.693%)  72.718 (0.790%) 
MSLnet  118.395 (2.701%)  32.548 (29.160%)  63.271 (11.191%)  46.979 (19.711%)  57.052 (11.375%)  69.519 (5.154%) 
WGAN  105.461 (8.519%)  42.659 (7.154%)  48.432 (14.887%)  54.306 (7.188%)  42.417 (17.195%)  70.904 (3.265%) 
BM3D  114.058 (1.062%)  31.515 (31.409%)  25.649 (54.925%)  69.411 (18.627%)  15.183 (70.360%)  100.08 (36.540%) 
REDCNN  116.642 (1.180%)  27.194 (40.813%)  57.985 (1.902%)  42.048 (28.138%)  51.272 (0.0918%)  66.961 (8.644%) 
WGANVGG  108.229 (6.118%)  36.721 (20.078%)  54.450 (4.311%)  48.660 (16.838%)  44.959 (12.232%)  67.059 (8.511%) 
SMGAN2D  108.758 (5.659%)  40.948 (10.878%)  51.243 (9.947%)  53.065 (9.309%)  48.230 (5.847%)  72.073 (1.670%) 
SMGAN3D  115.569 (0.749%)  43.654 (6.723%)  54.356 (4.476%)  56.552 (3.350%)  55.378 (8.107%)  73.303 (0.00821%) 
Sharpness  Noise Suppression  Diagnostic Acceptability  Contrast Retention  Overall Quality  

LDCT  2.551.43  1.550.80  1.850.96  1.750.83  1.931.01 
CNNL1  2.800.81  3.300.71  2.700.78  2.750.77  2.890.77 
CNNL2  2.120.42  3.980.58  1.930.78  2.070.83  2.530.55 
SLnet  2.950.86  3.150.65  2.700.71  2.800.81  2.900.76 
MSLnet  3.010.94  3.160.57  2.870.83  2.840.69  2.970.76 
WGAN  3.300.56  2.800.81  3.150.91  3.451.02  3.090.66 
BM3D  2.211.08  3.290.80  2.210.86  2.290.88  2.500.91 
REDCNN  3.290.88  3.790.70  3.510.70  3.461.12  3.510.85 
WGANVGG  3.350.91  3.501.07  3.350.91  3.451.02  3.410.94 
SMGAN2D  3.250.65  3.480.66  3.320.58  3.210.78  3.320.67 
SMGAN3D  3.560.73  3.590.68  3.580.46  3.611.02  3.590.72 
IiiE Quantitative analysis
We performed the quantitative analysis with respect to three selected metrics (PNSR, SSIM, and RMSE). Then, we investigated the statistical properties of the denoised images for each noisereduction algorithm. Furthermore, we performed a blind reader study with three radiologists on 10 groups of images. Note that quantitative fullsize measurements are in Table I and image quality assessments of ROIs are in Fig. 87. The NDCT images are chosen as the goldstandard.
IiiE1 Image quality analysis
As shown in Table I, REDCNN scores the highest PSNR and RMSE, and ranks the second place in SSIM. Since the properties of PSNR and RMSE are regression to the mean, it is expected that REDCNN, a meanbased regressiom optimization, has better performance than other featurebased models. For SLnet and MSLnet, it is not surprising that both models achieve the highest SSIM scores due to the adoption of structural similarity loss. However, a good score measured by image quality metrics does not ensure the preservation of highlevel feature information and structural details, and this explains why REDCNN can have the best PSNR and RMSE despite oversmoothing the content. PSNR, SSIM and RMSE are not perfect, and they are subject to image blurring abd blocky/waxy artifacts in the denoised images, as shown in Figs. 18  83. Hence, these metrics may not be sufficient in evaluating image quality and indicating diagnostic performance. Indeed, WGAN can provide better visual quality and achieve improved statistical properties. Compared with the CNNbased methods, the WGAN architecture can progressively reserve the consistency of the feature distributions between LDCT and NDCT images. By encouraging less blurring, WGAN alone could introduce more image noise to compromise diagnosis. To keep information in LDCT images, our novel loss function with a regularization term is structurally alert to enhance the clinical usability as compared to the other methods.
Although meanbased approaches, such as net, net, enjoy high metric scores, they may oversmooth the overall image contents and lose feature characteristics, which do not satisfy our HVS requirements because meanbased methods favor the regression toward the mean. Meanwhile, WGANVGG satisfies HVS requirements, but gets the lowest scores in the three selected metrics. The reason for the lowest scores is that WGANVGG may suffer from loss of subtle structural information or noise features, which may severely affect the diagnostic accuracy. The proposed SMGAN2D outperforms the featurebased method WGANVGG with reference to the three metrics, illustrating the robust denoising capability of our proposed loss function. Compared with the SMGAN2D model, SMGAN3D achieves higher scores in PSNR and SSIM since it incorporates 3D spatial information. To further validate the performance of each denoising model with respect to clinically significant local details, we performed the quantitative analysis over ROIs. The summary of the quantitative results from ROIs is shown in Fig. 87. It is worth noting that the quantitative results of the ROIs follow a similar trend to that of the fullsize images.
IiiE2 Statistical analysis
To quantitatively evaluate the statistical properties of processed images by different denoising models, we calculate the mean CT number (Hounsfield Unit) and standard deviations (SDs) of ROIs, as shown in Table II. For each denoising model, the percent error of the mean and SD values were calculated in comparison to those of the reference (NDCT) images. The lower percent errors correspond to more robust denoising models. As shown in Table II, net, net, SLnet, MSLnet, BM3D, REDCNN, and WGANVGG generate high percent errors in SD with respect to the NDCT images. There are blocky and oversmoothing effects in the images which match our visual inspections. Specifically, for Fig. 83, the absolute difference in SD between BM3D and NDCT is the largest among all of the denoising models, which indicates that BM3D has the most noticeable blurring effects. The standard deviation of BM3D supports our visual observations as shown in Figs. (h)h, (h)h, and (h)h. The mean values of WGAN, WGANVGG, SLnet and SMGAN2D deviated much from that of the NDCT image in Fig. 31. This indicates that WGAN, WGANVGG, and SMGAN2D effectively reduce the noise level but compromise significant content information. Nevertheless, the SD value of SMGAN2D is close to that of NDCT, which indicates that it supports HVS requirements. From the quantitative analysis in Table II, it can be observed that our proposed SMGAN3D achieves the best matching SD to the NDCT images out of all other methods. Overall, SMGAN3D is a highly competitive denoising model for clinical use.
IiiE3 Visual assessments
To validate clinical image quality of processed results, three radiologists performed a visual assessment on 10 groups of images. Each group includes an original LDCT image with lesions, the corresponding reference NDCT image, and the processed images by different denoising methods. NDCT, considered as the goldstandard, is the only labeled image in each group. All other images were evaluated on sharpness, noise suppression, diagnostic acceptability, and contrast retention using a fivepoint scale (5 = excellent and 1 = unacceptable). We invited three radiologists with mean clinical experience of 12.3 years to join our study. Note that these results were evaluated independently and the overall image quality score for each method was computed an averaging score from the four evaluation criteria. For different methods, the final score is presented as meanSD (average score of three radiologistsstandard deviation). The final quantitative results are listed in Table III.
As observed, the original LDCT images have the lowest scores because of their severe image quality degradation. All denoising models improve the scores to some extent in this study. From Table III, REDCNN obtains the highest score in noise suppression. Compared to all other methods, our proposed SMGAN3D scores best with respect to sharpness, diagnostic acceptability, and contrast retention. Furthermore, voxelwise optimization (CNNL2) has the best visuallyassessed image noise suppression, but it suffers from relatively low scores in sharpness and diagnostic acceptability, indicating a loss of image details. The proposed SMGAN3D model gets a superior overall image quality score relative to the 2D model, which indicates that a 3D model can enhance CT image denoising performance by incorporating spatial information from adjacent slices.
IiiF Computational Cost
In CT reconstruction, there is a tradeoff between the computational cost and the image quality. In this aspect, a DLbased algorithm has great advantages in computational efficiency. Although the training of DLbased methods is timeconsuming, it can rapidly perform the denoising tasks on reconstructed LDCT images after the training is completed. In our study, the proposed 2D method requires about 15 hours and the 3D model needs approximately 26 hours for training to converge. WGANVGG, which has the same number of layers, takes about 18 hours in the training phase. Compared with iterative reconstruction, any DLbased approach will require much less execution time, which facilitates the clinical workflow. In practice, our proposed SMGAN2D and SMGAN3D took 0.534s and 4.864s respectively in the validation phase on a NVIDA Titan GPU. Compared with the results in [58, 59], our method took significantly less time. For example, the computational cost for soft threshold filtering (STF)based TV minimization in the orderedsubset simultaneous algebraic reconstruction technique (OSSART) framework took 45.1s per iteration on the same computing platform. Hence, it is clear that once the model is trained, it requires far less computational overhead than an iterative reconstruction method given that other conditions are equal.
Iv Discussions
As mentioned before, different emphases on visual evaluation and traditional image quality metrics were extensively investigated. When training with only the meanbased losses (net, net, REDCNN), the results can achieve high scores in quantitative metrics and yield promising results with substantial noise reduction. When training with the featurebased methods (WGANVGG), the results can meet HVS requirements for visualization since they preserve more structural details than meanbased methods. However, these methods suffer from the potential risk of content distortion since a perceptual loss is computed based on a network [47] trained on a natural image dataset. Practically and theoretically, even though adversarial learning can prevent smoothing in the image, and capture structural characteristics, they may often result in severe loss of diagnostic information. To integrate the best characteristics of these loss functions, we have proposed a hybrid loss function to deliver the LDCT image quality optimally.
Although our proposed network has achieved highquality denoised LDCT images, there are still rooms for potential improvements. First and foremost, some feature edges in the processed results still look blurry. Also, some structural variations between NDCT and LDCT do not perfectly match. A possible way to enhance correlation between NDCT and LDCT is to design a network with a better modeling capability, which is the work we have started. As far as our reader study is concerned, although visual assessment may be subject to intra as well as interoperator variability, on average such assessment can still evaluate different algorithms effectively, especially in a pilot study. In our followup study, we will invite more radiologists to rate the results, and then quantify interoperator variability in a taskspecific fashion, and also study intraoperator variability.
V Conclusion
In conclusion, we have presented a 3D CNNbased method for LDCT noise reduction. As a followup to our previous work [37], a 3D convolutional neural network is utilized to improve the image quality in the 3D contextual setting. In addition, we have highlighted that the purpose of loss functions is to preserve highresolution and critical features for diagnosis. Different from the stateoftheart LDCT denoising method used in [36], an efficient structurallysensitive loss has been included to capture informative structural features. Moreover, we have employed the Wasserstein distance to stabilize the training process for GAN. We have performed the quantitative and qualitative comparison of the image quality. The assessments have demonstrated that SMGAN3D can produce results with higherlevel image quality for clinical usage compared with the existing denoising networks [34, 35, 36, 37].
In the future, we will extend our model to other medical imaging modalities in a taskspecific manner. Moreover, we plan to incorporate more advanced denoising models such as the networks mentioned in [60, 61, 62] for LDCT reconstruction. Finally, we are also interested in making our denoising software robust over different scanners.
Appendix A Different training sets for SMGAN3D training
We randomly splitted the Mayo dataset [53] into four different training sets,each with 5,000 image patches of size pixels. Then, different training sets were used to validate the generalizability of our proposed 3D SMGAN model. The results are presented in Fig. 100 and Table IV.
Figs. (a)a  (d)d  Figs. (e)e  (h)h  Figs. (i)i  (l)l  

PSNR  SSIM  RMSE  PSNR  SSIM  RMSE  PSNR  SSIM  RMSE  
Case1  26.678  0.811  0.0463  25.842  0.776  0.0510  26.538  0.812  0.0472 
Case2  26.759  0.814  0.0459  25.848  0.781  0.0510  26.544  0.814  0.0470 
Case3  26.589  0.807  0.0468  25.701  0.772  0.0519  26.455  0.806  0.0475 
Case4  26.903  0.815  0.0452  25.914  0.782  0.0506  26.662  0.816  0.0464 
Appendix B Summary of notations
Notation  Meaning 

NDCT  Normal dose CT 
LDCT  Low dose CT 
SSL  Structurally sensitive loss, integrating the structural loss and the loss as defined in Eq. 10 
SSIM  Structural similarity index (SSIM) [49] 
MSSSIM  Multiscale structural similarity index (MSSSIM) [51] 
SLnet (CNNSL)  8layer CNN with only structural similarity loss 
MSLnet(CNNMSL)  8layer CNN with only multiscale structural similarity loss 
WGAN  Wasserstein Generative Adversarial Networks with loss 
BM3D  Blockmatching and 3D filtering 
REDCNN  Residual encoderdecoder CNN with only loss 
WGANVGG  Wasserstein generative adversarial network with perceptual loss 
SMGAN2D  2D Wasserstein generative adversarial network with SSL loss 
SMGAN3D  3D Wasserstein generative adversarial network with SSL loss 
Acknowledgment
The authors would like to thank NVIDIA Corporation for the donation of Titan Xp GPU, which has been utilized for this study. The authors are grateful for helpful discussion with Dr. Mats Persson (Stanford University). This work was supported in part by the National Natural Science Foundation of China under Grant 61671312 and Science and Technology Project of Sichuan Province of China under Grant 2018HH0070, and in part by the National Institutes of Health under Grants R21 EB019074, R01 EB016977, and U01 EB017140.
References
 [1] D. J. Brenner and E. J. Hall, “Computed tomography — an increasing source of radiation exposure,” New Eng. J. Med., vol. 357, no. 22, pp. 2277–2284, 2007.
 [2] A. B. de González, M. Mahesh, K.P. Kim, M. Bhargavan, R. Lewis, F. Mettler, and C. Land, “Projected cancer risks from computed tomographic scans performed in the united states in 2007,” Arch. Intern. Med., vol. 169, no. 22, pp. 2071–2077, 2009.
 [3] D. A. Schauer and O. W. Linton, “National council on radiation protection and measurements report shows substantial medical exposure increase,” pp. 293–296, 2009.

[4]
J. Wang, H. Lu, T. Li, and Z. Liang, “Sinogram noise reduction for lowdose CT by statisticsbased nonlinear filters,” in
Proc. of SPIE Vol, vol. 5747, 2005, p. 2059.  [5] J. Wang, T. Li, H. Lu, and Z. Liang, “Penalized weighted leastsquares approach to sinogram noise reduction and image reconstruction for lowdose Xray computed tomography,” IEEE Trans. Med. Imaging, vol. 25, no. 10, pp. 1272–1283, 2006.
 [6] M. Balda, J. Hornegger, and B. Heismann, “Ray contribution masks for structure adaptive sinogram filtering,” IEEE Trans. Med. Imaging, vol. 31, no. 6, pp. 1228–1239, 2012.
 [7] G.Z. Yang, P. Burger, D. N. Firmin, and S. Underwood, “Structure adaptive anisotropic image filtering,” Proc. IEEE Int. Conf. Image Process. Applicat., vol. 14, no. 2, pp. 135–145, 1996.
 [8] J. Liu, J. Ma, Y. Zhang, Y. Chen, J. Yang, H. Shu, L. Luo, G. Coatrieux, W. Yang, Q. Feng et al., “Discriminative feature representation to improve projection data inconsistency for low dose ct imaging,” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2499–2509, 2017.
 [9] Y. Chen, J. Ma, Q. Feng, L. Luo, P. Shi, and W. Chen, “Nonlocal prior bayesian tomographic reconstruction,” Journal of Mathematical Imaging and Vision, vol. 30, no. 2, pp. 133–146, 2008.
 [10] A. Manduca, L. Yu, J. D. Trzasko, N. Khaylova, J. M. Kofler, C. M. McCollough, and J. G. Fletcher, “Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT,” Med. Phys., vol. 36, no. 11, pp. 4911–4919, 2009.
 [11] E. Y. Sidky, Y. Duchin, X. Pan, and C. Ullberg, “A constrained, totalvariation minimization algorithm for lowintensity xray ct,” Med. Phys., vol. 38, no. S1, 2011.
 [12] B. De Man and S. Basu, “Distancedriven projection and backprojection in three dimensions,” Phys. Med. Biol., vol. 49, no. 11, p. 2463, 2004.
 [13] B. R. Whiting, P. Massoumzadeh, O. A. Earl, J. A. O’Sullivan, D. L. Snyder, and J. F. Williamson, “Properties of preprocessed sinogram data in xray computed tomography,” Med. Phys., vol. 33, no. 9, pp. 3290–3303, 2006.
 [14] I. A. Elbakri and J. A. Fessler, “Statistical image reconstruction for polyenergetic Xray computed tomography,” IEEE Trans. Med. Imaging, vol. 21, no. 2, pp. 89–99, 2002.
 [15] Z. Tian, X. Jia, K. Yuan, T. Pan, and S. B. Jiang, “Lowdose CT reconstruction via edgepreserving total variation regularization,” Phys. Med. Biol., vol. 56, no. 18, p. 5949, 2011.
 [16] Y. Liu, J. Ma, Y. Fan, and Z. Liang, “Adaptiveweighted total variation minimization for sparse data toward lowdose xray computed tomography image reconstruction,” Phys. Med. Biol., vol. 57, no. 23, p. 7923, 2012.
 [17] Q. Xu, H. Yu, X. Mou, L. Zhang, J. Hsieh, and G. Wang, “Lowdose Xray CT reconstruction via dictionary learning,” IEEE Trans. Med. Imaging, vol. 31, no. 9, pp. 1682–1697, 2012.

[18]
Y. Zhang, X. Mou, G. Wang, and H. Yu, “Tensorbased dictionary learning for spectral CT reconstruction,”
IEEE Trans. Med. Imaging, vol. 36, no. 1, pp. 142–154, 2017.  [19] E. Y. Sidky and X. Pan, “Image reconstruction in circular conebeam computed tomography by constrained, totalvariation minimization,” Phys. Med. Biol., vol. 53, no. 17, p. 4777, 2008.
 [20] Y. Chen, X. Yin, L. Shi, H. Shu, L. Luo, J.L. Coatrieux, and C. Toumoulin, “Improving abdomen tumor lowdose CT images using a fast dictionary learning based processing,” Phys. Med. Biol., vol. 58, no. 16, p. 5803, 2013.
 [21] J. Ma, J. Huang, Q. Feng, H. Zhang, H. Lu, Z. Liang, and W. Chen, “Lowdose computed tomography image restoration using previous normaldose scan,” Med. Phys., vol. 38, no. 10, pp. 5713–5731, 2011.
 [22] Z. Li, L. Yu, J. D. Trzasko, D. S. Lake, D. J. Blezek, J. G. Fletcher, C. H. McCollough, and A. Manduca, “Adaptive nonlocal means filtering based on local noise level for CT denoising,” Med. Phys., vol. 41, no. 1, 2014.
 [23] A. Buades, B. Coll, and J.M. Morel, “A review of image denoising algorithms, with a new one,” Multiscale Model. Simul., vol. 4, no. 2, pp. 490–530, 2005.
 [24] A. Cheddad, C. Svensson, J. Sharpe, F. Georgsson, and U. Ahlgren, “Image processing assisted algorithms for optical projection tomography,” IEEE Trans. Med. Imaging, vol. 31, no. 1, pp. 1–15, 2012.
 [25] P. F. Feruglio, C. Vinegoni, J. Gros, A. Sbarbati, and R. Weissleder, “Block matching 3D random noise filtering for absorption optical projection tomography,” Phys. Med. Biol., vol. 55, no. 18, p. 5401, 2010.
 [26] Y. Chen, L. Shi, Q. Feng, J. Yang, H. Shu, L. Luo, J.L. Coatrieux, and W. Chen, “Artifact suppressed dictionary learning for lowdose ct image processing,” IEEE Trans. Med. Imaging, vol. 33, no. 12, pp. 2271–2292, 2014.
 [27] J. Liu, Y. Hu, J. Yang, Y. Chen, H. Shu, L. Luo, Q. Feng, Z. Gui, and G. Coatrieux, “3d feature constrained reconstruction for low dose ct imaging,” IEEE Trans. on Circuits Syst. Video Technol., 2016.
 [28] G. Wang, “A perspective on deep imaging,” IEEE Access, vol. 4, pp. 8914–8924, 2016.

[29]
G. Wang, M. Kalra, and C. G. Orton, “Machine learning will transform radiology significantly within the next 5 years,”
Med. Phys., vol. 44, no. 6, pp. 2041–2044, 2017.  [30] W. Zhang, R. Li, H. Deng, L. Wang, W. Lin, S. Ji, and D. Shen, “Deep convolutional neural networks for multimodality isointense infant brain image segmentation,” NeuroImage, vol. 108, pp. 214–224, 2015.

[31]
S. Wang, M. Kim, G. Wu, and D. Shen, “Scalable high performance image registration framework by unsupervised deep feature representations learning,” in
Deep Learning for Medical Image Analysis. Elsevier, 2017, pp. 245–269.  [32] X. Cao, J. Yang, Y. Gao, Q. Wang, and D. Shen, “Regionadaptive deformable registration of ct/mri pelvic images via learningbased image synthesis,” IEEE Trans. Image Process., 2018.
 [33] L. Cattell, G. Platsch, R. Pfeiffer, J. Declerck, J. A. Schnabel, C. Hutton, A. D. N. Initiative et al., “Classification of amyloid status using machine learning with histograms of oriented 3d gradients,” NeuroImage: Clinical, vol. 12, pp. 990–1003, 2016.
 [34] H. Chen, Y. Zhang, W. Zhang, P. Liao, K. Li, J. Zhou, and G. Wang, “Lowdose CT via convolutional neural network,” Biomed. Opt. Express, vol. 8, no. 2, pp. 679–694, 2017.
 [35] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Lowdose CT with a residual encoderdecoder convolutional neural network,” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2524–2535, 2017.
 [36] J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Išgum, “Generative adversarial networks for noise reduction in lowdose CT,” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2536–2545, 2017.
 [37] Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, and G. Wang, “Low dose CT image denoising using a generative adversarial network with wasserstein distance and perceptual loss,” arXiv preprint arXiv:1708.00961, 2017.
 [38] E. Kang, J. Min, and J. C. Ye, “A deep convolutional neural network using directional wavelets for lowdose xray ct reconstruction,” arXiv preprint arXiv:1610.09736, 2016.
 [39] H. Shan, Y. Zhang, Q. Yang, U. Kruger, W. Cong, and G. Wang, “3D convolutional encoderdecoder network for lowdose CT via transfer learning from a 2D trained network,” arXiv preprint arXiv:1802.05656, 2018.
 [40] W. Yang, H. Zhang, J. Yang, J. Wu, X. Yin, Y. Chen, H. Shu, L. Luo, G. Coatrieux, Z. Gui et al., “Improving lowdose ct image using residual convolutional network,” IEEE Access, vol. 5, pp. 24 698–24 705, 2017.
 [41] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
 [42] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” arXiv preprint arXiv:1701.07875, 2017.
 [43] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,” IEEE Trans. Comput. Imaging, vol. 3, no. 1, pp. 47–57, 2017.
 [44] L. Fu, T.C. Lee, S. M. Kim, A. M. Alessio, P. E. Kinahan, Z. Chang, K. Sauer, M. K. Kalra, and B. De Man, “Comparison between prelog and postlog statistical models in ultralowdose CT reconstruction,” IEEE Trans. Med. Imaging, vol. 36, no. 3, pp. 707–720, 2017.
 [45] P. S. Calhoun, B. S. Kuszyk, D. G. Heath, J. C. Carley, and E. K. Fishman, “Threedimensional volume rendering of spiral ct data: theory and method,” Radiographics, vol. 19, no. 3, pp. 745–764, 1999.

[46]
V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in
Proc. 27th Int. Conf. Machine Learning, 2010, pp. 807–814.  [47] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
 [48] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Proc. Advances Neural Information Processing Systems Conf., 2017, pp. 5769–5779.
 [49] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
 [50] Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it? a new look at signal fidelity measures,” IEEE Signal Process. Mag., vol. 26, no. 1, pp. 98–117, 2009.
 [51] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. IEEE Asilomar Conf. Signals, Syst., Comput., vol. 2. Ieee, 2003, pp. 1398–1402.
 [52] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
 [53] AAPM, “Low dose ct grand challenge,” 2017. [Online]. Available: http://www.aapm.org/GrandChallenge/LowDoseCT/#
 [54] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with deep neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 341–349.

[55]
C. Dong, C. C. Loy, K. He, and X. Tang, “Image superresolution using deep convolutional networks,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, 2016.  [56] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 [57] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Largescale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
 [58] R. Liu, Y. Luo, and H. Yu, “Gpubased acceleration for interior tomography,” IEEE Access, vol. 2, pp. 757–770, 2014.
 [59] D. Matenine, Y. Goussard, and P. Després, “Gpuaccelerated regularized iterative reconstruction for fewview cone beam ct,” Medical physics, vol. 42, no. 4, pp. 1505–1517, 2015.
 [60] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” CoRR, abs/1703.06211, vol. 1, no. 2, p. 3, 2017.
 [61] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016, pp. 770–778.
 [62] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 3859–3869.
Comments
There are no comments yet.