1 Introduction
Xray computed tomography is widely used for clinical screening, diagnosis, and intervention. However, the radiation dosage associated with CT examinations may potentially induce some cancerous and genetic diseases [1]. As a result, the wellknown ALARA [2] (as low as reasonably achievable) principle is universally accepted in practice, reducing unnecessary radiation exposure during medical CT imaging. One of the commonlyused methods is to lower the Xray flux towards the xray detector array by adjusting the milliampereseconds (mAs) and kVp settings for data acquisition. However, since CT imaging is a quantum integration process, an insufficient amount of photons will introduce excessive statistical noise and significantly deteriorate image quality. Therefore, how to preserve image quality for clinical tasks at minimum radiation dose has been one of the major endeavors in the CT field over the past decade.
Deep learning (DL) has been now applied in almost all medical tomographic imaging areas, inspired by a large amount of image processing results [3, 4]. In particular, several DLbased studies for image noise reduction were performed [5, 6, 7, 8, 9, 10, 11]
. Since CNN models learn highlevel representations in terms of multiple layers of feature abstraction from big training images, it is expected to have a better denoising capability than other classic imagedomain methods. In this paper, we aim to maintain anatomical and pathological information, and at the same time suppress image noise due to low radiation dose. Specially, we develop a new ConvNet architecture for LDCT denoising. In order to progressively capture both local and global anatomical features, here we design cascaded subnetworks to integrate complementary textural information. Moreover, by introducing residual learning at the image reconstruction stage, the network model is made to learn the residuals between a bicubic interpolation image and the corresponding fulldose CT (FDCT) image so that the denoising performance can be boosted. Finally, with parallelized CNNs (Network in Network) local patches within the receptive field are effectively analyzed
[12]. As far as the loss function is concerned, we introduce the
norm instead of distance to disencourage blurring [13].2 Methods
Let a vector
represent a noisy LDCT image of pixels, and a vector its corresponding NDCT image, . A DLbased network model with the multiple processing layers is trained to process LDCT images according to a nonlinear inputoutput mapping, which is equivalent to solve the following optimization problem:(1) 
Our network constitutes two components: the generative model and the discriminative model as shown in Fig. 1. In the feature extraction network, the number of filters are for the Conv layers respectively. Also, in the image reconstruction network, the three channels are cascaded. The reconstruction block , , consist of filters respectively. Because all the outputs from the feature extraction layers were densely connected, and the final outputs after reconstruction is large, therefore we introduce CNN after the reconstruction network to reduce the input dimension and decrease computational complexity. Instead of constructing a highquality image by the network itself, we incorporate residual learning strategy to capture highfrequency features that can help improve the quality of lowdose CT images [14].
The covariance of pixel level features will significantly influence the denoising performance [13]. Indeed, in our experiments the pure CNNbased model tends to produce blurry features. GANs et al. [15] is a promising approach to address the aforementioned limitations, since GAN is a framework for generative modeling of data through minimizing the discrepancy between the prior data distribution of the generated outputs from and the real data distribution . Hence, we force the denoised image to stay on the image manifold by matching the distribution of real images to that of synthesized input images. Even though GAN has been widely applied in image processing, they suffer from model divergence and are unstable to train [16]. To regularize the training process for GAN, we adopt the Earth Moving distance (EM distance), instead of the original JensenShannon (JS) divergence, in the objective function [17]. Thus, the adversarial loss is formulated as:
(2) 
where the first two terms are for the Wasserstein estimation, the third term penalizes the deviation of the gradient norm with respect to the input from one,
is uniformly sampled along straight lines pairs of denoised and real images, and is a regularization parameter.Although and losses are both the meanbased loss function, the effects of these two loss functions differ in terms of denoising. Compared with the loss, the loss neither overpenalize large differences nor tolerate small errors between denoised images and the goldstandard. Thus, the loss alleviates some limitations of the loss. Additionally, the loss shares the same merits that the loss has; e.g, a fast convergence speed.
The loss is formulated as follows:
(3) 
where , , stand for the height, width, and depth of a 3D image patch, respectively, denotes a goldstandard image (NDCT), and represents a denoised image from a LDCT image .
Besides, there are two aspects in the sparse representation step for image denoising, which are the prior information level and the sparsity level. We first introduce the adversarial loss to capture local anatomical information. Then, we use loss to improve the sparsity of our representation, leading to the solution of the following optimization problem.
3 Experimental Results
To evaluate the effectiveness of the proposed method, we compared it with existing stateoftheart denoising methods, including CNNL1 (net) [13] and WGANbased CNN [7]. Note that all the parameters of these selected benchmark methods were set to that suggested in the original papers. For brevity, we denote our Deep CNN with Skip Connection and Network in Network as DCSCN, and the model using a Wasserstein Generative Adversarial Network as DCSWGAN.
The experiment setup is as follows. First, to minimize the generalization error, we adopted leaveoneout crossvalidation to refine the denoising performance. Then, in the training phase, pairs of image patches of size from 7 patients were randomly selected. For validation, pairs of image patches were extracted from other 3 patients and set to the same size. It is worth noting that the size of extracted patches was made large enough to include regions of liver lesions. Next, in addition to preserve the integrity of data, here we scaled the CT Hounsfield Value (HU) to the unit interval [0,1] before the images were fed to the network. Finally, we used three common image quality metrics: peak signaltonoise ratio (PSNR), structural similarity index (SSIM) [18], and rootmeansquare error (RMSE) to evaluate the denoised image quality.
Fig. 2  Fig. 4  

PSNR  SSIM  RMSE  PSNR  SSIM  RMSE  
LDCT  22.818  0.761  0.0723  21.558  0.659  0.0836 
CNNL1  27.791  0.822  0.0408  26.794  0.738  0.0457 
WGAN  25.727  0.801  0.0517  24.655  0.711  0.0585 
DCSCN  28.016  0.883  0.0397  26.943  0.730  0.0530 
DCSWGAN  26.928  0.828  0.0449  25.721  0.808  0.0517 
The visual inspection of our results indicates that the LDCT images in Figs. 2(b) and 4(b) have strong background noises. Furthermore, we find that the net has a great noise suppression capability, but it still has oversmoothing effects on some textural details in the ROIs in Fig. 3(c). The net achieved a high signaltonoise ratio (SNR), but it yielded lower contrast resolution. From ROIs in Fig. 5(c), it is seen that there are still some blocky effects marked by the blue arrow. Figs. 2(d) and 4(d) display the WGANprocessed denoised LDCT images with improving structural identification. However, as shown in Figs. 3(d) and 5(d), the WGAN model also introduced strong image noise. In Figs. 2(e) and 4(e), the proposed DCSCN achieved noise reduction but also suffered from image blurring . As shown in Fig. 2(f) and 4(f), our proposed DCSWGAN network model demonstrates the best performance in noise reduction and feature preservation as compared to all the competing denoisng methods. Figs. 3(f) and 5(f) illustrate that DCSWGAN not only effectively suppressed strong noise but also kept subtle textural information, outperforming other denoising models; see ROIs (in Figs. 3 and 5) and/or zoom in for better visualization.
The PSNRs, SSIMs, and RMSEs are listed in Table 1. For noise reduction, the performance metrics were significantly improved by our proposed method (DCSCN). This demonstrates that using residual learning steak artifacts and image noise can be largely removed, enhancing the image quality. In this pilot study, DCSCN achieved the best performance in terms of PSNR and SSIM, and preserved anatomical features the most faithfully. However, there still exits blurry effects as shown in Figs. 3 and 5. DCSWGAN obtained the second best results in term of SSIM. It is noted that our method DCSWGAN produced visually pleasant results with sharp edges.
4 Conclusion
In this work, we have proposed a CNNbased network with skipconnection and network in network to capture structural information and suppress image noise. First, both local and global features are cascaded through skip connections before passing to the reconstruction network. Then, multichannels are introduced for the reconstruction network with different local receptive fields to optimize the reconstruction performance. Also, the network in network technique is applied to lower the computational complexity. Our results have suggested that the proposed method could be generalized to various medical image denoising problems but further efforts are needed for training, validation, testing, and optimization.
References
 [1] Amy Berrington de González, Mahadevappa Mahesh, KwangPyo Kim, Mythreyi Bhargavan, Rebecca Lewis, Fred Mettler, and Charles Land, “Projected cancer risks from computed tomographic scans performed in the united states in 2007,” Arch. Intern. Med., vol. 169, no. 22, pp. 2071–2077, 2009.
 [2] David J Brenner and Eric J Hall, “Computed tomography — an increasing source of radiation exposure,” New Eng. J. Med., vol. 357, no. 22, pp. 2277–2284, 2007.
 [3] Ge Wang, “A perspective on deep imaging,” IEEE Access, vol. 4, pp. 8914–8924, 2016.

[4]
Ge Wang, Mannudeep Kalra, and Colin G Orton,
“Machine learning will transform radiology significantly within the next 5 years,”
Med. Phys., vol. 44, no. 6, pp. 2041–2044, 2017.  [5] Hu Chen, Yi Zhang, Mannudeep K Kalra, Feng Lin, Yang Chen, Peixi Liao, Jiliu Zhou, and Ge Wang, “Lowdose CT with a residual encoderdecoder convolutional neural network,” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2524–2535, 2017.
 [6] Jelmer M Wolterink, Tim Leiner, Max A Viergever, and Ivana Išgum, “Generative adversarial networks for noise reduction in lowdose CT,” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2536–2545, 2017.
 [7] Qingsong Yang, Pingkun Yan, Yanbo Zhang, Hengyong Yu, Yongyi Shi, Xuanqin Mou, Mannudeep K Kalra, and Ge Wang, “Low dose CT image denoising using a generative adversarial network with wasserstein distance and perceptual loss,” arXiv preprint arXiv:1708.00961, 2017.
 [8] Eunhee Kang, Junhong Min, and Jong Chul Ye, “A deep convolutional neural network using directional wavelets for lowdose xray ct reconstruction,” arXiv preprint arXiv:1610.09736, 2016.
 [9] Chenyu You, Guang Li, Yi Zhang, Xiaoliu Zhang, Shenghong Ju, Zhen Zhao, Zhuiyang Zhang, Wenxiang Cong, Punam K Saha, and Ge Wang, “CT Superresolution GAN Constrained by the Identical, Residual, and Cycle Learning Ensemble (GANCIRCLE),” arXiv preprint arXiv:1808.04256, 2018.
 [10] Qing Lyu, Chenyu You, Hongming Shan, and Ge Wang, “Superresolution MRI through Deep Learning,” arXiv preprint arXiv:1810.06776, 2018.

[11]
Hongming Shan, Yi Zhang, Qingsong Yang, Uwe Kruger, K. Mannudeep Kalra, Ling
Sun, Wenxiang Cong, and Ge Wang,
“3D convolutional encoderdecoder network for lowdose CT via transfer learning from a 2D trained network,”
IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1522–1534, 2018.  [12] Min Lin, Qiang Chen, and Shuicheng Yan, “Network in network,” Int. Conf. Learn. Representations. (ICLR), 2014.
 [13] Chenyu You, Qingsong Yang, Hongming Shan, Lars Gjesteby, Li Guang, Shenghong Ju, Zhuiyang Zhang, Zhen Zhao, Yi Zhang, Wenxiang Cong, and Ge Wang, “Structuresensitive multiscale deep neural network for lowdose CT denoising,” IEEE Access, 2018.

[14]
Haichao Yu, Ding Liu, Honghui Shi, Hanchao Yu, Zhangyang Wang, Xinchao Wang,
Brent Cross, Matthew Bramler, and Thomas S. Huang,
“Computed tomography superresolution using convolutional neural networks,”
in Proc. IEEE Intl. Conf. Image Process., 2017, pp. 3944–3948.  [15] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
 [16] Alec Radford, Luke Metz, and Soumith Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
 [17] Martin Arjovsky, Soumith Chintala, and Léon Bottou, “Wasserstein GAN,” arXiv preprint arXiv:1701.07875, 2017.
 [18] Zhou Wang, Eero P Simoncelli, and Alan C Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. IEEE Asilomar Conf. Signals, Syst., Comput. Ieee, 2003, vol. 2, pp. 1398–1402.
Comments
There are no comments yet.