X-ray computed tomography is widely used for clinical screening, diagnosis, and intervention. However, the radiation dosage associated with CT examinations may potentially induce some cancerous and genetic diseases . As a result, the well-known ALARA  (as low as reasonably achievable) principle is universally accepted in practice, reducing unnecessary radiation exposure during medical CT imaging. One of the commonly-used methods is to lower the X-ray flux towards the x-ray detector array by adjusting the milliampere-seconds (mAs) and kVp settings for data acquisition. However, since CT imaging is a quantum integration process, an insufficient amount of photons will introduce excessive statistical noise and significantly deteriorate image quality. Therefore, how to preserve image quality for clinical tasks at minimum radiation dose has been one of the major endeavors in the CT field over the past decade.
Deep learning (DL) has been now applied in almost all medical tomographic imaging areas, inspired by a large amount of image processing results [3, 4]. In particular, several DL-based studies for image noise reduction were performed [5, 6, 7, 8, 9, 10, 11]
. Since CNN models learn high-level representations in terms of multiple layers of feature abstraction from big training images, it is expected to have a better denoising capability than other classic image-domain methods. In this paper, we aim to maintain anatomical and pathological information, and at the same time suppress image noise due to low radiation dose. Specially, we develop a new ConvNet architecture for LDCT denoising. In order to progressively capture both local and global anatomical features, here we design cascaded subnetworks to integrate complementary textural information. Moreover, by introducing residual learning at the image reconstruction stage, the network model is made to learn the residuals between a bicubic interpolation image and the corresponding full-dose CT (FDCT) image so that the denoising performance can be boosted. Finally, with parallelized CNNs (Network in Network) local patches within the receptive field are effectively analyzed
. As far as the loss function is concerned, we introduce thenorm instead of distance to disencourage blurring .
Let a vectorrepresent a noisy LDCT image of pixels, and a vector its corresponding NDCT image, . A DL-based network model with the multiple processing layers is trained to process LDCT images according to a non-linear input-output mapping, which is equivalent to solve the following optimization problem:
Our network constitutes two components: the generative model and the discriminative model as shown in Fig. 1. In the feature extraction network, the number of filters are for the Conv layers respectively. Also, in the image reconstruction network, the three channels are cascaded. The reconstruction block , , consist of filters respectively. Because all the outputs from the feature extraction layers were densely connected, and the final outputs after reconstruction is large, therefore we introduce CNN after the reconstruction network to reduce the input dimension and decrease computational complexity. Instead of constructing a high-quality image by the network itself, we incorporate residual learning strategy to capture high-frequency features that can help improve the quality of low-dose CT images .
The covariance of pixel level features will significantly influence the denoising performance . Indeed, in our experiments the pure CNN-based model tends to produce blurry features. GANs et al.  is a promising approach to address the aforementioned limitations, since GAN is a framework for generative modeling of data through minimizing the discrepancy between the prior data distribution of the generated outputs from and the real data distribution . Hence, we force the denoised image to stay on the image manifold by matching the distribution of real images to that of synthesized input images. Even though GAN has been widely applied in image processing, they suffer from model divergence and are unstable to train . To regularize the training process for GAN, we adopt the Earth Moving distance (EM distance), instead of the original Jensen-Shannon (JS) divergence, in the objective function . Thus, the adversarial loss is formulated as:
where the first two terms are for the Wasserstein estimation, the third term penalizes the deviation of the gradient norm with respect to the input from one,is uniformly sampled along straight lines pairs of denoised and real images, and is a regularization parameter.
Although and losses are both the mean-based loss function, the effects of these two loss functions differ in terms of denoising. Compared with the loss, the loss neither over-penalize large differences nor tolerate small errors between denoised images and the gold-standard. Thus, the loss alleviates some limitations of the loss. Additionally, the loss shares the same merits that the loss has; e.g, a fast convergence speed.
The loss is formulated as follows:
where , , stand for the height, width, and depth of a 3D image patch, respectively, denotes a gold-standard image (NDCT), and represents a denoised image from a LDCT image .
Besides, there are two aspects in the sparse representation step for image denoising, which are the prior information level and the sparsity level. We first introduce the adversarial loss to capture local anatomical information. Then, we use loss to improve the sparsity of our representation, leading to the solution of the following optimization problem.
3 Experimental Results
To evaluate the effectiveness of the proposed method, we compared it with existing state-of-the-art denoising methods, including CNN-L1 (-net)  and WGAN-based CNN . Note that all the parameters of these selected benchmark methods were set to that suggested in the original papers. For brevity, we denote our Deep CNN with Skip Connection and Network in Network as DCSCN, and the model using a Wasserstein Generative Adversarial Network as DCSWGAN.
The experiment set-up is as follows. First, to minimize the generalization error, we adopted leave-one-out cross-validation to refine the denoising performance. Then, in the training phase, pairs of image patches of size from 7 patients were randomly selected. For validation, pairs of image patches were extracted from other 3 patients and set to the same size. It is worth noting that the size of extracted patches was made large enough to include regions of liver lesions. Next, in addition to preserve the integrity of data, here we scaled the CT Hounsfield Value (HU) to the unit interval [0,1] before the images were fed to the network. Finally, we used three common image quality metrics: peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) , and root-mean-square error (RMSE) to evaluate the denoised image quality.
|Fig. 2||Fig. 4|
The visual inspection of our results indicates that the LDCT images in Figs. 2(b) and 4(b) have strong background noises. Furthermore, we find that the -net has a great noise suppression capability, but it still has over-smoothing effects on some textural details in the ROIs in Fig. 3(c). The -net achieved a high signal-to-noise ratio (SNR), but it yielded lower contrast resolution. From ROIs in Fig. 5(c), it is seen that there are still some blocky effects marked by the blue arrow. Figs. 2(d) and 4(d) display the WGAN-processed denoised LDCT images with improving structural identification. However, as shown in Figs. 3(d) and 5(d), the WGAN model also introduced strong image noise. In Figs. 2(e) and 4(e), the proposed DCSCN achieved noise reduction but also suffered from image blurring . As shown in Fig. 2(f) and 4(f), our proposed DCSWGAN network model demonstrates the best performance in noise reduction and feature preservation as compared to all the competing denoisng methods. Figs. 3(f) and 5(f) illustrate that DCSWGAN not only effectively suppressed strong noise but also kept subtle textural information, outperforming other denoising models; see ROIs (in Figs. 3 and 5) and/or zoom in for better visualization.
The PSNRs, SSIMs, and RMSEs are listed in Table 1. For noise reduction, the performance metrics were significantly improved by our proposed method (DCSCN). This demonstrates that using residual learning steak artifacts and image noise can be largely removed, enhancing the image quality. In this pilot study, DCSCN achieved the best performance in terms of PSNR and SSIM, and preserved anatomical features the most faithfully. However, there still exits blurry effects as shown in Figs. 3 and 5. DCSWGAN obtained the second best results in term of SSIM. It is noted that our method DCSWGAN produced visually pleasant results with sharp edges.
In this work, we have proposed a CNN-based network with skip-connection and network in network to capture structural information and suppress image noise. First, both local and global features are cascaded through skip connections before passing to the reconstruction network. Then, multi-channels are introduced for the reconstruction network with different local receptive fields to optimize the reconstruction performance. Also, the network in network technique is applied to lower the computational complexity. Our results have suggested that the proposed method could be generalized to various medical image denoising problems but further efforts are needed for training, validation, testing, and optimization.
-  Amy Berrington de González, Mahadevappa Mahesh, Kwang-Pyo Kim, Mythreyi Bhargavan, Rebecca Lewis, Fred Mettler, and Charles Land, “Projected cancer risks from computed tomographic scans performed in the united states in 2007,” Arch. Intern. Med., vol. 169, no. 22, pp. 2071–2077, 2009.
-  David J Brenner and Eric J Hall, “Computed tomography —- an increasing source of radiation exposure,” New Eng. J. Med., vol. 357, no. 22, pp. 2277–2284, 2007.
-  Ge Wang, “A perspective on deep imaging,” IEEE Access, vol. 4, pp. 8914–8924, 2016.
Ge Wang, Mannudeep Kalra, and Colin G Orton,
“Machine learning will transform radiology significantly within the next 5 years,”Med. Phys., vol. 44, no. 6, pp. 2041–2044, 2017.
-  Hu Chen, Yi Zhang, Mannudeep K Kalra, Feng Lin, Yang Chen, Peixi Liao, Jiliu Zhou, and Ge Wang, “Low-dose CT with a residual encoder-decoder convolutional neural network,” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2524–2535, 2017.
-  Jelmer M Wolterink, Tim Leiner, Max A Viergever, and Ivana Išgum, “Generative adversarial networks for noise reduction in low-dose CT,” IEEE Trans. Med. Imaging, vol. 36, no. 12, pp. 2536–2545, 2017.
-  Qingsong Yang, Pingkun Yan, Yanbo Zhang, Hengyong Yu, Yongyi Shi, Xuanqin Mou, Mannudeep K Kalra, and Ge Wang, “Low dose CT image denoising using a generative adversarial network with wasserstein distance and perceptual loss,” arXiv preprint arXiv:1708.00961, 2017.
-  Eunhee Kang, Junhong Min, and Jong Chul Ye, “A deep convolutional neural network using directional wavelets for low-dose x-ray ct reconstruction,” arXiv preprint arXiv:1610.09736, 2016.
-  Chenyu You, Guang Li, Yi Zhang, Xiaoliu Zhang, Shenghong Ju, Zhen Zhao, Zhuiyang Zhang, Wenxiang Cong, Punam K Saha, and Ge Wang, “CT Super-resolution GAN Constrained by the Identical, Residual, and Cycle Learning Ensemble (GAN-CIRCLE),” arXiv preprint arXiv:1808.04256, 2018.
-  Qing Lyu, Chenyu You, Hongming Shan, and Ge Wang, “Super-resolution MRI through Deep Learning,” arXiv preprint arXiv:1810.06776, 2018.
Hongming Shan, Yi Zhang, Qingsong Yang, Uwe Kruger, K. Mannudeep Kalra, Ling
Sun, Wenxiang Cong, and Ge Wang,
“3D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2D trained network,”IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1522–1534, 2018.
-  Min Lin, Qiang Chen, and Shuicheng Yan, “Network in network,” Int. Conf. Learn. Representations. (ICLR), 2014.
-  Chenyu You, Qingsong Yang, Hongming Shan, Lars Gjesteby, Li Guang, Shenghong Ju, Zhuiyang Zhang, Zhen Zhao, Yi Zhang, Wenxiang Cong, and Ge Wang, “Structure-sensitive multi-scale deep neural network for low-dose CT denoising,” IEEE Access, 2018.
Haichao Yu, Ding Liu, Honghui Shi, Hanchao Yu, Zhangyang Wang, Xinchao Wang,
Brent Cross, Matthew Bramler, and Thomas S. Huang,
“Computed tomography super-resolution using convolutional neural networks,”in Proc. IEEE Intl. Conf. Image Process., 2017, pp. 3944–3948.
-  Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
-  Alec Radford, Luke Metz, and Soumith Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
-  Martin Arjovsky, Soumith Chintala, and Léon Bottou, “Wasserstein GAN,” arXiv preprint arXiv:1701.07875, 2017.
-  Zhou Wang, Eero P Simoncelli, and Alan C Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. IEEE Asilomar Conf. Signals, Syst., Comput. Ieee, 2003, vol. 2, pp. 1398–1402.