1 Introduction
Computed tomography (CT) has been widely used for clinical diagnosis. Meanwhile, concerns regarding radiationrelated cancer in CT examination are growing, especially in the repeated CT scans Brenner2016MO . Therefore, decreasing Xray dose to reduce risk to patients is highly desired. However, this would lead to severe noiseinduced artifacts in the filtered backprojection (FBP) reconstructed image without adequate treatments Hsieh1998Adaptive Lu2001Noise Xu2009Electronic .
Many methods have been proposed to enhance the LdCT image quality. These methods can be mainly divided into two strategies. One is to characterize noise distribution or designate somewhat handcrafted prior based on the conventional maximum a posteriori probability (MAP) model
Ma2012Iterative Ouyang2011Effects Wang2009Iterative Xie2017Robust . Although these MAPbased methods can yield highquality LdCT sinograms to some extent, they may have intrinsic limitations: First, the iterative solution process of these methods yields a high computational cost, and can be hundreds of times slower than DL methods in prediction process. Second, these methods process each sinogram separately, and thus are not able to integrate all CT sinogram sources to extract their common latent knowledge underlying desired CT sinograms.The other is deep learning (DL) approach, which learns the mapping from the LdCT images to highdose ones in an endtoend manner Chen2017Low Chen2016Low Kang2017A Kenji2017Neural , and has obtained stateoftheart performance on the task. However, this line of methods needs to precollect a large quantity of lowdose/highdose CT image pairs as the training inputs/outputs of the network, where the labeled LdCT images are usually generated from the highdose images via simulation methods. However, due to the limitations of privacy, collecting costs and domain biases, it is always impractical to attain sufficient training sample pairs as expected. Moreover, the current CT image enhancement methods based on DL do not take good advantage of abundant information in the unlabeled CT dataset.
It thus has become a critical issue to make unsupervised LdCT images, without guidance of the corresponding highdose ones, capable of being sufficiently involved in deep network training. Such an unsupervised deep learning issue actually has been attracting increasing attention throughout machine learning
Lehtinen2018Noise2Noise Lotter2016Deep Yin2018GeoNet , and many other related domains.Against the aforementioned issue, this work presents an unsupervised DL regime for directly involving unlabeled LdCT sinograms, without requirement of their highdose ones, into network training. Specifically, through fully exploring both the structure characteristics underlying a clean CT sinogram and specific noise configuration in a measured LdCT sinogram, we can use a MAP objective to fully represent both of these knowledge contained in LdCT sinograms by elaborately designed regularization and likelihood terms, respectively. Such a MAP model facilitates an effective exploration on the gradient direction along which the input LdCT sinogram should be oriented to the expected clean one, and thus can be readily employed into the network training. Such an unsupervised DL regime can also be easily integrated with supervised CT sinogram pairs to further ameliorate the performance of the method. The basic implementation mechanism of the method is illustrated in Fig. 1.
In summary, this paper mainly makes the following contributions:
(1) Towards the lowdose CT enhancement issue, this work first proposes a feasible unsupervised DL regime, without need of supervised lowdose/highdose CT sinogram pairs as inputs/outputs of the network, while directly being implemented on unsupervised LdCT sinograms. This method facilitates a sufficient utilization of unlabeled LdCT sinograms, and makes the DL strategy capable of being more easily and generally implemented in real scenarios with few highdose data sources.
(2) We further extend our unsupervised DL method to semisupervised version. For supervised data, we construct the objective function in the datadriven manner according to its supervised information. For unsupervised data, we construct the objective function in the modeldriven manner by fully considering its prior structure and noise configuration. Such supervised and unsupervised integration is also inspiring to construct more general semisupervised DL paradigms for other tasks.
(3) We have verified the superiority of the proposed unsupervised/semisupervised DL strategy on real LdCT sinograms, in terms of both computational speed and accuracy, as compared with the traditional methods.
2 Related Work
2.1 Traditional Lowdose CT enhancement approaches
Traditional methods can be mainly categorized into two classes, sinogram statistical iterative methods, which only use the information in sinogram domain, and modelbased iterative reconstruction (MBIR) methods, which combine the information of sinogram domain and CT image domain.
The penalized weighted leastsquare (PWLS) method is the representative work on the first class. One typical PWLS method was proposed by Wang et al. Wang2009Iterative , who modeled accurate noise distribution and imposed a proper regularization to reduce sinogram noise. Moreover, Xie et al. Xie2017Robust
proposed a method taking full use of both the statistical properties of projection data and prior structure knowledge under sinogram domain for CT denoising and reached the stateoftheart in this kind of methods. Comparatively, MBIR methods can offer the potential to reconstruct CT image with better biasvariance performance by using prior information of CT image domain. Some works explored different prior information in recovery model, such as total variation(TV) and its variants
Bouman1993A Tian2011Low Zhu2010Duality , dictionary learning Xu2014Dictionary and nonlocal means Chen2009Bayesian .Though some of these methods show satisfactory effects on certain LdCT images, they can only be implemented on each CT image separately, while cannot get a deterministic prediction function to directly input LdCT images and output expected clean ones. This makes them always very timeconsuming in real scenarios. Besides, such methods can only make use of one CT image to explore its latent groundtruth one, while cannot integrate more CT images to summarize its insightful statistical common knowledge and serve such useful knowledge for further LdCT enhancement. DL techniques thus attract more attention recently by finely ameliorating these issues.
2.2 Deep learning approaches
Currently, the databased DL approaches has achieved inspiring achievements to this issue. For example, Chen et al. Chen2016Low first introduced CNN in CT images denoising task. To extract features more efficiently, he further used a encoderdecoder network instead Chen2017Low . After that, Yang et al. Yang2017CT introduced the perceptual loss in CT enhancement task, which measured the difference between lowdose/highdose CT image pairs in a highlevel feature space to make them look more similar. Further, in order to preserve the detail information as well, Yi et al. Yi2018Sharpness combined the GAN and a lowlevel feature space measurement network named sharpness detection network, to decrease the blur effect.
Though DL methods have an exciting performance and fast processing speed, they seriously rely on the precollected LdCTs and corresponding highdose ones as their groundtruth. Such supervised samples, however, are always very hard to get and need take large costs including human labor and collecting time. Besides, the collection of highdose CT images will always cause great harm to the patients’ health. It is thus highly expected to have an unsupervised DL paradigm, by only inputting LdCTs into network for training without need of their guided highdose ones.
3 Unsupervised/Semisupervised Deep Learning for LdCT Enhancement
3.1 Databased supervised deep learning
Let , be the LdCT sinogram and be the corresponding highdose sinogram, is the mapping :, where is the predicted output by CNN. We thus denote as , where is the parameters of CNN mapping. In the DL network for CT image enhancement, training dataset is a set of inputtarget pairs . A commonly utilized strategy is to minimize the following mean square error (MSE) for network parameter tuning:
(1) 
The supervised DL model can achieve a good tradeoff between image quality and computational efficiency under adequately corrected supervised CT sinogram pairs. However, the number of the supervised sinograms would affect the network’s capability and sometimes the groundtruth sinograms are difficult to obtain. Instead, the unlabeled LdCT sinograms can be relatively easily collected. Therefore, the unlabeled LdCT dataset is expected to be also used in the network training to further improve enhancement performance.
3.2 Modelbased unsupervised deep learning
We first shortly introduce the basic generation process of a CT image. denotes the number of unattenuated photons (Xray fluence), is the number of the attenuated photons arriving at the detector and usually follows the combined PoissonGaussian noise distribution. Specifically, based on Xie2017Robust , represents the atteunated photons with additive electronic noise only, ignoring the quanta fluctuation of Xray interactions theoretically. And denotes the attenuated photons without any noise, which can be considered as the desired projection data. CT sinogram can be generated by after a logarithmic transformation. The final CT image can be got by FBP. The overall generation process is illustrated in Fig. 2.
We then introduce how to express the statistical properties (leading to noises), as well as its expected recovery structures, underlying a LdCT sinogram. This knowledge is then used to achieve a proper gradient direction to feed into the network and make the input LdCT sinogram orient to the expected clean ones as well as remove the unexpected noise.
The projection data are generally mixed with noise and can be expressed as follows:
(2) 
where denotes the electronic noise. For projection , the first term follows Xray photon statistics and the second term leads to electronic noise background Riviere2004Reduction Xu2009Electronic . The electronic noise
can be described as a simple Gaussian distribution
Ma2012Iterative : , where denotes the variance of noise. Based on Ma2012Iterative we can obtain that(3) 
The received quanta
can be well depicted by the compound Poisson distribution
La2006Penalized Xie2017Robust as follows:(4) 
where denotes along the projection path . and denote the logtransformations of and , corresponding to the input LdCT sinogram and ideal output one of the network, respectively, with elements and , where is the total number of measurements in the scan. By combining (3) and (4), the generativemodel of projection data can be obtained as Xie2017Robust :
(5) 
Note that the electronic noise background and the statistical property of photon statistics have been considered in (3) and (4). Besides the above understanding on the statistical properties on LdCT sinogram, constituting the likelihood term in the optimized MAP model, we can also get useful prior knowledge on the ideal recovery, to further compensate the model.
Based on Xie2017Robust , the sinogram data is formed as a manifold approximately constituted by a combination of several flat surface, This flatscombination prior can be introduced to describe the properties of the sinogram, i.e., sparsity in its second order derivative, which can be formulated as:
(6) 
where is a constant parameter, and is the second order difference matrix. This class of distribution can encode the transformedsparsity of data.
By combining (5) and (6), the complete posterior distribution on data can be formulated as follows:
(7) 
The ideal
can be estimated under MAP framework
Fessler2000Statistical La2006Penalized Ma2012Iterative . The network can then be trained under the guidance of the following loss term in an unsupervised DL manner:(8) 
3.3 Semisupervised deep learning
We can then naturally construct semisupervised DL scheme to fully utilize both supervised and unsupervised LdCT data sources, via combining the aforementioned two types of models. The corresponding loss function can be expressed as follows:
(9) 
where and are the sets of unsupervised and supervised sinogram data, respectively. is the tradeoff parameter, which balances the loss functions of supervised and unsupervised learning components. The value of this parameter can be set based on the portions of two parts of data. The more we have the supervised ones, the larger it should be set.
By using such loss setting, the network can be trained both on supervised and purely unsupervised inputs. Note that when , this model will directly degenerate to the unsupervised one.
3.4 Alternative optimization algorithm for solving the model
We can readily employ the alternative optimization algorithm to calculate (9). The optimization procedures can be summarized as follows:
With the other parameters fixed, can be updated by solving , that is:
(10) 
This problem can be further separated for each as:
(11) 
whose solution can be obtained by Algorithm 1.
With the other parameters fixed, can be updated by solving , which is equivalent to the following problem:
(12) 
This corresponds to a standard network training problem, and can be easily calculated by calling any of current deep learning algorithms, like Adam Kingma2014Adam , for network parameter tuning.
4 Experimental results
The performance of the proposed two methods, i.e., unsupervied learning (unsupCNN) and semisupervised learning (semiCNN), is verified in this section. Comparison methods include PWLS
Wang2009Iterative , MAPFC Xie2017Robust , and supervised CNN methods. The PSNR, SSIM Wang2004Image and FSIM Zhang2011FSIM are used for performance evaluation. In addition, the running time of all algorithms are demonstrated for speed comparison. The "2016 Lowdose CT Grand Challenge datasets"^{1}^{1}1https://www.aapm.org/GrandChallenge/LowDoseCT were used in the experiments. The normaldose CT data were acquired with 120 kVp and 200 effective mAs. LdCT data at three dose levels, 20 mAs, 12.5 mAs and 10 mAs, were generated via the simulation method Zeng2015A . The highdose ones at 200 mAs are considered as the groundtruth for comparison in the experiments. More detailed information on the utilized network settings can be referred to in supplementary material.4.1 On the effect of unsupervised DL method
To verify the effectiveness of the unsupervised CNN (unsupCNN) method, supervised CNN (supCNN) method and MAPFC method were conducted for comparison. For unsupervised CNN, we only used 50 LdCT sinograms as training set. For supCNN method, we used 50 lowdose/highdose CT sinogram pairs as training set.
Patient  L096  L096  

dose  PSNR  SSIM  FSIM  PSNR  SSIM  FSIM  Time  
10mAs  FBP  29.2551  0.6157  0.8916  33.2117  0.7662  0.9274   
MAPFC  35.3356  0.8465  0.9461  38.9901  0.9224  0.9681  21.76  
supCNN  36.8963  0.9128  0.9658  39.1643  0.9494  0.9738  0.083  
unsupCNN  35.5475  0.9128  0.9629  40.1458  0.9471  0.9745  0.083  
12.5mAs  FBP  30.4716  0.6799  0.9178  34.5637  0.8111  0.9444   
MAPFC  36.5699  0.8786  0.9602  39.9781  0.9376  0.9762  21.76  
supCNN  37.6384  0.9244  0.9725  40.0542  0.9622  0.9823  0.083  
unsupCNN  36.1274  0.9338  0.9974  40.3582  0.9475  0.9803  0.083  
20mAs  FBP  33.6759  0.7887  0.9521  37.3508  0.8851  0.9682   
MAPFC  38.7972  0.9229  0.9771  41.7376  0.9593  0.9841  21.76  
supCNN  38.8859  0.9406  0.9812  42.1377  0.9593  0.9855  0.083  
unsupCNN  38.3038  0.9439  0.9813  41.3767  0.9593  0.9846  0.083 
Fig. 3 shows the corresponding results processed by the different methods. It can be observed that MAPFC, supCNN and unsupCNN can suppress noises effectively. Because the bone regions contain abundant structures details, two regions indicated by the red boxes are selected to validate image quality improvement. It is seen that the unsupCNN method preserves more details with higher resolution in the magnified ROI than the other competing methods. In addition, Table 1 lists the PSNR, FSIM and SSIM measurements and running time of all competing methods. It is seen that the two CNNbased methods perform better than the MAPFC method in all cases. And the unsupCNN method can obtain similar performance to the supCNN method, while the latter require extra supervised samples for training. These results substantiate that the proposed unsupCNN method can properly extract gradients to guide network training on purely unsupervised LdCT sinograms.
4.2 On the effect of unsupervised DL method for single image
To evaluate the capability of the unsupervised network in extreme cases, only one LdCT sinogram was used for network training. Fig. 4 shows the CT images processed by the FBP, PWLS, MAPFC and unsupCNN methods. Through visual inspection, it is seen that the PWLS and MAPFC can suppress noiseinduced artifacts well at the cost of resolution loss. The proposed unsupCNN method effectively reduces the noiseinduced artifacts and also preserves the resolution successfully.
This experiment illustrates that the proposed unsupCNN can also work well even the training data is limited. This is attributed to that network can also work as an optimizer to minimize the target function. Note that in such scenarios, one superiority of the proposed method is that it can get a explicit prediction network, which can be efficiently utilized for further CT enhancement task.
4.3 On the effect of semisupervised DL method
We used 20 lowdose/highdose CT sinogram pairs (supervised samples) and 50 LdCT sinograms (unsupervised samples) as training data in the semisupervised CNN experiment. For comparison, we used the same LdCT data as training data in the unsupervised CNN network. Fig. 5 shows the LdCT results processed by the FBP, unsupCNN and semiCNN methods. It is evident that both of the proposed CNNbased methods are able to remove noiseinduced artifacts satisfactorily compared to the highdose (200 mAs) one. The semiCNN method performs better than the unsupCNN method in the noiseinduced artifacts in the flat region. The zoomed bone regions (ROIs indicated by the red boxes) suggests that the proposed semiCNN method can reconstruct the fine structures with higher resolution than the unsupCNN method.
Table 2 lists the PSNR, FSIM, and SSIM measurements for the results with the FBP, unsupCNN and semiCNN methods at three noise levels. Both the CNNbased methods outperform the conventional FBP method. The semiCNN method that combinationally uses the supervised and unsupervised CT data sources leads to its better reconstruction quality than the unsupCNN methods.
Patient  L286  L286  

dose  PSNR  SSIM  FSIM  PSNR  SSIM  FSIM  
10mAs  FBP  29.9772  0.6426  0.8857  29.8674  0.6275  0.8757 
unsupCNN  36.6898  0.9053  0.9514  36.7192  0.9048  0.9515  
semiCNN  37.4342  0.9165  0.9604  37.6782  0.9569  0.9592  
12.5mAs  FBP  31.4157  0.7042  0.9137  31.2056  0.6875  0.9002 
unsupCNN  37.0445  0.9191  0.9649  36.9405  0.9224  0.9599  
semiCNN  38.2185  0.9128  0.9681  38.2185  0.9213  0.9637  
20mAs  FBP  34.3839  0.8088  0.9593  34.0601  0.7967  0.9401 
unsupCNN  38.4932  0.9384  0.9699  38.3164  0.9352  0.9378  
semiCNN  39.6660  0.9406  0.9776  39.5755  0.9457  0.9751 
5 Conclusion
In this study, we have proposed a new mechanism on making DL performable on unsupervised training data, and especially realized it for the task of LdCT enhancement. Through sufficiently understanding and formulating the statistics properties embedded in data and prior structures underlying the expected recovery data, we can construct a MAP model, which facilitates an effective gradient direction to guide the unsupervised LdCT transformed to the expected clean one. Such gradients can easily feed into the network for its parameter tuning, and thus the deep learning can be implemented in an unsupervised manner. The proposed method constitutes a general paradigm to realize unsupervised/semisupervised DL for more related tasks, like signal recovery and image reconstruction.
References
 (1) C. Bouman and K. Sauer. A generalized Gaussian image model for edgepreserving MAP estimation, volume 2. IEEE Transactions on Image Processing, 1993.
 (2) D Brenner. Mofg207a02: What do we really know about cancer risks at doses pertinent to ct scans. Medical Physics, 43(6):3714, 2016.

(3)
H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang.
Lowdose ct with a residual encoderdecoder convolutional neural network.
IEEE Transactions on Medical Imaging, 36(12):2524–2535, 2017.  (4) H Chen, Y Zhang, W Zhang, P Liao, K Li, J Zhou, and G Wang. Lowdose ct via deep neural network. Biomedical Optics Express, 8(2):679–694, 2016.
 (5) Y Chen, D Gao, C Nie, L Luo, W Chen, X Yin, and Y Lin. Bayesian statistical reconstruction for lowdose xray computed tomography using an adaptiveweighting nonlocal prior. Computerized Medical Imaging & Graphics, 33(7):495, 2009.
 (6) J Fessler. Statistical image reconstruction methods for transmission tomography. In Handbook of Medical Imaging, Volume 2. Medical Image Processing and Analysis, 2000.
 (7) J Hsieh. Adaptive streak artifact reduction in computed tomography resulting from excessive xray photon noise. Medical Physics, 25(11):2139–2147, 1998.
 (8) E Kang, J Min, and J Ye. A deep convolutional neural network using directional wavelets for lowdose xray ct reconstruction. Medical Physics, 44(10):e360–e375, 2017.
 (9) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2014.
 (10) P. J. La Rivière, J. Bian, and P. A. Vargas. Penalizedlikelihood sinogram restoration for computed tomography. IEEE Transactions on Medical Imaging, 25(8):1022–1036, 2006.
 (11) J Lehtinen, J Munkberg, J Hasselgren, S Laine, T Karras, M Aittala, and T Aila. Noise2noise: Learning image restoration without clean data. arxiv, 2018.
 (12) W Lotter, G Kreiman, and D Cox. Deep predictive coding networks for video prediction and unsupervised learning. 2016.
 (13) H Lu, I Hsiao, X Li, and Z Liang. Noise properties of lowdose ct projections and noise treatment by scale transformations. In IEEE Nuclear Science Symposium Conference Record, pages 1662–1666, 2001.
 (14) J Ma, H Zhang, Y Gao, J Huang, Z Liang, Q Feng, and W Chen. Iterative image reconstruction for cerebral perfusion ct using precontrast scan induced edgepreserving prior. Physics in Medicine & Biology, 57(22):7519–7542, 2012.
 (15) L Ouyang, T Solberg, and J Wang. Effects of the penalty on the penalized weighted leastsquares image reconstruction for lowdose cbct. Physics in Medicine & Biology, 56(17):5535–5552, 2011.
 (16) P Riviere and D Billmire. Reduction of noiseinduced streak artifacts in xray computed tomography through penalizedlikelihood sinogram smoothing. In IEEE Transactions on Medical Imaging, volume 24, pages 105–111, 2005.
 (17) K Suzuki, J Liu, A Zarshenas, T Higaki, W Fukumoto, and K Awai. Neural network convolution (nnc) for converting ultralowdose to virtual highdose ct images. In International Workshop on Machine Learning in Medical Imaging, pages 334–343, 2017.
 (18) Z Tian, X Jia, K Yuan, T Pan, and S Jiang. Lowdose ct reconstruction via edgepreserving total variation regularization. Physics in Medicine & Biology, 56(18):5949–5967, 2011.
 (19) J. Wang, T. Li, and L. Xing. Iterative image reconstruction for cbct using edgepreserving prior. Medical Physics, 36(1):252–260, 2009.
 (20) Z Wang, A Bovik, H Sheikh, and E Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
 (21) Q. Xie, D. Zeng, Q. Zhao, D. Meng, Z. Xu, Z. Liang, and J. Ma. Robust lowdose ct sinogram preprocessing via exploiting noisegenerating mechanism. IEEE Transactions on Medical Imaging, 36(12):2487–2498, 2017.
 (22) J Xu and B Tsui. Electronic noise modeling in statistical iterative reconstruction. IEEE Transactions on Image Processing, 18(6):1228–1238, 2009.
 (23) Q Xu, H Yu, G Wang, and X Mou. LowDose Xray CT Reconstruction via Dictionary Learning, volume 31. 2012.
 (24) Q Yang, P Yan, M Kalra, and G Wang. Ct image denoising with perceptive deep neural networks. arxiv, 2017.
 (25) X Yi and P Babyn. Sharpnessaware lowdose ct denoising using conditional generative adversarial network. Journal of Digital Imaging, pages 1–15, 2018.
 (26) Z Yin and J Shi. Geonet: Unsupervised learning of dense depth, optical flow and camera pose. arxiv, 2018.
 (27) D Zeng, J Huang, Z Bian, S Niu, H Zhang, Q Feng, Z Liang, and J Ma. A simple lowdose xray ct simulation from highdose scan. IEEE Transactions on Nuclear Science, 62(5):2226–2233, 2015.
 (28) L Zhang, L Zhang, X Mou, and D Zhang. Fsim: a feature similarity index for image quality assessment. IEEE Transactions on Image Processing, 20(8):2378–2386, 2011.
 (29) M Zhu, S Wright, and T Chan. Dualitybased algorithms for totalvariationregularized image restoration. Computational Optimization & Applications, 47(3):377–400, 2010.
Comments
There are no comments yet.