Log In Sign Up

3D U-NetR: Low Dose Computed Tomography Reconstruction via Deep Learning and 3 Dimensional Convolutions

by   Doga Gunduzalp, et al.

In this paper, we introduced a novel deep learning based reconstruction technique using the correlations of all 3 dimensions with each other by taking into account the correlation between 2-dimensional low-dose CT images. Sparse or noisy sinograms are back projected to the image domain with FBP operation, then denoising process is applied with a U-Net like 3 dimensional network called 3D U-NetR. Proposed network is trained with synthetic and real chest CT images, and 2D U-Net is also trained with the same dataset to prove the importance of the 3rd dimension. Proposed network shows better quantitative performance on SSIM and PSNR. More importantly, 3D U-NetR captures medically critical visual details that cannot be visualized by 2D network.


page 9

page 11


Deep Learning Based Computed Tomography Whys and Wherefores

This is an article about the Computed Tomography (CT) and how Deep Learn...

Momentum-Net for Low-Dose CT Image Reconstruction

This paper applies the recent fast iterative neural network framework, M...

Can Deep Learning Outperform Modern Commercial CT Image Reconstruction Methods?

Commercial iterative reconstruction techniques on modern CT scanners tar...

TED-net: Convolution-free T2T Vision Transformer-based Encoder-decoder Dilation network for Low-dose CT Denoising

Low dose computed tomography is a mainstream for clinical applications. ...

Probabilistic self-learning framework for Low-dose CT Denoising

Despite the indispensable role of X-ray computed tomography (CT) in diag...

Low-Dose CT via Deep CNN with Skip Connection and Network in Network

A major challenge in computed tomography (CT) is how to minimize patient...

Cross-Vendor CT Image Data Harmonization Using CVH-CT

While remarkable advances have been made in Computed Tomography (CT), mo...

1 Introduction

X-ray Computed Tomography (CT) has played a vital role in medicine since its discovery in the 20th century. Unlike X-Ray Scan, CT images are volumetric images that are obtained from many 2D projections and allow view on soft tissue. CT is widely used in diagnosis of serious illnesses such as cancer, pneumonia and the epidemic virus Covid-19.

CT imaging modality is based on the Radon Transform of X-ray projections. The most traditional technique for reconstruction of image is Filtered Back Projection (FBP) which is based on Inverse Radon Transform [20]

and shows sufficient results on enough signal to noise ratio and projections. However, CT has an inevitable cancer-causing drawback, ionizing radiation. In order to reduce the radiation dose of CT imaging, either the number of projections or the tube current is decreased which results in an ill-posed problem on the reconstruction of image.

The iterative techniques have been suggested to solve ill-posed problems and reconstruct higher quality images [2, 8]. Since iterative methods achieve successful results, they are combined with regularization and regularized iterative methods are proposed. The regularized iterative methods determine a prior knowledge to the problem and perform the image reconstruction. The traditional prior knowledge for CT image reconstruction is total variation (TV) [21] and deep image prior is a trending approach [29, 12]. Also, there are studies that work on sinogram domain to improve the quality with regularized iterative models [25].

In addition, deep learning (DL) models have become a trending solution to inverse imaging along with many optimization problems such as classification [11], segmentation [22, 6] and reconstruction [10, 31, 9, 4, 27, 1, 5]. As in the regularized iterative models, there are deep learning networks that can operate both in the sinogram and image domain [31, 9, 1].

AUTOMAP is a neural network that can achieve mapping in between projection and spatial domains as a data-driven supervised learning task. AUTOMAP is mainly implemented on MRI image reconstruction but it is suggested that it can work on many domain transformations such as CT, PET and ultrasound

[31]. Another model that can work from sinogram to image domain is iRadonMAP. iRadonMap achieves improvements on both sinogram and spatial domain alongside the image transformation between domains by implementing the theoretical inverse Radon Transform as a deep learning model [9]. However, in order to obtain satisfactory results from the networks with fully learned structure, large datasets are needed and in case of insufficient data, they perform worse than FBP and iterative methods [3]. Another network that operates in both projection and spatial domain and gives promising results is the Learned Primal-Dual (PD). Unlike the fully learned networks, Learned PD switches to both sinogram and spatial domains many times during the reconstruction [1].

Recently, the networks operating only in the spatial domain have emerged with the widespread use of autoencoders in medical imaging


. First, an autoencoder maps the input to the hidden representation. Then, it maps this hidden representation back to the reconstruction vector. Denoising autocoders are autocoders that take a stochastic approach


. Residual connections are widely preferred in denoising and reconstruction networks to model the differences between the input image and ground truth. In addition, overfitting can be prevented and a faster learning process obtained with residual connections. Residual encoder-decoder convolutional neural networks (RED-CNN) model has been proposed by combining autocoders and residual connections for low-dose CT reconstruction

[4]. Network models designed for one imaging problem can be used for another. The U-Net is normally created for image segmentation [19], but it is also used for inverse imaging problems [3, 10]. The image size is reduced by half in each layer and the number of feature maps extracted is doubled in U-Net type networks. The FBP Conv-Net model, which enhances the images obtained with FBP has expanded the coverage of deep learning models in medical imaging. An U-Net like network has been chosen and the modeling of the artifacts created in the sparse view FBP by U-Net is provided with the residual connection which connects the input to the output [10].

Artifacts caused by low projection or low tube current have been greatly denoised with 2D networks mentioned above for the low-dose CT problem. However, in some cases, small details in the sinograms will be lost due to the low-dose and it is impossible to reconstruct the missing part from a single sinogram. Since CT modality has 3D images consisting of multiple 2D image slices, the correlations on the third dimension still exist between slices apart from the 2D correlations in slice. For this reason, extracting the feature from the adjacent slices is very effective for capturing and enhancing fine details. Liu et. al. mentioned the importance of third dimension and applied a 1D convolution over 2D convolutions for segmentation of Digital Breast Tomosynthesis and CT [15].

3D CNNs have also become possible with the increasing hardware specs and can detect surface features on images while 2D networks work by detecting 2D edges and correlation in 2D spatial domain. Cicek et al. proposed 3D convolutions for segmentation of CT images [6]. Similar to the RED-CNN network, which works in 2D, Huidong et al. have also taken into account the relationships between 2D CT slices by their network using 3D encoder-decoder structures [27]. In this paper, we proposed a 3D U-Net liike network called as 3D U-NetR for low-dose Computed Tomography image reconstruction to exploit the correlation in all three dimensions using 3D convolutions and surface features. The proposed 3D U-NetR architecture has been tested on both synthetic and real chest CT data. In addition, experimental setups are established for the recovery of low-dose images created with either sparse view or low current.

The organization of the article will be as follows. Section 2 will introduce the problems encountered with low-dose CT imaging and a mathematical approach will be presented. Section 3 will specify the details of the proposed deep learning model and the datasets used. Section 4 will compare the experimental results obtained with traditional, iterative and other deep learning-based models introduced in the literature. Finally, Section 5 will conclude the paper.

2 General Problems in Low Dose Computed Tomography

The CT reconstruction can be expressed in a linear algebra form, as an inverse problem as:


where represents the forward operator. is the vector form of the ground truth CT image and the. is the vector form of the sinogram. In addition, represents the noise in the system [23].

The number of measurements () is reduced to obtain CT images with low-projection. Therefore the forward operator () takes the form of a fat matrix and the sparse CT inverse problem occurs with the formation of a non-invertible forward operator. The projections used in sparse CT problems have high signal to noise ratio (SNR) value, but the low number of projections makes inverse operation an ill posed problem. Another way to reduce the dose is to decrease the signal power by reducing tube current and peak voltage while the number of projections is constant. Any decrease in the signal power, in other word lower SNR value, is mathematically modeled by increasing the magnitude of the in (1) and noisy sinograms are obtained. Despite a sufficient number of observations are obtained, each observation had a low SNR value.

Although the traditional methods use inverse Radon transform to solve the low-dose CT problem, they first perform a filtering process. Because, lower frequency levels are sampled a lot more than higher frequencies when a Radon transform is applied to a CT image. Thus, a low frequency dominant image can be reconstructed after filtering. The FBP method applies filters such as Ramp, Hann, Hamming to the sinogram before doing the inverse Radon transform [17].

Iterative and DL-based solutions obtain the measurement results with (1) and calculate the error between measurement and ground truth CT image. The optimization problem of image to image reconstructors can be defined as:


where and represent the sparse or noisy CT image and the ground truth CT image respectively. In addition, is the parameters of the model and indicates the iterative and DL-based solutions [5]. The function whose parameters minimizes the (2) is considered as the solution.

3 Proposed Method

The success of 2D deep learning based solutions such as FBP-ConvNet [10], RED-CNN [4], Learned PD [1] and iRadonMAP [9] for inverse CT problems is clearly stated in the literature. However, it is impossible to detect and reconstruct in-slice detail losses resulting from sparse or noisy views by a 2D network. On the other hand, these details can be recaptured when the correlations between the slices are taken into account. Therefore, it is possible to optimize a reconstruction based on the 3D surface features rather than 2D edge features. Based on this insight, we propose a deep learning based solution for inverse CT problems called 3D U-NetR which utilizes 3D convolutions and U-Net Architecture. 3D U-NetR operates by mapping initially reconstructed sinograms with FBP to the ground truth volumetric images. Proposed reconstruction process is not limited only for CT images and can be applied for any 3D imaging modality.

Firstly, spatial domain forms of the sparse or noisy sinograms are reconstructed with the inverse operator which can be defined as:


where represents the sparse or noisy sinogram of the image and represents the volumetric low-dose CT image. In addition, is the inverse operator and for our case it is the FBP operator. Low-dose CT images are mapped to ground truth images with minimum error using the 3D U-NetR architecture. This can be expressed as:


where represents the volumetric reconstructed CT image and is the trained neural network which is 3D U-NetR. The working principle of the proposed reconstruction method is given in Figure 1.

Figure 1: Proposed working schema with 3D U-NetR

3.1 Network Architecture

Based on the success of the 2D FBP-ConvNet [10] architecture and the 3D U-Net used for segmentation [6], a U-Net like network is built with 3D CNNs. The Figure 2 describes the 3D U-NetR architecture. The network is a modified U-Net with 4 depths which can be inspected as analysis and synthesis parts. The analysis part of the network contains 2 blocks of 33

3 convolution, batch normalization and leaky ReLU in each layer. Two layers in consecutive depths are connected with 2


2 max pooling with stride 2.

Figure 2: Network Architecture of 3D U-NetR.

Starting from the deepest layer, layers are connected with a trilinear interpolation process with scale factor 2 and followed by 2 blocks of 3


3 convolution, batch normalization and leaky ReLU for synthesis. Before the convolution blocks, channels are concatenated with the feature maps from the skip connections of the corresponding analysis layer. Skip connections are used to solve the vanishing gradients problem and carry the high resolution features. On the other hand, trilinear interpolation is chosen as a simple 3D interpolation method. Finally, all channel outputs are summed by a 1

11 convolution block into 1 channel image and the result is added to the input with shortcut connection.

The 3D U-NetR architecture contains 5,909,459 parameters which is 3 times more than the 2D structure. For this reason, the number of filters started from 16 and continued to double in each layer up to the deepest layer. The number of filters used in the deepest layer thus became 256. The number of filters in each layer starting from the deepest layer to the output decreases by half in the synthesis part. The reason why the number of filters are less than the 2D networks is the memory limitations of 3D networks. The low number of filters may cause the model to be underfit, but this problem has been solved with the deepness of the model which provides higher receptive field and the large dataset.

The skip connections contain 1 block of 111 convolution, batch normalization and leaky ReLU rather than shortcut connection to be able to tune the number of residually connected channels. In addition, a shortcut connection is connected from the input to the output since the main purpose is to reduce the noise in the FBP images. The random noise modeling of the network is provided with the shortcut connection.

4 Experimental Setup

4.1 Dataset Preparation

Two datasets are used for experimentation of the proposed method. Because of the nature of the CT modality and the network architecture, 3D datasets are prepared rather than shuffled 2D CT image slices. Firstly, synthetic data which is a 3D version of the 2D ellipses dataset of Deep Inversion Library (DIV) is prepared [14]. On the other hand, a chest dataset acquired from Mayo Clinic for the AAPM Low Dose CT Grand Challenge is used as the real dataset [16].

4.1.1 Synthetic Data

The 2D ellipses dataset which contains randomly generated ellipses is modified to create random ellipsoids in a 3D space. In the 2D ellipses dataset, the number of ellipses in each image slice is selected from a poisson distribution with an expected value of 40 and limited at 70. For our 3D ellipsoid dataset, the number of ellipsoids in each volume is selected from a poisson distribution with an expected value of 114 and limited at 200. Later, each volume is normalized by setting all the negative values to zero and dividing to the maximum value of the volume. Finally, all the volumes are masked with a cylindrical mask along the slice axis in order to be similar to CT images.

Parallel beams with 60 views and 182 detectors are chosen as projection geometry. A sparse view sinogram of each volumetric image slice is obtained with the forward operator. In addition, additive white Gaussian noise (AWGN) has been applied to the sinograms with 35 dB SNR. The sinograms are reconstructed with a 2D FBP with Hann filter which has 0.8 frequency scaling for each volumetric image slice. 2D FBP is chosen instead of 3D FBP operation to prevent the reconstruction of an extra ellipse due to the artifacts in the 3rd dimension.

220 different volumetric images are generated where each image has 128 slices of 128128 pixel images. Total volumes are separated as 192 training, 8 validation and 20 test volumes.

4.1.2 Real Chest CT Data

The real chest dataset from Mayo Clinic consists of full dose and quarter dose CT image pairs. The quarter dose data are noisy and full-view images as there is no reduction in the number of angles. 2 patients are excluded from the dataset containing 50 patients in total as the number of low-dose images is not equal to full-dose. 11 patients are excluded to decrease the variance of pixel spacing values since CNNs are not zoom resistant and different pixel spacing values have the risk of adversely affecting training. On the other hand, slices are evenly spaced for every patient with 1.5 mm slice thickness.

Originally, selected patients have 334 ± 22 slices but only the middle 256 slices are used to focus on the middle part of the volumetric image where the main medical information exists. As Luschner et. al. have mentioned, the real CT data contain circular reconstructions [13] and the data must be cropped at an square inside this circle to prevent value jumps. Accordingly, we focused the middle 384384 pixels of each slice where there were 512512 pixels in the original one. Consequently, a dataset of 37 patients with mm pixel spacing, 1.5 mm slice thickness and 384384256 voxels is prepared.

Total 37 patients are separated as 31 training, 3 validation and 3 test volumes. In addition, the volumetric images are divided into 18 volumetric patches with size of 128128128 voxels because of memory limitations.

4.2 Training Strategy

The 3D U-NetR architecture is implemented with the PyTorch toolbox

[18]. The Tesla T4 graphic processing unit (GPU) with 16 GB memory and GeForce RTX 2080 Ti with 11 GB memory are used during the training. Tesla T4 GPU is used in the training with Ellipses dataset because of the high memory capacity, and GeForce RTX 2080 Ti is preferred in the training with Real Chest CT dataset due to high processing power. The L1 norm is utilized for optimization since it shows better performance in image denoising problems compared to L2 norm [30]. The error minimization between the reconstructed image and ground truth image can be achieved with different algorithms. The ADAM optimizer is preferred for this work. A batch consists of 128128

128 volumetric images in 3D U-NetR training. Therefore, when the batch size is higher than 4, the memory becomes insufficient for both GPUs and the batch size is selected as 4 for Ellipses dataset and 3 for Real Chest CT dataset for 3D U-NetR training. 0.001 is chosen as the learning rate and the coefficients to be used for finding the mean of the gradients and its square are selected as 0.9 and 0.999 by default. The 3D U-NetR architecture is trained with Ellipses images for 745 epochs, then the Real Chest CT images for 850 epochs. The trainings continued until the loss values no longer decreased. Training of Ellipses and Real Chest CT images took approximately 62 hours and 127.5 hours, respectively.

The 2D U-Net architecture is also trained with the same datasets in order to prove the importance of the 3rd dimension. PyTorch, Tesla T4 and GeForce RTX 2080 Ti GPUs, L1 norm loss function and ADAM optimizer are used as in the 3D U-NetR training. In addition, optimizer parameters such as learning rate, gradient coefficients are kept the same, only batch size is changed. A batch consists of slices which have a 128

128 size and taken from the volumetric images in 2D U-Net training. The batch size is selected as 3 for Ellipses dataset and 2 for Real Chest CT dataset. The 2D U-Net is trained along 763 epoch with Ellipses and 1500 epoch with Real Chest CT. Training of Ellipses and Real Chest CT datasets took approximately 20.5 hours and 50 hours, respectively.

5 Results

Popular metrics such as peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) are evaluated to see the quantitative performance of 3D U-NetR. The root mean squared error (RMSE) represents L2 error and is used when calculating PSNR [28]. The definition of the RMSE is given as follows:


where and represent the vector forms of the reconstructed and ground truth image, respectively. In addition, sub index denotes each pixel. Similarly, PSNR can be defined as:


is the maximum value of image which is 255 for 8 bit images. Even though PSNR is commonly used for image quality assessment, it only calculates pixel-wise cumulative error and does not represent how similar the images are. Therefore, the SSIM is used as a second image quality metric to evaluate the similarity of luminance, contrast and structure [26].


where and represent the average of reconstructed and ground truth images, respectively. and indicate the variance of reconstructed and ground truth images, respectively. In addition, is the covariance of the reconstructed and ground truth image. and constants are calculated based on the dynamic range of image and they are 2.55 and 7.65 correspondingly for 8 bit images.,

5.1 Synthetic Data Reconstruction Results

The performance of 3D U-NetR is examined with synthetic dataset in this section. The reconstructed images with the FBP, 2D U-Net and 3D U-NetR is given in Figure 3. As can be seen from the results, some details lost in FBP images cannot be recovered with 2D U-Net, but with 3D U-NetR. Moreover, it has been observed that the 3D U-NetR reconstructs elliptical edges more smoothly. PSNR and SSIM metric values of test data are given in Table 1 to show quantitative performance of the 3D U-NetR.

Figure 3: Synthetic image reconstruction examples. Ground Truth, FBP, 2D U-Net and 3D U-NetR, respectively, from left to right.
Phantom No SSIM of FBP SSIM of 2D U-Net SSIM of 3D U-NetR
Phantom 1 72.733.12 97.850.85 97.810.71
Phantom 2 69.833.31 98.060.50 98.010.47
Phantom 3 73.872.37 97.441.20 97.481.03
Phantom 4 71.712.48 97.520.96 97.560.80
Phantom 5 72.223.71 97.461.36 97.471.19
Phantom 6 75.063.33 97.520.74 97.500.67
Phantom 7 73.172.81 97.031.29 97.071.06
Phantom 8 74.842.60 97.520.86 97.480.73
Phantom 9 71.363.68 97.201.38 97.221.18
Phantom 10 73.502.63 97.641.20 97.740.98
Phantom 11 71.812.57 96.321.45 96.611.11
Phantom 12 73.243.39 97.041.12 97.170.94
Phantom 13 73.772.66 97.251.06 97.370.81
Phantom 14 72.952.86 97.121.35 97.221.12
Phantom 15 76.303.35 97.800.61 97.690.64
Phantom 16 75.873.37 97.020.75 97.050.62
Phantom 17 73.563.75 97.800.68 97.830.65
Phantom 18 70.593.05 96.891.41 97.111.09
Phantom 19 74.073.42 96.641.11 96.860.86
Phantom 20 73.302.97 97.011.16 97.110.90
Average 73.193.07 97.311.05 97.370.88
Phantom No PSNR of FBP PSNR of 2D U-Net PSNR of 3D U-NetR
Phantom 1 25.071.09 34.021.58 33.991.32
Phantom 2 25.051.38 34.431.52 34.521.42
Phantom 3 26.341.26 34.642.08 34.681.81
Phantom 4 25.491.44 34.212.17 34.291.82
Phantom 5 25.201.76 34.202.75 34.132.71
Phantom 6 26.681.60 34.711.63 34.701.57
Phantom 7 25.761.37 33.702.48 33.712.20
Phantom 8 26.281.25 34.431.64 34.311.34
Phantom 9 24.871.30 33.501.65 33.631.72
Phantom 10 27.100.99 35.402.29 35.532.03
Phantom 11 25.200.87 32.871.81 33.041.51
Phantom 12 25.951.42 33.481.62 33.711.54
Phantom 13 25.741.22 33.991.91 34.061.62
Phantom 14 27.131.23 34.861.83 35.011.63
Phantom 15 26.981.59 35.421.68 35.301.69
Phantom 16 26.471.36 34.031.56 34.071.39
Phantom 17 26.501.48 35.061.56 35.111.54
Phantom 18 25.781.49 34.012.58 34.122.22
Phantom 19 27.191.17 33.731.34 34.041.20
Phantom 20 25.981.21 33.831.74 33.861.40
Average 26.041.32 34.231.87 34.291.68
Table 1: Quantitative results of the FBP, 2D U-Net and 3D U-NetR with synthetic images

5.2 Real Chest Data Reconstruction Results

Forward propagation of the real CT data is done differently from the ellipses due to the image size. Even though medical images are patched for training because of GPU limitations, bigger portions of data can be used as a whole in forward propagation thanks to the higher memory capacity of the CPU and RAM. First thing to note here is the receptive field of network architecture. 3D U-NetR is a deep and complex network that has a receptive field of 140140

140 voxels. In terms of slices, any slice in forward propagation is affected by the adjacent 70 slices in both directions. Therefore, doing reconstruction of each patch separately is highly erroneous. Thus, it is decided to separately process each patient’s first and last 192 slices and then pasting back first and last 128 slices for forward propagation. Hereby, an interval of 64 slices starting from middle is used only as padding for other voxels. Still it is 6 slices less than the ideal value but restrictions on resources enables this as the way to reconstruct with minimum error.

We further investigated the performance of 3D U-NetR, which has been trained with low and high dose real CT images prepared by the Mayo Clinic. The ill-posed problem with synthetic data is easier than real CT images since synthetic data contains images with less resolution. The results for the reconstructed images with the FBP and the forward propagated images with trained 2D U-Net and 3D U-NetR are provided in Figure 4. As can be seen from the figures, 3D U-NetR captures some details that FBP and 2D U-Net cannot reconstruct. Some details lost in the vessels and bone tissue due to noise in low-dose images cannot be obtained with FBP and 2D U-Net. However, 3D U-NetR presents these details with a more similar view to ground truth since it takes into account the correlation between images. The total quantitative results of the 3D U-NetR is displayed in Table 2 in terms of PSNR and SSIM.

Figure 4: Real chest CT image reconstruction examples. Ground Truth, FBP, 2D U-Net and 3D U-NetR, respectively, from left to right.
Patient No SSIM of FBP SSIM of 2D U-Net SSIM of 3D U-NetR
Patient 1 51.178.94 79.045.97 80.845.76
Patient 2 41.516.37 66.085.87 68.286.05
Patient 3 44.548.47 73.076.43 74.506.08
Average 45.747.93 72.736.09 74.545.96
Patient No PSNR of FBP PSNR of 2D U-Net PSNR of 3D U-NetR
Patient 1 22.091.27 29.410.97 32.181.99
Patient 2 18.160.73 22.460.64 24.841.32
Patient 3 20.221.08 26.440.83 27.130.89
Average 20.161.03 26.110.81 28.051.40
Table 2: Quantitative performance of the FBP, 2D U-Net and 3D U-NetR with Real CT images

6 Discussion and Conclusion

In this paper, 3D U-NetR architecture is proposed for CT image reconstruction, inspired by 3D networks previously used for image segmentation. The novel part of 3D U-NetR from other networks that reconstruct CT images in the literature is 3D U-NetR evaluates the image as a 3D data and optimizes filter in all three dimensions. The details lost in a 2-dimensional slice can be recovered by examining the 3rd dimension and this has been proven with the prepared experimental setups.

Two different datasets, including synthetic and real chest CTs, are used to validate the model. 3D U-NetR has a difference of 8 dB and 24 percent in terms of PSNR and SSIM, respectively, to the traditional FBP method, and it contains much less artifacts visually in the synthetic ellipses dataset. When 3D U-NetR is compared with 2D U-Net, the quantitative performance of 3D U-NetR appears to be ahead, but the difference is not enough to prove that 3D U-NetR is better. However, when the images are examined, it is seen that 3D U-NetR reconstructs the edges of the ellipses better and captures some small details, and this proves that 3D U-NetR is better in synthetic data. The success of 3D U-NetR in sparse view has been shown with synthetic dataset since it contains a low number of projections.

3D U-NetR has the best quantitative performance among the networks trained with real chest CT images dataset. In addition, it is seen that some details are lost with 2D U-Net and FBP in real chest CT images, as in synthetic data images. However, some vessels in the lung and some tissues in the bone are better recovered with 3D U-NetR. The ability of 3D U-NetR to capture details in vascular and soft tissue is evidence that low-dose CT can become commercially widespread and 3D networks can be used for denoising. The biggest problem in real chest CT dataset is that the images labeled as ground truth actually contain a certain amount of noise. The noise in ground truth images cause PSNR and SSIM metrics to be lower than synthetic data.

3D U-NetR gives better results because it takes into account the correlation in the 3rd dimension, but it also has some disadvantages. First of all, since the convolution blocks in the network are 3-dimensional, the number of parameters is approximately 3 times that of 2-dimensional networks. The high number of parameters causes the network’s loss curve to settle with more iterations and require more time for training. Different experimental setups such as residual connection, filter number, activation function selection and different configurations of the dataset could not be prepared because the network requires a high time for training.

Secondly, increasing the number of filters might have an exponential impact on performance compared to 2D convolutions, but due to memory limitations, the highest number of filters was already preferred in experimental setups as mentioned in Section 3.1. Finally, there are some disadvantages when adjusting synthetic and real chest CT datasets according to memory limitations. While producing synthetic datasets, higher resolution images and real chest CT datasets without splitting them into patches could not be implemented due to memory limitations.

The proposed method gives better results with both real and synthetic data compared to its 2-dimensional configuration. In addition, 3D U-NetR can be applied to different experimental setups containing CT images and different imaging modalities. These studies will be our future studies.


  • [1] J. Adler and O. Öktem (2018) Learned primal-dual reconstruction. IEEE transactions on medical imaging 37 (6), pp. 1322–1332. Cited by: §1, §1, §3.
  • [2] A. H. Andersen and A. C. Kak (1984) Simultaneous algebraic reconstruction technique (sart): a superior implementation of the art algorithm. Ultrasonic imaging 6 (1), pp. 81–94. Cited by: §1.
  • [3] D. O. Baguer, J. Leuschner, and M. Schmidt (2020) Computed tomography reconstruction using deep image prior and learned reconstruction methods. Inverse Problems 36 (9), pp. 094004. Cited by: §1, §1.
  • [4] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang (2017) Low-dose ct with a residual encoder-decoder convolutional neural network. IEEE transactions on medical imaging 36 (12), pp. 2524–2535. Cited by: §1, §1, §3.
  • [5] H. Chen, Y. Zhang, W. Zhang, P. Liao, K. Li, J. Zhou, and G. Wang (2017) Low-dose ct denoising with convolutional neural network. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Vol. , pp. 143–146. External Links: Document Cited by: §1, §2.
  • [6] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger (2016) 3D u-net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, pp. 424–432. Cited by: §1, §1, §3.1.
  • [7] L. Gondara (2016)

    Medical image denoising using convolutional denoising autoencoders

    In 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Vol. , pp. 241–246. External Links: Document Cited by: §1.
  • [8] R. Gordon, R. Bender, and G. T. Herman (1970) Algebraic reconstruction techniques (art) for three-dimensional electron microscopy and x-ray photography. Journal of theoretical Biology 29 (3), pp. 471–481. Cited by: §1.
  • [9] J. He, Y. Wang, and J. Ma (2020) Radon inversion via deep learning. IEEE Transactions on Medical Imaging 39 (6), pp. 2076–2087. External Links: Document Cited by: §1, §1, §3.
  • [10] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser (2017) Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing 26 (9), pp. 4509–4522. Cited by: §1, §1, §3.1, §3.
  • [11] D. Kumar, A. Wong, and D. A. Clausi (2015)

    Lung nodule classification using deep features in ct images

    In 2015 12th Conference on Computer and Robot Vision, pp. 133–138. Cited by: §1.
  • [12] V. Lempitsky, A. Vedaldi, and D. Ulyanov (2018) Deep image prior. In

    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

    pp. 9446–9454. Cited by: §1.
  • [13] J. Leuschner, M. Schmidt, D. O. Baguer, and P. Maaß (2019) The lodopab-ct dataset: a benchmark dataset for low-dose ct reconstruction methods. arXiv preprint arXiv:1910.01113. Cited by: §4.1.2.
  • [14] J. Leuschner, M. Schmidt, and D. Erzmann (2019) Deep inversion validation library. GitHub. Note: Cited by: §4.1.
  • [15] S. Liu, D. Xu, S. K. Zhou, O. Pauly, S. Grbic, T. Mertelmeier, J. Wicklein, A. Jerebko, W. Cai, and D. Comaniciu (2018) 3d anisotropic hybrid network: transferring convolutional features from 2d images to 3d anisotropic volumes. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 851–858. Cited by: §1.
  • [16] C. McCollough, B. Chen, D. Holmes, X. Duan, Z. Yu, L. Xu, S. Leng, and J. Fletcher (2020) Low dose ct image and projection data [data set]. The Cancer Imaging Archive. Cited by: §4.1.
  • [17] X. Pan, E. Y. Sidky, and M. Vannier (2009-12) Why do commercial CT scanners still employ traditional, filtered back-projection for image reconstruction?. Inverse Problems 25 (12), pp. 123009. External Links: Document, Link Cited by: §2.
  • [18] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024–8035. External Links: Link Cited by: §4.2.
  • [19] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §1.
  • [20] L. A. Shepp and B. F. Logan (1974) The fourier reconstruction of a head section. IEEE Transactions on Nuclear Science 21 (3), pp. 21–43. External Links: Document Cited by: §1.
  • [21] E. Y. Sidky and X. Pan (2008) Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization. Physics in Medicine & Biology 53 (17), pp. 4777. Cited by: §1.
  • [22] B. A. Skourt, A. El Hassani, and A. Majda (2018) Lung ct image segmentation using deep neural networks. Procedia Computer Science 127, pp. 109–113. Cited by: §1.
  • [23] M. O. Unal, M. Ertas, and I. Yildirim (2020) An unsupervised reconstruction method for low-dose ct using deep generative regularization prior. External Links: 2012.06448 Cited by: §2.
  • [24] P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol (2008) Extracting and composing robust features with denoising autoencoders. In

    Proceedings of the 25th international conference on Machine learning

    pp. 1096–1103. Cited by: §1.
  • [25] J. Wang, T. Li, H. Lu, and Z. Liang (2006) Penalized weighted least-squares approach to sinogram noise reduction and image reconstruction for low-dose x-ray computed tomography. IEEE transactions on medical imaging 25 (10), pp. 1272–1283. Cited by: §1.
  • [26] Z. Wang, E. P. Simoncelli, and A. C. Bovik (2003) Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2, pp. 1398–1402. Cited by: §5.
  • [27] H. Xie, H. Shan, and G. Wang (2019) Deep encoder-decoder adversarial reconstruction (dear) network for 3d ct from few-view data. Bioengineering 6 (4), pp. 111. Cited by: §1, §1.
  • [28] W. Yuanji, L. Jianhua, L. Yi, F. Yao, and J. Qinzhong (2003) Image quality evaluation based on image weighted separating block peak signal to noise ratio. In International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003, Vol. 2, pp. 994–997. Cited by: §5.
  • [29] H. Zhang, J. Huang, J. Ma, Z. Bian, Q. Feng, H. Lu, Z. Liang, and W. Chen (2013) Iterative reconstruction for x-ray computed tomography using prior-image induced nonlocal regularization. IEEE Transactions on Biomedical Engineering 61 (9), pp. 2367–2378. Cited by: §1.
  • [30] H. Zhao, O. Gallo, I. Frosio, and J. Kautz (2016) Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging 3 (1), pp. 47–57. Cited by: §4.2.
  • [31] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. Rosen (2018) Image reconstruction by domain-transform manifold learning. Nature 555 (7697), pp. 487–492. Cited by: §1, §1.