MANAS: Multi-Scale and Multi-Level Neural Architecture Search for Low-Dose CT Denoising

03/24/2021 ∙ by Zexin Lu, et al. ∙ Sichuan University FUDAN University 0

Lowering the radiation dose in computed tomography (CT) can greatly reduce the potential risk to public health. However, the reconstructed images from the dose-reduced CT or low-dose CT (LDCT) suffer from severe noise, compromising the subsequent diagnosis and analysis. Recently, convolutional neural networks have achieved promising results in removing noise from LDCT images; the network architectures used are either handcrafted or built on top of conventional networks such as ResNet and U-Net. Recent advance on neural network architecture search (NAS) has proved that the network architecture has a dramatic effect on the model performance, which indicates that current network architectures for LDCT may be sub-optimal. Therefore, in this paper, we make the first attempt to apply NAS to LDCT and propose a multi-scale and multi-level NAS for LDCT denoising, termed MANAS. On the one hand, the proposed MANAS fuses features extracted by different scale cells to capture multi-scale image structural details. On the other hand, the proposed MANAS can search a hybrid cell- and network-level structure for better performance. Extensively experimental results on three different dose levels demonstrate that the proposed MANAS can achieve better performance in terms of preserving image structural details than several state-of-the-art methods. In addition, we also validate the effectiveness of the multi-scale and multi-level architecture for LDCT denoising.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nowadays, X-ray computed tomography (CT) has been widely used in medical field. Since the radiation generated by CT scanning may cause irreversible damage to human body, more and more researchers pay attention to low-dose CT (LDCT) under the well-known as low as reasonably achievable or ALARA principle [4]. The most common way to lower the radiation dose is to reduce the X-ray flux by decreasing the operating current. As the radiation dose decreases, the imaging quality is contaminated by severe noise and artifacts, which compromise the subsequent clinical diagnoses. How to achieve a pleasing image quality in LDCT remains active and challenging.

To solve this problem, numerous algorithms have been developed over the last decade. These methods can be generally divided into three categories: 1) sinogram domain filtration, 2) iterative reconstruction, and 3) image post-processing. Sinogram domain filtration directly processes either raw data or log-transformed data with a specific filter and then performs image reconstruction using filtered back projection (FBP). Typical methods include structural filtering [2], bilateral filtering [37], and penalized weighted least-squares [53]. However, these kinds of methods usually suffer from spatial resolution loss once the edges in the sinogram are smoothed. Iterative reconstruction is effective at suppressing the noise and artifacts. As a representative method for LDCT, the model-based iterative reconstruction method repeatedly performs forward- and back-projection and computes the regularizers in image domain [8, 56, 7], as a result, the computational burden of iterative reconstruction methods is tremendous, hampering its wide application in clinical use. In addition, these methods often suffer from the difficulty of accessing raw data since the details of the scanner geometry and data correction are usually not available to users. Significantly different from the above two categories, the image post-processing method is an efficient alternative, which does not rely on raw data and can be easily integrated into the current CT pipeline. Inspired by the idea of spare representation, several classic image denoising algorithms, such as non-local means, dictionary learning, and block-matching 3D (BM3D), which have attained a state-of-the-art performance [14, 47, 22, 5]. However, different from the assumption of traditional natural image denoising, the noise and artifacts in LDCT images do not obey any statistical distribution and cannot be accurately determined, which makes these image denoising methods have certain limitations.

Recently, deep learning (DL) has received increasingly attention in many fields of computer vision, including object detection 

[43, 42, 34], semantic segmentation [44] and image restoration [63, 54, 49]. In the field of LDCT denoising, a growing number of DL-based methods have been proposed [50, 52, 51]

. Since the DL-based post-processing methods are computationally efficient and do not need any assumption on the noise distribution, it has become a hot research topic. Such methods usually use a classic or modified end-to-end network architecture and take the paired data for supervised learning. Typical network architectures include VGG 

[48], AlexNet [26], ResNet [18], DenseNet [19] and U-Net [44]. However, there are no further studies on the design of network architecture and its corresponding influence on denoising performance. All these networks are handcrafted, which are limited by the researcher’s experience and computing resources. Manually designed networks usually face two main problems. First, due to the differences in target datasets, substantial efforts are required to select the appropriate network structure according to the characteristics of the data. Second, it is also difficult to balance the tradeoff between the scale and performance of the network. Compared with the manually-designed methods, neural architecture search (NAS), which does not rely on expert experience and knowledge, has attracted significant interest and achieved competitive performance in the fields of image classification [33, 30, 6] and segmentation [29, 32, 55, 59]

. Current NAS methods can be roughly classified into reinforcement learning 

[1, 64, 65, 66]

, evolutionary algorithms 

[41, 57, 39], or gradient-based methods [30, 33, 59, 55]. The first two kinds are computational expensive and possibly not suitable for pixel-level denoising task. In this paper, inspired by the idea of [33, 32, 59], we develop a nested-like super-net that contains all candidate operations as shown in Fig. 1 to enlarge the search space and apply the continuous relaxation method [33] on both inner cell and outer network levels. Once the searching stage finishes, the Dijkstra algorithm is adopted to find the optimal sub-net with corresponding cells.

Our contributions are summarized as follows:

  • We propose a multi-scale and multi-level NAS, termed MANAS, for low-dose CT denoising. To the best of our knowledge, this is the first attempt to extend NAS to LDCT denoising.

  • We propose a multi-scale fusion cell to leverage the features extracted from different scales in the searching stage.

  • We propose a multi-level super-net integrating both cell- and network-level search to enlarge the search space. It enables the algorithm to easily find a more efficient sub-net from the super-net defined in Fig. 1 while handling data from different dose levels. The training of the proposed MANAS can be done in 2 RTX 8000 GPU days under the continuous relaxation method.

  • Extensive experiments demonstrate that the network architectures searched by our method provide better visual effects and reduce the scale of the network compared with several state-of-the-art models.

The rest of this paper is organized as follows. In Sec. 2, recent advances in DL-based LDCT denoising and NAS are reviewed. Then we elaborate on the specific implementation of the proposed MANAS in Sec. 3. Sec. 4 presents the experimental design and representative results are presented, which is followed by a concluding summary in Sec. 5.

2 Related Works

2.1 Deep Learning based LDCT Image Denoising

Recently, thanks to the rapid development of convolution neural networks (CNNs), the performance of LDCT image denoising algorithms have been significantly improved. As the first work, Chen introduced the famous super-resolution CNN or SRCNN 

[12] into LDCT restoration and then proposed a residual encoder-decoder CNN or RED-CNN for LDCT with encouraging results. Kang  [23] introduced wavelet transform to U-Net [55]

. With the emergence of generative adversarial networks (GANs), Yang  proposed to combine Wasserstein GANs and perceptual loss, termed WGAN-VGG, which can recover the mottle-like texture in CT images 

[60]

. Shan extended this model for 3D LDCT with an efficient training strategy based on transfer learning 

[46], and then proposed a modularized adaptive processing neural network (MAP-NN), which performs an end-to-process mapping with a modularized neural network and allows radiologists in the loop to optimize the denoising depth in a task-specific fashion [45]. Inspired by the successful applications in computer vision, attention mechanism was introduced into LDCT to retrieve pixels with strong relationships across long distance and achieved promising results [28, 21]. In addition, CycleGAN was also used to mitigate the difficulty in acquiring paired training samples [24, 16]. Compared with these handcrafted network architectures, our work does not rely on expert experience and can get better LDCT denoising performance.

Figure 1: The overall framework of the proposed MANAS. (a) The architecture of the super-net and (b) the basic block involving three kinds of cells.

2.2 Network Architecture Search (NAS)

NAS is dedicated to automating the design of neural network architectures, so that this tedious work, which heavily depends on researchers’ practical experience, can be significantly alleviated. Several studies proposed to optimize the network architectures using the basic operations in the evolutionary algorithms [41, 57]. Reinforcement learning methods, e.g., Q-Learning [64, 1] and policy gradients [65, 66]

, aim to train a recurrent neural network as a controller to generate specific architectures. Different from the above two strategies that are computationally intensive and time-consuming in the search stage, differentiable architecture search (DARTS) employs continuous relaxation to perform an efficient search of the cell architecture using gradient descent, which makes it possible to train NAS networks on a single GPU. Based on the fundamental DARTS, many efforts were made for various tasks. For example, in 

[32], Liu proposed the Auto-DeepLab to search in a hierarchical architecture search space for semantic image segmentation. Ghiasi proposed NAS-FPN to learn scalable feature pyramid architecture for object detection [15]. In [62], HiNAS is proposed for natural image denoising. In the field of medical imaging, most works were proposed for image segmentation and classification [59, 55, 61, 35, 13, 17]. Only two works most relevant to our work are dedicated to MRI reconstruction using DARTS [20, 58]. Nevertheless, both methods adopt a simple plain network architecture combined with residual block, which only searches the repeatable cell structure and ignores the impact of the outer network-level structure. Furthermore, as demonstrated by [9, 21], fusing the features extracted from different scales is helpful to recover more details. At the same time, when manually designing the network, people who lack practical experience will cause the designed structure to have a certain degree of redundancy and increase the number of additional network parameters. Despite these shortcomings mentioned above, these works have shown encouraging potential in the field of medical image reconstruction. In this study, motivated by these pioneering works [59, 32, 33], we make the first attempt to integrate NAS for LDCT denoising aided by both inner cell- and outer network-levels search and multi-scale feature fusion. Our work follows the formulation of differentiable NAS methods. Compared with other NAS models, our work is to jointly search the cells and network architecture. We also construct a nested-like super-net to enlarge the search space to make the model capture more features from different scales.

3 Method

3.1 Problem Formulation

Assuming that is an LDCT image of size and is the corresponding normal-dose CT (NDCT) image, the LDCT image restoration is formulated to find a function , which maps an LDCT image to its normal-dose counterpart:

(1)

where represents the widely-used mean squared error (MSE).

Instead of finding an explicit function , we use a convolution neural network to automatically achieve this process in a data-driven fashion. Unlike the traditionally manual design of the neural network, we use the NAS method to find a suitable network. That is, we try to find using an automatic method.

3.2 The Proposed MANAS

Similar to [33, 32], in our MANAS, the gradient-based strategy is adopted to search for the basic cells’ architecture and the sub-net architecture from the nested-liked super-net. Current researches provide limited choices of network architecture for low-level tasks, especially for medical images [20, 62]. Inspired by existing DL-based LDCT image restoration, multi-scale transformation can extract features from different spatial resolutions and is useful to recover more details in multiple scales. Following the idea in [32, 59], a more flexible search space is proposed for LDCT image restoration. A nested-liked super-net is built in Fig. 1(a), which can be divided into three stages: 1) encoding stage, 2) feature transformation stage, and 3) decoding stage. In the encoding stage, the details extracted from different scales are encoded using various candidate operations. In the feature transformation stage, the multi-scale features are extracted and filtered to suppress the noise and artifacts. In the decoding stage, the features processed from the previous stage are decoded to recover the details with the best decoding strategy using the candidate operations. In each stage, a multi-scale fusion block is adopted to make full use of the multi-scale features.

Figure 2: Inner cell construct stage and search stage.

This subsection first introduces how to search basic cells using continuous relaxation, and then shows how to construct a basic block from cells, and finally describes how to form a multi-level super-net.

3.2.1 Basic Cell

We define three different types of cells for multi-scale image feature extraction. Following [33], we employ the continuous relaxation strategy to search for the appropriate cells. Typically, a super-cell is represented by a directed acyclic graph (DAG) with nodes. Fig. 2(a) and (b) show a super-cell containing all candidate operations and one example of the optimized result, respectively. For simplicity, we only show three nodes in Fig. 2. Our target is to map the input to the output through the cell. The output of one cell is represented as follows:

(2)

where denotes the output of -th node and is defined as , . is the set of all the nodes before node . is the set of all the candidate operations from node to node as

(3)

where represents a set of candidate operations and denotes the weight for the corresponding candidate operations.

For simplicity, we define , when which means that first node only receives

as input, and the other nodes receive all the preceding tensors (including

) as input.

According to [33], the softmax function is applied after all possible operations to make the search space continuous:

Acronym Meaning
C@3 convolution
SC@3 separable convolution
SC@5 separable convolution
DC@3 convolution with dilation rate 2
DC@5 convolution with dilation rate 2
Skip Skip connection
None Without any connection and return zero
Table 1: Candidate operations

Based on the recent works on DL-based LDCT image restoration, several typical candidate operations are shown in Table 1.

Each operation starts with a ReLU layer and a

Conv layer is followed to keep the number of the features consistent. Batch normalization is excluded since it does not perform well on the PSNR oriented tasks 

[31].

3.2.2 Multi-Scale Block

To better use the features extracted from different kinds of cells, we construct a multi-scale block. In different scales, the model can capture different image features, which will help the model to recover better image details [3]. But not all features from different scales contribute a lot in one block, so we add architecture weights to select the most contributing cell. As illustrated in Fig. 1(b), the basic block in super-net is defined as:

(4)

where is the output for the cell located in layer and depth . , , and denote the feature expand cell, feature transfer cell and feature compress cell, respectively. means from depth to where and is the depth of the super-net. and are the weight of the candidate operations in the cells and the weight for different cells in the block, respectively. It should be noted that not all the blocks have same cells as shown in the top and the bottom of the network. Meanwhile, the softmax function is applied to normalize as , where is the set of candidate cells in current block.

3.2.3 Multi-Level Super-Net

In our model, as the number of network layers increases by one, the number of features is doubled. In the same layer, as the network depth increases by one, the image size is reduced by half. Otherwise, the image size is doubled. By stacking the basic blocks into a nested-liked network, the multi-scale features extracted for the images can be fully exploited. Each block receives the fused features from the previous blocks in the super-net and then outputs processed features to the subsequent blocks. Aided by this architecture, the super-net can provide a more extensive search space covering different scales—from the network, block, and cell to operations, which makes the NAS algorithm easy to search for an efficient network architecture.

MANAS has the following multi-level feature:

Cell-level. Instead of using three different cell structures for one block, we only create three different kinds of inner cells to construct all the blocks in our model to save computational resources. Once optimized, the candidate operation with the largest is chosen on each edge in the cell.

Network-level. To optimize the super-net and determine the network architecture at the network level, we transform this super-net into a DAG, then we can construct all the paths to connect the input and final output. By calculating the accumulated using the Dijkstra algorithm, top- paths are selected as the best sub-net architecture. In this paper, is set to 5 throughout experiments.

3.3 Loss Function and Optimization

Loss function.

 In our model, mean squared error and perceptual loss are used as the hybrid loss function, which is defined as:

(5)

where the perceptual loss is defined as and is the pretrained VGG-16 network with parameters fixed. is a weighting coefficient, which is empirically set to be in this paper.

Optimization. Similar to other works [32, 59], the continuous relaxation strategy is adopted to optimize the cell and network parameters. It has been proven that this method can optimize these parameters efficiently using a gradient descent algorithm. Here we use the first-order approximation in DARTS and split the training set into train-data and arch-data; the ratio of train-data to arch-data is . The loss function in Eq. (5) is computed on the train-data to optimize the network parameters and on arch-data to optimize the architecture weights; these two processes are optimized alternatively.

4 Experiment

4.1 Dataset

The “2016 NIH-AAPM-Mayo Clinic Low-Dose CT Grand Challenge” [38] dataset is chosen to evaluate the performance of our proposed model. The dataset has 5,936 full-dose CT images from 10 patients. In our experiment, we choose 400 images from 8 patients randomly as the training set. Particularly, 50 images from the training set as the arch-data. 100 images from the remaining two patients as the testing set. The images are resized to . Poisson noise and electronic noise are added into the measured projection data to simulate the LDCT images with different dose levels: , where and

represent the Poisson distribution and Gaussian distribution, respectively,

is the number of photons before the X-ray penetrates the object,

is the variance of electronic noise generated from equipment measurement error, and

represents the noise-free projection. The X-ray intensity of a normal dose is set to  [40]. LDCT images with three different dose levels, 10%, 5% and 2.5%, which corresponds to , and , respectively, are simulated. In all experiments, we set .

4.2 Implementation Details

Figure 3: Image denoising with 10%, 5%, and 2.5% dose data by different methods.
Dose 10% 5% 2.5%
PSNR SSIM PL PSNR SSIM PL PSNR SSIM PL
FBP 38.040.68 0.9080.013 1.450.18 35.250.74 0.8460.021 2.320.23 32.360.77 0.7560.030 3.440.27
BM3D[11] 43.450.55 0.9800.003 0.510.09 42.240.62 0.9770.004 0.500.07 39.860.94 0.9600.009 0.760.12
DnCNN [63] 39.810.76 0.9700.003 0.630.09 39.210.81 0.9630.004 0.840.13 38.620.66 0.9570.006 1.030.16
RED-CNN [9] 44.860.60 0.9860.003 0.200.05 43.400.62 0.9820.003 0.310.06 41.930.64 0.9760.004 0.450.09
MAP-NN [45] 43.400.50 0.9830.003 0.270.05 41.150.48 0.9740.004 0.470.08 38.950.45 0.9590.006 0.950.13
MANAS(ours) 43.220.53 0.9730.003 0.170.03 42.020.62 0.9810.003 0.230.04 40.210.52 0.9550.006 0.420.07
Table 2: Quantitative result (MeanSD) of different methods on the testing sets of three low dose levels. The configuration of MANAS is , , .

In this work, the number of nodes in each basic cell is set to 3. The numbers of layers and depths are set to 12 and 4, respectively. Top-

paths are selected to form the best sub-net architecture. The super-net was trained for 200 epochs with batch size 2. Two different Adam 

[25] optimizers and were used to optimize the network parameters in super-net and the architecture weights in the network searching stage, respectively. For the initial learning rate is set to and the decay to using the cosine annealing strategy [36]. For the initial learning rate is set to and the other configurations are set as default. Existing researches have shown that with the increase of training epochs, the network tends to select skip-connections in the cell search stage, which causes model collapse [10, 30]. To avoid model collapse, early stopping is employed to restrict the number of skip-connections in each cell to two. In order to better train the search stage architecture, we ensure that the network only

optimizes the network parameters in the first 8 epochs. After the output results are relatively stable, the network parameters and architecture weights are optimized simultaneously. After optimizing, we can get the corresponding network architecture for the specific dataset. Our code is based on PyTorch 1.7, performing on Windows 10 with 32GB RAM and a NVIDIA RTX 8000 GPU.

4.3 Comparison with State-of-the-art Methods

In order to verify the performance of the proposed method, we conduct experiments with three different dose levels: 2.5%, 5%, and 10%. Meanwhile, several state-of-the-art LDCT restoration methods such as BM3D [11], DnCNN [63], MAP-NN [45], and RED-CNN [9] are compared. All the models expect RED-CNN are implemented according to the original paper and employ the original loss functions. In RED-CNN, we employ the same loss as MANAS. The quantitative results on the whole testing set are listed in Table 2. It can be found that the networks searched using our method only achieve middle positions in terms of both PSNR and SSIM for all the dose-levels. However, according to current studies [60, 50, 27], both PSNR and SSIM cannot always well judge the image quality. Therefore, we add perceptual loss, termed PL, as a metric to show the performance of image detail preserving. We can see the MANAS has the lowest perceptual loss, which shows our MANAS can preserve more image details. To further illustrate the performance of MANAS and the visual effect of image detail restoration, three slices reconstructed using different methods from 10%, 5%, and 2.5% dose-levels are given in Fig. 3, respectively. It is noticed that as the dose-level reduces, the artifacts and noise become serious and most details are covered. All the methods can suppress the artifacts and noise to a certain degree. In the 3rd row of Fig. 3, there are obvious streak artifacts near the femurs in the results of BM3D. In the results of DnCNN and MAP-NN, the details are blurred, especially in the 1st and 2nd rows of Fig. 3. Some contrast-enhanced vessels in the liver indicated by arrows are smoothed. RED-CNN obtains most close performance to ours, but it still suffers from some perceptible over smoothed effects, which leads to the spatial contrast loss. Overall, the proposed model achieves the best visual effects in both artifact reduction and detail preservation.

4.4 Comparison of Network Parameters

To assess the complexity of the searched networks, the amount of parameters (model scale) and floating-point operations per second (FLOPs) are adopted. The results are listed in Table 3

. DnCNN is a simple plain CNN aided by one residual connection. MAP-NN has a parameter-sharing generator, which is a lightweight model, but its discriminator is of large scale. The searched networks using our method have a smaller model scale and FLOPs than RED-CNN. It is reasonable to say that the performance of the network has an approximate positive relation to the model scale, and it is very hard to have a satisfactory result with a lightweight model for every dose level.

Method Params(M) FLOPs(G)
MAP-NN (G+D) [45] 0.06+269.58 1.96+4.55
DnCNN [63] 0.56 36.51
RED-CNN [9] 1.85 121.2
MANAS Dose 2.5% 2.96 114.01
MANAS Dose 5% 1.66 105.51
MANAS Dose 10% 2.79 98.02
Table 3: The parameters in different network architecture.

4.5 Model Investigation

Architecture Analysis Figs. 45 and 6 show the results of sub-network architectures and corresponding cells for different datasets using our method. By looking into the generated network structure, we have the following observations. First, different types of convolutions are included in the final sub-networks, which demonstrates the powerful ability of our model for operation selection. Although separable convolution is efficient in reducing network scale and dilated convolution excels at enlarging the receptive field, normal convolution and skip connection are still selected. Second, the model collapse, which means the entire cell is full of skip-connection, is well avoided by the early stopping strategy [30, 58, 62]. Third, all the searched architectures do not use the fourth depth to build the network, which implies that for the LDCT restoration, simply increasing the network depth cannot always improve the performance. Last, the network architectures generated for different dose-levels are quite different. The possible reason lies in that the images with different dose-levels are contaminated by artifacts and noise to varying degrees, which may have significant impact on the choice of network architecture.

Figure 4: Network architecture result with , , with 10% dose data and the corresponding basic blocks.
Figure 5: Network architecture result with , , with 5% dose data and the corresponding basic blocks.
Figure 6: Network architecture result with , , with 2.5% dose data and the corresponding basic blocks.
Figure 7: Image denoising with 5% dose data using different layer numbers in super-net.
Figure 8: Image denoising with 5% dose data using different path numbers in super-net.
Layer PSNR SSIM PL
8 42.460.54 0.9740.004 0.240.04
10 44.430.60 0.9840.003 0.220.05
12 44.360.62 0.9840.003 0.220.04
Table 4: Quantitative result (MeanSD) with different numbers of initial layers on the testing set.

Effects of Number of Super-network Layers  To evaluate the impact of the number of super-network layers , we initialize the super-network with 8, 10 and 12 layers respectively. The LDCT images with 5% dose are used as the training and testing sets. The other parameters are fixed as we mentioned before. The statistical quantitative results for the whole testing set are given in Table 4. It can be noticed that when the number of layers is greater than 8, the improvement is not quite limited. The qualitative results of one representative slice are shown in Fig. 7. It can be observed that the results using 10 and 12 layers as initial architectures obtain better visual effects than the one with 8 layers with fewer artifacts, which is coherent with the quantitative scores. The arrows indicate some obvious differences. The result with recovers more details than the one with . Based on this observation, in our model is initialized to 12.

Path PSNR SSIM PL
4 43.240.54 0.9760.004 0.270.05
5 44.360.62 0.9840.003 0.220.04
6 42.880.58 0.9730.005 0.270.05
Table 5: Quantitative result (MeanSD) with different on the testing set.

Effects of the Numbers of Paths  We evaluate the impact of the number of paths to form the final network. Three different numbers of paths are tested in the experiments, including 4, 5, and 6. The other parameters are fixed and . The statistical quantitative results are listed in Table 5. When , the searched network achieves the best scores. One typical thoracic slice processed using networks with different numbers of paths are illustrated in Fig. 8. It is easy to notice that the result reconstructed by the network with can better visualize the structural details indicated by the arrows in Fig. 8, which is also confirmed by the quantitative metrics. Based these results, it is suggested that the optimal number of paths may vary when given a specific dataset or super-net architecture.

5 Conclusion

In this paper, we proposed a multi-scale and multi-level gradient based NAS for low-dose CT denoising. The proposed method searches the network architecture in both cell- and network-levels, which provides an extended search space and is more flexible and efficient than traditional NAS methods. Meanwhile, to leverage the multi-scale features, three different feature fusion cells are introduced. The searched networks are evaluated on the datasets with different dose levels and demonstrate better performance in terms of image structural details than several handcrafted state-of-the-art models, which reflects the robustness and effectiveness of MANAS. In addition, different configurations of MANAS further illustrate the influence of different scale features on image detail restoration.

We acknowledge some limitations of our method. First, since we expand the search space, in spite of DARTS, it is time- and resource-consuming for training. Meanwhile, since we need to train the model twice, one for architecture search and the other for parameter optimization, the computational burden is further aggravated. In this study, it takes two GPU days (RTX 8000) to train the model. The other one lies in that the proposed three different cells have the same structures, which may limit the possible results. In the future, designing a more general search space will be the next step.

References

  • [1] B. Baker, O. Gupta, N. Naik, and R. Raskar (2016) Designing Neural Network Architectures using Reinforcement Learning. arXiv preprint arXiv:1611.02167. Cited by: §1, §2.2.
  • [2] M. Balda, J. Hornegger, and B. Heismann (2012) Ray Contribution Masks for Structure Adaptive Sinogram Filtering. IEEE Transactions on Medical Imaging 31 (6), pp. 1228–1239. Cited by: §1.
  • [3] L. Bao, Z. Yang, S. Wang, D. Bai, and J. Lee (2020) Real Image Denoising Based on Multi-Scale Residual Dense Block and Cascaded U-Net With Block-Connection. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

    ,
    pp. 448–449. Cited by: §3.2.2.
  • [4] D. J. Brenner and E. J. Hall (2007) Computed Tomography—An Increasing Source of Radiation Exposure. New England Journal of Medicine 357 (22), pp. 2277–2284. Cited by: §1.
  • [5] A. Buades, B. Coll, and J. Morel (2005) A non-local algorithm for image denoising. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2, pp. 60–65. Cited by: §1.
  • [6] H. Cai, L. Zhu, and S. Han (2018) ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. arXiv preprint arXiv:1812.00332. Cited by: §1.
  • [7] G. Chen, X. Hong, Q. Ding, Y. Zhang, H. Chen, S. Fu, Y. Zhao, X. Zhang, H. Ji, G. Wang, et al. (2020) AirNet: Fused analytical and iterative reconstruction with deep neural network regularization for sparse-data CT. Medical physics 47 (7), pp. 2916–2930. Cited by: §1.
  • [8] H. Chen, Y. Zhang, Y. Chen, J. Zhang, W. Zhang, H. Sun, Y. Lv, P. Liao, J. Zhou, and G. Wang (2018) LEARN: Learned Experts’ Assessment-Based Reconstruction Network for Sparse-Data CT. IEEE Transactions on Medical Imaging 37 (6), pp. 1333–1347. Cited by: §1.
  • [9] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang (2017) Low-Dose CT With a Residual Encoder-Decoder Convolutional Neural Network. IEEE Transactions on Medical Imaging 36 (12), pp. 2524–2535. Cited by: §2.2, §4.3, Table 2, Table 3.
  • [10] X. Chu, T. Zhou, B. Zhang, and J. Li (2020) Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 465–480. Cited by: §4.2.
  • [11] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian (2007) Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE Transactions on Image Processing 16 (8), pp. 2080–2095. Cited by: §4.3, Table 2.
  • [12] C. Dong, C. C. Loy, K. He, and X. Tang (2014) Learning a Deep Convolutional Network for Image Super-Resolution. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 184–199. Cited by: §2.1.
  • [13] N. Dong, M. Xu, X. Liang, Y. Jiang, W. Dai, and E. Xing (2019) Neural Architecture Search for Adversarial Medical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 828–836. Cited by: §2.2.
  • [14] P. F. Feruglio, C. Vinegoni, J. Gros, A. Sbarbati, and R. Weissleder (2010) Block matching 3D random noise filtering for absorption optical projection tomography. Physics in Medicine & Biology 55 (18), pp. 5401. Cited by: §1.
  • [15] G. Ghiasi, T. Lin, and Q. V. Le (2019) NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7036–7045. Cited by: §2.2.
  • [16] J. Gu and J. C. Ye (2020) AdaIN-Switchable CycleGAN for Efficient Unsupervised Low-Dose CT Denoising. arXiv preprint arXiv:2008.05753. Cited by: §2.1.
  • [17] D. Guo, D. Jin, Z. Zhu, T. Ho, A. P. Harrison, C. Chao, J. Xiao, and L. Lu (2020) Organ at Risk Segmentation for Head and Neck Cancer Using Stratified Learning and Neural Architecture Search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4223–4232. Cited by: §2.2.
  • [18] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep Residual Learning for Image Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §1.
  • [19] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely Connected Convolutional Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4700–4708. Cited by: §1.
  • [20] Q. Huang, Y. Xian, P. Wu, J. Yi, H. Qu, D. Metaxas, et al. (2020) Enhanced MRI Reconstruction Network Using Neural Architecture Search. In

    International Workshop on Machine Learning in Medical Imaging

    ,
    pp. 634–643. Cited by: §2.2, §3.2.
  • [21] Z. Huang, Z. Chen, Q. Zhang, G. Quan, M. Ji, C. Zhang, Y. Yang, X. Liu, D. Liang, H. Zheng, et al. (2020) CaGAN: A Cycle-Consistent Generative Adversarial Network With Attention for Low-Dose CT Imaging. IEEE Transactions on Computational Imaging 6, pp. 1203–1218. Cited by: §2.1, §2.2.
  • [22] D. Kang, P. Slomka, R. Nakazato, J. Woo, D. S. Berman, C. J. Kuo, and D. Dey (2013) Image denoising of low-radiation dose coronary CT angiography by an adaptive block-matching 3D algorithm. In Medical Imaging 2013: Image Processing, Vol. 8669, pp. 86692G. Cited by: §1.
  • [23] E. Kang, W. Chang, J. Yoo, and J. C. Ye (2018) Deep Convolutional Framelet Denosing for Low-Dose CT via Wavelet Residual Network. IEEE Transactions on Medical Imaging 37 (6), pp. 1358–1369. Cited by: §2.1.
  • [24] E. Kang, H. J. Koo, D. H. Yang, J. B. Seo, and J. C. Ye (2019) Cycle-consistent adversarial denoising network for multiphase coronary CT angiography. Medical physics 46 (2), pp. 550–562. Cited by: §2.1.
  • [25] D. P. Kingma and J. Ba (2014) Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.
  • [26] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. Advances in neural information processing systems 25, pp. 1097–1105. Cited by: §1.
  • [27] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. (2017) Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4681–4690. Cited by: §4.3.
  • [28] M. Li, W. Hsu, X. Xie, J. Cong, and W. Gao (2020)

    SACNN: Self-Attention Convolutional Neural Network for Low-Dose CT Denoising With Self-Supervised Perceptual Loss Network

    .
    IEEE Transactions on Medical Imaging 39 (7), pp. 2289–2301. Cited by: §2.1.
  • [29] Y. Li, L. Song, Y. Chen, Z. Li, X. Zhang, X. Wang, and J. Sun (2020) Learning Dynamic Routing for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8553–8562. Cited by: §1.
  • [30] H. Liang, S. Zhang, J. Sun, X. He, W. Huang, K. Zhuang, and Z. Li (2019) DARTS+: Improved Differentiable Architecture Search with Early Stopping. arXiv preprint arXiv:1909.06035. Cited by: §1, §4.2, §4.5.
  • [31] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee (2017) Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144. Cited by: §3.2.1.
  • [32] C. Liu, L. Chen, F. Schroff, H. Adam, W. Hua, A. L. Yuille, and L. Fei-Fei (2019) Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 82–92. Cited by: §1, §2.2, §3.2, §3.3.
  • [33] H. Liu, K. Simonyan, and Y. Yang (2018) DARTS: Differentiable Architecture Search. arXiv preprint arXiv:1806.09055. Cited by: §1, §2.2, §3.2.1, §3.2.1, §3.2.
  • [34] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg (2016) SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 21–37. Cited by: §1.
  • [35] Z. Liu, H. Wang, S. Zhang, G. Wang, and J. Qi (2020) NAS-SCAM: Neural Architecture Search-Based Spatial and Channel Joint Attention Module for Nuclei Semantic Segmentation and Classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 263–272. Cited by: §2.2.
  • [36] I. Loshchilov and F. Hutter (2016)

    SGDR: Stochastic Gradient Descent with Warm Restarts

    .
    arXiv preprint arXiv:1608.03983. Cited by: §4.2.
  • [37] A. Manduca, L. Yu, J. D. Trzasko, N. Khaylova, J. M. Kofler, C. M. McCollough, and J. G. Fletcher (2009) Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT. Medical physics 36 (11), pp. 4911–4919. Cited by: §1.
  • [38] C. McCollough (2016) TU-FG-207A-04: Overview of the Low Dose CT Grand Challenge. Medical physics 43 (6Part35), pp. 3759–3760. Cited by: §4.1.
  • [39] R. Miikkulainen, J. Liang, E. Meyerson, A. Rawal, D. Fink, O. Francon, B. Raju, H. Shahrzad, A. Navruzyan, N. Duffy, et al. (2019) Evolving Deep Neural Networks. In Artificial intelligence in the age of neural networks and brain computing, pp. 293–312. Cited by: §1.
  • [40] S. Niu, Y. Gao, Z. Bian, J. Huang, W. Chen, G. Yu, Z. Liang, and J. Ma (2014) Sparse-view x-ray CT reconstruction via total generalized variation regularization. Physics in Medicine & Biology 59 (12), pp. 2997. Cited by: §4.1.
  • [41] E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin (2017) Large-Scale Evolution of Image Classifiers. In International Conference on Machine Learning, pp. 2902–2911. Cited by: §1, §2.2.
  • [42] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016) You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 779–788. Cited by: §1.
  • [43] S. Ren, K. He, R. Girshick, and J. Sun (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv preprint arXiv:1506.01497. Cited by: §1.
  • [44] O. Ronneberger, P. Fischer, and T. Brox (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §1.
  • [45] H. Shan, A. Padole, F. Homayounieh, U. Kruger, R. D. Khera, C. Nitiwarangkul, M. K. Kalra, and G. Wang (2019) Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose CT image reconstruction. Nature Machine Intelligence 1 (6), pp. 269–276. Cited by: §2.1, §4.3, Table 2, Table 3.
  • [46] H. Shan, Y. Zhang, Q. Yang, U. Kruger, M. K. Kalra, L. Sun, W. Cong, and G. Wang (2018) 3-D Convolutional Encoder-Decoder Network for Low-Dose CT via Transfer Learning From a 2-D Trained Network. IEEE Transactions on Medical Imaging 37 (6), pp. 1522–1534. Cited by: §2.1.
  • [47] K. Sheng, S. Gou, J. Wu, and S. X. Qi (2014) Denoised and texture enhanced MVCT to improve soft tissue conspicuity. Medical physics 41 (10), pp. 101916. Cited by: §1.
  • [48] K. Simonyan and A. Zisserman (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556. Cited by: §1.
  • [49] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2018) Deep Image Prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9446–9454. Cited by: §1.
  • [50] G. Wang, J. C. Ye, K. Mueller, and J. A. Fessler (2018) Image Reconstruction is a New Frontier of Machine Learning. IEEE Transactions on Medical Imaging 37 (6), pp. 1289–1296. Cited by: §1, §4.3.
  • [51] G. Wang, J. C. Ye, and B. De Man (2020) Deep learning for tomographic image reconstruction. Nature Machine Intelligence 2 (12), pp. 737–748. Cited by: §1.
  • [52] G. Wang, Y. Zhang, X. Ye, and X. Mou (2019) Machine Learning for Tomographic Imaging. IOP Publishing. Cited by: §1.
  • [53] J. Wang, T. Li, H. Lu, and Z. Liang (2006) Penalized weighted least-squares approach to sinogram noise reduction and image reconstruction for low-dose X-ray computed tomography. IEEE Transactions on Medical Imaging 25 (10), pp. 1272–1283. Cited by: §1.
  • [54] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy (2018) ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0. Cited by: §1.
  • [55] Y. Weng, T. Zhou, Y. Li, and X. Qiu (2019) NAS-Unet: Neural Architecture Search for Medical Image Segmentation. IEEE Access 7, pp. 44247–44257. Cited by: §1, §2.1, §2.2.
  • [56] W. Xia, W. Wu, S. Niu, F. Liu, J. Zhou, H. Yu, G. Wang, and Y. Zhang (2019) Spectral CT Reconstruction—ASSIST: Aided by Self-Similarity in Image-Spectral Tensors. IEEE Transactions on Computational Imaging 5 (3), pp. 420–436. Cited by: §1.
  • [57] L. Xie and A. Yuille (2017) Genetic CNN. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1379–1388. Cited by: §1, §2.2.
  • [58] J. Yan, S. Chen, Y. Zhang, and X. Li (2020) Neural Architecture Search for compressed sensing Magnetic Resonance image reconstruction. Computerized Medical Imaging and Graphics 85, pp. 101784. Cited by: §2.2, §4.5.
  • [59] X. Yan, W. Jiang, Y. Shi, and C. Zhuo (2020) MS-NAS: Multi-scale Neural Architecture Search for Medical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 388–397. Cited by: §1, §2.2, §3.2, §3.3.
  • [60] Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, Y. Zhang, L. Sun, and G. Wang (2018) Low-Dose CT Image Denoising Using a Generative Adversarial Network With Wasserstein Distance and Perceptual Loss. IEEE Transactions on Medical Imaging 37 (6), pp. 1348–1357. Cited by: §2.1, §4.3.
  • [61] Q. Yu, D. Yang, H. Roth, Y. Bai, Y. Zhang, A. L. Yuille, and D. Xu (2020) C2FNAS: Coarse-to-fine neural architecture search for 3D medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4126–4135. Cited by: §2.2.
  • [62] H. Zhang, Y. Li, H. Chen, and C. Shen (2020) Memory-efficient hierarchical neural architecture search for image denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3657–3666. Cited by: §2.2, §3.2, §4.5.
  • [63] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang (2017) Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Transactions on Image Processing 26 (7), pp. 3142–3155. Cited by: §1, §4.3, Table 2, Table 3.
  • [64] Z. Zhong, J. Yan, W. Wu, J. Shao, and C. Liu (2018) Practical Block-Wise Neural Network Architecture Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2423–2432. Cited by: §1, §2.2.
  • [65] B. Zoph and Q. V. Le (2016) Neural Architecture Search with Reinforcement Learning. arXiv preprint arXiv:1611.01578. Cited by: §1, §2.2.
  • [66] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le (2018) Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8697–8710. Cited by: §1, §2.2.