Neural Architecture Search for Gliomas Segmentation on Multimodal Magnetic Resonance Imaging

05/13/2020 ∙ by Feifan Wang, et al. ∙ Alibaba Cloud 0

Past few years have witnessed the artificial intelligence inspired evolution in various medical fields. The diagnosis and treatment of gliomas – one of the most commonly seen brain tumor with low survival rate – relies heavily on the computer assisted segmentation process undertaken on the magnetic resonance imaging (MRI) scans. Although the encoder-decoder shaped deep learning networks have been the de facto standard style for semantic segmentation tasks in medical imaging analysis, enormous spirit is still required to be spent on designing the detail architecture of the down-sampling and up-sampling blocks. In this work, we propose a neural architecture search (NAS) based solution to brain tumor segmentation tasks on multimodal volumetric MRI scans. Three sets of candidate operations are composed respectively for three kinds of basic building blocks in which each operation is assigned with a specific probabilistic parameter to be learned. Through alternately updating the weights of operations and the other parameters in the network the searching mechanism ends up with two optimal structures for the upward and downward blocks. Moreover, the developed solution also integrates normalization and patching strategies tailored for brain MRI processing. Extensive comparative experiments on the BraTS 2019 dataset demonstrate that the proposed algorithm not only could relieve the pressure of fabricating block architectures but also possesses competitive performances.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The human brain is stable under normal conditions. Nevertheless, this balance would be compromised by the presence of brain tumors, which pathologically are clusters of dysfunctional brain cells [36]

. According to their different origins, brain tumors could be classified into the primary brain tumor and the secondary brain tumor. The first one starts from the brain area, while the second one which is also called the metastatic brain tumor is transfered from the other organs in human body. From another perspective, brain tumors can also be categorized as malignant or benign. The difference is the malignant tumors are cancerous and likely to be spread to the whole brain while the benign ones are not. Gliomas, as one of the most common primary malignant brain tumors, keep attracting researchers’ attentions because they could result in more suffering and lost than any other brain tumors 

[27].

Magnetic resonance imaging (MRI) processing, in particular the semantic segmentation telling the tumor tissues apart from the other parts in the brain volume, plays important roles in the diagnosis and treatment of gliomas [16].One vital task for neurosurgeons before the resection surgery is to annotate the tumor regions as precisely as possible, since an ideal brain tumor segmentation could not only preserve enough healthy tissues but also prevent the subsequent tumor recurrence [15]. During the past few years, scientists from academia and industry have been sparing no effort in exploring computer assisted solutions to give doctors a relief from the laborious and time consuming annotation work.

The prevail of deep learning has pushed the evolution of semantic segmentation methods. The fully convolution networks (FCN) started the trial of taking advantage of the convolution neural network (CNN) to do the dense image prediction 

[20]. Intrigued by the similar idea of FCN, many edge-cutting CNNs found their ways in the segmentation tasks, for example the fully convolutional dense nets (FCDN) [12] and the DeepLab [6]. In 2015, the U-Net was firstly brought out aiming at the drosophila cell tracking task in the IEEE international symposium on biomedical imaging (ISBI) challenge [26]. Soon after that, it quickly became renowned for the demonstrated effectiveness and efficiency on two dimensional (2D) and three dimensional (3D) medical image datasets [8].

For the gliomas segmentation, U-Net is also one of the most frequently chosen architecture styles. For instance, Kamnitsas et al. made an ensemble of FCN, U-Net and DeepMedic [14], which is a variant of DeepLab for brain lesion segmentations, and won the first place award in the multimodal brain tumor segmentation challenge (BraTS) in 2017 [13]. Myronenko, winner of the BraTS 2018, benefited from a combination of the variational auto-encoder (VAE) and the U-Net structure [23]. Isensee et al. proposed the argument that a well trained U-Net would suffice for the segmentation task, with no need to the extra accessories [11]. However, even if that is the case, picking up the right building blocks for the down-sampling and up-sampling routes in a U-Net could still be a hard work, considering the diversity of off-the-shelf operation modules may result in a huge variety of candidate architectures.

As a branch of the automated machine learning (AutoML), NAS specifically focuses on finding out the optimal network structure among the numerous candidate architectures automatically. The blossom of NAS attributes to Zoph and Le who firstly came up with the idea of training a recurrent neural network (RNN) in reinforcement learning manner to decide the arguments of a convolutional module and hyperparameters like the number of filters 

[34]. In NASNet, Zoph et al. proposed the two steps searching strategy that at first it looks for two kinds of ‘Cell’ called basic units based on small scale datasets and then builds up the final solution for larger datasets with piles of these ‘Cell’s [35]

. One major problem with NASNet is the enormous cost, which typically amounts to hundreds of GPUs working together for a few days. How to reduce the consumption on time and memory space and simultaneously maintain a high effectiveness has long been a hot topic in the NAS field. The progressive NAS (PNAS) gains seven times faster speed than NASNet through a smaller searching space, a heuristic searching strategy and an empirical surrogate evaluator 

[18]. The efficient NAS (ENAS) enormously improves the efficiency of NASNet by means of weights sharing among basic modules in the searching space [24]. To further cut the expenditure on memory, researchers turned their eyes on the hypernetwork learning controller. The one-shot NAS [3] and the proxyless NAS [4]

get rid of the reinforcement or evolutionary learning controller, they arrange each operation unit a probability and train them once for all directly on the target dataset rather than a smaller dataset. The differentiable architecture search (DARTS) introduced a mathematical relaxation from discrete searching to continuous searching and replaced the controller learning with a gradient descent updating process 

[19, 31].

The success of NAS in classification tasks also stimulates the endeavors in the semantic segmentation scenarios. Chen et al. demonstrated the feasibility by recursively searching the encoder and decoder blocks [5]. Liu et al. devised the Auto-DeepLab and brought up the idea of hierarchical searching space which covers both the ‘Cell’ level and the backbone network level [17]. When it comes to medical imaging segmentations, the NAS-Unet respectively looks for the optimal basic down-sampling and up-sampling units and constructs the U-Net shaped architectures, which has been tested on 2D medical image datasets including the prostate MRI, liver computed tomography (CT) and nerves ultrasound images [29]. Zhu et al. developed a DARTS-style differentiable NAS U-Net for segmentation of the lung and pancreas on 2D and 3D CT datasets [33].

In this paper, we present the NAS based solution for the multi-labeled glioma volumetric segmentation on four modalities of structural MRI scans. Patching strategies cutting a big image into small pieces have been employed due to the high resolution of the input dataset. The searching process takes place on small sized patches, while the training and test processes are undergone on patches with higher contrast. The basic unit to be searched has three different types, correspondingly there are three kinds of searching spaces. NAS is in charge of finding the best building blocks for the downward and upward sampling in the U-Net. Since the input multimodal MRI data are four dimensional matrices and all the deep learning modules work in their 3D versions, throughout this article we call this proposed solution the NAS-3D-U-Net. The multimodal brain tumor image segmentation (BraTS) benchmark [21, 1, 2] has been employed as the testbed for our developed algorithms.

The contributions of this work boils down to threefold:

  • One NAS based solution for multimodal volumetric MRI gliomas segmentation task has been proposed, which could liberate the network designers from the laborious parameter tuning work in the long run.

  • The empirical searching strategy of learning two categories of parameters alternately on different datasets has been proved effective for the brain tumor segmentation tasks on the BraTS 2019 Dataset.

  • Last but not the least, we bring up a brain-wise normalization and a patching strategy specifically for the brain MRI processing.

The remainder of this paper is organized as follows. Section 2 explores details about the searching and training procedures of the NAS-3D-U-Net. Then in section 3 we exhibit and discuss the brain tumor segmentation results. Finally, there is a conclusion on this work in Section 4.

2 Method

2.1 Prerequisite

For glioma segmentation tasks, all the provided structural MRI pictures are 3D volumes revealing the distribution of hydrogen proton energy in brain tissues. More specifically, at the beginning of the scan, the hydrogen nuclei in human body are spinning in different phases and the axes they rotate on are oriented in the same direction as the magnetic field of the MRI scanner. Then an extra radio frequency (RF) pulse is introduced and the protons are forced to realign to the new orientation and spin in the same phase. When the pulse stops, the protons are coming back to the former state, during which they will emit the absorbed energy. The MRI scanner monitors these emissions and maps them into gray scale images. The speed of proton realignment and phase transition varies for distinct tissues. As a result, by controlling the scanning time intervals we can have different types of tissue to be highlighted. The two tunable variables are the repetition time (TR) which decides the time slot between two RF pulses and the time to echo (TE) which constraints the time span between the RF pulse generation and the emitted signal reception. In ascending order of TR and TE, the three most commonly used MRI modalities are T1-weighted, T2-weighted and the fluid attenuated inversion recovery (FLAIR)

111T1: TR500, TE14; T2: TR4000, TE90; FLAIR: TR9000, TE114.. Constants T1 (the longitudinal relaxation time) and T2 (the transverse relaxation time) reflects the respective time protons need for the realignment and the phase transition. Correspondingly, the contrast and brightness of T1-weighted and T2-weighted images are predominantly determined by the T1 and T2 properties of tissues. FLAIR is pathologically sensitive, it suppresses the free fluid and lightens the pathological tissues. Besides, in this work we also take into account another T1-weighted MRI with gadolinium (T1Gd). Gadolinium works as an injected contrast agent that helps to enhance the tumor areas. Instances of these four modalities mentioned above could be found in Fig. 1 which also illustrates the ground truth tumor images.

Figure 1: An instance of MRI slices in four modalities and the annotated tumor delineations. ‘Sagittal’, ‘Coronal’ and ‘Axial’ are terminologies indicating the three dimensions. Different colors in the last column represent different levels of tumor tissues: red for the necrotic and non-enhancing tumor core, yellow for the enhancing tumor core, and blue for the edema [21]. (Bear in mind the colors are just for illustration purposes but not reflecting the real contrast.)

2.2 Preprocessing

In this work, the preprocessing takes place under the skull, which means we do a z-score normalization and min-max scaling for each volume without taking the black background into account. Mathematically speaking, for each modality, there is a mean value

and standard deviation

of all the non-zero valued voxels in the whole number of training volumes. Given represents an original input MRI image, , and refer to the volume size on three dimensions. The preprocessing is carried out in the following way:

(1)
(2)

in which , , and . and indicate the minimum and maximum values for all the whose corresponding . and are two constants used for discriminating the normalized brain voxels from the background. is the preprocessed image.

2.3 Patching

Patching strategies would cut a large input image into smaller ones and stitch the corresponding small sized output volumes together. Through this way, it is theoretically possible for a memory limited GPU to deal with pictures in any size. In this work, a patching strategy which we call ‘auto-fitting’ is deployed. The idea of auto-fitting is to cover the brain encapsulated space with the least number of volumetric patches which are symmetrically located. For each MRI image, we define a property named ‘brain cube’ to record the three dimensional scales of the brain area. On each axis, given and respectively represent the lengths of the brain cube and the patch, with assumption of . Then the number of patches on that axis would be and the length of the overlap would be . Suppose the brain cube starts at 0 on that axis, then the starting point of the first patch would be and the moving step of the patches would be . Fig. 2 illustrates an example of the patching arrangement on a sagittal slice.

Figure 2: An example of the auto-fitting patching strategy deployment. This is a sagittal slice of a T1-weighted MRI volume in size of . The shape of each patch is . The white rectangle indicates the brain cube and the blue dashed line blocks represent the patches. There are nine patches overlapping with each other.

2.4 Neural Architecture Search

2.4.1 Backbone Network:

The essence of U-Net is the U-shaped architecture composed of mutually connected down sampling and up sampling blocks. In NAS-3D-U-Net, the macro structure still consists of one downward route and one upward route. Nonetheless, when it comes to the micro structure of each individual block on these routes, the NAS would be in charge of the organization management. Following the convention of NAS, the generated building blocks for the outside network are called ‘Cell’s. Throughout this work, the down-sampling and up-sampling blocks are named downward Cell (DC) and upward Cell (UC) respectively. The DCs take responsibility for the feature embedding, which compresses the resolution and extracts the target sensitive information. The UCs would mix the embedded features with former reserved DC outputs which stores the inevitably lost positional information during the down-sampling. The constitution of NAS-3D-U-Net has been depicted in Fig. 3 which takes the BraTS dataset for instance. Ahead of DCs there are two primary modules P0 and P1 which are made up of one 3D convolutional (Conv) layer followed by a 3D group normalization (GN) layer (We choose GN in this work because the batch size is very small [30]). P0 doesn’t change the image contrast while P1 shrinks the image into its half size. The number of kernels (or filters) in P0 and P1 are decided according to the number of nodes, which is the basic operation union in a Cell and will be explored in the next part. Right after the last UC, the contrast recovered data is transformed into the degree of confidence for three types of labels through the Conv and sigmoid layers.

Figure 3: Schematic of the NAS-3D-U-Net. This figure shows an example of the three tumor subregions segmentation task on BraTS 2019 dataset. The first channel of the input shape repersents the number of modalities,

indicates the patch shape. Both P0 and P1 are composed by a Conv layer and a GN layer, parameter ‘s’ refers to the stride. There are two inputs and one output for each downward Cell (DC) and upward Cell (UC). In accordance with the disciplines in BraTS 2019, herein the three tumor subregions are: 1) the enhancing tumor (ET) which is equal to the enhancing tumor core. 2) the tumor core (TC) including the necrotic and non-enhancing tumor core and ET. 3) the whole tumor (WT) which contains TC and the edema.

2.4.2 Searching Space:

The explanation of the searching space starts with the definition of a hybrid module (HM) which is the fundamental computing unit in DC and UC. As illustrated in Fig. 4, the HM is a mixture of different operations (OPs), with an assurance that all of the OPs have the same output shape. For each , , there is a parameter whose softmax transformation is assigned to the output of as a weight. The plays as the probability indicating how much this contributes to the HM. The output of HM is a weighted sum of all the OPs. As the searching process goes, the optimizers would increase the whose bonded OPs affect more to the HM while decrease the other s which belong to less important OPs. In the rest of this work, for simplicity, we call the hybrid parameter (HP) and the rest parameters are referred as the kernel parameter (KP).

Figure 4: Hybrid module structure. refers to the individual operation and is the corresponding weight parameter, .

Inspired by the searching space for 2D medical image segmentation tasks [29], in this work we propose three kinds of HMs, including the down sampling HM (DHM), the up sampling HM (UHM) and the normal HM (NHM). The operation sets for each HM have been listed in Table 1 from which we can see that four types of Conv modules are picked out and commonly used.

HM type Operation candidates
DHM  d_conv d_dil_conv d_dep_conv d_se_conv max_pool avg_pool
UHM  u_conv u_dil_conv u_dep_conv u_se_conv
NHM  conv dil_conv dep_conv se_conv identity
Table 1: Operation candidates for three kinds of hybrid modules (HM).

The ‘conv’ represents the basic Conv layer. The ‘dil_conv’ refers to the dilated convolution which has an enlarged reception field [32]. The ‘dep_conv’ is the depthwise separable convolution in which the function of a shaped Conv kernel could be implemented by a combination of four depthwise Conv kernels and one pointwise Conv kernel [7]. The ‘se_conv’ is short for the squeeze and excitation convolution which brings the attention mechanism in by means of learning the significance distribution among different channels [9]. The prefix ‘d_’ in a DHM indicates the stride is two for that Conv layer which would change the resolution into half size. On the contrary, the Conv operations with prefix ‘u_’ in UHM are transposed convolutions [25]

which will have the image contrast doubled. All these Conv OPs involve the GN and ReLU activation layers coming after the convolutions. Aside from the Conv modules, DHM also has the max pooling layer (‘max_pool’) and average pooling layer (‘avg_pool’). NHM has the ‘identity’ OP which only does the GN and ReLU calculations. Last but not the least, none of the HMs would change the channel size.

With hybrid modules at hand, we are able to assemble the DC and UC as exhibited in Fig. 5. Here we define a ‘Node’ as a cluster of HMs encircled by the red dashed lines. In this work, each Cell has three Nodes, the number of HMs in these Nodes are in ascending order as the output of the previous Node would be taken as the input signal for the next Node. Finally, the output of a Cell is a concatenation of three nodes’ outputs each of which is an accumulation of all the HMs in the certain node.

Figure 5: Diagrams of Cells. (a) The downward Cell (DC). (b) The upward Cell (UC). The blocks with ‘pre0’ and ‘pre1’ represent the preprocessing Conv layers with certain stride arguments. ‘D’, ‘U’ and ‘N’ are respectively short for the downward, upward and normal hybrid modules (HM). The plus signs in blue squares indicate element-wise additions and the yellow squared ‘C’ is the concatenation on the channel dimension. The conception of the Node has been illustrated with red dashed lines and labels.

According to the design of NAS-3D-U-Net displayed in Fig. 3, the two inputs of a Cell always have different shapes. Hence in Fig. 5, two preprocessing maneuvers which are mainly Conv layers have been set up before the HMs. Assuming the MRI dataset has modalities (channels) and there are Nodes in each Cell whose output channel we want to be times as large or small as the input channel, then the number of kernels for P0 and P1 (in Fig. 3) would be set to and the output channel of the th DC or UC (from top to bottom in Fig. 3) would be , in which and is called the zoom factor.

2.4.3 Searching Strategy:

The searching process is to learn the parameters in HMs and decide the structure of DC and UC. Given a set of , each HM in a Cell would only keep one OP with the highest value, and in each Node only the first two highly ranked HMs would be finally elected. Fig. 6 displays the searched architectures of DC and UC on the BraTS 2019 dataset, which gives an example of how does the generated Cells look like.

Figure 6: The NAS-3D-U-Net generated Cells (a) The searched downward Cell. (b) The searched upward Cell. The rectangles with operation names (which can be found in Table 1) represent the selected hybrid modules in the Nodes.

In this work, we carry out an empirically searching strategy that alternately updates the hybrid parameters and the kernel parameters in the network, the detail of which has been concluded in Table 2. For each category of parameters there is going to be one optimizer, and the two optimizers would work sequentially in each iteration. The data for searching are separated into two parts, one for hybrid parameters learning and the rest for the kernel parameters learning. Correspondingly, the respective loss values calculated on these two sets are referred to as the hybrid loss and the kernel loss. Once the best Cell structures are found, we will replace the DCs and UCs with the searched architectures and retrain the network on the same dataset.

Searching algorithm:
1 Prepare the datasets for hybrid parameters and kernel parameters.
Let record the searched Cell structures and their counts.
the minimum number of counts we need to find a best Cell .

the total number of epochs.

.
2 for in 1 to :
3  Get the searched Cell structure .
4  The count of in += 1.
5 if the count of :
6   .
7   break.
8  Get the hybrid loss; back propagation; update the hybrid parameters.
9  Get the kernel loss; back propagation; update the kernel parameters.
10 if :
11 = the most common Cell structure in .
12 return .
Table 2: Searching strategy in NAS-3D-U-Net.

3 Experiment and Results

3.1 Dataset

The BraTS 2019 dataset we used in this work is coming from the multimodal brain tumor segmentation challenge 2019 which is an annually hosted contest since 2012 [21]

. Aiming at pushing the advance of computer vision solutions for brain tumor diagnosis and treatments, BraTS keeps providing abundant clinically acquired MRI scans 

[1, 2]. The BraTS 2019 dataset has collected pre-operative multimodal MRI scans and the neuroradiologists verified tumor labels of subjects with the glioblastoma/high grade glioma (HGG) or the low grade glioma (LGG) from 19 institutions. The four MRI modalities are T1-weighted, T2-weighted, FLAIR and T1Gd as we described in section 2.1. All the MRI and label volumes are in shape of . The ground truth labels which are pathologically confirmed by experts have four voxel values: 1 representing the necrotic and non-enhancing tumor core, 2 referring to the edema, 4 indicating the enhancing tumor core, and 0 covering the other places. The tumor subregions considered in the evaluation system are inclusive combinations of value 1,2 and 4. Specifically, the enhancing tumor (ET) is 4, the tumor core (TC) includes 1 and 4, the whole tumor (WT) equals to the complete set of 1, 2 and 4. Samples of the dataset could be seen in Fig. 1. The BraTS 2019 Training Dataset is made up by MRI scans and ground truth labels of 259 HGG and 76 LGG subjects. The BraTS 2019 Validation Dataset for generalization and scalability verification only provides the MRI volumes of 125 subjects, which in this work are leveraged as the testing dataset. All of the MRI data have undergone preprocessing including the registration, millimeter resolution resampling, and skull stripping.

3.2 Loss Function

The loss function integrated in NAS-3D-U-Net is the weighted multi-class Dice loss which has been demonstrated as an effective variant of Dice loss for brain tumor segmentation tasks 

[10, 22]. As exhibited in Eq. (3),

(3)

in which , and indicate the patch shape on three dimensions, is a tiny constant to avoid zero division error. indicates the sized matrix extracted from the ground truth label, and represents the predicted output.

3.3 Configurations

The NAS-3D-U-Net has been developed with a single GTX1080Ti GPU card and PyTorch framework. For brain-wise normalization in Eq. (

2), we set and . The patch size is for the searching process and for the training process. The hybrid parameter is initialized as 0. In a Cell, the number of Nodes and the zoom factor . For loss function in Eq. (3) . In the data stream pipeline, data augmentations including random distortion, flipping, and rotation are implemented on the fly. The BraTS 2019 Training Dataset are split in the 5-folds cross validation style. The batch size is 1 for all the scenarios in searching and training processes. When stitching the patches, we keep the average as the value for the overlapped voxel.

3.4 Searching Results

For the searching process, we set , and recored the searched Cell structures in a hash-map. One fifth of the training set are used for updating hybrid parameters and the others are for the kernel parameters. Histories of the hybrid loss and the kernel loss have been shown in Fig. 7, from which we can see the hybrid loss value vibrated a lot during the first 10 epochs and then converged to a stable state. The iteration was stopped by the break sentence after the 56th epoch when the searched structure appeared for its 40th time.

Figure 7: Searching history. Notice the learning processes for both hybrid and kernel parameters are applied alternately in the same epoch. The last epoch has been highlighted by the red dashed line.

The searched Cell structures have been depicted in Fig. 6. One obvious characteristic shared by the searched DC and UC is that X1 is fed into every Node while X0 only affects the first Node. This phenomenon partially attributes to the fact that for both DC and UC, X1 is directly coming from the previous Cell whereas X0 is a shortcut from even further Cells. Another detail we have noticed is that in DC the resolution of all inputs would be compressed in the first place, which we believe would create more diversity for the next layer. In Fig. 6, the positional information to be recovered is carried by X0, and we can see that information has been merged with the embedded features from X1 in the first Node and then propagated to the other two Nodes and the final output signal.

3.5 Training and Validation Results

In order to prove the feasibility and the scalability of the NAS-3D-U-Net, in this part, we compare the proposed solution with the 3D-U-Net which is a manually designed architecture for brain tumor segmentations [28]. The development environment, preprocessing, patching strategies, data augmentation and the configurations for the baseline algorithm are mostly identical to that of the NAS-3D-U-Net. For training process, we deploy the 5-fold cross validation at first and then take use of the whole training dataset again. The last training had 200 epochs. Fig. 8 illustrates one sample of the detected tumor subregions by the two methods as well as the accompanied ground truth labels. It is seen that the NAS-3D-U-Net has detected most part of the three tumor subregions as well as what 3D-U-Net could figure out.

Figure 8: An example of the segmentation results by 3D-U-Net and NAS-3D-U-Net. For illustration purpose, the tumor image has been overlaid on the T1-wighted slices. The colors for tumor subregions have the same meaning as in Fig. 1 beforehand.

Following the BraTS benchmark convention, four metrics are used to evaluate the performances of an algorithm [36]. For each tumor subregion (ET, TC and WT), let and be the the volumes respectively extracted from the ground truth label and the network output, then the Dice score would be

(4)

The Sensitivity (or Recall) reflects the true positive rate which is

(5)

The Specificity is the complementary set of the false positive rate which equals

(6)

Besides the three volumetric similarities, the 95% Hausdoff distance measuring the difference between the tumor boundaries has also been leveraged. Given and representing the gradients of and , the surface of the tumor area could be expressed as and . Then we have

(7)

in which is the Euclidean distance,

means finding the 95% quantile values rather than the maximum.

In Fig. 9 we can see the comparative experiment results in terms of these four metrics. The box plots have shown the distribution characteristics.

Figure 9:

Evaluation results of the 3D-U-Net (pink colored) and the NAS-3D-U-Net (blue colored). The first row indicates the results on the training dataset and the second row is for the testing set. Each colum represents one metric evaluation for three tumor subrigions. The median (red line), 25% quantile, 75% quantile have been included in the box acommpanied by the 1.5 inter-quartile-ranged whiskers and the mean values (black stars).

From the Dice score and Sensitivity values we can see it is more difficult to correctly predict the TC and ET than that of the WT. On the other hand, the WT doesn’t have the smallest Hausdorff distance, which means it is not easy to anticipate the boundary of the WT. This contradictory outcomes reflect the fact that compared with the tumor core the edema may have more irregular surface, which could also be revealed partially from Fig. 8. In contrast with the manually fabricated 3D-U-Net, the NAS-3D-U-Net shows competitive performances. For most testing cases in Fig. 9 the NAS-3D-U-Net has smaller inter quartile ranges. The exceptions happen to the dice score and sensitivity of the tumor core and the hausdoff distance of the whole tumor. The accurate mean values for training and testing datasets have also been listed in Table. 3 and Table. 4 respectively. From the mean values we could also find that the NAS-3D-U-Net works as well as the 3D-U-Net especially on the testing dataset.

Algorithm Dice Sens Spec Haus
ET WT TC ET WT TC ET WT TC ET WT TC
3D-U-Net 0.83 0.92 0.89 0.86 0.91 0.89 1.00 1.00 1.00 3.07 4.01 3.67
NAS-3D-U-Net 0.79 0.91 0.88 0.85 0.90 0.88 1.00 1.00 1.00 3.44 3.96 4.20
Table 3: Mean values in evaluations on the training dataset.

All these comparisons have demonstrated that the automatically generated network is potentially good enough in both feasibility and scalability to replace the manually designed network for brain tumor segmentation tasks.

Algorithm Dice Sens Spec Haus
ET WT TC ET WT TC ET WT TC ET WT TC
3D-U-Net 0.74 0.89 0.81 0.77 0.90 0.83 1.00 0.99 1.00 5.99 5.68 7.36
NAS-3D-U-Net 0.74 0.89 0.80 0.78 0.89 0.81 1.00 1.00 1.00 6.36 6.40 8.02
Table 4: Mean values in evaluations on the testing dataset.

4 Conclusion

In this work, we develop an automated machine learning solution named NAS-3D-U-Net for the 3D multimodal MRI brain tumor segmentation task. Through alternately updating the two classes of parameters the searching process would end up with the most frequently appeared cell structures which are further used as the building blocks in the U-Net architectures. In order to feeding the large 4D input into our networks, the z-score normalization and scaling have been employed only in the brain area. Moreover, NAS-3D-U-Net could take advantage from the patching strategy considering the different patch sizes for searching and training processes. On BraTS 2019 dataset, it has been demonstrated that the autoML searched network has competitive performance in both of the feasibility and generalization.

References

  • [1] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B. Freymann, K. Farahani, and C. Davatzikos (2017) Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific data 4, pp. 170117. Cited by: §1, §3.1.
  • [2] S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempfler, A. Crimi, R. T. Shinohara, C. Berger, S. M. Ha, M. Rozycki, et al. (2018) Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629. Cited by: §1, §3.1.
  • [3] G. Bender, P. Kindermans, B. Zoph, V. Vasudevan, and Q. Le (2018) Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning, pp. 549–558. Cited by: §1.
  • [4] H. Cai, L. Zhu, and S. Han (2019) ProxylessNAS: direct neural architecture search on target task and hardware. In International Conference on Learning Representations, Cited by: §1.
  • [5] L. Chen, M. Collins, Y. Zhu, G. Papandreou, B. Zoph, F. Schroff, H. Adam, and J. Shlens (2018) Searching for efficient multi-scale architectures for dense image prediction. In Advances in neural information processing systems, pp. 8699–8710. Cited by: §1.
  • [6] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40 (4), pp. 834–848. Cited by: §1.
  • [7] F. Chollet (2017) Xception: deep learning with depthwise separable convolutions. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 1251–1258. Cited by: §2.4.2.
  • [8] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger (2016) 3D u-net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, pp. 424–432. Cited by: §1.
  • [9] J. Hu, L. Shen, and G. Sun (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141. Cited by: §2.4.2.
  • [10] F. Isensee, P. Kickingereder, W. Wick, M. Bendszus, and K. H. Maier-Hein (2017) Brain tumor segmentation and radiomics survival prediction: contribution to the brats 2017 challenge. In International MICCAI Brainlesion Workshop, pp. 287–297. Cited by: §3.2.
  • [11] F. Isensee, P. Kickingereder, W. Wick, M. Bendszus, and K. H. Maier-Hein (2018) No new-net. In International MICCAI Brainlesion Workshop, pp. 234–244. Cited by: §1.
  • [12] S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio (2017) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 11–19. Cited by: §1.
  • [13] K. Kamnitsas, W. Bai, E. Ferrante, S. McDonagh, M. Sinclair, N. Pawlowski, M. Rajchl, M. Lee, B. Kainz, D. Rueckert, et al. (2017) Ensembles of multiple models and architectures for robust brain tumour segmentation. In International MICCAI Brainlesion Workshop, pp. 450–462. Cited by: §1.
  • [14] K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker (2017) Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical image analysis 36, pp. 61–78. Cited by: §1.
  • [15] D. Kwon, M. Niethammer, H. Akbari, M. Bilello, C. Davatzikos, and K. M. Pohl (2013) PORTR: pre-operative and post-recurrence brain tumor registration. IEEE transactions on medical imaging 33 (3), pp. 651–667. Cited by: §1.
  • [16] K. Lenting, R. Verhaak, M. Ter Laan, P. Wesseling, and W. Leenders (2017) Glioma: experimental models and reality. Acta neuropathologica 133 (2), pp. 263–282. Cited by: §1.
  • [17] C. Liu, L. Chen, F. Schroff, H. Adam, W. Hua, A. L. Yuille, and L. Fei-Fei (2019) Auto-deeplab: hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 82–92. Cited by: §1.
  • [18] C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy (2018) Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34. Cited by: §1.
  • [19] H. Liu, K. Simonyan, and Y. Yang (2019) DARTS: differentiable architecture search. In International Conference on Learning Representations, Cited by: §1.
  • [20] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440. Cited by: §1.
  • [21] B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest, et al. (2014) The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34 (10), pp. 1993–2024. Cited by: §1, Figure 1, §3.1.
  • [22] F. Milletari, N. Navab, and S. Ahmadi (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Cited by: §3.2.
  • [23] A. Myronenko (2018)

    3D mri brain tumor segmentation using autoencoder regularization

    .
    In International MICCAI Brainlesion Workshop, pp. 311–320. Cited by: §1.
  • [24] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean (2018) Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268. Cited by: §1.
  • [25] A. Radford, L. Metz, and S. Chintala (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In International Conference on Learning Representations, Cited by: §2.4.2.
  • [26] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §1.
  • [27] J. A. Schwartzbaum, J. L. Fisher, K. D. Aldape, and M. Wrensch (2006) Epidemiology and molecular pathology of glioma. Nature clinical practice Neurology 2 (9), pp. 494–503. Cited by: §1.
  • [28] F. Wang, R. Jiang, L. Zheng, C. Meng, and B. Biswal (2019) 3D u-net based brain tumor segmentation and survival days prediction. arXiv preprint arXiv:1909.12901. Cited by: §3.5.
  • [29] Y. Weng, T. Zhou, Y. Li, and X. Qiu (2019) NAS-unet: neural architecture search for medical image segmentation. IEEE Access 7, pp. 44247–44257. Cited by: §1, §2.4.2.
  • [30] Y. Wu and K. He (2018) Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19. Cited by: §2.4.1.
  • [31] Y. Xu, L. Xie, X. Zhang, X. Chen, G. Qi, Q. Tian, and H. Xiong (2020) Pc-darts: partial channel connections for memory-efficient differentiable architecture search. In International Conference on Learning Representations, Cited by: §1.
  • [32] F. Yu and V. Koltun (2016) Multi-scale context aggregation by dilated convolutions. In International Conference on Learning Representations, Cited by: §2.4.2.
  • [33] Z. Zhu, C. Liu, D. Yang, A. Yuille, and D. Xu (2019) V-nas: neural architecture search for volumetric medical image segmentation. In 2019 International Conference on 3D Vision (3DV), pp. 240–248. Cited by: §1.
  • [34] B. Zoph and Q. V. Le (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. Cited by: §1.
  • [35] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le (2018) Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710. Cited by: §1.
  • [36] K. J. Zülch (2013) Brain tumors: their biology and pathology. Springer-Verlag. Cited by: §1, §3.5.