Bottleneck Supervised U-Net for Pixel-wise Liver and Tumor Segmentation

10/16/2018 ∙ by Song Li, et al. ∙ City University of Hong Kong 0

Convolutional neural network (CNN) has been widely used for image processing tasks.In this paper we design a bottleneck supervised U-Net model and apply it to liver and tumor segmentation. Taking an image as input, the model outputs segmented images of the same size, each pixel of which takes value from 1 to K where K is the number of classes to be segmented. The innovations of this paper are two-fold: first we design a novel U-Net structure which include dense block and inception block as the base U-Net; second we design a double U-Net architecture based on the base U-Net and includes an encoding U-Net and a segmentation U-Net. The encoding U-Net is first trained to encode the labels, then the encodings are used to supervise the bottleneck of the segmentation U-Net. While training the segmentation U-Net, a weighted average of dice loss(for the final output) and MSE loss(for the bottleneck) is used as the overall loss function. This approach can help retain the hidden features of input images. The model is applied to a liver tumor 3D CT scan dataset to conduct liver and tumor segmentation sequentially. Experimental results indicate bottleneck supervised U-Net can accomplish segmentation tasks effectively with better performance in controlling shape distortion, reducing false positive and false negative, besides accelerating convergence. Besides, this model has good generalization for further improvement.



There are no comments yet.


page 10

page 13

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Convolutional Neural Network (CNN) and Deep Convolutional Neural Network (DCNN) have gained tremendous attention in recent years because it can learn very complicated features from the training dataset. Neural network can be traced back to the last century (Le Cun et al.

1990). After years of silence, it has become popularity in recent years because of the increasing data volume which facilitates training of complex models, and the development of hardwares for fast computing. Deep learning has been used in many ground applications, such as object detection (

Ren et al. 2017; Redmon and Farhadi 2017), style transfer(Luan et al. 2017; Gatys et al. 2016) and image semantic segmentation (Chen et al. 2018; Long et al. 2015; Ronneberger et al. 2015). In this paper we focus on its application in medical image segmentation and apply it to liver and tumor segmentation.

Automatic medical images segmentation is a long-standing issue and is still one of the hottest research areas. Medical imaging is a technique for visualizing the interior of a body for clinical analysis and medical intervention. In recent decades, medical imaging technologies such as X-ray, CT, MRI and ultrasound have been developed for human health care. Radiologists and physicians make diagnosis by examining the generated images and videos. For example, the two applications in this paper are hand bone age assessment from X-ray images and liver tumors detection from CT scan images. Even though manual diagnosis accuracy has been greatly improved, it depends highly on radiologists’ expertise. Misdiagnosis happens constantly because of the inevitable fatigue caused by the time-consuming work, which can lead to mortal consequences. Therefore, an automatic segmentation method is required in clinical practice. This task is likely to be fulfilled by the recent fast developments in computer vision techniques.

Computer vision intends to facilitate computers gain high-level understanding of digital images or videos so as to relieve manual work and reduce errors of judgement. In general, they lie in the following two categories: supervised methods and unsupervised methods. The algorithm foundations of traditional computer vision was formed in 1970s, including non-polyhedral, polyhedral modeling, motion estimation, edges extraction and so on. Modern computer vision is gaining popularity mainly because of the development of convolutional neural network (CNN) and hardware such as multi-core CPU and GPU. Nowadays, computer vision related tasks (such as image classification, semantic segmentation, object detection, deep reinforcement learning and so on) have been widely studied in academia and have found ground applications in industry.

Taking advantage of the development of computer vision techniques, researchers use computers to analyze the images generated from medical devices, trying to achieve results comparable to physicians.

1.1 Related Work

The task of automatic segmentation of medical images has been widely studied. These researches mainly fall into the following two streams: supervised methods and unsupervised methods.

Unsupervised methods learn patterns of the dataset without referring to ground truth. In Stawiaski et al. (2008), a sub-volume is manually defined at first, then a region adjacency graph is extracted using watershed segmentation. Later, liver boundaries are extracted using the minimal surfaces technique. Finally, the tumors are segmented using MAP estimation of a Markov random field (MRF). Li et al. (2013) used a complex level set model to semi-automatically segment hepatic tumors from contrast-enhanced clinical CT images. Li et al. (2012) also used a level set model to integrate image gradient, region competition and prior information for CT liver tumor segmentation. Lipková et al. (2017) used a phase separation approach. They assumed the healthy phase and lesion phase images are polluted by noises, and then remove the noises and separate the mixture using Cahn-Hilliard equation. In Das and Sabut (2016), the adaptive threshold, morphological processing, and kernel fuzzy C-means (KFCM) clustering algorithm were used to visualize and measure the tumor areas from abdominal CT images. One of the advantages of unsupervised methods is that they have more generalizability since they are not learned from a certain population. However, the performance of unsupervised methods can be inferior to supervised ones since they do not have ground truth for supervision.

For supervised methods, each training data is a pair consisting of an input object (usually a vector) and an output object (ground truth). After training, a function is learned from training data pairs to map new examples. When optimizing the function, the labels are used as supervision. For example, the function can be optimized to minimize the average difference between function outputs and ground truths. Some supervised methods are used in the literature for medical applications. In

Zhou et al. (2008)

, a support vector machine (SVM) was trained to extract tumor region from each 2D slice, then the extracted tumor contour was projected to its neighboring slices after some morphological steps.

Smeets et al. (2010) took a semi-automatic level set segmentation approach to segment liver tumors. Initialization is done by a dynamic programming based spiral-scanning technique, the level set evolves according to a speed image generated from supervised statistical pixel classification. Zhang et al. (2011) used support vector machine (SVM) to detect tumors from liver parenchyma. All the above researchers used traditional supervised methods. Next, we discuss one class of supervised methods—deep convolutional neural network (DCNN).

In this paper, we focus on using DCNN for medical images segmentation and apply it to liver and tumor segmentation. In recent years, some neural network architectures have been proposed for this task. Shelhamer et al. (2017) adapted the classification networks (Alexnet, GoogLeNet and VGG net) to fully convolutional neural network (FCN) to conduct end-to-end, pixels-to-pixels semantic segmentation. Similar to FCN, in Badrinarayanan et al. (2015) an encoder-decoder architecture neural network was proposed and this achieved state-of-art segmentation results even without post-processing. As an extension of the above studies, Ronneberger et al. (2015) used a U-shaped architecture neural network which consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. They use short and long skip connections between contracting layers and expanding layers to recover spatial information lost during downsampling. These networks are also widely applied for medical images segmentations. Oliveira et al. (2018) combines Stationary Wavelet Transform with Fully Convolutional Neural Network network for Retinal Vessel Segmentation. Havaei et al. (2017) used a cascaded 2-pathway architecture for brain tumor segmentation. Li et al. (2017) proposed a hybrid densely connected U-Net (H-DenseUNet), which uses a 2D Dense UNet to extract intra-slice features and a 3D counterpart to aggregate volumetric contexts under the spirit of the auto-context algorithm. Liu et al. (2017) proposed a 3D Anisotropic Hybrid Network (AH-Net) that transfers convolutional features learned from 2D images to 3D anisotropic volumes.

1.2 Motivation and Contribution

Compared with natural image segmentation tasks, medical images segmentation is more difficult for the following reasons. First, medical images always take up much more memory than natural images. For example, on average per 3D CT scan volume to be used in this paper consumes about 1GB memory. This makes it hard to put the entire image into any model due to hardware limitation. Second, the objects in medical images are always more irregular in size, shape and intensity, which makes it hard to learn patterns well. Third, the problem of false positive is serious for medical images due to similarity of target tissues and some background tissues. Fourth, since annotating medical images is expensive and time-consuming, positive and negative cases in training datasets are always highly unbalanced. This can also lead to a third problem. Focusing on the above problems, we propose a bottleneck supervised U-Net model for pixel-wise medical images segmentation. Our contribution is as follows:

  • We design a variation of standard U-Net by including dense block and inception block into its encoding path and bottleneck layers, so as to improve the overall performance. The structure is shown in Figure 2.

  • We design a double U-Net structure (Figure 3). During training, the encoder U-Net is first trained as an auto-encoder to learn bottleneck encodings of real labels. The encodings are then used to supervise the training of segmentation U-Net at the bottleneck. Experiments show this approach can partly control shape distortion, reduce false positives and false negatives and accelerate convergence.

  • Liver segmentation results on the border and tumor regions are always unfavorable due to its similarity with background regions. Therefore, we design a weight function and incorporate it into the loss function so as to strengthen supervision on such regions.

2 Technical background

2.1 Convolutional neural network

Convolutional neural network (CNN) is a class of feed-forward artificial neural networks evolved from fully connected feedforward neural networks(Figure 1(A)). Each neuron in the hidden layer and output layer is calculated by transformation of weighted sum of all its previous connected neurons. For example:


where denotes the neuron in the hidden layer, denotes the weight on the line from to , is a bias term. Since there are neurons in the input layer, takes value from to .

is the activation function for nonlinear transformation.

Even though classical fully connected feedforward neural networks can extract features well when the number of input neurons is not very large, it is hard to extend them to tasks with images as input. Consider a input image, in the fully connected setting there would be 10000 weights for each neuron in the next layer. The appearance of convolutional layer solves this problem by using a shared kernel screening on the input. In this way there is only

weights to estimate. This approach greatly reduces the total number of weights, relieves the computational burden and allows networks to go deeper. An plot of an example convolutional neural network (CNN) is shown in Figure 1(B) (cited from the website) The convolutional layers normally have some parameters such as padding, striding and pooling. These hyperparameters should be carefully selected to guarantee best performance. Similar to fully connected feedforward neural networks, activation layer usually follows the convolutional layer to conduct non-linear transformation. Many different kinds of activation layers have been proposed mainly to avoid the gradient vanishing and exploding issues, such as ReLU, PReLU, TanH and Sigmoid layers. In this paper we use RELU activation function

. We also use the Batch Normalization (BN) layers (

Ioffe and Szegedy (2015)) to normalize the input layer by adjusting and scaling the input. There are many benefits of incorporating BN layers, such as relieving ’internal covariate shift’, serving as regularizer to relieve overfitting and relieving gradient vanishing and exploding.

  Input: Values of over a mini-batch: ; Parameters to be learned: .
  Output:                      //mini-batch mean

        //mini-batch variance

      // scale and shift
Algorithm 1 Batch normalization transform, applied to activation over a mini-batch.
Figure 1: An example of fully connected feedforward neural network.

2.2 Segmentation network: U-Net

U-Net is a U-shaped convolutional neural network used for image segmentation. Figure 2 is the standard U-Net architecture (cited from Ronneberger et al. (2015)). It consists of a contracting path which encodes an input image of size to a

feature tensor at the bottleneck, and an expanding path which decodes the feature tensor to a segmented image of the same size as input image. As shown in Figure 2, there are short and long connections between contracting path and expanding path, which indicates the feature map at the LHS of arrow is concatenated to the feature map at the RHS. For example the feature map at LHS is of size

, feature map at RHS is of size , and then after concatenation it is . The skip connections are to recover information loss during the encoding path. With the development of deep neural networks, some newly proposed modules (such as dense block Huang et al. (2017) and inception block Szegedy et al. (2016)

) are included in the architecture of U-Net to improve its performance. For each layer in the dense block, feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs for all subsequent layers. The use of dense connections can relieve the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse and reduce the number of parameters. In the inception block, the input is processed by convolution layers with different filter sizes (

) to capture information of different scales. Then the output feature maps are concatenated together as a block output. Structures of the dense block and inception block uesd in this paper can be found in Figure 3. You can see the dense block in this paper is a simplied version with only one skip concatenation.

Figure 2: U-net architecture (example of pixels in the lowest resolution). Each blue box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The x-y-size is provided at the lower edge of the box. White boxes represent copied feature maps. The arrows denote the different operations.

2.3 Auto-encoder

An auto-encoder is a type of artificial neural network used to learn efficient data encoding in an unsupervised manner. It can be used for dimensional reduction by encoding high-dimensional data into low-dimension features. A simple form of auto-encoder is a feed-forward non-recurrent neural network that takes its input as the output. An auto-encoder usually consists of an encoder part and an decoder part. Denote the encoder function as

, the decoder function as , an auto-encoder tries to find . Auto-encoder has many variants corresponding to various applications, such as denoising auto-encoder, sparse auto-encoder and variational auto-encoder (VAE). The motivation of our model is that a U-Net can be trained as an auto-encoder if all skip connections are removed and the labels are used as both the input and labels. After training, the feature at the bottleneck can be deemed as the encoding for the labels.

3 Model and algorithm

This section provides details of training of the bottleneck supervised U-Net. The network structure, the weight map and the loss function are to be presented.

3.1 Network structure

This subsection first shows structure of the base U-Net, then builds bottleneck supervised U-Net based on it.

Different from the standard U-Net (Figure 1), we include inception module and dense module to improve the overall performance. Figure 3 shows our proposed base U-Net architecture. It has five inception modules in the encoding path, an inception module and a dense module in the bottom transformation layers. The decoding path is similar to the standard U-Net. In the inception module, input is first processed through four groups of operations, they are:

, , and (where denotes a convolution layer with filter size ). The outputs are then concatenated together and processed by some other layers. Such approach of independently using different filter sizes can improve encoding performance by capturing input information from different scales so as to reduce information loss. The dense module used in this paper is a simplified verson, which only concatenates the input and the feature map after Batch Normalization and . It can help relieve the vanishing-gradient problem and strengthen feature propagation during bottom information transformation. Note that a RELU activation layer follows each convolutional layer.

Figure 3: A variation of original U-Net. Five inception modules are put into the encoding path, an inception module and a dense module are put into the bottom layers. We do not change too much for the decoding layers.

U-Net is also a kind of auto-encoder if the skip connections are taken away and the labels are used as both model input and labels. We call above U-Net ‘encoding U-Net’. We call the U-Net used for segmentation ‘segmentation U-Net’, which has skip connections and uses the original image as input. The encoding U-Net is trained to learn representations (encodings) of the labels, which are the bottleneck features. Ideally, the bottleneck feature of encoder U-Net and segmentation U-Net should be the same, because the final output should be the same. With this fact, we design the bottleneck supervised U-Net (BS U-Net) as shown in Figure 4. The BS U-Net is a combination of an encoder U-Net and a segmentation U-Net connected at the bottleneck. The two U-Nets have the same base structure (Figure 3) except that skip connections in the encoding U-Net are removed. To train BS U-Net, the encoding U-Net should be trained first. Then train the segmentation U-Net using a sum of two loss functions—one is to minimize the difference between the output and the label, the other is to minimize the difference between the bottleneck features of the segmentation U-Net and the well-trained encoding U-Net. Specifically, we use the weighted sum of weighted dice loss and Euclidean loss.

Figure 4: The bottleneck supervised U-Net

3.2 Weight function

When doing liver segmentation, we found that segmentation results at the border of the liver are always worse than the inner areas, especially when there are tumors at the border. The reason is that the intensity of tumors and liver borders can be similar to the background, making it hard for the neural network to learn the difference well. To force the neural network to focus more on the difficult areas, we compute weight maps of the same size as input images and include them in the loss function. Given an image and its corresponding label, computation of a weight map has two steps: first compute a distance map , each pixel of which is the distance from this pixel to the nearest pixel on the liver contour; then compute the weight map based on using the following formulas:


where is a predetermined binary (0-1) matrix, whose pixels equal to 1 and are the regions you focus on. and are the hyperparameters selected by repeated trials. As shown in Figure 5, larger leads to slower decrease when getting away from the border, larger weight leads to larger weight over the regions induced by .

Figure 5: Plots of an example of original CT slice image, liver segmentation image, lesion segmentation image, weight map with , weight map with , weight map with .

3.3 Loss function

The loss function has two parts. One is the dice loss used to supervise the final output. The other is the MSE loss used to supervise the bottleneck. Given label and output , the dice loss function is defined as follows:


where is the summation of all elements of the inside matrix. equals if and otherwise. Dice loss has the advantage of relieving data imbalance problem, which is ideal for medical images segmentation.

Denote two bottleneck features generated by encoding U-Net and segmentation U-Net respectively as and respectively. The Euclidean loss of and is:


where takes all the locations of and .

To strengthen supervision on the border areas and regions induced by , we include the weight map calculated from Equations (1) and (2) into the dice loss. The weighted dice loss is calculated as follows:


The total loss of BS U-Net is a weighted summation of the above dice loss and Euclidean loss.


where .

4 Application

In this subsection, we apply our method to the dataset of Liver Tumor Segmentation (LiTS) challenge (

) organized by CodaLab. The dataset consists 130 3D abdominal CT scan images, each of which contains liver as well as some other organs. The challenge is mainly about liver tumor segmentation, but it also includes evaluation of liver segmentation results. An example 3D CT data visualized by slices in different directions is shown in Figure 7. We follow the routine of most existing researches which use the second plot (coronal position) as the model input. In addition, we use a cascaded approach for liver and tumor segmentation sequentially. The procedure for preprocessing raw 3D CT data is as follows:

  1. Pixels with Hounsfield Units (HU) larger than 250 are set to 250, lower than -200 are set to -200;

  2. All the HUs are normalized to 0-255 using the tminmax method;

4.1 Liver segmentation

This section shows details of liver segmentation carried out by using BS U-Net. Following extant literature, two different kinds of model input are used: 1. Single slice as one channel input; 2. A concatenation of three consecutive slices as three channel inputs. The purpose of the second approach is to introduce more contextual information. As illustrated in Section 3.1, training bottleneck supervised U-Net has two steps: First train the encoding U-Net, then train the segmentation U-Net. Only positive slices which contain liver are used for training both U-Nets. Data augmentation techniques are used when training BS U-Net: first scale each input image to the size of , where is a random number between and ; then randomly crop the image back to . For both U-Nets, the data is normalized to 0-1 range at last before putting into the network. The hyper-parameters for both U-Nets are:

, at epoch

, the learning rate is adjusted to . Figure 6 shows the loss as a function of number of iteration. It is obvious that both BS U-Net and standatd U-Net converge, and BS U-Net converges. Figure 8 shows some representative predictions that illustrate the advantage of BS U-Net. As shown in the figure, BS U-Net significantly reduced the occurrence of false positives (as in the first, second, and fourth rows) and false negatives (the third row). Note that even when the input image resolution in the third row is lower, BS U-Net successfully segments out the target liver.

Figure 6: Plots of dice loss of standard U-Net with 1-hannel input, dice loss of BS U-Net with 1-channel input, dice loss of BS U-Net with 3-channel input, dice loss of standard U-Net with 1-channel input, Euclidean loss of BS U-Net with 1-channel input, Euclidean loss of standard U-Net with 3-channel input.

We submit segmentation results of standard U-Net and BS U-Net to the official competition. Our purpose is not to win the competition which requires many tricks and repeatedly tunning of the parameters, therefore, we just focus on comparing the metrics. The dice per case (DPC), dice global (DG), volume overlap error (VOE), relative volume difference (RVD) on the test set are shown in Table 1 (Calculation methods of above metrics can be found on the website). Usually we mainly focus on DPC and DG. Single slice original U-Net (Figure 2) performs worse than BS U-Net (Figure 4) and STD U-Net (Figure 3), meaning that including dense block and inception block indeed improves the overall performance. For both single slice input and 3 slice concatenation input, the difference between standard U-Net and BS U-Net on dice global is 0.1, but that of dice per case is 0.2. This means that there are a lot of minor corrections (Figure 8). It is a little surprising that 3 slice input works worse than single slice input. The result of 1S BS U-Net ranks third in the leaderboard when this work was done. Note that we did not use any post-processing techniques or fine-tune the parameters, with which the results can be further improved.

1S Ori U-Net 0.957 0.960 0.080 0.019 1.503 70.076 4.256
1S BS U-Net 0.9610 0.9640 0.075 0.018 1.419 47.217 3.831
1S STD U-Net 0.9590 0.9630 0.078 0.016 1.540 57.106 4.236
3S BS U-Net 0.9600 0.9620 0.077 0.021 1.543 55.696 4.307
3S STD U-Net 0.9580 0.9610 0.079 0.020 1.470 76.054 4.174
Table 1: Metrics on the test dataset.
Figure 7: Different positions of an example 3D CT image. Upper left: oblique coronal position of the image; upper right: add label to the upper left image, yellow area is the liver, green area is the tumor on the liver; lower left: sagittalia position of the image; lower right: oblique-axial position of the image.

Input image
Standard U-Net BS U-Net
Figure 8: Three-slice segmentation results: the first column shows the input middle slice before divided by 255, second column shows segmentation results using standard U-Net model, the third column shows segmentation results using bottleneck supervised U-Net (BS U-Net).

4.2 Tumor segmentation

In this subsection, we use BS U-Net for tumor segmentation based on the liver segmentation results in Section 4.1. This approach is called a cascaded approach. The two neural networks system—one for liver segmentation and one for tumor segmentation—is also called the cascaded neural network structure. We also use the base network in Figure 3. The encoding U-Net and segmentation U-Net are with , at epoch , the learning rate is adjusted to . Figure 9 shows the preprocessing steps for input images. The pipline of image preprocessing for tumor segmentation is:

  1. Use liver mask to mask out the background;

  2. Crop out a rectangular area containing the liver with pixels margin at the top, bottom, left and right;

  3. Pad the cropped image to a square, whose size takes the maximum of cropped image’s length and width;

  4. Rescale the square image to .

Figure 9: The pipline of image preprocessing for tumor segmentation.

To relieve computation burden and reduce false negatives, only slices that contain liver are used as input. During training, the data augmentation methods include: 1. The images are first scaled to , then randomly cropped to ; 2. Randomly rotate the image with angle randomly taken from to . Since our focus is to illustrate the strength of BS U-Net, we do not commit ourselves to tunning hyperparameters or exploring preprocessing and data augmentation methods. Figure 10 shows the losses as a function of number of iteration. Segmentation results of BS U-Net and standard U-Net are submited to the official chellenge website, feedback metrics are shown in Table 2. Both dice per case and dice global of BS U-Net are better than standard U-Net.

BS U-Net 0.569 0.751 0.437 -0.228 1.702 9.130 2.426
STD U-Net 0.5520 0.7290 0.414 -0.101 1.395 8.324 2.069
Table 2: Metrics on the test dataset.

Figure 10: Plots of dice loss of BS U-Net, MSE loss of BS U-Net, dice loss of standard U-Net.

5 Conclusion and discussion

This paper extends the research on U-Net for image segmentation. First we propose a variation of U-Net architecture which includes dense block and inception block. Then we designed a double U-Net architecture to train the Bottleneck Supervised U-Net. When doing liver segmentation, we calculate a weight map and include it in the loss function to focus more on the border regions. To evaluate the performance of BS U-Net and standard U-Net, we tested on the test set of LiTS dataset for both liver and tumor segmentation tasks. Results are submitted to the official LiTS challenge website. Feedback metrics show that BS U-Net has larger dice per case and dice global. In addition, visualization results show BS U-Net has better performance in controlling shape distortion, reducing false positives and false negatives, besides accelerating convergence.

The structure of BS U-Net can be generalized to any variation of base U-Net, which makes it a great choice to further improve the performance. In other words, when you have designed a U-Net and achieved good performance, you can try using it as the base U-Net and train a BS U-Net to further improve the performance.

For the LiTS dataset, we use equal weights for the dice loss and bottleneck MSE loss. Still, it is possible to use other weights to improve the performance. Io our experience, there is no general method; just keep repeating the trials to find the optimal weight for the best results.

In this paper we additionally supervise one bottleneck tensor. In fact, all the tensors in the bottleneck and decoding path can be supervised. However, too much supervision can lead to over-fitting problem. It is desired to find the optimal amount of supervision, we leave this topic to our future research.


  • Badrinarayanan et al. (2015) Badrinarayanan, V., Handa, A., Cipolla, R., 2015. Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. Computer Science .
  • Chen et al. (2018) Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L., 2018. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis & Machine Intelligence 40, 834–848.
  • Das and Sabut (2016) Das, A., Sabut, S.K., 2016. Kernelized fuzzy c-means clustering with adaptive thresholding for segmenting liver tumors. Procedia Computer Science 92, 389–395.
  • Gatys et al. (2016) Gatys, L.A., Ecker, A.S., Bethge, M., 2016.

    Image style transfer using convolutional neural networks, in: Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, IEEE. pp. 2414–2423.

  • Havaei et al. (2017) Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin, P.M., Larochelle, H., 2017. Brain tumor segmentation with deep neural networks. Medical image analysis 35, 18–31.
  • Huang et al. (2017) Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., 2017. Densely connected convolutional networks., in: CVPR, p. 3.
  • Ioffe and Szegedy (2015) Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 .
  • Le Cun et al. (1990) Le Cun, Y., Matan, O., Boser, B., Denker, J.S., Henderson, D., Howard, R., Hubbard, W., Jacket, L., Baird, H., 1990. Handwritten zip code recognition with multilayer networks, in: Pattern Recognition, 1990. Proceedings., 10th International Conference on, IEEE. pp. 35–40.
  • Li et al. (2012) Li, B.N., Chui, C.K., Chang, S., Ong, S.H., 2012. A new unified level set method for semi-automatic liver tumor segmentation on contrast-enhanced ct images. Expert Systems with Applications 39, 9661–9668.
  • Li et al. (2013) Li, C., Wang, X., Eberl, S., Fulham, M., Yin, Y., Chen, J., Feng, D.D., 2013. A likelihood and local constraint level set model for liver tumor segmentation from ct volumes. IEEE Transactions on Biomedical Engineering 60, 2967–2977.
  • Li et al. (2017) Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A., 2017. H-denseunet: Hybrid densely connected unet for liver and liver tumor segmentation from ct volumes. arXiv preprint arXiv:1709.07330 .
  • Lipková et al. (2017) Lipková, J., Rempfler, M., Christ, P., Lowengrub, J., Menze, B.H., 2017. Automated unsupervised segmentation of liver lesions in ct scans via cahn-hilliard phase separation. arXiv preprint arXiv:1704.02348 .
  • Liu et al. (2017) Liu, S., Xu, D., Zhou, S.K., Mertelmeier, T., Wicklein, J., Jerebko, A., Grbic, S., Pauly, O., Cai, W., Comaniciu, D., 2017. 3d anisotropic hybrid network: Transferring convolutional features from 2d images to 3d anisotropic volumes. arXiv preprint arXiv:1711.08580 .
  • Long et al. (2015) Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440.
  • Luan et al. (2017) Luan, F., Paris, S., Shechtman, E., Bala, K., 2017. Deep photo style transfer. CoRR, abs/1703.07511 .
  • Oliveira et al. (2018) Oliveira, A.F.M., Pereira, S.R.M., Silva, C.A.B., 2018. Retinal vessel segmentation based on fully convolutional neural networks. Expert Systems with Applications .
  • Redmon and Farhadi (2017) Redmon, J., Farhadi, A., 2017. Yolo9000: better, faster, stronger. arXiv preprint .
  • Ren et al. (2017) Ren, S., He, K., Girshick, R., Sun, J., 2017. Faster r-cnn: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence 39, 1137–1149.
  • Ronneberger et al. (2015) Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer. pp. 234–241.
  • Shelhamer et al. (2017) Shelhamer, E., Long, J., Darrell, T., 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence 39, 640–651.
  • Smeets et al. (2010) Smeets, D., Loeckx, D., Stijnen, B., De Dobbelaer, B., Vandermeulen, D., Suetens, P., 2010. Semi-automatic level set segmentation of liver tumors combining a spiral-scanning technique with supervised fuzzy pixel classification. Medical image analysis 14, 13–20.
  • Stawiaski et al. (2008) Stawiaski, J., Decenciere, E., Bidault, F., 2008. Interactive liver tumor segmentation using graph-cuts and watershed, in: Workshop on 3D segmentation in the clinic: a grand challenge II. Liver tumor segmentation challenge. MICCAI, New York, USA.
  • Szegedy et al. (2016) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826.
  • Zhang et al. (2011) Zhang, X., Tian, J., Xiang, D., Li, X., Deng, K., 2011. Interactive liver tumor segmentation from ct scans using support vector classification with watershed, in: Engineering in medicine and biology society, EMBC, 2011 annual international conference of the IEEE, IEEE. pp. 6005–6008.
  • Zhou et al. (2008) Zhou, J., Xiong, W., Tian, Q., Qi, Y., Liu, J., Leow, W.K., Han, T., Venkatesh, S.K., Wang, S.c., 2008. Semi-automatic segmentation of 3d liver tumors from ct scans using voxel classification and propagational learning, in: MICCAI workshop, p. 43.