Normalization in Training Deep Convolutional Neural Networks for 2D Bio-medical Semantic Segmentation

09/11/2018 ∙ by Xiao-Yun Zhou, et al. ∙ Imperial College London 0

2D bio-medical semantic segmentation is important for surgical robotic vision. Segmentation methods based on Deep Convolutional Neural Network (DCNN) out-perform conventional methods in terms of both the accuracy and automation. One common issue in training DCNN is the internal covariate shift, where the convolutional kernels are trained to fit the distribution change of input feature, hence both the training speed and performance are decreased. Batch Normalization (BN) is the first proposed method for addressing internal covariate shift and is widely used. Later Instance Normalization (IN) and Layer Normalization (LN) were proposed and are used much less than BN. Group Normalization (GN) was proposed very recently and has not been applied into 2D bio-medical semantic segmentation yet. Most DCNN-based bio-medical semantic segmentation adopts BN as the normalization method by default, without reviewing its performance. In this paper, four normalization methods - BN, IN, LN and GN are compared and reviewed in details specifically for 2D bio-medical semantic segmentation. The result proved that GN out-performed the other three normalization methods - BN, IN and LN in 2D bio-medical semantic segmentation regarding both the accuracy and robustness. Unet is adopted as the basic DCNN structure. 37 RVs from both asymptomatic and Hypertrophic Cardiomyopathy (HCM) subjects and 20 aortas from asymptomatic subjects were used for the validation. The code and trained models will be available online.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Bio-medical semantic segmentation, which labels the class/anatomy/prosthesis of each pixel/voxel in an image/volume, is important for both intra-operative surgical robotic navigation and path planning. For example, segmenting the Right Ventricle (RV) from Magnetic Resonance (MR) images intra-operatively is essential to instantiate 3D RV shapes and hence to navigate robotic cardiac interventions [2]. Segmenting markers on fenestrated stent grafts is useful for instantiating 3D stent graft shapes and hence 3D robotic path planning for Fenestrated Endovascular Aortic Repair (FEVAR) [3].

Conventional segmentation methods for both natural and bio-medical images are usually based on features (edge, region, angle, etc.) which need an expert-designed feature extractor and classifier, while recent segmentation methods based on Deep Convolutional Neural Network (DCNN) extract and classify the features automatically with multiple non-linear modules

[4]. Fully Convolutional Network (FCN) is the first proposed DCNN which realized pixel-level classification and hence semantic segmentation by using convolutional layer, deconvolutional layer and skip architecture [5]

. Ronneberger et al. introduced FCN into 2D bio-medical semantic segmentation, proposed U-Net with dense skip architectures, and achieved reasonable segmentation results on neuronal structure segmentation and cell segmentation

[6]. Systematical review has been carried out by Litgens et al. on the application of DCNN in bio-medical image analysis including segmentation, classification, detection, registration and other tasks [7]. Bio-medical semantic segmentation could be divided into 3D [8] and 2D [9], based on the dimension of convolution. In this paper, we focus on 2D DCNN-based bio-medical semantic segmentation and omit the pre-definition - 2D in the following content.

Most of previous researches on bio-medical semantic segmentation focused on architecture design, loss function, and network cascade for specific tasks. For example, atriaNet composed of multi-scaled and dual-pathed convolutional architectures were proposed for left atrial segmentation from late gadolinium enhanced MR Imaging

[10]. A hierarchical DCNN was designed with two-stage FCN and dice-sensitivity-like loss function to segment breast tumors from dynamic contrast-enhanced MR imaging [11]. The thrombus was segmented from CT images with detectnet, FCN and holistically-nested edge detection [12]. Equally-weighted focal Unet combined with focal loss and Unet was proposed to segment the small markers from fluoroscopic images in FEVAR [13].

The main technology in DCNN is to apply convolutional kernels with trainable parameters on feature maps to extract new features iteratively. During the training of DCNN, the input of a layer depends on all the parameters/values in its previous layers/feature maps. The small changes in the first input feature map/image batch accumulate and amplify along the depth of network, causing later layers to be trained to fit these changes rather than the real and useful content. This phenomenon is called internal covariate shift [14]

which is more obvious for training images with large variance. Batch Normalization (BN)

[14] is the very first proposed solution for internal covariate shift which normalized the feature map along the channel direction and remained the representation capacity of DCNN by re-scaling and re-translating the normalized feature map. It is the most widely-used DCNN normalization method for both bio-medical and natural semantic segmentation.

Later Instance Normalization (IN) was proposed to normalize the feature map along both the batch and channel dimension for image stylization [15]

. Layer Normalization (LN) was proposed to normalize the feature map along the batch dimension for recurrent neural network

[16]. Group Normalization (GN) was proposed to normalize the feature map along the batch dimension and to divide the channel dimension into groups for image classification and instance segmentation [17]

. Here, ”instance” segmentation which labels both the class and the instance of a pixel is different from semantic segmentation. Weight normalization was proposed to normalize the trainable weights of DCNN with validations on supervised image recognition, generative modelling and deep reinforcement learning, its accuracy improvement is similar to BN in image classification

[18]

. Cosine normalization was proposed to use cosine similarity instead of dot product in DCNN to decrease the internal covariate shift with limited validations shown in

[19]. It is complex to implement cosine normalization, as it needs to rewrite all the convolution and deconvolution functions.

Fig. 1: (a) a segmentation example of cars, people, trees, etc. in a natural image from Cityscap [20] (b) a segmentation example of RV from MR image.

These normalization methods were proposed for different tasks, no reviews or comparisons of their performance were published on semantic segmentation in both bio-medical and natural community. The comparison in [17] between BN, IN, LN and GN/BN and GN is for image classification/instance segmentation is different from semantic segmentation. In natural semantic segmentation, targets are usually cars, people, trees, etc. (as shown in Fig. 1a), indicating shareable parameters. Fine-tuning or extracting features from pre-trained feature maps are popular, preventing from exploring the normalization methods. BN is used by default without comparing its performance with other normalization methods. In bio-medical semantic segmentation, the target is a specific anatomy or prosthesis, i.e. the RV in Fig. 1b. A network trained from scratch is common, memory-efficient and accurate, allowing for exploring different normalization methods.

In this paper, the most widely-applied four normalization methods - BN, IN, LN, and GN were reviewed and compared specifically for bio-medical semantic segmentation. Unet is selected as the basic network architecture due to its wide application and performance advantage. The network details, four normalization methods, data collection for the RV and aorta, and the implementation details are introduced in section II. Detailed comparisons regarding the performance, hyper-parameters, cross validation, etc. are shown in section III. It is proved that GN out-performed the other three normalization methods in both the RV segmentation and aortic segmentation, even BN is currently widely used. The discussion and conclusion are in section IV and section V respectively.

Ii Methodology

Systematic details about DCNN could be found in [21], while this paper focuses on explaining the concepts of data propagation, network architecture and loss function in section II-A. The algorithms of BN, IN, LN and GN are explained in section II-B, section II-C, section II-D and section II-E respectively. The data collection and implementation details are given in section II-F.

Fig. 2: The main technology in DCNN: extracting new features from input feature map with trainable convolutional kernels .

Ii-a Network Details

With an input feature map (the first feature map is the image batch), is the batch size, is the height, is the width, is the channel, trainable kernel moves along the height and width of , as shown in fig. 2, indicating an output feature map:

(1)

Where is the kernel size, , , where is exact division and

is the convolutional stride. When

, the feature spatial dimension decreases. When , the feature spatial dimension increases. For extracting richer features, multiple are trained, resulting in .

Fig. 3: The structure of Unet used in this paper, Conv - convolution, Deconv - deconvolution.

Unet which is a widely applied DCNN structure with performance advantage for bio-medical semantic segmentation is used as the basic framework in this paper, its architecture is shown in fig. 3

. It gradually increases the receptive field (the pixels it sees) with max-pooling layers, resulting decreased spatial dimensions. Then Unet recovers and increases the spatial dimension with deconvolutional layers which is similar to convolutional layers but with

.

An increased receptive field is useful for extracting the semantic information. However, the decreased spatial dimension is harmful for spatial information. Skip connections are used to concatenated the feature maps from shallower layers and deeper layers to combine the semantic and spatial information.

All the convolutional and deconvolutional layers are followed with a relu activation and dropout, except the

convolutional layer at the last which predicts the class probability. It is only with a relu activation and without dropout.

Softmax is used to transform the final feature map into probabilities and cross-entropy is used as the loss function:

(2)

Where is the ground truth,

is the prediction probability. Stochastic Gradient Descent (SGD) is adopted to train the

to obtain a minimum loss. When the distribution of changes, is trained to fit this distribution change rather than the real content or the loss function, which increases the difficulty of training the DCNN model. This phenomenon is defined as interval covariate shift.

Fig. 4: An intuitive illustration of mean and variance calculation for the BN (a), IN (b), LN (c) and GN (d).

Ii-B Batch Normalization (BN)

BN is the very first proposed algorithm for solving the interval covariate shift. It normalizes with the mean of and the variance of while maintains the representation capability of DCNN with two more trainable parameters - .

In BN [14], the mean and variance of a feature map is calculated along the channel, as shown in fig. 4a:

(3)
(4)

The feature map is normalized by:

(5)

where is a small value used to increase the division stability. After this normalization, is always with the mean of 0 and the variance of 1, which limits the DCNN representation capacity. Another two trainable parameters and are added to recover and increase the representation power:

(6)

BN is applied after the convolutional layer and before the activation, as the value after the convolution is with higher instability than the value after the activation. Even it was recommended to use the moving average mean and variance in the training stage to normalize the test feature map in [14], In this paper, we use the mean and variance of the test feature map to normalize the inference feature map. This inference normalization was recommended in [15] and is suitable for training DCNNs with limited training data.

Ii-C Instance Normalization (IN)

In IN [15], the mean and variance of a feature map is calculated along the channel and batch, as shown in fig. 4b:

(7)
(8)

The feature map is normalized by:

(9)

Then is re-scaled and re-translated by trainable variables and to restore the representation power:

(10)

Ii-D Layer Normalization (LN)

In LN [16], the mean and variance of a feature map is calculated along the batch, as shown in fig. 4c:

(11)
(12)

The feature map is normalized by:

(13)

Then is re-scaled and re-translated by trainable variables and to restore the representation power:

(14)

Ii-E Group Normalization (GN)

In GN [17], the mean and variance of a feature map is calculated along the batch and the channel. The difference between GN and IN/LN is that a group of channels are grouped for the normalization, is the group number, of the channel number per group, as shown in fig. 4c,:

(15)
(16)

The feature map is normalized by:

(17)

Then is re-scaled and re-translated by trainable variables and to restore the representation power:

(18)

The number of additional trainable variables in BN, IN, LN and GN for one feature map are , , and .

Ii-F Data Collection and Implementation

To maintain the diversity of validation data, image modality and image resolution, RV scanned with MR, with image resolution and aorta scanned with CT, with image resolution were used for the validation.

37 RV scans were acquired from a 1.5T MR scanner (Sonata, Siemens, Erlangen, Germany), from both the asymptomatic and Hypertrophic cardiomyopathy (HCM) subjects, from the atrioventricular ring to the apex, with a slice gap, with a pixel spacing, and with time frames for the cardiac cycle. 6082 images were collected in total for the RV. To achieve a ground truth with high precision and consistence, all the images were labelled by one expert with Analyze (AnalyzeDirect, Inc, Overland Park, KS, USA). The 37 subjects were split randomly into four groups: Subject 1-9, Subject 10-18, Subject 19-27, and Subject 28-37. In the cross validation, one group was used as the testing while the other three groups were used for the training. In all other experiments, Subject 28-37 were used as the testing while Subject 1-9, Subject 10-18, and Subject 19-27 were used as the training.

20 aortic CT scans were acquired from the VISCERAL data set [22]. 4631 images were collected in total for the aorta. The 20 subjects were split randomly into three groups: Subject 1-7, Subject 8-14, and Subject 15-20. In the cross validation, one group was used as the testing while the other two groups were used for the training. In all other experiments, Subject 15-20 were used as the testing while Subject 1-7 and Subject 8-14 were used as the training.

Training images were augmented by rotation from with as the interval. All images were normalized by their maximum intensity. As the performance of normalization methods is the main focus of this paper, no evaluation images were split.

The kernel size of the convolutional and deconvolutional layers is 3, except the last convolutional layer whose kernel size is 1. The pool size for max-pooling is 2. The stride of the deconvolutional layer is 2. The root number of feature map - C is 64. The moment is 0.9. Dropout rate is 0.25 - dropout 25% parameters. Both the RV and aortic DCNN were trained with two epochs and with step-wise learning rate. The learning rate in the first epoch was set as 0.05 for the RV and 0.05 for the aorta while that in the second epoch was set as 0.005 for the RV and 0.01 for the aorta empirically.

The DCNN framework is programmed with Tensroflow Estimator Application Programming Interface (API) tersely, with about 200 lines of codes. The convolutional, deconvolutional, and max-pooling layers are programmed with tf.layers. The programming of BN is from tf.layers while that of IN, LN and GN are from tf.contrib.layers. The data is fed into tensorflow with tfrecords and tf.data API. The images were shuffled when generating the tfrecords files and then was shuffled again with a shuffle size of 500 during the feeding, which guaranteed random input images.

Dice Similarity Coefficient (DSC) was calculated as the evaluation metric:

(19)

where is the ground truth and is the prediction probability. Only the foreground DSC is shown in this paper. As only two classes exist, the trend of background DSC is the same as the foreground DSC.

Row Test Patient Index Batch Size Group Number Unet Unet-BN Unet-IN Unet-LN Unet-GN Training Time
1 28-37 15 32 0.5192 0.5834 0.5597 0.5934 0.5799 1.3h
2 28-37 8 32 0.5355 0.5969 0.5495 0.5931 0.6315 2h
3 28-37 1 32 0.6201 0.6520 0.6320 0.6273 0.6800 6h
4 28-37 1 16 0.6201 - - - 0.6692 -
5 28-37 1 8 0.6201 - - - 0.6610 -
6 1-9 1 32 0.6434 - - - 0.7022 6h
7 10-18 1 32 0.5675 - - - 0.5995 6.25h
8 19-27 1 32 0.6741 - - - 0.7352 6.25h
TABLE I: Mean DSCs and training time of the RV segmentation with different batch size, data division and normalization methods, with the highest DSC in bold.
Row Test Patient Index Batch Size Group Number Unet Unet-BN Unet-IN Unet-LN Unet-GN Training Time
1 15-20 3 32 0.6046 0.6046 0.6049 0.6148 0.6197 7h
2 15-20 1 32 0.6435 0.6269 0.6368 0.6319 0.6684 8.5
3 15-20 1 16 0.6435 - - - 0.6431 -
4 15-20 1 8 0.6435 - - - 0.6816 -
5 1-7 1 8 0.6974 - - - 0.6937 6.25h
6 8-14 1 8 0.6388 - - - 0.6516 7h
7 1-7 1 16 0.6974 - - - 0.7173 -
8 1-7 1 32 0.6974 - - - 0.7325 -
TABLE II: Mean DSCs and training time of the aortic segmentation with different batch size, data division and normalization methods, with the highest DSC in bold.

Iii Results

Different batch sizes and different normalization methods are explored in section III-A. The impact of group number in Unet-GN is shown in section III-B. Cross validation is stated in section III-C. The meanstd DSC for each patient and detailed segmentation results are illustrated in section III-D. The convergence of Unet, Unet-BN, Unet-IN, Unet-LN and Unet-GN are stated in section III-E.

Iii-a The Influence of Batch Size

Due to the different resolutions of RV and aortic images, three batch sizes - 15, 8 and 1 were explored for the RV segmentation, with the mean DSCs shown in Row. 1-3 in table I. Two batch sizes - 3 and 1 were explored for the aortic segmentation, with the mean DSCs shown in Row. 1-2 in table II.

Unet-BN achieved 0.03-0.07 DSC improvements on the RV segmentation while achieved similar or worse results (-0.02) for the aortic segmentation. Unet-IN acquired smaller improvements () on the RV data while acquired similar or worse results (-0.01) on the aortic data. Unet-LN realized higher improvements for large batch size (0.08 for the RV and 0.01 for the aorta) while realized minor or worse results for small batch size (0.007 for the RV and -0.01 for the aorta). Unet-GN achieved noticeable improvements for both data sets, 0.06-0.1 for the RV and 0.01-0.02 for the aorta.

For both the RV and aortic segmentation with Unet, batch size 1 indicated better performance, 0.6201 for the RV and 0.6435 for the aorta, than larger batch size, 0.5192 for the RV and 0.6046 for the aorta. However, the training time also increased along the decreasing batch size, as shown in the column - ”Training Time” in table I and table II. As the training time is mainly influenced by the batch size and is hardly influenced by the normalization, only the training time with different batch size is illustrated.

Unet-GN out-performed Unet, Unet-BN, Unet-IN with noticeable margins for all tests and under-performed Unet-LN in only one test (the RV segmentation with batch size of 15). We can conclude that GN is more suitable for bio-medical semantic segmentation, especially when the batch size is as small as 1.

Iii-B The Influence of Group Number

Although group number - 32 was recommended as the default in [17], we still explored other alternatives of 16 and 8, with the mean DSCs for the RV/aorta shown in Row 3-5/Row 2-4 in table I/table II respectively. For the RV segmentation, group number 32 achieved the highest mean DSC, while for the aortic segmentation, group number 8 achieved the highest mean DSC. It is uncritical to conclude a best choice for the group number.

Iii-C Cross Validation

To prove the robustness of the improvement of Unet-GN over Unet to data division, cross validation is performed for Unet and Unet-GN. The four cross validation results with Subject 1-9, Subject 10-18, Subject 19-27 and Subject 28-37 as the test subjects are shown in Row 6, 7, 8 and 1 in table I. The three cross validation results with Subject 1-7, Subject 8-14, and Subject 15-20 as the test subjects are shown in Row 5, 6 and 4 in table II. Group number was set as 32 for the RV and was set as 8 for the aorta, as concluded from section III-B. Unet-GN out-performed Unet with 0.03-0.06 DSC for the RV and 0.02 DSC for the aorta except the test in Row 5 in table II. We think this exception is due to the group number and further explored 16 and 8, with the results shown in Row 7-8 in table II. 0.02 and 0.04 DSC improvements were achieved for group number 16 and 32 respectively.

Fig. 5: meanstd DSCs for each patient for the RV (a) and aorta (b) segmented with Unet, Unet-BN, Unet-IN, Unet-LN and Unet-GN

Fig. 6: Segmentation examples: a) one 2D segmentation example of the RV; b) the segmented aorta of one patient in 3D; red - the ground truth, green - the segmentation results, yellow - the overlap between the ground truth and the segmentation results.

Iii-D Patient Accuracy and Segmentation Results

The meanstd DSCs segmented in Row 3 in table I and Row 2 in table II are calculated for each RV and aortic subject, as shown in fig. 5. Unet-GN out-performed other methods especially for under-performed subjects, i.e. subject 3 for the RV, as it is with lower intensity and subject 1 for the aorta, as it is with smaller aortic size. Improved data augmentation may facilitate these two subjects, however this is out of the scope of this paper which is the normalization in DCNN training.

As the RV MR images were with 10mm slice gap, no 3D reconstruction can be extracted. One 2D segmentation example is shown in fig. 6a which could be used to instantiate 3D RV shapes and hence navigate cardiac robotic interventions. One 3D aortic reconstruction from the proposed segmentation is shown in fig. 6b which could be registered to navigate the Magellan (Hansen Medical, CA, USA) robotic system.

Fig. 7: The training loss of Unet, Unet-BN, Unet-IN, Unet-LN and Unet-GN for RV (a) and aortic (b) segmentation, the loss was truncated to 0.1 and 0.03 for the RV and aorta for clear plot, the losses were recorded every 200 steps, a moving average with a window of 31 was applied on the losses to remove noises.

Iii-E Convergence

The training losses of the RV segmentation and aortic segmentation are shown in fig. 7. Even the DCNN converged very early (the first 1/3 epoch), two epochs were trained to illustrated the convergence more clearly. Unlike the report in [14]

where the DCNN was trained 14 times faster, in this paper, Unet converged slightly slower while Unet-IN converged slightly faster, in general, there is not too much difference. This may due to that the task in this paper is easier than the Imagenet classification which contains 1.2 million images and 1000 classes.

Two workstations with an Intel® Xeon(R) CPU E5-1650 v4 @ 3.60GHz × 12 and an Intel® Xeon(R) CPU E5-1620 v4 @ 3.50GHz × 8 CPU were used for the training and testing. The GPUs used are Titan Xp with 12G memory and 1080 Ti with 11G memory.

Iv Discussion

Most DCNNs for semantic segmentation applied BN as the normalization method. For bio-medical semantic segmentation which is usually trained from scratch, it is possible to substitute the BN with other normalization methods for higher performance. In this paper, we proved that small batch size contributes to the segmentation performance with the cost of longer training time. At the batch size of 1, GN out-performed BN, IN and LN based on the validation on RV and aorta. The choice of group number is related to the task and data and a best solution could not be determined. The normalization methods including the BN, IN, LN and GN speed up the convergence slightly. Cross validation proves the robustness of the improvement of Unet-GN over Unet.

Even the focus of this paper is a fundamental problem in training DCNNs for bio-medical semantic segmentation - normalization, this paper connects and contributes to surgical robotic vision. The two segmented anatomies - RV and aorta could be used for cardiac robotic navigation and surgical robotic path planning, based on previous work of 3D shape instantiation in [2] and [23] respectively.

V Conclusion

This paper explores the bio-medical semantic segmentation in surgical robotic vision and focuses on the normalization in training DCNNs. Four most popular normalization methods -BN, IN, LN and GN are reviewed and compared in details. Small batch size indicates higher segmentation accuracy and GN out-performed all the other three normalization methods including the widely-applied BN. Different from the conclusion in [17] where different group numbers perform similarly, different group numbers perform differently in bio-medical semantic segmentation in this paper.

Acknowledgment

This work is supported by EPSRC project grant EP/L020688/1. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

References

  • [1] P.-Y. Kao, T. Ngo, A. Zhang, J. Chen, and B. Manjunath, “Brain tumor segmentation and tractographic feature extraction from structural mr images for overall survival prediction,” arXiv preprint arXiv:1807.07716, 2018.
  • [2] X.-Y. Zhou, G.-Z. Yang, and S.-L. Lee, “A real-time and registration-free framework for dynamic shape instantiation,” Medical image analysis, vol. 44, pp. 86–97, 2018.
  • [3] X.-Y. Zhou, J. Lin, C. Riga, G.-Z. Yang, and S.-L. Lee, “Real-time 3-d shape instantiation from single fluoroscopy projection for fenestrated stent graft deployment,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 1314–1321, 2018.
  • [4]

    Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”

    Nature, vol. 521, no. 7553, pp. 436–444, 2015.
  • [5] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2015, pp. 3431–3440.
  • [6] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.   Springer, 2015, pp. 234–241.
  • [7] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” arXiv preprint arXiv:1702.05747, 2017.
  • [8] X. Li, Q. Dou, H. Chen, C.-W. Fu, X. Qi, D. L. Belavỳ, G. Armbrecht, D. Felsenberg, G. Zheng, and P.-A. Heng, “3d multi-scale fcn with random modality voxel dropout learning for intervertebral disc localization and segmentation from multi-modality mr images,” Medical image analysis, vol. 45, pp. 41–54, 2018.
  • [9] O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, X. Yang, P.-A. Heng, I. Cetin, K. Lekadir, O. Camara, M. A. G. Ballester et al., “Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: Is the problem solved?” IEEE Transactions on Medical Imaging, 2018.
  • [10] Z. Xiong, V. V. Fedorov, X. Fu, E. Cheng, R. Macleod, and J. Zhao, “Fully automatic left atrium segmentation from late gadolinium enhanced magnetic resonance imaging using a dual fully convolutional neural network,” IEEE Transactions on Medical Imaging, 2018.
  • [11] J. Zhang, A. Saha, Z. Zhu, and M. A. Mazurowski, “Hierarchical convolutional neural networks for segmentation of breast tumors in mri with application to radiogenomics,” IEEE transactions on medical imaging, 2018.
  • [12] K. López-Linares, N. Aranjuelo, L. Kabongo, G. Maclair, N. Lete, M. Ceresa, A. García-Familiar, I. Macía, and M. A. G. Ballester, “Fully automatic detection and segmentation of abdominal aortic thrombus in post-operative cta images using deep convolutional neural networks,” Medical image analysis, vol. 46, pp. 202–214, 2018.
  • [13] X.-Y. Z. C. Riga, S.-L. Lee, and G.-Z. Yang, “Towards automatic 3d shape instantiation for deployed stent grafts: 2d multiple-class and class-imbalance marker segmentation with equally-weighted focal u-net,” IROS, 2018.
  • [14] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in

    International Conference on Machine Learning

    , 2015, pp. 448–456.
  • [15] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization. corr (2016),” arXiv preprint arXiv:1607.08022, 2016.
  • [16] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” stat, vol. 1050, p. 21, 2016.
  • [17] Y. Wu and K. He, “Group normalization,” arXiv preprint arXiv:1803.08494, 2018.
  • [18] T. Salimans and D. P. Kingma, “Weight normalization: A simple reparameterization to accelerate training of deep neural networks,” in Advances in Neural Information Processing Systems, 2016, pp. 901–909.
  • [19] C. Luo, J. Zhan, L. Wang, and Q. Yang, “Cosine normalization: Using cosine similarity instead of dot product in neural networks,” arXiv preprint arXiv:1702.05870, 2017.
  • [20]

    M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213–3223.
  • [21] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295–2329, 2017.
  • [22] O. Jimenez-del Toro, H. Müller, M. Krenn, K. Gruenberg, A. A. Taha, M. Winterstein, I. Eggel, A. Foncubierta-Rodríguez, O. Goksel, A. Jakab et al., “Cloud-based evaluation of anatomical structure segmentation and landmark detection algorithms: Visceral anatomy benchmarks,” IEEE transactions on medical imaging, vol. 35, no. 11, pp. 2459–2475, 2016.
  • [23] D. Toth, M. Pfister, A. Maier, M. Kowarschik, and J. Hornegger, “Adaption of 3d models to 2d x-ray images during endovascular abdominal aneurysm repair,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.   Springer, 2015, pp. 339–346.