A Strategy of MR Brain Tissue Images' Suggestive Annotation Based on Modified U-Net

by   Yang Deng, et al.
Tsinghua University

Accurate segmentation of MR brain tissue is a crucial step for diagnosis,surgical planning, and treatment of brain abnormalities. However,it is a time-consuming task to be performed by medical experts. So, automatic and reliable segmentation methods are required. How to choose appropriate training dataset from limited labeled dataset rather than the whole also has great significance in saving training time. In addition, medical data labeled is too rare and expensive to obtain extensively, so choosing appropriate unlabeled dataset instead of all the datasets to annotate, which can attain at least same performance, is also very meaningful. To solve the problem above, we design an automatic segmentation method based on U-shaped deep convolutional network and obtain excellent result with average DSC metric of 0.8610, 0.9131, 0.9003 for Cerebrospinal Fluid (CSF), Gray Matter (GM) and White Matter (WM) respectively on the well-known IBSR18 dataset. We use bootstrapping algorithm for selecting the most effective training data and get more state-of-the-art segmentation performance by using only 50 of MR brain tissue images' suggestive annotation for unlabeled medical data based on the modified U-net. The proposed method performs fast and can be used in clinical.



page 5


DASN:Data-Aware Skilled Network for Accurate MR Brain Tissue Segmentation

Accurate segmentation of MR brain tissue is a crucial step for diagnosis...

A Fast Segmentation-free Fully Automated Approach to White Matter Injury Detection in Preterm Infants

White Matter Injury (WMI) is the most prevalent brain injury in the pret...

A cross-center smoothness prior for variational Bayesian brain tissue segmentation

Suppose one is faced with the challenge of tissue segmentation in MR ima...

Automatic brain tissue segmentation in fetal MRI using convolutional neural networks

MR images of fetuses allow clinicians to detect brain abnormalities in a...

Markov Random Field Segmentation of Brain MR Images

We describe a fully-automatic 3D-segmentation technique for brain MR ima...

Efforts estimation of doctors annotating medical image

Accurate annotation of medical image is the crucial step for image AI cl...

Transfer Learning by Asymmetric Image Weighting for Segmentation across Scanners

Supervised learning has been very successful for automatic segmentation ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the advent of aging society, degenerative diseases of the central nervous system are becoming more common. Degenerative diseases of the central nervous system refer to a group of diseases caused by degeneration of chronic progressive central nervous system. The degeneration and loss of neurons in the brain spinal cord can be seen in pathology. Magnetic resonance (MR) can be more clearly and safer to display the structure of the brain because of its non-invasive, non-radioactive, free selection profile, higher signal to noise ratio, and higher resolution of the soft tissue with smaller density difference, so as to provide more information for the pathological diagnosis of brain diseases. It has become a common method for the examination of brain diseases. The precise segmentation of brain tissue is the first step of the volume and quantitative analysis of the brain. It is of great significance to the diagnosis and treatment of brain diseases, especially the neurodegenerative diseases, and the discovery of many subsequent neurological diseases.

The classical segmentation methods are mainly classified as follows: (1) threshold segmentation (2) segmentation based on regional growth (3) segmentation based on the watershed algorithm (4) segmentation based on the statistical algorithms like the mixed Gauss model (GMM) and Markov random field (MRF) (5) segmentation based on Atlas (6) segmentation based on clustering algorithm like K mean (K-means), fuzzy C mean (FCM) (7) a mixed application of the above algorithm


Since AlexNet [7]

won the ImageNet Challenge in 2012, deep learning attracted the attention of researchers again. Over the last few years, deep learning especially deep convolutional neural networks (CNNs) have emerged as one of the most prominent approaches for image recognition problems in various domains. There are three main ideas for the segmentation of brain MRI image with convolution neural network

[2]: (1) Patch-Wise CNN Architecture. This is a simple approach to train a CNN algorithm for segmentation. A fixed size patch around each pixel is extracted from a given image, and the model is trained on these patches with labels of the patches’ center pixel classes, such as normal brain and tumor. The disadvantages of this method are huge computation and hard to train (2) Semantic-Wise CNN Architecture [8] [12]

. This type of architecture makes predictions for each pixel of the whole input image, which the network needs only one forward inference. This kind of architecture includes encoder part that extracts features and decoder part that combines lower level features from the encoder part to form abstract features. The input image is mapped to the segmentation labels in a way that minimizes a loss function. (3) Cascaded CNN Architecture

[5]. This type of architecture combines two CNN architectures. The first CNN is used to train the model for preliminary prediction and the second CNN is used to further adjust the prediction of the first network.

Zhang et al.[10] proposed to use CNN method based on image patches to segment gray matter, white matter and cerebrospinal fluid from multimodal baby MR images, and performed better compared with the traditional methods; Nie et al.[10] proposed a semantic level full convolution network segmentation method and got higher dice similarity coefficient (DSC) than Zhang’s method; Moeskops et al. [9] proposed a multi-scale ( pixels) pach-wise CNN method to segment brain images of infants and young adults; Bao et al. [3]also proposed a multi-scale patch-wise CNN method together with dynamic random walker with decay region of interest to obtain smooth segmentation of subcortical structures in IBSR (developed by the Centre for Morphometric Analysis at Massachusetts General Hospital-available at https://www.nitrc.org/projects/ibsr to download) and LPBA40 datasets; Chen et al. [4] proposed deep voxelwise residual networks for volumetric brain segmentation.

Compared to CNN, FCN has no the full connection layer, which makes it possible to input any size of the image, to greatly reduce the training parameters and to improve the speed of training. U-shaped deep convolutional network, as a kind of outstanding FCN, has excellent performance for biomedical image segmentation and almost becomes the preferred method due to the advantage of light, flexible, strong robustness and needing a small amount of data. In this paper, we designed a U-shaped net to segmentation the MRI brain tissue and proposed a strategy of MR brain tissue images’ suggestive annotation based on our model, both of which achieved promising result.

The remainder of this paper is organized as follows. In section 2, we present our method. Experiments and results are detailed in section 3. Finally, the discussion and the main conclusions are presented in section 4 and section 5 separately.

2 Method

2.1 U-shape Network Architecture

Figure 1 illustrates the network architecture we use in this paper. Like the standard U-Net [12], it has an analysis and a synthesis path each with four resolution steps.

Figure 1: The modified U-Net

In the analysis path, each layer contains two 3×3 convolutions each followed by a rectified linear unit (ReLu), and then a 2×2 max pooling with strides of 2 for down-sampling. In the synthesis path, each layer consists of an upconvolution of 2×2 by strides of one in each dimension, followed by two 3×3 convolutions each followed by a ReLu. To keep the same shape after convolution, we use padding. Shortcut connections from layers of equal resolution in the analysis path provide the essential high-resolution features to the synthesis path

[1]. In the last layer, a 1×1 convolution reduces the subject of output channels to the subject of labels which is 4 in our case. The architecture has 31030788 parameters in total.

Like suggested in [13] we avoid bottlenecks by doubling the subject of channels already before max pooling. We also adopt this scheme in the synthesis path. The input size to the network is 64x64 and the output is 64x64x4, where the four channel corresponds to the background, CSF, GM, and WM.

Different from the purely U-net[12], our network can segment CSF, GM and WM three tissues at once because we use our own loss function which will be detailed in section 3. At the same time, the shape of output is the same as input owing to the use of padding.

2.2 Bootstrapping algorithm for selecting the most effective training data

From Lin Yang’s paper [15], we know that the quantity and quality of train dataset is very important for the model. Lin Yang’s paper indicates that state-of-the-art segmentation performance can be achieved by using only 50% of training data. Inspired by this result, we use bootstrapping algorithm to pick out the most effective 50% training data on IBSR18 dataset. Bootstrapping [6]

is a standard way for evaluating the uncertainty of learning models. Its basic idea is to train a set of models while restricting each of them to use a subset of the training data (generated by sampling with replacement) and calculate the variance (disagreement) among these models. The measure of variance we used in this paper is dice similarity coefficient and lower dice similarity coefficient means higher variance.

To ensure to choose the training data randomly, we focus on the data of subject 01,02,03,04,05,10,11,12,13,14 and what we want to do is to choose the more valuable data in this ten datasets, which make the model more excellent. The evaluation standard to determine whether the selected data is more effective than others is the average dice similarity coefficient of remaining 8 sets of testing data.

Figure 2: The process to find the more valuable data from training dataset

2.3 The Strategy of MR Brain Tissue Images’ Suggestive Annotation Based on Modified U-Net

Figure 3 shows the process to find the more valuable data from unlabeled dataset. We used the training data selected in section 2.2(dataset subject 3, 10, 11, 12, 13) to train our segmentation model. When the model was trained, we used it to predict the unlabeled data (in this paper, we abandoned the labels of dataset subject 6, 7, 8, 9, 15, 16, 17, 18 and regarded them as unlabeled dataset) and got the predicted label. Then, we combined this label with the preprocessed unlabeled data as new training sets and separately added them into the original training set one at a time to train the model again. We kept repeating this process with fixed epochs and got the unlabeled data’s model. Finally, we got 8 models for 8 unlabeled testing sets. Next, we used the 8 new trained model to predict the testing set (dataset subject 1, 2, 4, 5, 14) and attained 8 average DSC of testing set for the 8 model. Finally, we sorted the DSC according to their numeric value. The lower DSC meant the higher variance and should be label.

Figure 3: The process to find the more valuable data from unlabeled dataset

3 Rusults

3.1 Dataset and pre-processing

We validated our method on the well-known IBSR18 dataset111https://www.nitrc.org/frs/?group_id=48, which is one of the standard datasets for tissue quantification and segmentation evaluation. The dataset consists of 18 MRI volumes and the corresponding ground truth (GT) is provided.

Typical preprocessing steps for structural brain MRI include the following key steps [2]: registration, skull stripping, bias field correction, intensity normalization and noise reduction. With advent of deep learning techniques, some of the preprocessing steps became less critical for the final segmentation performance [2]. In the given dataset, skull-stripping and bias field correction algorithm were already applied. Thus, we do not need to apply these techniques.

3.2 Segmentation quantitative evaluation and comparison

In this paper, three metrics are used to evaluate the segmentation result: DSC (Dice’s coefficient), HD (Hausdorff distance) and AVD (Absolute Volume Difference). First, the DSC, is the most used metric in the evaluation of medical volume segmentations. In addition to the direct comparison between automatic and ground truth segmentations, it is common to use the DSC to measure reproducibility (repeatability) [14]. DSC is computed by:


Where TP, FP and FN are the subjects of true positive, false positive and false negative predictions for the considered class. Second, the distance between crisp volumes (HD) between two finite point sets A and B is defined by:


Where h(A,B)is called the directed HD and given by:


Finally, the AVD is defined by:


Where A is ground truth and B is predicted volume of one class.

Regarding the evaluation of testing data, we compared our method with several state-of-the-art methods, including Moeskops’ multi-scale ( pixels) patch-wise CNN method[3]and Chen’s voxel based residual network [4]. For all the there methods, we used IBSR subject 01-05 as training set and the remaining 13 subjects as testing set.

Figure 4: Boxplot of each metrics with modified U-Net
Method CSF GM WM Time(s)
Ours 81.16 28.27 37.50 89.46 12.47 21.11 88.73 11.39 21.65 40
VoxResNet [4] 79.55 30.39 42.12 88.09 12.08 18.15 88.92 9.15 21.89 100
Multi-scale CNN [3] 63.01 80.53 82.16 3500
Table 1: Results of MR Brain Segmentation using Different Methods (HD: mm)( Moeskops method is too hard to train and the cost of time is huge, we chose the best result of several experiments here, the metrics (AVD and HD) have been unfortunately lost, so we show the DSC only. ) (DC: %, HD: mm, AVD: %).

From Table 1, we can see that the modified U-Net achieved quite good performance than others.

3.3 The most effective training data

With bootstrapping algorithm, we got the result of different training sets below.

Training set Dice of CSF Dice of GM Dice of WM Average dice
1 0.8062 0.8886 0.8851 0.8600
2 0.8296 0.8961 0.8832 0.8697
3 0.8525 0.9114 0.9010 0.8883
4 0.8678 0.9185 0.9070 0.8978
5 0.8102 0.8873 0.8772 0.8582
Table 2: The result of different training sets used on modified U-Net (training set 1 including subject 1,2,4,5,14;training set 2 including subject 1,2,3,4,5;training set 3 including subject 10,11,12,13,14; training set 4 including subject 3,10,11,12,13 and training set 5 including subject 1,2,3,4,5,10,11,12,13,14. Testing set are subject 6 7 8 9 15 16 17 18)

Fig. 5 is the different results while using different training sets on modified U-Net in table2.

Figure 5: The different results while using different training sets

From Table 2, we find that different training set has great influence on the result and the model performance is not always better with more training data, even the model will be bad when the training set is too many. So picking out the more valuable training set has great significance for saving training time and even for improving the performance of model.

3.4 Suggestive Annotation

We used the most effective training set acquired from section 3.3 to predict the dataset composed of subject 6, 7, 8, 9, 15, 16, 17, 18, which we assumed that they were not labeled. Using the method in section 2.3, we added the predicted label with its preprocessed image to the original training set one at a time to become new training set and predicted testing set (subject 1, 2, 4, 5, 14). Following are the results.

Training set(subject) Dice of CSF Dice of GM Dice of WM Average dice
3 10 11 12 13 6 0.8440 0.9071 0.8912 0.8807
3 10 11 12 13 7 0.8465 0.8988 0.8848 0.8767
3 10 11 12 13 8 0.8548 0.9078 0.8919 0.8848
3 10 11 12 13 9 0.8566 0.9020 0.8871 0.8819
3 10 11 12 13 15 0.8650 0.9085 0.8915 0.8883
3 10 11 12 13 16 0.8530 0.9053 0.8913 0.8832
3 10 11 12 13 17 0.8462 0.9068 0.8903 0.8811
3 10 11 12 13 18 0.8578 0.9111 0.8960 0.8883
Table 3: The result with different training sets on testing set

From Table 3, we think the subject 7, 6, 17, 9 are the suggestive annotation because their DSCs are lower. To prove whether the data we picked out is more valuable, we used bootstrapping method like section 2.2 because the data we used here (subject 6, 7, 8, 9, 15, 16, 17, 18) have been actually already labeled. Results are below (testing sets here are themselves):

Training set(subject) Dice of CSF Dice of GM Dice of WM Average dice
15 18 8 16 0.7976 0.8922 0.8824 0.8574
7 8 9 15 0.8071 0.8879 0.8808 0.8586
7 6 17 9 0.8168 0.8900 0.8790 0.8619
6 7 8 9 15 16 17 18 0.7928 0.8860 0.8746 0.8511
Table 4: The result of proving suggestive annotation

From Table 4, we know that the dataset we picked out using our strategy are actually worth to be labeled because they get higher DSC.

3.5 Implementation details and Computation cost

The project was to find a method which returns the highest average DSC of three tissues, we made our own loss function as following:


where and are predicted and ground-truth for class i, respectively.

In the stage of segmentation reconstruction, we found the maximum probability among four classes and returned the corresponding label for each voxel rather than finding the optimized threshold.

Figure 6: Dice loss as a function of a training epoch for our proposed models.(label_0_dice-coef, label_1_dice-coef , label_2_dice-coef , label_3_dice-coef means the DSC of background, CSF, GM and WM respectively.)

The network was trained for 500 epochs on a single NVIDIA TitanX GPU. In order to prevent the network from over-fitting, we applied early stopping in the training process. The training process was automatically terminated when the validation accuracy did not increase after 30 epochs, which took approximately 3 hours for the whole training process. We used the glorot_uniform initialization and the Adam algorithm in keras. Segmentation runtime is 40-50 seconds for processing each set of testing data (size 256 ×128×256). Thus, our method is fast enough to be used in clinical practice.

4 Discusion

From Table 1 and Fig. 4, we can see that the result of CSF is worse than GM and WM due to the small number of CSF, so we can enhance the data of CSF to balance the samples in future research.

In the initial experiment, we tried normalization and Contrast-limited Adaptive Histogram Equalization (CLAHE) described in [4], but it benefited little for the segmentation result of ours method, so we did not use these preprocessing technology. On the other hand, we found BN layer had no effect on our model and we did not apply it either.

To exploit the relationship between training sets with deep learning models, we used the method described in section 2.2 on the VoxResNet [4] model. We found that specific training sets are needed for different deep learning models. Table 5 shows the most effective training sets found by the VoxResNet [4] model. We can see that when we used the VoxResNet model, the performance would be better on using the whole training sets. In other words, each model has its own strong point and making full use of these strong point has great significance in improving the final segmentation performance, which is what we will continue to research in our future work.

Training set Dice of CSF Dice of GM Dice of WM Average dice
1 0.7555 0.8861 0.8837 0.8418
2 0.7939 0.8873 0.8868 0.8560
3 0.7586 0.9166 0.9042 0.8598
4 0.7824 0.8915 0.8898 0.8545
5 0.7983 0.9114 0.8980 0.8692
Table 5: The result of different training sets used on VoxResNet (training set 1 including subject 1,2,4,5,14;training set 2 including subject 1,2,3,4,5;training set 3 including subject10,11,12,13,14; training set 4 including subject 3,10,11,12,13 and training set 5 including subject 1,2,3,4,5,10,11,12,13,14. Testing set are subject 6 7 8 9 15 16 17 18)

5 Conclusions

In this paper, we segmented MR brain tissue with modified U-shape model and achieved better result compared with some other method and our method can be used in clinical. In addition, we chose more valuable data based on modified U-Net from training sets. Finally, we proposed an effective strategy of MR brain tissue images’ suggestive annotation with trained model.


  • [1] A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: Learning dense volumetric segmentation from sparse annotation,” pp. 424–432, 2016.
  • [2] Z. Akkus, A. Galimzianova, A. Hoogi, D. L. Rubin, and B. J. Erickson, “Deep learning for brain mri segmentation: State of the art and future directions,” Journal of Digital Imaging, vol. 30, no. 4, pp. 1–11, 2017.
  • [3] S. Bao and A. C. S. Chung, “Multi-scale structured cnn with label consistency for brain mr image segmentation,” pp. 1–5, 2015.
  • [4] H. Chen, Q. Dou, L. Yu, J. Qin, and P. A. Heng, “Voxresnet: Deep voxelwise residual networks for brain segmentation from 3d mr images,” Neuroimage, vol. 170, 2017.
  • [5] Q. Dou, H. Chen, L. Yu, L. Zhao, J. Qin, D. Wang, V. C. Mok, L. Shi, and P. A. Heng, “Automatic detection of cerebral microbleeds from mr images via 3d convolutional neural networks,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1182–1195, 2016.
  • [6] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap.   Chapman & Hall,, 1993.
  • [7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in International Conference on Neural Information Processing Systems, 2012, pp. 1097–1105.
  • [8] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in

    IEEE Conference on Computer Vision and Pattern Recognition

    , 2015, pp. 3431–3440.
  • [9] P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries, M. J. Benders, and I. Isgum, “Automatic segmentation of mr brain images with a convolutional neural network.” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1252–1261, 2016.
  • [10] D. Nie, L. Wang, Y. Gao, and D. Shen, “Fully convolutional networks for multi-modality isointense infant brain image segmentation,” Proc IEEE Int Symp Biomed Imaging, vol. 108, pp. 1342–1345, 2015.
  • [11] Z. Qin, F. Wang, Z. Xiao, T. Lan, and Y. Ding, “Brain tissue segmentation with the gka method in mri,” in IEEE International Conference on Signal and Image Processing, 2017, pp. 273–276.
  • [12] O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation.   Springer International Publishing, 2015.
  • [13] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.
  • [14] A. A. Taha and A. Hanbury, “Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool,” BMC Medical Imaging,15,1(2015-08-12), vol. 15, no. 1, p. 29, 2015.
  • [15]

    L. Yang, Y. Zhang, J. Chen, S. Zhang, and D. Z. Chen, “Suggestive annotation: A deep active learning framework for biomedical image segmentation,” pp. 399–407, 2017.