1 Introduction
Image segmentation is one of the central problems in medical imaging. It is often more challenging than natural image segmentation as the results are expected to be highly accurate, but at the same time, little training data is provided.
To address these issues, often strong assumptions and anatomical priors are imposed on the expected segmentation results. For quite a few years, the field was dominated by energybased approaches, where the segmentation task is formulated as an energy minimization problem. Different types of regularizers and priors [1] can be easily incorporated into such formulations. Since the seminal work of Chan and Vese [2], the level set has been one of the preferred models due to its ability to handle topological changes of the segmentation function. Nevertheless, there are some limitations of traditional level set approaches: they rely on a good contour initialization and a good guess of parameters involved in the model. Additionally, they often have a relatively simple appearance model despite some progress [1, 3].
The recently introduced deep neural net architectures address some of these issues by automatically learning appearance models from a large annotated dataset. Moreover, FCNs [4] have been proved successful in many segmentation tasks including medical imaging [5, 6]
. Despite their success, FCNs have a few limitations compared to traditional energybased approaches: they have no explicit way of incorporating regularization and prior information. The network often requires a lot of training data and tends to produce lowresolution results due to subsampling in the strided convolutional and pooling layers.
We address these limitations by proposing an integrated FCNlevelset model that iteratively refines the FCN using a level set module. We show that (1) the integrated model achieves good performance even when little training data is available, outperforming the FCN or level set alone and (2) the unified iterative model trains the FCN in a semisupervised way, which allows an efficient use of the unlabeled data. In particular, we show that using only a subset of the training data with labels, the jointlytrained FCN achieves comparable performance with the FCN trained with the whole training set.
Few other works address the problem of introducing smoothness into convolutional nets by using explicit regularization terms in the cost function [7] or by using a conditional random field (CRF) either as a postprocessing step [8, 9] or jointly trained [10] with the FCN or CNN. However, only specific graphical models can be trained with the FCN pipeline, and they cannot easily integrate shape priors. A joint deep learning and level set approach has also been recently proposed [11, 12]
, but their work considers a generative model (deep belief network DBM) that is not trained by the joint model.
2 Methods
An overview of the proposed FCNlevelset framework is shown in Fig. 1. The FCN is pretrained with a small dataset with labels. Next, in the semisupervised training stage, the integrated FCN and level set model is trained with both labeled (top half of Fig.1) and unlabeled (bottom half of Fig.1
) data. The segmentation is gradually refined by the level set at each iteration. Based on the refined segmentation the loss is computed and backpropagated through the network to improve the FCN. With the FCN trained in this manner, inference is done in the traditional way. A new image is fed to the network and a probability map of the segmentation is obtained. The output can be further refined by the level set if desired. The following subsections show details on the level set model as well as the integration with FCN.
2.1 The level set method
Following traditional level set formulations [2, 13], an optimal segmentation is found by minimizing a functional of the following form:
(1) 
where , defined over the image domain , is a signed distance function that encodes the segmentation boundary. and are constants manually tuned and kept fixed during experiments. The nonuniform smoothness term has the form of a weighted curve length:
(2) 
where is the regularized Dirac function. The weights are inversely proportional to the image gradients. The data term models the object/background intensity distribution as:
(3) 
where is the Heaviside function. and
are the probabilities belonging to object/background regions. In our model, the probabilities estimated by the FCN are used. The shape term is a critical component in knowledgebased segmentation. Based on the squared difference between the evolving level set and the shape prior level set, we choose
[14]:(4) 
where denotes the shape model. is an affine transformation between the shape prior and the current segmentation.
To optimize the segmentation energy functional Eq. (1), calculus of variations is utilized. The position of the curve is updated by a geometric flow. Jointly with the evolution of , the transformation parameters are also estimated.
2.2 The integrated FCNlevelset model
FCN. We create a shallow FCN to make the network less prone to overfitting on a small training set. Fewer pooling layers are added to achieve a finer segmentation. We only have 4 convolutional layers with 23938 parameters and a total subsampling rate of 8 for liver segmentation, while 7 convolutional layers with 51238 parameters and a total subsampling rate of 6 are used for the left ventricle segmentation. Each convolution contains a
filter followed by a rectified linear unit (ReLu), and then a max pooling with strides of two. An upconvolution is added right after a
convolution to achieve a dense prediction. During training, we randomly initialize all new layers by drawing weights from a zeromean Gaussian distribution with standard deviation 0.01 and ADADELTA
[15] was used to optimize the cross entropy loss:(5) 
where is the number of pixels in one image, (,) denotes the pixel and its label, is the network parameter and denotes the network predicted probability of belonging to the object.
FCNlevelset. The pretrained FCN is further refined by the integrated level set module. Each unlabeled image is fed to the FCN and produces a probability map. It provides the level set module with a reliable initialization and foreground/background distribution for in Eq. (3). The level set further refines the output of the FCN. We compute the cross entropy loss between the FCN prediction and the level set output as the label. This loss is back propagated through the network to update the weights . In this manner, the FCN can implicitly learn the prior knowledge which is imposed on the level set especially from the unlabeled portion of dataset. Tuning the model weights only with the unlabeled data may cause drastic changes and corrupt the well learned weights from the labeled data. This is especially important at the beginning of the joint training when the performance of the system is not yet good. To make the learning progress smooth, the integrated FCNlevelset model is trained with both labeled and unlabeled data as illustrated in Fig. 1.
During the joint training, to ensure a stable improvement, the memory replay technique is used. It prevents some outliers from disrupting the training. In this technique, a dynamic buffer is used to cache the recently trained samples. Whenever the buffer is capped, a training on the buffer data is triggered which updates the network weights. Then the oldest sample is removed from the buffer and next training iteration is initiated.
Inference. To infer the segmentation, a forward pass through the FCN is required to get the probability map. The level set is initialized and the data term is set according to this probability map. Then the final segmentation is obtained by refining the output contour from the FCN. Different from the training where level set is mainly used to improve the performance of the FCN, inference is a postprocessing step and the refined output is not back propagated to the FCN.
3 Experiments and Results
3.1 Data
Liver Segmentation contains a total of 20 CT scans [16]. All segmentations were created manually by radiology experts, working slicebyslice in transversal view. The scans are randomly shuffled and divided into training (10 scans), validation (5 scans) and testing (5 scans). We select the middle 6 slices from each scan and form our data sets.
Left Ventricle Segmentation is a collection of 45 cineMR sequences (about 20 slices each) taken during one breath hold cycle [17]. Manual segmentation is provided by an expert for images at enddiastolic (ED) and endsystolic (ES) phases. The original data division was used in our experiments where the 45 sequences are randomly divided into 3 subsets, each with 15 sequences, for training, validation, and testing. All slices are selected from the scans.
For both datasets, we divide the training set in two, labeled (30% of the liver data; 50% of the left ventricle data) and unlabeled part. The ground truth segmentation is not provided for the unlabeled part to simulate a scenario where only a percentage of data is available for supervised training.
3.2 Experiments
The networks used in the experiments are described in Section 2.2
. As the level set method is sensitive to the initialization, the FCN needs to be pretrained on the labeled part of data. During training, the batch size was 12, and the maximum number of epochs was set to 500. The best model on the validation set was stored and used for evaluations.
The level set models used for liver and left ventricle data are slightly different, depending on their properties. For liver data, we use a weighted curve length as the smoothness term with in Eq. (2). The left ventricle data has no obvious edges, so is used.
All experiments are performed on 2D slices. The shape prior is randomly taken from the ground truth segmentations. To show the superiority of the proposed integrated FCNleveset model and its potential for semisupervised learning we compared five models:

[label=()]

Pretrained FCN: trained with labeled images from the training set (30% of the liver data; 50% of the left ventricle data);

Postprocessing Level set: Model (i) with postprocessing level set;

Jointlytrained FCN: the FCN module of the integrated model jointly trained with the level set;

FCNlevelset: the proposed integrated model as described in Section 2.2.

Baseline FCN: FCN trained on all images in the training set.
The performance of the above methods is evaluated using Dice score and Intersection over Union (IoU) score. Both scores measure the amount of overlap between the predicted region and the ground truth region.
3.3 Results
To illustrate the sensitivity of the level set method to initializations, we compared the output contours under different initializations, as shown in Fig. 2. The level set with manual initialization Fig. 2(a) converges to a proper shape (red) but with low precision. Initialization with the FCN predicted probability map Fig. 2(b) leads to a faster and more accurate convergence after a few iterations. Notably, when initialized with the probability map, the level set managed to eliminate the wrongly segmented parts (yellow) in the FCN output.
The performance of the five models described in Section 3.2 on the two datasets is summarized in Table 1 and illustrated in Fig. 3. Both datasets showed that the joint semisupervised training improves the performance of the deep network. For liver data, the level set initialized with the probability map outperformed the pretrained FCN by . Through the joint training, FCN was finetuned and got an improvement of . The performance of FCNlevelset was increased by for Dice Score and for IoU score compared to the pretrained FCN. Notably, despite using only a few manually labeled images, the integrated model performed even better than the baseline FCN, where the improvement was about in Dice Score and in IoU. The integrated FCNlevelset model has clear advantage in datasets where the segmented object presents a prominent shape.
Dataset  Model 
Dice Score 
IoU 

Liver  Pretrained FCN  0.843  0.729 
Postprocessing level set  0.885  0.794  
Jointlytrained FCN  0.879  0.784  
FCNlevelset  0.923  0.857  
Baseline FCN  0.896  0.811  
Left Ventricle  Pretrained FCN  0.754  0.605 
Postprocessing level set  0.678  0.620  
Jointlytrained FCN  0.772  0.623  
FCNlevelset  0.788  0.635  
Baseline FCN  0.804  0.672 
Left ventricle data is more challenging for the level set model to distinguish the object from the background. Mainly due to the lack of a clear regional property. However, the joint model still improved the performance of the FCN by during training. The FCNlevelset (trained with only 50% data) was able to improve the segmentation result by compared to the pretrained FCN and achieved a comparable performance to the baseline FCN (trained with all data).
4 Discussion
In this paper a novel technique for integrating a level set with a fully convolutional network is presented. This technique combines the generality of convolutional networks with the precision of the level set method. Two advantages of this integration are shown in the paper. First, using the level set initialized with the FCN output can achieve better performance than the FCN alone. Second, as a training technique, by utilizing unlabeled data and jointly training the FCN with the level set, improves the FCN performance. While the proposed model is only 2D binary segmentation and has a simple shape prior, its extension to 3D and more complex probabilistic shape models is straightforward and we are currently working on it.
References

[1]
D. Cremers, M. Rousson, and R. Deriche, “A review of statistical approaches to
level set segmentation: integrating color, texture, motion and shape,”
International Journal of Computer Vision
, vol. 72, April 2007.  [2] T. Chan and L. Vese, “Active contours without edges,” IEEE Trans. Image Processing, vol. 10, no. 2, pp. 266–277, 2001.
 [3] M. B. Salah, A. Mitiche, and I. B. Ayed, “Effective level set image segmentation with a kernel induced data term,” Trans. Img. Proc., vol. 19, no. 1, pp. 220–232, 2010.
 [4] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, p. 3431–3440, 2015.
 [5] O. Ronneberger, P. Fischer, and T. Brox, “Unet: convolutional networks for biomedical image segmentation,” in MICCAI, p. 234–241, 2015.
 [6] T. Brosch, Y. Yoo, L. Y. Tang, D. K. Li, A. Traboulsee, and R. Tam, “Deep convolutional encoder networks for multiple sclerosis lesion segmentation,” in MICCAI, pp. 3–11, 2015.
 [7] A. BenTaieb and G. Hamarneh, “Topology aware fully convolutional networks for histology gland segmentation,” in MICCAI, 2016.
 [8] K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, and B. G. Daniel Rueckert and, “Efficient multiscale 3d cnn with fully connected crf for accurate brain lesion segmentation,” Medical Image Analysis, vol. 36, p. 61–78, 2017.

[9]
J. Cai, L. Lu, Z. Zhang, F. Xing, L. Yang, and Q. Yin, “Pancreas segmentation in mri using graphbased decision fusion on convolutional neural networks,” in
MICCAI, 2016. 
[10]
S. Zheng, S. Jayasumana, B. RomeraParedes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. S. Tor, “Conditional random fields as recurrent neural network,” in
ICCV, p. 1529–1537, 2015.  [11] T. A. Ngo, Z. Lu, and G. Carneiro, “Combining deep learning and level set for the automated segmentation of the left ventricle of the heart from cardiac cine magnetic resonance,” Medical Image Analysis, vol. 35, p. 159–171, 2017.

[12]
F. Chen, H. Yu, R. Hu, and X. Zeng, “Deep learning shape priors for object
segmentation,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 1870–1877, 2013.  [13] N. Paragios and R. Deriche, “Geodesic active regions: A new paradigm to deal with frame partition problems in computer vision,” Visual Communication and Image Representation, vol. 13, pp. 249–268, 2002.

[14]
D. Cremers, S. J. Osher, and S. Soatto, “Kernel density estimation and intrinsic alignment for shape priors in level set segmentation,”
International Journal of Computer Vision, vol. 69, no. 3, p. 335–351, 2006.  [15] M. D. Zeiler, “Adadelta: an adaptive learning rate method,” arXiv preprint arXiv:1212.5701, 2012.
 [16] B. Van Ginneken, T. Heimann, and M. Styner, “3d segmentation in the clinic: A grand challenge,” 3D segmentation in the clinic: a grand challenge, pp. 7–15, 2007.
 [17] P. Radau, “Cardiac MR Left Ventricle Segmentation Challenge.” http://smial.sri.utoronto.ca/LV_Challenge/Home.html, 2008. [Online; accessed 10December2016].
Comments
There are no comments yet.