Transformation Consistent Self-ensembling Model for Semi-supervised Medical Image Segmentation

Deep convolutional neural networks have achieved remarkable progress on a variety of medical image computing tasks. A common problem when applying supervised deep learning methods to medical images is the lack of labeled data, which is very expensive and time-consuming to be collected. In this paper, we present a novel semi-supervised method for medical image segmentation, where the network is optimized by the weighted combination of a common supervised loss for labeled inputs only and a regularization loss for both labeled and unlabeled data. To utilize the unlabeled data, our method encourages the consistent predictions of the network-in-training for the same input under different regularizations. Aiming for the semi-supervised segmentation problem, we enhance the effect of regularization for pixel-level predictions by introducing a transformation, including rotation and flipping, consistent scheme in our self-ensembling model. With the aim of semi-supervised segmentation tasks, we introduce a transformation consistent strategy in our self-ensembling model to enhance the regularization effect for pixel-level predictions. We have extensively validated the proposed semi-supervised method on three typical yet challenging medical image segmentation tasks: (i) skin lesion segmentation from dermoscopy images on International Skin Imaging Collaboration (ISIC) 2017 dataset, (ii) optic disc segmentation from fundus images on Retinal Fundus Glaucoma Challenge (REFUGE) dataset, and (iii) liver segmentation from volumetric CT scans on Liver Tumor Segmentation Challenge (LiTS) dataset. Compared to the state-of-the-arts, our proposed method shows superior segmentation performance on challenging 2D/3D medical images, demonstrating the effectiveness of our semi-supervised method for medical image segmentation.



page 1

page 3

page 4

page 6

page 9


Semi-supervised Skin Lesion Segmentation via Transformation Consistent Self-ensembling Model

Automatic skin lesion segmentation on dermoscopic images is an essential...

PoissonSeg: Semi-Supervised Few-Shot Medical Image Segmentation via Poisson Learning

The application of deep learning to medical image segmentation has been ...

A generic ensemble based deep convolutional neural network for semi-supervised medical image segmentation

Deep learning based image segmentation has achieved the state-of-the-art...

How Reliable Are Out-of-Distribution Generalization Methods for Medical Image Segmentation?

The recent achievements of Deep Learning rely on the test data being sim...

Exploring Smoothness and Class-Separation for Semi-supervised Medical Image Segmentation

Semi-supervised segmentation remains challenging in medical imaging sinc...

Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation

Recently, several Bayesian deep learning methods have been proposed for ...

Brain Stroke Lesion Segmentation Using Consistent Perception Generative Adversarial Network

Recently, the state-of-the-art deep learning methods have demonstrated i...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Figure 1: Examples of three representative medical images. The first row denotes the skin lesion in the dermoscopy image; the second row shows the optical disc in retinal fundus images; While the third row demonstrates the liver segmentation from CT scans. Blue color denotes the structure boundary and red color represents the liver.

Segmenting anatomical structural or abnormal regions from medical images, such as dermoscopy images, fundus images and 3D computed tomography (CT) scans, is of great significance for clinical practice, especially for disease diagnosis and treatment planning. Recently, deep learning techniques have made impressive progress on semantic image segmentation tasks and become a popular choice in both computer vision and medical imaging community 

[1, 2]. The success of deep neural networks usually relies on the massive labeled dataset. However, it is hard and expensive to obtain labeled data, notably in medical imaging domain where only experts can provide reliable annotations [3]. For example, there are thousands of dermoscopy image records in the clinical center, but melanoma delineation by experienced dermatologists is very scarce; see Figure 1. Such cases can also be observed in the optic disc segmentation from the retinal fundus images, and especially in liver segmentation from CT scans, where delineating organs from volumetric images in a slice-by-slice manner is very time-consuming and expensive. The lack of the labeled data motivates the study of methods that can be trained with limited supervision, such as semi-supervised learning [4, 5, 6], weakly supervised learning [7, 8, 9] and unsupervised domain adaptation [10, 11, 12], etc. In this paper, we focus on semi-supervised segmentation approaches, considering that it is relatively easy to acquire a large amount of unlabeled medical image data.

Semi-supervised learning aims to learn from a limited amount of labeled data and an arbitrary amount of unlabeled data, which is a fundamental, challenging problem and have a high impact in real-world clinical applications. The semi-supervised problem has been widely studied in medical image research community [13, 14, 15, 16, 17]. Recent progress in semi-supervised learning for medical image segmentation has featured deep learning [18, 5, 19, 20, 21]. Bai et al. [18]

present a semi-supervised deep learning model for cardiac MR image segmentation, where the segmented label maps from unlabeled data are incrementally added into the training set to refine the segmentation network. Other semi-supervised learning methods are based on the recent techniques, such as variational autoencoder (VAE) 

[5] and generative adversarial network (GAN) [19]. We tackle the semi-supervised segmentation problem from a different point of view. With the success of self-ensembling model in the semi-supervised classification problem [22], we further advance the method to medical image segmentation tasks, including 2D cases and 3D cases.

In this paper, we present a novel semi-supervised learning method based on self-ensembling strategy for medical image segmentation tasks. The whole framework is trained with a weighted combination of the supervised loss and the unsupervised loss. The supervised loss is designed to utilize the labeled data for accurate prediction. To leverage the unlabeled data, our self-ensembling method encourages a consistent prediction of the network for the same input data under different regularizations, e.g., randomized Gaussian noise, network dropout and randomized data transformation. In particular, we design our method to account for the challenging segmentation task, in which pixel-level classification is required to be predicted. We observe that in the segmentation problem, if one transforms (e.g., rotates) the input image, the expected prediction should be transformed in the same manner. Actually, when the inputs of CNNs are rotated, the corresponding network predictions would not rotate in the same way [23]. In this regard, we take advantage of this property by introducing a transformation (i.e., rotation, flipping) consistent scheme at the input and output space of our network. Specifically, we design the unsupervised loss by minimizing the differences between the network predictions under different transformations of the same input. We extensively evaluate our methods for semi-supervised medical image segmentation on three representative segmentation tasks, i.e., skin lesion segmentation from dermoscopy images, optic disc segmentation from retinal images, and liver segmentation from CT scans. For training on 3D CT images, we conduct experiments with 2D and 3D convolutional neural networks, respectively. To train with 2D convolutional neural network, we slice the volumetric data to three adjacent slices and the result is the concatenation of the network output. We also show that our method performs well with the 3D convolutional neural network. In summary, our semi-supervised method achieves significant improvements compared with the supervised baseline, and also outperforms other semi-supervised segmentation methods. A preliminary version of this work was presented in [24]. The main contributions of this paper are:

  • We present a simple and effective semi-supervised segmentation method for various medical images segmentation tasks. Our method is flexible and can be easily applied on both 2D and 3D convolutional neural networks.

  • To better utilize the unlabeled data for segmentation tasks, we proposed a transformation consistent self-ensembling model (TCSM), which shows effectiveness for the semi-supervised segmentation problem.

  • Extensive experiments on three representative yet challenging medical image segmentation tasks, including 2D and 3D datasets, demonstrate the effectiveness of our semi-supervised method over other methods.

  • Our method excels other state-of-the-arts and establishes a new record in the ISIC 2017 skin lesion segmentation dataset with the semi-supervised method.

The remainders of this paper are organized as follows. We review the related techniques in Section II and elaborate the semi-supervised method in Section III. The experimental results and ablation analysis on dermoscopy images, retinal funds images and liver CT scans are shown in Section IV. We further discuss our method in Section V and draw the conclusions in Section VI.

Figure 2: The pipeline of our proposed transformation consistent self-ensembling model for semi-supervised medical image segmentation (we visualize the liver CT scans as an example). The total loss is weighted combination of the cross-entropy loss on labeled data, and mean square error loss on both labeled and unlabeled data. The model encourages the network to be transformation consistent by utilizing the unlabeled data. Note that remains the same in each training pass but are changed at different passes.

Ii Related Work

Semi-supervised segmentation for medical images.  Early works for semi-supervised medical image segmentation are mainly based on hand-crafted features [13, 14, 15, 16, 17]. For example, You et al. [13] combined radial projection and self-training learning to get an improved overall segmentation of retinal vessel from fundus image. Portela et al. [14]

presented a clustering based semi-supervised Gaussian Mixture Model (GMM) to automatically segment brain MR images. Later on, Gu 

et al. [16] proposed a semi-supervised method for vessel segmentation by constructing forest oriented super pixels. For skin lesion segmentation, Jaisakthi et al. [17]

designed a semi-supervised method based on K-means clustering and flood fill algorithm. However, these semi-supervised methods are based on hand-crafted features, which suffer from limited representation capacity.

Recent progress for semi-supervised segmentation has featured deep learning. An iterative approach is proposed by Bai et al. [18] for cardiac segmentation from MR images, where the network parameters and the segmentation masks for the unlabeled data are alternatively updated. Generative model based semi-supervised approaches are also popular in medical image analysis community [5, 19, 25, 26]. Sedai et al. [5] introduced a variational autoencoder (VAE) for optic cup segmentation from retinal fundus images. They learned the feature embedding from unlabeled images using VAE, and then combined the feature embedding with the segmentation autoencoder trained on the labeled images for pixel-wise segmentation of the cup region. To involve the unlabeled data in the training, Nie et al. [19] presented an attention-based GAN approach to select the trustworthy regions of the unlabeled data to train the segmentation network. Another GAN based work [26] employed cycle-consistency principle and worked on cardiac MR image segmentation. More recently, Ganaye et al. [21] proposed a semi-supervised method for brain structures segmentation by taking advantage of the invariant nature and semantic constraint of anatomical structures. Multi-view co-training based methods [4, 27] have been explored on 3D medical data. Differently, our method takes the advantage of transformation consistency and self-ensembling model, which is simple yet effective for the medical image segmentation tasks.

Transformation equivariant representation.  There is a body of related literature on equivariance representations, where the transformation equivariance is encoded into the network to explore the network equivariance property [28, 29, 23]. For example, Cohen and Welling [28] proposed group equivariant neural network to improve the network generalization, where equivariance to -rotations and dihedral flips is encoded by copying the transformed filters at different rotation-flip combinations. Concurrently,  Dieleman et al. [29] designed four different equivariance to preserve feature map transformations by rotating feature maps instead of filters. Recently, Worrall et al. [23] restricted the filters to circular harmonics to achieve continuous -rotations equivariance. However, these works aim to encode equivariance into the network to improve the generalization capability of the network, while our method targets to better utilize the unlabeled data in the semi-supervised learning.

Medical image segmentation. Early approaches on medical image segmentation mainly focused on thresholding [30], statistical shape models [31]

and machine learning related methods 

[32, 33, 34, 35, 36]. Recently, many researchers employed deep learning based methods for medical image segmentation [37, 38, 39]. These deep learning based methods achieved promising results on skin lesion segmentation, optic disc segmentation and liver segmentation [40, 41, 42, 43, 44]. Yu et al. [40] explored the network depth property and developed a deep residual network for automatic skin lesion segmentation, where several residual blocks were stacked together to increase the network representative capability. Yuan et al. [45] presented a 19-layer deep convolutional neural network and trained it in an end-to-end manner for skin lesion segmentation. As for optical disc segmentation, Fu et al. [41] presented a M-Net for joint OC and OD segmentation. And a disc-aware network [41] was designed for glaucoma screening by an ensemble of different feature streams of the network. For liver segmentation, Chlebus et al. [36] presented a cascaded FCN combined with hand-crafted features. Li et al. [46] presented a 2D-3D hybrid architecture for liver and tumor segmentation from CT images. Although these approaches achieve good results in the experiments, they are based on fully supervised learning, requiring massive pixel-wise annotations from experienced dermatologists or radiologists.

Iii Method

Figure 2

is the overview of our proposed transformation consistent self-ensembling model (TCSM) for semi-supervised medical image segmentation. The transformation operation are added in a standard fully convolution neural network (FCN). The total loss function is the weighted combination of the cross-entropy loss and mean square error, where the cross-entropy loss is optimized on the labeled data and mean square error loss is calculated on both labeled and unlabeled data. The framework is trained for the medical image segmentation in the semi-supervised way.

Iii-A Overview

To ease the description of our method, we first formulate the semi-supervised segmentation task in general, where the training set consists inputs in total, including labeled inputs and unlabeled inputs. We denote the labeled set as and the unlabeled set as , where is the input image and is the ground-truth label for 2D medical images, e.g., retinal fundus image and dermoscopy images. The general semi-supervised segmentation learning tasks can be formulated to learn the network parameters by optimizing:


where denotes the supervised loss function and represents the regularization (unsupervised) loss. denotes the segmentation neural network. The first item in the loss function is trained by the cross-entropy loss, aiming at evaluating the correctness of network output on labeled inputs only. The second item is optimized with a regularization loss, which utilizes both labeled and unlabeled inputs. is a weighting factor that controls how strong the regularization is.

Recent progress on semi-supervised learning show promising results with self-ensembling methods [47, 22]. The key point to this success relies on the key smoothness assumption; that is, data points close to each other in the image space are likely to be same in the label space. Specifically, these methods focus on improving the quality of targets by using self-ensembling and exploring different perturbations. The perturbations include the input noise and the network dropout. The network with the regularization loss encourages the predictions to be consistent and is expected to give better predictions. The regularization loss can be described as:


where and refers to different regularization or perturbations of input data. In our work, we share the same spirit with these methods by designing different perturbations for input data. Specifically, we design the regularization term as a consistency loss to encourage smooth predictions for the same data under different regularization or perturbations (e.g., Gaussian noise, network dropout, and randomized data transformation).

Iii-B Transformation Consistent Self-ensembling Model

Figure 3: (a) Segmentation is desired to be rotation equivariant. If the input image is rotated, the ground truth mask should be rotated in the same manner. (b) Convolutions are not rotation equivariant in general. If the input image is rotated, the generated output is not same with the original output that rotated in the same manner.

In this subsection, we will introduce how to effectively design the randomized data transformation regularization for the segmentation problem, i.e., the transformation consistent self-ensembling model (TCSM). In the general self-ensembling semi-supervised learning, most regularization and perturbations are easily designed for the classification problem. However, in the medical image domain, the accurate segmentation of important structures or lesions is a very challenging, practical problem and the perturbations for segmentation tasks are more worthy to explore. One prominent difference between these two common tasks is that the classification problem is transformation invariant while the segmentation task is expected to be transformation equivariant. Specifically, for image classification, the convolutional neural network only recognize the presence or absence of an object in the whole image. In other words, the classification result should remain the same, no matter what the data transformation (i.e., translation, rotation, and flipping) are applied to the input image. However, for image segmentation, if the input image is rotated, the segmentation mask is expected to have the same rotation with the original mask, although the corresponding pixel-wise predictions are same; see examples in Figure 3 (a). However, in general, convolutions are not transformation (i.e., flipping, rotation) equivariant111Transformation in this work refers to flipping and rotation., meaning that if one rotates or flips the CNN input, then the feature maps do not necessarily rotate in a meaningful manner [23], as shown in Figure 3 (b). Therefore, the convolutional network consisting of a series of convolutions is also not transformation equivariant. Formally, every transformation of input x associates with a transformation of the outputs; that is but in general .

This phenomenon limits the unsupervised regularization effect of randomized data transformation for the segmentation problem [22]. To enhance the regularization and more effectively utilize unlabeled data in our segmentation task, we introduce a transformation consistent scheme in the unsupervised regularization term. Specifically, this transformation consistent scheme is embedded into the framework by approximating to at the input and output space. The detailed illustration of the framework is shown in Figure 2, and the pseudocode is presented in Algorithm 1. Under the transformation consistent scheme and other different perturbations (e.g., Gaussian noise and network dropout), each input is fed into the network for twice evaluation to acquire two outputs and . More specifically, the transformation consistent scheme consists of triple operations; see Figure 2. For one training input , in the first evaluation, the operation is applied to the input image while in the second evaluation, the operation is applied on the prediction map. Random perturbations (e.g., Gaussian noise and network dropout) are also applied in the network during the twice evaluation. By minimizing the difference between and with a mean square error loss function, the network is regularized to be transformation consistent and thus increase the network generalization capacity. Notably, the regularization loss is evaluated on both labeled and unlabeled inputs. To utilize the labeled data , the same operation is also performed on and optimized by the standard cross-entropy loss. Finally, the network is trained by minimizing the weighted combination of unsupervised regularization loss and supervised cross-entropy loss. Note that we employed the same data augmentation in the training procedure of all the experiments for fair comparison. However, our method is different from traditional data augmentation. Specifically, our method utilized the unlabeled data by minimizing network output difference under the transformed inputs, while complying with the smoothness assumption.

Iii-C TCSM for Semi-supervised Medical Image Segmentation

= unsupervised weight function
= neural network with trainable parameters
= transformation operations
for  in  do
     for each minibatch  do
          randomly update
          update using optimizer
     end for
end for
return ;
Algorithm 1 Algorithm pseudocode.

TCSM with 2D medical images For dermoscopy images and retinal fundus images, we employ the 2D DenseUNet architecture [46] as our baseline model. Compared to the standard DenseNet [48]

, we add the decoder part for the segmentation tasks. The decoder part is four blocks and each block consists of "upsampling, convolutional, batch normalization and ReLU activation" layers. The UNet-like skip connection is added between the final convolution layer of each dense block in the encoder part and the convolution layer in the decoder part. The final prediction layer is a convolution layer with the channel number of 2. Before the final convolution layer, we add a dropout layer with drop rate as 0.3.

TCSM with 3D medical images To generalize our method to 3D medical images, e.g., liver CT scans, we train TCSM with 2D DenseUNet and 3D U-Net [37] respectively. For training DenseUNet on liver CT scans, the volumetric data including both raw images and volumetric labels is sliced into a large number of three adjacent slices. The middest slice in these adjacent slices is used as the ground-truth image. In the testing stage, the network output is the concatenation of the sequential test of three adjacent slices from volumetric images. For training with 3D U-Net, we follow the original setting with the following modifications. We modify the base filter parameters to 16 to accommodate this input size. The optimizer is SGD with learning rate 0.01. The batch normalization layer is employed to facilitate the training process and the loss function is modified to the standard weighted cross entropy loss.

Details of TCSM The transformation consistent scheme includes the horizontal flipping operation as well as four kinds of rotation operations to the input with angles of where . During each training pass, one operation is randomly chosen and applied. We avoid the other angles for implementation simplification, but the proposed framework can be generalized to other angles in general. To keep the balance of two terms in the loss function, we evenly and randomly select the labeled and the unlabeled samples in each minibatch. The time-dependent warming up function is a weighting factor for supervised loss and regularization loss. This weighting function is a Gaussian ramp-up curve , where

denotes the training epoch and

scales the maximum value of the weighting function. In our experiments, we empirically set as 1.0.

Iii-D Technique Details

Iii-D1 Implementation

The model was implemented using Keras package [49]

, and was trained with stochastic gradient descent (SGD) algorithm (momentum is 0.9 and minibatch size is 10). The initial learning rate was 0.01 and decayed according to the equation

. We use the standard data augmentation techniques on-the-fly to avoid overfitting. The data augmentation includes randomly flipping, rotating as well as scaling with a random scale factor from 0.9 to 1.1. Note that all the experiments employed data augmentation for fair comparison.

Iii-D2 Inference procedure

In the inference phase, we remove the transformation operations in the network and do one single test with original input for fair comparison. After getting the probability map from the network, we first apply thresholding with 0.5 to get the binary segmentation result, and then use morphology operation,

i.e., filling holes, to get the final segmentation result.

Iv Experiments

Iv-A Datasets

To evaluate the effectiveness of our method, we conduct experiments on various modalities of medical images, including dermoscopy images, retinal fundus images and liver CT scans.

Dermoscopy image dataset. The dermoscopy image dataset in our experiments is the 2017 ISIC skin lesion segmentation challenge dataset [50]. It includes a training set with 2000 annotated dermoscopic images, a validation set with 150 images, and a testing set with 600 images. The image size ranges from to . To keep the balance of segmentation performance and computational cost, we first resize all the images to

using bicubic interpolation.

Retinal fundus image dataset. The fundus image dataset is acquired from MICCAI 2018 Retinal Fundus Glaucoma Challenge (REFUGE)222 Manual pixel-wise annotations of the optic disc were obtained by seven independent ophthalmologists from Zhongshan Ophthalmic Center, Sun Yat-sen University, China. The experiments is conducted on the released training dataset, which contains 400 retinal images. The training dataset is randomly split to training and test set, and we resize all the images to using bicubic interpolation.

Liver segmentation dataset. The liver segmentation dataset are from 2017 Liver Tumor Segmentation Challenge (LiTS)333 [51]. The LiTS dataset contains 131 and 70 contrast-enhanced 3D abdominal CT scans for training and testing, respectively. The dataset is acquired by different scanners and protocols from six different clinical sites, with a largely varying in-plane resolution from 0.55 mm to 1.0 mm and slice spacing from 0.45 mm to 6.0 mm.

Iv-B Evaluation Metrics

For dermoscopy image dataset, we use five evaluation metrics to measure the segmentation performance, including jaccard index (JA), dice coefficient (DI), pixel-wise accuracy (AC), sensitivity (SE) and specificity (SP). The definition of them are:


where and refer to the number of true positives, true negatives, false positives, and false negatives, respectively. For retinal fundus image dataset, we use JA to measure the optic disc segmentation accuracy. For liver CT dataset, Dice per case score is employed to measure the accuracy of the liver segmentation result, according to the evaluation of 2017 LiTS challenge [51].

Figure 4: Examples of the segmentation results of supervised learning (left) and our method (right) on the validation set in the dermoscopy image dataset. The blue and red contours denote the ground truth and our segmentation result, respectively.

Iv-C Experiments on Dermoscopy Image Dataset

Iv-C1 Quantitative and visual results with 50 labeled data

We report the performance of our method trained with only 50 labeled images and 1950 unlabeled images. Note that the labeled image is randomly selected from the whole dataset. Table I shows the experiments with supervised method, supervised with regularization, and our semi-supervised method on the validation dataset. We use the same network architecture (DenseUNet) in all these experiments for fair comparison. The supervised experiment is optimized by the standard cross-entropy loss on the 50 labeled images. The supervised with regularization experiment is also trained with 50 labeled images, but differently, the total loss function is weighted combination of the cross-entropy loss and the regularization loss, which is the same with our TCSM loss function. The TCSM experiment is trained with 50 labeled and 1950 unlabeled images in the semi-supervised manner. From Table I, it is obvious that our semi-supervised method can achieve higher performance than supervised counterpart on all the evaluation metrics, with prominent improvements of 2.46%, 2.64%, and 3.60% on JA, DI and SE, respectively. It is worth mentioning that supervised with regularization experiment improves the supervised training due to the regularization loss on the labeled images; see "supervised+regu" in Table I. The consistent improvements of "supervised+regu" on all evaluation metrics demonstrate the regularization loss is also effective for the labeled images. Figure 4 presents some segmentation results (red contour) of supervised method (left) and our method (right). Comparing with the segmentation contour achieved by supervised method (left column), the semi-supervised method fits more consistently with the ground-truth boundary. The observation shows the effectiveness of our semi-supervised learning method, i.e., TCSM, compared with the supervised method.

Metric Supervised Supervised+regu Ours
JA 72.85% 73.25% 75.31%
DI 81.15% 81.60% 83.79%
AC 93.70% 93.71% 93.73%
SE 82.77% 83.30% 86.37%
SP 96.35% 96.40% 97.50%
Table I: Comparison of supervised learning and semi-supervised learning (50 labeled/1950 unlabeled) on the validation set in the dermoscopy image dataset. "Supervised+reg" denotes supervised with regularization.

Iv-C2 Effectiveness of transformation consistent scheme

To show the effectiveness of the transformation consistent regularization scheme, we conduct ablation analysis of our method on the dermoscopy image dataset. We compare our method with the most common perturbations regularization, i.e., Gaussian noise and network dropout. Table II shows the experimental results, where "Ours-A" refers to semi-supervised learning with Gaussian noise and dropout regularization, "Ours-B" denotes to semi-supervised learning with transformation consistent regularization, and "Ours" refers to the experiment with all of these regularizations. Note that all experiments are conducted on the same training data with 50 labels with 1950 unlabeled data. As shown in Table II, both kinds of regularizations independently contribute to the performance gains of semi-supervised learning. The result improvement with transformation consistent regularization is very competitive, compared with the performance increment with Gaussian noise and dropout regularizations. We also observe that these two regularizations are complementary. When the two kinds of regularizations are employed, the performance can be further enhanced.

Metric Supervised Ours-A Ours-B Ours
JA 72.85% 74.59% 74.21% 75.31%
DI 81.15% 83.27% 82.68% 83.79%
AC 93.70% 93.70% 93.71% 93.73%
SE 82.77% 82.75% 83.15% 86.37%
SP 96.35% 96.43% 97.01% 97.50%
Table II: Ablation of semi-supervised method (50 labeled/1950 unlabeled) on the validation set in the dermoscopy image dataset. "Ours-A" denotes semi-supervised learning with dropout, Gaussian noise. "Ours-B" denotes semi-supervised learning with transformation consistent strategy.

Iv-C3 Results under different number of labeled data

Label/Unlabel Metric Supervised Ours
50/1950 JA 72.85% 75.31%
DI 81.15% 83.79%
AC 93.70% 93.73%
SE 82.77% 86.37%
SP 96.35% 97.50%
100/1900 JA 74.80% 75.85%
DI 83.00% 84.10%
AC 94.80% 94.58%
SE 84.85% 85.69%
SP 96.17% 97.18%
300/1700 JA 77.69% 78.10%
DI 85.62% 86.37%
AC 94.92% 95.58%
SE 87.32% 88.20%
SP 96.12% 96.38%
2000/0 JA 79.60% 79.95%
DI 87.26% 88.01%
AC 95.82% 95.82%
SE 89.35% 89.80%
SP 96.80% 96.91%
Table III: Results of our method on the validation set under different number of labeled/unlabeled images.

Table III shows the lesion segmentation results of our semi-supervised method (trained with labeled data and unlabeled data) and supervised method (trained only with labeled data) under different number of labeled/unlabeled images. We draw the JA score of the results in Figure 5. It is obvious that the semi-supervised method consistently performs better than the supervised method in different labeled/unlabeled data settings, which demonstrates that our method effectively utilizes unlabeled data and is beneficial to the performance gains. Note that in all semi-supervised learning experiments, we train the network with 2000 images in total, including labeled images and unlabeled images. As expected, the performance of supervised training increases when more labeled training images are available; see the blue line in Figure 5. At the same time, the segmentation performance of semi-supervised learning can also be increased with more labeled training images; see the orange line in Figure 5. The performance gap between supervised training and semi-supervised learning narrows as more labeled samples are available, which conforms with our expectation. When the amount of labeled dataset is small, our method can gain a large improvement, since the regularization loss can effectively leverage more information from the unlabeled data. Comparatively, as the number of labeled data increases, the improvement becomes limited. This is partially because the labeled and unlabeled data are randomly selected from the same dataset and a large amount of labeled data may reach the upper bound performance of the dataset.

From the comparison between the semi-supervised method and supervised method trained with 2000 labeled images in Figure 5, it is observed that our method increases the JA performance when all labels are used (from 79.60% to 79.95%). The improvement indicates the unsupervised loss can also provide a regularization to the labeled data. In other words, the consistency requirement in the regularization term can encourage the network to learn more robust features to improve the segmentation performance.

Figure 5: Results of our semi-supervised methods on the validation set of the dermoscopy image dataset with different number of labeled/unlabeled data.
Team Label/Unlabel JA DI AC SE SP
Our Semi-supervised Method 300/1700 0.798 0.874 0.943 0.879 0.953
Our baseline 0.772 0.853 0.936 0.837 0.969
Yuan et al. [52] 2000/0 0.765 0.849 0.934 0.825 0.975
Venkatesh et al. [53] 0.764 0.856 0.936 0.83 0.976
Berseth et al. [54] 0.762 0.847 0.932 0.820 0.978
Bi et al. [55] 0.760 0.844 0.934 0.802 0.985
RECOD 0.754 0.839 0.931 0.817 0.970
Jer 0.752 0.837 0.930 0.813 0.976
NedMos 0.749 0.839 0.930 0.810 0.981
INESC 0.735 0.824 0.922 0.813 0.968
Shenzhen U (Lee) 0.718 0.810 0.922 0.789 0.975
Table IV: Results on the test dataset in the ISIC 2017 dermoscopy lesion segmentation challenge.
Method Backbone Result Improvement
Supervised DenseUNet 72.85% -
Bai et al. [18] DenseUNet 74.40% 1.55%
Hung et al. [56] DenseUNet 73.31% 0.46%
Our DenseUNet 75.31% 2.46%
Table V: JA performance of different semi-supervised methods on the validation dataset of dermoscopy image dataset. “Supervised” denotes training with 50 labeled data.

Iv-C4 Comparison with other semi-supervised segmentation methods

We compare our method with the latest semi-supervised segmentation method [18] in the medical imaging community and an adversarial learning based semi-supervised method [56]. Note that the method [19] for medical image segmentation adopts the similar idea with the adversarial learning based method [56]. For fair comparison, we re-implement their methods with the same network backbone on this dataset. We conduct experiments with the setting of 50 labeled images and 1950 unlabeled images. Table V shows the JA performance of different methods on the validation set. As shown in Table V, our proposed method achieves 2.46% JA improvement by utilizing unlabeled data. However, the methods of Bai et al. [18] and Hung et al. [56] can only enhance 1.55% and 0.46% improvement on JA, respectively. The comparison shows the effectiveness of our semi-supervised segmentation method, compared to other semi-supervised methods.

Iv-C5 Comparison with methods on the challenge leaderboard

We also compare our method with state-of-the-art methods submitted to the ISIC 2017 skin lesion segmentation challenge. There are totally 21 submissions and the top results are listed in Table IV. Note that the final rank is determined according to JA on the testing set. We trained two models: semi-supervised learning model with 300 labeled images and 1700 unlabeled images, and supervised model with only 300 labeled data. The supervised model is denoted as our baseline model. As shown in Table IV, our semi-supervised method achieved the best performance on the benchmark, outperforming the state-of-the-art method [52] with 3.3% improvement on JA (from 76.5% to 79.8%). The performance gains on DI and SE are consistent with that on JA, with 2.5% and 5.4% improvement, respectively. Our baseline model with 300 labeled data also excels the some other methods due to the state-of-the-art network architecture. Based on this strong baseline, our semi-supervised learning method further makes significant improvements, which demonstrates the effectiveness of the overall semi-supervised learning method.

Iv-D Experiments on Retinal Fundus Image Dataset

We report the performance of our method for optic disc segmentation from retinal fundus images. The 400 training images from REFUGE challenge were randomly separated to training and test dataset with the ratio of 9:1. For training semi-supervised model, only a portion of labels (i.e., 10% and 20%) in the training set were used. We preprocessed all the input images by subtracting the mean RGB values of all the training dataset. When training the supervised model, the loss function was traditional cross-entropy loss and we used SGD algorithm with learning rate 0.01 and momentum 0.9. To train the semi-supervised model, we added the extra unsupervised regularization loss, and the learning rate was changed to 0.001.

We report the JA performance of supervised and semi-supervised results under the setting of 10% labeled training images and 20% labeled training images, respectively. As shown in Table VII, we also report the other two representative semi-supervised methods. It is observed that our method achieves 1.52% improvement under the 10% labeled training setting, which ranked top among all these methods. In addition, the improvement achieved by our method under the 20% training setting is also the highest. Figure 6 shows some visual segmentation results of our semi-supervised method. We can see that our method can better capture the boundary of the optic disc structure.

Figure 6: Examples of our semi-supervised (20%) segmentation results for the fundus image and liver CT scans. Blue color denotes the segmented boundary of optic disc and red color represents the segmented liver.
Method Backbone 10% Imp 20% Imp
Supervised DenseUNet 88.75% - 91.41% -
Bai et al. [18] DenseUNet 90.11% 1.36% 92.22% 0.81%
Hung et al. [56] DenseUNet 89.55% 0.80% 91.65% 0.24%
Our DenseUNet 91.15% 2.40% 93.58% 2.17%
Supervised 3D U-Net - 4 blocks 88.55% - 91.10% -
Bai et al. [18] 3D U-Net - 4 blocks 90.36% 1.81% 91.68% 0.58%
Our 3D U-Net - 4 blocks 91.57% 3.02% 92.05% 0.95%
Supervised 3D U-Net - 5 blocks 87.97% - 88.55% -
Bai et al. [18] 3D U-Net - 5 blocks 89.64% 1.67% 89.65% 1.10%
Our 3D U-Net - 5 blocks 90.24% 2.27% 90.53% 1.98%
Table VI: Dice performance of different semi-supervised methods on the LiTS dataset. “10%” and “20%” denote training with “10%” and “20%”labeled data, respectively. “Imp” refers to the improvement over supervised baseline.
Method Backbone 10% Imp 20% Imp
Supervised DenseUNet 93.61% - 94.61% -
Bai et al. [18] DenseUNet 94.20% 0.59% 95.01% 0.40%
Hung et al. [56] DenseUNet 94.32% 0.71% 94.93% 0.32%
Our DenseUNet 95.13% 1.52% 95.25% 0.64%
Table VII: JA performance of different methods on the fundus image dataset. “10%” and “20%” denote training with “10%” and “20%”labeled data in the training set, respectively. “Imp” refers to the improvement over supervised baseline.

Iv-E Experiments on LiTS dataset

For this dataset, we evaluate the performance of liver segmentation from CT volumes. Under our semi-supervised setting, we randomly separated the original 131 training data from the challenge into 118 training volumes and 13 testing volumes. For image preprocessing, we truncated the image intensity values of all scans to the range of [-200, 250] HU to remove the irrelevant details. We run experiments with 2D DenseUNet and 3D U-Net to verify the effectiveness of our method. For the 3D U-Net, the input size is randomly cropped to to leverage the information from the third dimension. We also trained with two various U-Net with 4 blocks and 5 blocks to verify the effectiveness of our method on the 3D CT scans.

According to the evaluation of 2017 LiTS challenge, we employed Dice per case score to evaluate the liver segmentation result, which refers to an average Dice score per volume. We report the performance of our method and other two semi-supervised methods under the setting of 10% labeled training images and 20% labeled training images, respectively, in Table VI. We can see that with DenseUNet baseline, our approach achieves the highest performance improvement in both 10% labeled training setting and 20% labeled training setting, with 2.40% and 2.17% improvements respectively. For 3D U-Net, we can see that U-Net with blocks 4 achieves better results than that with blocks 5. In semi-supervised learning, it is obvious that our method gains higher performance consistently than Bai et al. [18] in both 10% and 20% settings, respectively. We also visualize some liver segmentation results from CT scans in the second row in Figure 6.

V Discussion

Supervised deep learning has been proven extremely effective for many problems in medical image community. However, the promising performance of supervised learning heavily relies on the availability of massive annotations. Developing new learning methods with limited annotation will largely advance the real-world clinical applications. In this work, we focus on developing semi-supervised learning methods for medical image segmentation. These methods have great potential to reduce the annotation effort by taking advantage of numerous amount of unlabeled data and to make progress beyond supervised learning. The key insight of our semi-supervised learning method is the transformation consistent self-ensembling strategy. The extensive experiments on three representative and challenging datasets have sufficiently demonstrated the effective improvements of our method.

Medical image data has different formats, like the 2D in-plane scans (e.g., dermoscopy images and fundus images) and 3D volumetric data (e.g., MRI, CT). In this paper, we employ both 2D and 3D networks to conduct segmentation for these various data formats. Our method is very flexible and can be easily applied on both 2D and 3D networks. It is worth mentioning that the recent works [4, 27] are specifically designed for 3D volume data by considering three-view co-training, i.e., the coronal, sagittal and axial views of the volumetric data. However, we aim for a more general approach that is applicable for 2D and 3D medical images simultaneously. For the 3D semi-supervised learning, it may be a promising direction to design specific methods by consideration the 3D natural property of the volumetric data.

The recent works on network equivariance [23, 28, 29] improve the generalization capacity of the trained network by exploring equivariance property. For example, Cohen and Welling [28] presented a group equivariant neural network that is equivariant to 90-rotations and dihedral flips, aiming at improving generalization capacity and achieving the higher results under the same level of weights. Our method also leverages the transformation consistency principle, but differently, we aims for the semi-supervised segmentation task. Moreover, if we trained these works, i.e., harmonic network [23], in the semi-supervised way to leverage the unlabeled data, the transformation regularization will have no effect ideally, since the network outputs are the same when applying the transformation on the input images. Therefore, the limited regularization would restrict the performance improvement from the unlabeled data.

One limitation of our method is that we assume both labeled and unlabeled data come from the same distribution. However, in real-world clinical applications, the labeled and unlabeled data may not be collected from the same distribution, and there may exists domain shift between labeled and unlabeled data. Oliver et al. [57] demonstrated that the performance of semi-supervised learning methods can degrade substantially when the unlabeled dataset contains out-of-distribution examples. However, most of the current semi-supervised approaches for medical image segmentation do not consider this issue. Therefore, in the future, we would explore the domain adaptation [10] technique, and investigate how to combine it with self-ensembling strategy to bring our method towards real-world clinical applications.

Vi Conclusion

In this paper, we present a novel semi-supervised learning method for medical image segmentation. The whole framework is trained with a weighted combination of the supervised loss and the unsupervised loss. Specifically, we introduce a transformation consistent self-ensembling model for the segmentation task, which enhances the regularization effects to utilize the unlabeled data and can be easily applied on 2D and 3D networks. Comprehensive experimental analysis on three medical imaging datasets, i.e., skin lesion dataset, retinal image dataset and liver CT dataset, demonstrated the effectiveness of our method. Our method is general enough and can be widely used in other semi-supervised medical image analysis problems. Further works include investigating other domain adaptation techniques to enhance the effectiveness of our semi-supervised learning methods.


  • Nie et al. [2018a] D. Nie, L. Wang, Y. Gao, J. Lian, and D. Shen, “Strainet: Spatially varying stochastic residual adversarial networks for mri pelvic organ segmentation,” IEEE transactions on neural networks and learning systems, 2018.
  • Mahmud et al. [2018]

    M. Mahmud, M. S. Kaiser, A. Hussain, and S. Vassanelli, “Applications of deep learning and reinforcement learning to biological data,”

    IEEE transactions on neural networks and learning systems, vol. 29, no. 6, pp. 2063–2079, 2018.
  • Kohli et al. [2017] M. D. Kohli, R. M. Summers, and J. R. Geis, “Medical image data and datasets in the era of machine learning—whitepaper from the 2016 c-mimi meeting dataset session,” Journal of digital imaging, vol. 30, no. 4, pp. 392–399, 2017.
  • Zhou et al. [2018] Y. Zhou, Y. Wang, P. Tang, W. Shen, E. K. Fishman, and A. L. Yuille, “Semi-supervised 3d abdominal multi-organ segmentation via deep multi-planar co-training,” WACV, 2018.
  • Sedai et al. [2017] S. Sedai, D. Mahapatra, and S. e. a. Hewavitharanage, “Semi-supervised segmentation of optic cup in retinal fundus images using variational autoencoder,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.    Springer, 2017, pp. 75–82.
  • Cheplygina et al. [2018]

    V. Cheplygina, M. de Bruijne, and J. P. Pluim, “Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis,”

    arXiv preprint, 2018.
  • Hu et al. [2018] Y. Hu, M. Modat, E. Gibson, W. Li, N. Ghavami, E. Bonmati, G. Wang, S. Bandula, C. M. Moore, M. Emberton et al., “Weakly-supervised convolutional neural networks for multimodal image registration,” Medical image analysis, vol. 49, pp. 1–13, 2018.
  • Gondal et al. [2017] W. M. Gondal, J. M. Köhler, R. Grzeszick, G. A. Fink, and M. Hirsch, “Weakly-supervised localization of diabetic retinopathy lesions in retinal fundus images,” in International Conference on Image Processing.    IEEE, 2017, pp. 2069–2073.
  • Feng et al. [2017] X. Feng, J. Yang, A. F. Laine, and E. D. Angelini, “Discriminative localization in cnns for weakly-supervised segmentation of pulmonary nodules,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.    Springer, 2017, pp. 568–576.
  • Kamnitsas et al. [2017] K. Kamnitsas, C. Baumgartner, C. Ledig, V. Newcombe, J. Simpson, A. Kane, D. Menon, A. Nori, A. Criminisi, D. Rueckert et al., “Unsupervised domain adaptation in brain lesion segmentation with adversarial networks,” in IPMI.    Springer, 2017, pp. 597–609.
  • Dong et al. [2018]

    N. Dong, M. Kampffmeyer, X. Liang, Z. Wang, W. Dai, and E. Xing, “Unsupervised domain adaptation for automatic estimation of cardiothoracic ratio,” in

    International Conference on Medical Image Computing and Computer-Assisted Intervention.    Springer, 2018, pp. 544–552.
  • Mahmood et al. [2018] F. Mahmood, R. Chen, and N. J. Durr, “Unsupervised reverse domain adaptation for synthetic medical images via adversarial training,” IEEE transactions on medical imaging, vol. 37, no. 12, pp. 2572–2581, 2018.
  • You et al. [2011] X. You, Q. Peng, Y. Yuan, Y.-m. Cheung, and J. Lei, “Segmentation of retinal blood vessels using the radial projection and semi-supervised approach,” Pattern Recognition, vol. 44, no. 10-11, pp. 2314–2324, 2011.
  • Portela et al. [2014] N. M. Portela, G. D. Cavalcanti, and T. I. Ren, “Semi-supervised clustering for mr brain image segmentation,” Expert Systems with Applications, vol. 41, no. 4, pp. 1492–1497, 2014.
  • Masood et al. [2015] A. Masood, A. Al-Jumaily, and K. Anam, “Self-supervised learning model for skin cancer diagnosis,” in Neural Engineering (NER), 2015 7th International IEEE/EMBS Conference on.    IEEE, 2015, pp. 1012–1015.
  • Gu et al. [2017] L. Gu, Y. Zheng, R. Bise, I. Sato, N. Imanishi, and S. Aiso, “Semi-supervised learning for biomedical image segmentation via forest oriented super pixels (voxels),” in International Conference on Medical Image Computing and Computer-Assisted Intervention.    Springer, 2017, pp. 702–710.
  • Jaisakthi et al. [2017] S. Jaisakthi, A. Chandrabose, and P. Mirunalini, “Automatic skin lesion segmentation using semi-supervised learning technique,” arXiv preprint, 2017.
  • Bai et al. [2017] W. Bai, O. Oktay, and M. e. a. Sinclair, “Semi-supervised learning for network-based cardiac mr image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.    Springer, 2017, pp. 253–260.
  • Nie et al. [2018b] D. Nie, Y. Gao, L. Wang, and D. Shen, “Asdnet: Attention based semi-supervised deep networks for medical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.    Springer, 2018, pp. 370–378.
  • Perone and Cohen-Adad [2018] C. S. Perone and J. Cohen-Adad, “Deep semi-supervised segmentation with weight-averaged consistency targets,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support.    Springer, 2018, pp. 12–19.
  • Ganaye et al. [2018] P.-A. Ganaye, M. Sdika, and H. Benoit-Cattin, “Semi-supervised learning for segmentation under semantic constraint,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.    Springer, 2018, pp. 595–602.
  • Laine and Aila [2016] S. Laine and T. Aila, “Temporal ensembling for semi-supervised learning,” arXiv preprint, 2016.
  • Worrall et al. [2017] D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow, “Harmonic networks: Deep translation and rotation equivariance,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), vol. 2, 2017.
  • Li et al. [2018a] X. Li, L. Yu, H. Chen, C.-W. Fu, and P.-A. Heng, “Semi-supervised skin lesion segmentation via transformation consistent self-ensembling model,” British Machine Vision Conference, 2018.
  • Zhang et al. [2017] Y. Zhang, L. Yang, J. Chen, M. Fredericksen, D. P. Hughes, and D. Z. Chen, “Deep adversarial networks for biomedical image segmentation utilizing unannotated images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.    Springer, 2017, pp. 408–416.
  • Chartsias et al. [2018] A. Chartsias, T. Joyce, G. Papanastasiou, S. Semple, M. Williams, D. Newby, R. Dharmakumar, and S. A. Tsaftaris, “Factorised spatial representation learning: application in semi-supervised myocardial segmentation,” International Conference on Medical Image Computing and Computer-Assisted Intervention, 2018.
  • Xia et al. [2018] Y. Xia, F. Liu, D. Yang, J. Cai, L. Yu, Z. Zhu, D. Xu, A. Yuille, and H. Roth, “3d semi-supervised learning with uncertainty-aware multi-view co-training,” arXiv preprint, 2018.
  • Cohen and Welling [2016] T. Cohen and M. Welling, “Group equivariant convolutional networks,” in International conference on machine learning, 2016, pp. 2990–2999.
  • Dieleman et al. [2016] S. Dieleman, J. D. Fauw, and K. Kavukcuoglu, “Exploiting cyclic symmetry in convolutional neural networks,” in International conference on machine learning, 2016, pp. 1889–1898.
  • Emre Celebi et al. [2013] M. Emre Celebi, Q. Wen, S. Hwang, H. Iyatomi, and G. Schaefer, “Lesion border detection in dermoscopy images using ensembles of thresholding methods,” Skin Research and Technology, vol. 19, no. 1, 2013.
  • Heimann and Meinzer [2009] T. Heimann and H.-P. Meinzer, “Statistical shape models for 3d medical image segmentation: a review,” Medical image analysis, vol. 13, no. 4, pp. 543–563, 2009.
  • He and Xie [2012] Y. He and F. Xie, “Automatic skin lesion segmentation based on texture analysis and supervised learning,” in ACCV.    Springer, 2012, pp. 330–341.
  • Sadri et al. [2013] A. R. Sadri, M. Zekri, S. Sadri, N. Gheissari, M. Mokhtari, and F. Kolahdouzan, “Segmentation of dermoscopy images using wavelet networks,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 4, pp. 1134–1141, 2013.
  • Cheng et al. [2013] J. Cheng, J. Liu, Y. Xu, F. Yin, D. W. K. Wong, N.-M. Tan, D. Tao, C.-Y. Cheng, T. Aung, and T. Y. Wong, “Superpixel classification based optic disc and optic cup segmentation for glaucoma screening,” IEEE transactions on medical imaging, vol. 32, no. 6, pp. 1019–1032, 2013.
  • Abramoff et al. [2007] M. D. Abramoff, W. L. Alward, E. C. Greenlee, L. Shuba, C. Y. Kim, J. H. Fingert, and Y. H. Kwon, “Automated segmentation of the optic disc from stereo color photographs using physiologically plausible features,” Investigative ophthalmology & visual science, vol. 48, no. 4, pp. 1665–1673, 2007.
  • Chlebus et al. [2018] G. Chlebus, A. Schenk, J. H. Moltz, B. van Ginneken, H. K. Hahn, and H. Meine, “Automatic liver tumor segmentation in ct with fully convolutional neural networks and object-based postprocessing,” Scientific reports, vol. 8, no. 1, p. 15497, 2018.
  • Çiçek et al. [2016] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: learning dense volumetric segmentation from sparse annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.    Springer, 2016, pp. 424–432.
  • Ronneberger et al. [2015] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.    Springer, 2015, pp. 234–241.
  • Milletari et al. [2016] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in 3D Vision (3DV), 2016 Fourth International Conference on.    IEEE, 2016, pp. 565–571.
  • Yu et al. [2017] L. Yu, H. Chen, Q. Dou, J. Qin, and P.-A. Heng, “Automated melanoma recognition in dermoscopy images via very deep residual networks,” IEEE transactions on medical imaging, vol. 36, no. 4, pp. 994–1004, 2017.
  • Fu et al. [2018] H. Fu, J. Cheng, Y. Xu, C. Zhang, D. W. K. Wong, J. Liu, and X. Cao, “Disc-aware ensemble network for glaucoma screening from fundus image,” IEEE transactions on medical imaging, 2018.
  • Tan et al. [2017] J. H. Tan, U. R. Acharya, S. V. Bhandary, K. C. Chua, and S. Sivaprasad, “Segmentation of optic disc, fovea and retinal vasculature using a single convolutional neural network,” Journal of Computational Science, vol. 20, pp. 70–79, 2017.
  • Lu et al. [2017] F. Lu, F. Wu, P. Hu, Z. Peng, and D. Kong, “Automatic 3d liver location and segmentation via convolutional neural network and graph cut,” International journal of computer assisted radiology and surgery, vol. 12, no. 2, pp. 171–182, 2017.
  • Yang et al. [2017] D. Yang, D. Xu, S. K. Zhou, B. Georgescu, M. Chen, S. Grbic, D. Metaxas, and D. Comaniciu, “Automatic liver segmentation using an adversarial image-to-image network,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.    Springer, 2017, pp. 507–515.
  • Yuan et al. [2017] Y. Yuan, M. Chao, and Y.-C. Lo, “Automatic skin lesion segmentation using deep fully convolutional networks with jaccard distance,” IEEE transactions on medical imaging, vol. 36, no. 9, pp. 1876–1886, 2017.
  • Li et al. [2018b] X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P.-A. Heng, “H-denseunet: Hybrid densely connected unet for liver and tumor segmentation from ct volumes,” IEEE transactions on medical imaging, 2018.
  • Sajjadi et al. [2016] M. Sajjadi, M. Javanmardi, and T. Tasdizen, “Regularization with stochastic transformations and perturbations for deep semi-supervised learning,” in Advances in neural information processing systems, 2016, pp. 1163–1171.
  • Huang et al. [2017] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), vol. 1, no. 2, 2017, p. 3.
  • Chollet [2015] F. Chollet, “Keras,”, 2015.
  • Codella et al. [2018] N. C. Codella, D. Gutman, M. E. Celebi, B. Helba, M. A. Marchetti, S. W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittler et al., “Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic),” in Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on.    IEEE, 2018, pp. 168–172.
  • Bilic et al. [2019] P. Bilic, P. F. Christ, E. Vorontsov, G. Chlebus, H. Chen, Q. Dou, C.-W. Fu, X. Han, P.-A. Heng, J. Hesser et al., “The liver tumor segmentation benchmark (lits),” arXiv preprint arXiv:1901.04056, 2019.
  • Yuan and Lo [2017] Y. Yuan and Y.-C. Lo, “Improving dermoscopic image segmentation with enhanced convolutional-deconvolutional networks,” IEEE journal of biomedical and health informatics, 2017.
  • Venkatesh et al. [2018] G. Venkatesh, Y. Naresh, S. Little, and N. E. O’Connor, “A deep residual architecture for skin lesion segmentation,” in OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis.    Springer, 2018, pp. 277–284.
  • Berseth [2017] M. Berseth, “Isic 2017-skin lesion analysis towards melanoma detection,” arXiv preprint, 2017.
  • Bi et al. [2017] L. Bi, J. Kim, E. Ahn, and D. Feng, “Automatic skin lesion analysis using large-scale dermoscopy images and deep residual networks,” arXiv preprint, 2017.
  • Hung et al. [2018] W.-C. Hung, Y.-H. Tsai, Y.-T. Liou, Y.-Y. Lin, and M.-H. Yang, “Adversarial learning for semi-supervised semantic segmentation,” British Machine Vision Conference, 2018.
  • Oliver et al. [2018] A. Oliver, A. Odena, C. Raffel, E. D. Cubuk, and I. J. Goodfellow, “Realistic evaluation of deep semi-supervised learning algorithms,” Advances in neural information processing systems, 2018.