Train, Learn, Expand, Repeat

03/18/2020 ∙ by Abhijeet Parida, et al. ∙ Technische Universität München 0

High-quality labeled data is essential to successfully train supervised machine learning models. Although a large amount of unlabeled data is present in the medical domain, labeling poses a major challenge: medical professionals who can expertly label the data are a scarce and expensive resource. Making matters worse, voxel-wise delineation of data (e.g. for segmentation tasks) is tedious and suffers from high inter-rater variance, thus dramatically limiting available training data. We propose a recursive training strategy to perform the task of semantic segmentation given only very few training samples with pixel-level annotations. We expand on this small training set having cheaper image-level annotations using a recursive training strategy. We apply this technique on the segmentation of intracranial hemorrhage (ICH) in CT (computed tomography) scans of the brain, where typically few annotated data is available.



There are no comments yet.


page 5

page 6

page 8

page 9

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep artificial neural networks like convolutional neural networks (CNNs) are currently the state-of-the-art for semantic segmentation of both natural and medical images

(Dolz et al., 2018). This is achieved by leveraging the information from large, well-labeled datasets with ground truth annotations. However, in the medical domain, large delineated dataset of high quality are difficult to generate. et al. (2017) explain that while the annotation of medical images is done by highly specialized physicians, the resulting segmentation is still very prone to inter/intra-observer variability. Algorithms that can learn from unlabeled or weakly labeled training data are essential in the medical domain to leverage the vast amount of already available unlabeled data.

Datasets equipped with image labels are much more readily available and in their absence are easier and faster to create (Lin et al., 2014). As such, we propose a recursive method that is able to transfer knowledge obtained from a small, fully-supervised segmented dataset to obtain segmentation for a larger weakly-supervised dataset.

Our main contributions are:

  1. We present a semi-supervised recursive learning strategy that transfers knowledge gained from a small segmented dataset to a larger weakly labeled dataset

  2. We apply the algorithm to the task of semantic segmentation of intracranial hemorrhage in brain CT scans, and show that the results on several datasets.

2 Related Work

Accurate annotation of medical imaging data is pivotal for the successful training of supervised algorithms. It is very difficult to find datasets which are both large enough for training robust models, and where the annotation quality is high. To mitigate this problem, several strategies have been explored to reduce the need for such datasets.

One strategy to reduce the amount of training data needed is transfer learning: Pan and Yang (2010)

describe transfer learning as a technique which relays knowledge acquired from a domain with rich data availability to a domain with low data availability. Transfer learning techniques can be applied both across domains and tasks.

Another approach is semi-supervised learning, where algorithms are trained on a dataset that contains a small amount of labeled data and a large portion of unlabeled examples. One popular semi-supervised technique is self-training (Cheplygina et al., 2018). The idea in self-learning is to propagate predictions from the small labeled data to the large unlabeled data and subsequently use the newly created labeled set for training. Zhu and Goldberg (2009) advocate that this approach assumes that the method’s high confidence predictions are correct and can be used further. As seen in Su et al. (2015), an active verification step in which a human is queried to verify some of the labels can be deployed to avoid error propagation.

In situations where an imprecise or inaccurate annotation of the data is much cheaper to obtain compared to an accurate annotation, weakly-supervised learning can be taken into consideration. Khoreva et al. (2017) argue that a large number of noisy annotations should convey enough information about the task to be performed with reasonable accuracy while reducing the burden on the data annotator. The aim is thus to replace time-consuming annotation procedures, yielding the way for potentially labeling larger datasets. Yang et al. (2018) have shown it is possible to successfully learn a segmentation task in the medical domain by using axis-aligned bounding boxes, derived from 6 points. Instead of full, pixel-wise segmentation labels, less time consuming labelling techniques include bounding boxes (Khoreva et al., 2017; Yang et al., 2018), scribbles (Lin et al., 2016), or points (Rakelly et al., 2018).

3 Proposed Method

Figure 1: The flowchart depicts the schema for the proposed three-stage training strategy. The first stage, marked in red, shows the supervised learning on a small, but fully segmented dataset . The second stage, marked in blue, shows the samples with good segmentation that are actively selected to augment the dataset from the fully supervised approach. Finally, in the third stage, shown in green, the training continues recursively until the stopping criteria are reached.

To perform the task of semantic segmentation in data regimes where only few segmented data is available, we propose a new multi-stage semi-supervised training strategy as shown in Figure 1, supported by a human-in-the-loop. The idea combines the ability of CNNs to efficiently learn from few training samples with a recursive de-noising training strategy similar to that seen in Khoreva et al. (2017), and thus yields segmentations of high quality, while only requiring few annotated input samples.

We consider a semi-supervised setting, where denotes an image dataset equipped with precise pixel-level segmentations, i.e. for each image segmentations for the respective classes are available. For , however, only image-level class-labels are available. Since full segmentations are more expensive to generate, we assume that is considerably larger than .

In the first stage of training, a naive function approximator is trained on to segment an image of width and height with respect to classes, i.e. . This naive baseline is a deep neural network which has been trained to overfit on . The learning is done by minimizing the multi-class cross entropy loss


with denoting the -th pixel of the segmentation belonging to class .

In the second stage of training, the model is used to create segmentations for , which serves as an cheap but inaccurate approximation to the unknown ground truth segmentation of the image, i.e., a weak label is generated. In the selection step shown in Figure 1 a human expert identifies those segmentations with a low number false positives. The extent of the true positives is not taken into consideration at all. The expert is not allowed to make any changes to the annotation. The selected subset of samples is then used to augment the original dataset .

Before the selected samples are used to augment the original dataset , a step of post-processing may applied to the generated weak labels to refine them further. If it can be assumed that the generated labels are larger than the true but unkown ground-truth segmentations, the segmentation algorithm may be chosen such that it reduces the false positives. If, in contrast, it can be assumed that the generated weak labels are smaller than the unknown ground truth, can be chosen such that it increases the true positives. Another criterion that could enforce is the objectness (Khoreva et al., 2017), to make disjoint objects continuous and to force existing boundaries in on the respective segmentations.

In the final stage, is recursively re-trained by minimizing Equation 1. The loss is minimized between the segmentation generated by in the previous recursion, as ground truth, with the current recursion’s prediction for . After each recursion, images which are newly segmented from , are added to the training data. For the images , where the ground truth, , is available, the objective is to minimize the Equation 1 and the dice loss (Jimenez-del-Toro et al., 2016) as a regularization. The main idea exploited by the recursion is that the noisy segmentation generated will lead to better generalization on the inputs and generate robust predictions(Khoreva et al., 2017).The dice loss will allow for controlled growth/shrinkage of the segmentation learned by . This recursive process continues until the network no longer expands to new data.

4 Semantic Segmentation of Intracranial Hemorrhage

We choose the task of ICH segmentation in brain CT scans for validating the proposed approach. It satisfies all the criteria necessary for the application of the methods presented in this paper: (1) it is difficult and time consuming to create the segmentation labels, (2) no large segmented datasets are publicly available, and (3) large datasets with image-level class details are available.

4.1 Datasets

The datasets that we make use of are the PhysioNet dataset (Hssayeni et al., 2019), which corresponds to the with the full pixel-level annotation, and the dataset provided in the RSNA Intracranial Hemorrhage Detection challenge111 From the RSNA dataset we sample images with only one class present to form our subset. Apart from these we make use of an in-house dataset of brain CT scans with ICH222We have obtained positive ethical vote from the ethical committee of Technical University Munich to use the data for research purposes (344/19-SR). and CQ 500 dataset (Chilamkurthy et al., 2018) for benchmarking the performance of the recursive strategy. Each of the dataset consists of bleed classes, whose distribution can be seen in Table 2 in the Appendix A.

4.2 Setup and Experiments

For the ICH segmentation task, we choose UNet (Ronneberger et al., 2015) as the segmentation network and the Felzenszwalb-Huttenlocher (Felzenszwalb and Huttenlocher, 2004) algorithm as the approximator to refine the labels produced by the UNet.

Initially, the UNet was trained on the Physionet dataset for 120 epochs before running inference on the RSNA dataset. We then manually selected segmentations from the predictions to use as ground truth for the recursive training of the UNet. In each recursion, the UNet was trained for three epochs, with the results from the recursion used as the ground truth for the next recursion.

To test the models, a radiologist was given the task to label 20 pathological volumes from the CQ 500 (Chilamkurthy et al., 2018) dataset. Additionally, the models were tested on an in-house dataset consisting of various ICHs. We evaluate the model’s performance using the Dice Coefficient and Intersection over Union Scores (Jimenez-del-Toro et al., 2016) on both datasets.

4.3 Results

Test results performed on the two unseen datasets are provided in Table 1. We observe that the recursive strategy provides a significant improvement in performance on our internal dataset. There is a slight decrease in performance on the fully supervised dataset, which we assume to be due to the increased generalizing capacity of . Put differently, the minor details specific to the Physionet dataset have been ignored but the large trends are still maintained.

Dataset Model Dice IoU Precision Recall

Dataset Recursion
Median Change

CQ 500


Median Change

Median Change
Table 1: The segmentation results after performing the recursive training strategy across datasets.

The CQ 500 patient performance details are shown in the boxplots in Figure 2. The increase in performance after applying the recursive training strategy is lower compared to that on our internal dataset, due to the noisy nature of the scans in the dataset. We expect that training entirely on noisy input images could help to improve the performance globally and would allow further model generizability.

Figure 2: Boxplot for segmentation metrics on CQ 500. The complete boxplot for all the datasets is available in Appendix A Fig.8

Overall, there are two kinds of trends seen in the performance of the model before and after applying the recursive training strategy. First, can identify new regions that would not have been segmented without recursion (Figure 7 in Appendix A). Second, with recursion, the segmentation quality of identified regions is improved (Figure 3 and Figures 4, 6 in Appendix A).

Figure 3: Predicted segmentation masks from the model on the CQ 500. The green regions() correspond to the ground truth, the orange regions( row) show segmentations without the recursive strategy and the red regions ( row) show segmentations after recursion. More samples from CQ 500 can be seen in Appendix A.


Figure 4: Predicted results for a slice. The left image(green) is the ground truth, middle image (orange) is the output without recursive training and right output(red) with recursive training shows the model capacity in identifying new regions and expanding from seed points.

5 Conclusion

We propose a new training scheme for segmentation tasks in situations where only little ground truth data is available. Our results indicate that starting with a small labeled dataset and recursively expanding into a new dataset without pixel-level annotations may improve the networks ability generalize across different datasets. We demonstrated this with the task of Intracranial Hemorrhage Segmentation using the PhysioNet dataset as initial labeled dataset and the RSNA ICH dataset as the unlabeled expansion dataset.

Especially in the medical sector there are various weakly semi-supervised data scenarios where our proposed training scheme can be applied successfully. The recursive training scheme, which combines semi-supervised and transfer learning with a human-in-the-loop approach, is so general that we think other medical use cases and tasks other then ICH segmentation could benefit from it.


  • H. K. Bø, O. Solheim, A. S. Jakola, K. Kvistad, I. Reinertsen, and E. M. Berntsen (2017) Intra-rater variability in low-grade glioma segmentation. Journal of Neuro-Oncology 131 (2), pp. 393–402. External Links: ISSN 1573-7373, Document, Link Cited by: §1.
  • V. Cheplygina, M. de Bruijne, and J. P. W. Pluim (2018) Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. CoRR abs/1804.06353. External Links: Link, 1804.06353 Cited by: §2.
  • S. Chilamkurthy, R. Ghosh, S. Tanamala, M. Biviji, N. G. Campeau, V. K. Venugopal, V. Mahajan, P. Rao, and P. Warier (2018)

    Development and validation of deep learning algorithms for detection of critical findings in head CT scans

    CoRR abs/1803.05854. External Links: Link, 1803.05854 Cited by: §4.1, §4.2.
  • J. Dolz, K. Gopinath, J. Yuan, H. Lombaert, C. Desrosiers, and I. B. Ayed (2018) HyperDense-net: a hyper-densely connected cnn for multi-modal image segmentation. IEEE transactions on medical imaging 38 (5), pp. 1116–1126. Cited by: §1.
  • P. F. Felzenszwalb and D. P. Huttenlocher (2004) Efficient graph-based image segmentation.

    International journal of computer vision

    59 (2), pp. 167–181.
    Cited by: §4.2.
  • M. D. Hssayeni, M. S., M. S. Croock, D. Ph., A. Al-Ani, D. Ph., H. Falah Al-khafaji, M. D., Z. A. Yahya, M. D., B. Ghoraani, and Ph. D (2019) Intracranial Hemorrhage Segmentation Using Deep Convolutional Model. arXiv e-prints, pp. arXiv:1910.08643. External Links: 1910.08643 Cited by: §4.1.
  • O. Jimenez-del-Toro, H. Müller, M. Krenn, K. Gruenberg, A. A. Taha, M. Winterstein, I. Eggel, Weber, Y. Dicente Cid, T. Gass, M. Heinrich, F. Jia, F. Kahl, R. Kechichian, D. Mai, A. B. Spanier, G. Vincent, C. Wang, D. Wyeth, and A. Hanbury (2016) Cloud-based evaluation of anatomical structure segmentation and landmark detection algorithms: visceral anatomy benchmarks. IEEE Transactions on Medical Imaging 35 (11), pp. 2459–2475. External Links: Document, ISSN 0278-0062 Cited by: §3, §4.2.
  • A. Khoreva, R. Benenson, J. Hosang, M. Hein, and B. Schiele (2017) Simple does it: weakly supervised instance and semantic segmentation. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 876–885. Cited by: §2, §3, §3, §3.
  • D. Lin, J. Dai, J. Jia, K. He, and J. Sun (2016) Scribblesup: scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3159–3167. Cited by: §2.
  • T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft COCO: common objects in context. External Links: Link Cited by: §1.
  • S. Pan and Q. Yang (2010) A survey on transfer learning. ieee transaction on knowledge discovery and data engineering, 22 (10). IEEE press. Cited by: §2.
  • K. Rakelly, E. Shelhamer, T. Darrell, A. A. Efros, and S. Levine (2018) Meta-learning to guide segmentation. Cited by: §2.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §4.2.
  • H. Su, Z. Yin, S. Huh, T. Kanade, and J. Zhu (2015) Interactive cell segmentation based on active and semi-supervised learning. IEEE transactions on medical imaging 35 (3), pp. 762–777. Cited by: §2.
  • L. Yang, Y. Zhang, Z. Zhao, H. Zheng, P. Liang, M. T. Ying, A. T. Ahuja, and D. Z. Chen (2018) Boxnet: deep learning based biomedical image segmentation using boxes only annotation. arXiv preprint arXiv:1806.00593. Cited by: §2.
  • X. Zhu and A. B. Goldberg (2009) Introduction to semi-supervised learning.

    Synthesis lectures on artificial intelligence and machine learning

    3 (1), pp. 1–130.
    Cited by: §2.

Appendix A Appendix

Dataset Distribution

PhysioNet is a small dataset with full pixel-level segmentations for each of the bleed classes. RSNA ICH is a very large dataset with image-level labels for the bleed classes in the images. We choose those training slices where only one type of bleeding is apparent. We then subsample the dataset to have equal occurence of each class. For the CQ 500 we consider a random subselection of 20 patients, which have been manually annotated by a neuro-radiologist.

Type of Bleed () PhysioNet RSNA CQ 500 In House Datset

- -
Intraparenchymal - -
Intraventricular - -
Subarchnoid - -
Subdural - -
Bleed Slices
No Bleed Slices
Total Slices

Table 2: The distribution of number of slices of CT scans in each of the dataset that was used for all the experiments

The 20 patients manually annotated by the neuro-radiologist are-

Additional Visualizations

Figure 5: Some of the slices of the CQ 500 where a decrease in the dice score is seen. The green regions() corresponds to the ground truth, the orange regions ( row) show segmentations without recursive strategy and the red regions ( row) show segmentations after recursion.
Figure 6: Segmentation results on few slices of the CQ 500. The green regions ( row) correspond to the ground truth, the orange regions ( row) show segmentations without recursive strategy and the red regions ( row) show segmentations after recursion.
Figure 7: Segmentation results on few slices of the CQ 500 where the recursion helps identifying new regions for segmentation which were not identified without the recursive strategy. The green regions ( row) correspond to the ground truth, the orange region ( row) show segmentations without recursive strategy and the red regions ( row) show segmentation after recursion.


This sections shows the detailed comparison of the boxplot for all the available dataset.

Figure 8: Boxplot for segmentation metrics for all available datasets.